# Install conda on your Colab environment

Ignore this first cell if you are running the notebook in a local environment.

One can still run it locally but it will have no effect.

In [1]:
# Run this cell first - it will install a conda distribution (mamba)
# on your Drive then restart the kernel automatically 
# (don't worry about the crashing/restarting kernel messages)
# It HAS to be runned FIRST everytime you use the notebook in colab

import os
import sys
RunningInCOLAB  = 'google.colab' in str(get_ipython())

if RunningInCOLAB:
    !pip install -q condacolab
    import condacolab
    condacolab.install()

# Set up your Colab or local environment
# Then import libraries

Run this cell in both cases of use (local or Colab)

In [1]:
import os
import sys
RunningInCOLAB  = 'google.colab' in str(get_ipython())

if RunningInCOLAB:
    
    # Check everything is fine with conda in Colab
    import condacolab
    condacolab.check()
    
    # Mount your drive environment in the colab runtime
    from google.colab import drive
    drive.mount('/content/drive',force_remount=True)
    
    # Change this variable to your path on Google Drive to which the repo has been cloned
    # If you followed the colab notebook 'repo_cloning.ipynb', nothing to change here
    repo_path_in_drive = '/content/drive/My Drive/Github/amn_release/'
    # Change directory to your repo cloned in your drive
    DIRECTORY = repo_path_in_drive
    os.chdir(repo_path_in_drive)
    # Copy the environment given in the environment_amn_light.yml
    !mamba env update -n base -f environment_amn_light.yml
    
    # This is one of the few Colab-compatible font
    font = 'Liberation Sans'
    
else:
    
    # In this case the local root of the repo is our working directory
    DIRECTORY = './'
    font = 'arial'

# printing the working directory files. One can check you see the same folders and files as in the git webpage.
print(os.listdir(DIRECTORY))

from Library.Build_Dataset import *

['README.md', 'Duplicate_Model.ipynb', 'Build_Model_Dense.ipynb', 'Dataset_experimental', 'Tutorial.ipynb', '.ipynb_checkpoints', '.git', 'Build_Model_RC.ipynb', 'environment_amn_light.yml', 'Build_Experimental.ipynb', 'Reservoir', 'Build_Model_MM.ipynb', 'Dataset_model', 'Figures.ipynb', 'Result', 'Figures', '.gitignore', 'LICENSE', 'Build_Model_ANN.ipynb', 'Build_Dataset.ipynb', 'Dataset_input', 'Functions', 'environment_amn.yml', 'Build_Model_AMN.ipynb', '.DS_Store']


# Generate Training Sets with FBA simulation or experimental data file



Below are provided several examples to generate training sets using different metabolic models, or experimental datafiles.

We also provide a way to run cobrapy with provided inputs, such as the reservoir computing predictions (see figure 5 of the research paper).

## Examples of FBA simulation training set generation

Using *E. coli* core which is a model of the central metabolic network common in most E. coli strains.

In [5]:
# Generate training set with E coli core model with FBA simulation

# What you can change
seed = 10
np.random.seed(seed=seed)  # seed for random number generator
cobraname =  'e_coli_core_duplicated'  # name of the model 
mediumname = 'e_coli_core' # name of the medium file 
mediumbound = 'UB' # Exact bound (EB) or upper bound (UB)
method = 'pFBA' # FBA, pFBA or EXP
size  = 50 # training set size
reduce = False # Set at True if you want to reduce the model
# End of What you can change

# Run cobra
cobrafile  = DIRECTORY+'Dataset_input/'+cobraname
mediumfile = DIRECTORY+'Dataset_input/'+mediumname
parameter = TrainingSet(cobraname=cobrafile, 
                        mediumname=mediumfile, mediumbound=mediumbound, 
                        method=method,objective=[],
                        measure=[])
# Note: Leaving objective and mesaure as empty lists sets the default
# objective reaction of the SBML model as the objective reaction
# and the measure (Y) as this objective reaction.
parameter.get(sample_size=size)

# Saving file
trainingfile  = DIRECTORY+'Dataset_model/'+mediumname+'_'+parameter.mediumbound+'_'+str(size)
parameter.save(trainingfile, reduce=reduce)

# Verifying
parameter = TrainingSet()
parameter.load(trainingfile)
parameter.printout()

model file name: ./Dataset_model/e_coli_core_UB_50
reduced model: False
medium file name: ./Dataset_input/e_coli_core
medium bound: UB
list of reactions in objective: ['BIOMASS_Ecoli_core_w_GAM']
method: pFBA
trainingsize: 50
list of medium reactions: 20
list of medium levels: 20
list of medium values: 20
ratio of variable medium turned on: 0.5
list of measured reactions: 154
Stoichiometric matrix (72, 154)
Boundary matrix from reactions to medium: (20, 154)
Measurement matrix from reaction to measures: (154, 154)
Reaction to metabolite matrix: (72, 154)
Metabolite to reaction matrix: (154, 72)
Training set X: (50, 20)
Training set Y: (50, 154)
S_int matrix (67, 154)
S_ext matrix (154, 298)
Q matrix (154, 67)
P matrix (154, 154)
b_int vector (67,)
b_ext vector (50, 298)
Sb matrix (154, 72)
c vector (154,)


Using iML1515, alongside an experimental file that is guiding the generation of the training set (instead of the usual 'mediumname' we have a 'expname' file which contains all experimental media compositions, in order to obtain a training sets of all biologically relevant flux distributions according to these compositions. Note that we reduce the model in this next cell.

In [3]:
# Generate training set with E coli iML1515 with FBA simulation 
# constrained by experimental file: metabolites in medium are not drawn at
# random but are the same than in the provided training experimental file
# This cell may take several hours to execute! Avoid running this in Colab
    
# What you can change
seed = 10
np.random.seed(seed=seed)  # seed for random number generator
cobraname =  'iML1515_duplicated' # name of the model 
mediumname = 'iML1515' # name of the medium file 
mediumbound = 'UB' # Exact bound (EB) or upper bound (UB)
expname = 'iML1515_EXP' # name of the experimental dataset for constraints
method = 'pFBA' # FBA, pFBA or EXP
size, size_i  = 110, 100 # expname training set size, training set size per item in expname
reduce = True # Set at True if you want to reduce the model
verbose = True
# End of What you can change

# Get X from experimental data set
cobrafile = DIRECTORY+'Dataset_input/'+cobraname
expfile  = DIRECTORY+'Dataset_input/'+expname
parameter = TrainingSet(cobraname=cobrafile, 
                        mediumname=expfile, 
                        mediumbound=mediumbound, 
                        mediumsize=38, 
                        method='EXP',verbose=False)
X = parameter.X.copy()

# Get other parameters from medium file
mediumfile = DIRECTORY+'Dataset_input/'+mediumname
parameter = TrainingSet(cobraname=cobrafile, 
                        mediumname=mediumfile, 
                        mediumbound=mediumbound, 
                        method=method, verbose=False)

# Create varmed the list of variable medium based on experimental file
varmed = {}
for i in range(X.shape[0]):
    varmed[i] = []
    for j in range(X.shape[1]):
        if parameter.levmed[j] > 1 and X[i,j] > 0:
            varmed[i].append(parameter.medium[j])
varmed = list(varmed.values())

# Get a Cobra training set constrained by varmed
for i in range(X.shape[0]): 
    parameter.get(sample_size=size_i, varmed=varmed[i], verbose=True) 

# Saving file
trainingfile  = DIRECTORY+'Dataset_model/'+mediumname+'_'+parameter.mediumbound
parameter.save(trainingfile, reduce=reduce)

# Verifying
parameter = TrainingSet()
parameter.load(trainingfile)
print(trainingfile)
parameter.printout()

sample: 0
pass (varmed, obj): ['EX_gal_e_i'] 0.09339486031638548
primal objectif = ['BIOMASS_Ec_iML1515_core_75p37M'] pFBA 0.09339486031638464
sample: 1
pass (varmed, obj): ['EX_gal_e_i'] 0.09864436780341955
primal objectif = ['BIOMASS_Ec_iML1515_core_75p37M'] pFBA 0.09864436780341955
sample: 2
pass (varmed, obj): ['EX_gal_e_i'] 0.1397443454891211
primal objectif = ['BIOMASS_Ec_iML1515_core_75p37M'] pFBA 0.1397443454891211
sample: 3
pass (varmed, obj): ['EX_gal_e_i'] 0.10975066942808978
primal objectif = ['BIOMASS_Ec_iML1515_core_75p37M'] pFBA 0.10975066942808978
sample: 4
pass (varmed, obj): ['EX_gal_e_i'] 0.16057328719817235
primal objectif = ['BIOMASS_Ec_iML1515_core_75p37M'] pFBA 0.16057328719817235
sample: 5
pass (varmed, obj): ['EX_gal_e_i'] 0.16390591787162176
primal objectif = ['BIOMASS_Ec_iML1515_core_75p37M'] pFBA 0.16390591787162176
sample: 6
pass (varmed, obj): ['EX_gal_e_i'] 0.1105838270964515
primal objectif = ['BIOMASS_Ec_iML1515_core_75p37M'] pFBA 0.1105838270964515
sam

## Examples of experimental or manual training set generation

This cell provides a way to generate a training set with the same object (parameter) as simulated training sets using Cobra, but only using the experimental data. In this cell, we do not run cobra to generate the training set but instead directly use the experimental data.

In [5]:
# Generate training set for E coli iML1515 experimental file 

# What you can change
seed = 10
np.random.seed(seed=seed)  # seed for random number generator
cobraname = 'iML1515_EXP'  # name of the model here a reduced iML1515 model
mediumbound = 'UB' # a must exact bounds unknown
mediumname = 'iML1515_EXP' # name of experimental file 
method    = 'EXP' # FBA, pFBA or EXP
reduce = False # Set at True if you want to reduce the model
# End of What you can change

# Get data
cobrafile = DIRECTORY+'Dataset_input/'+cobraname
mediumfile  = DIRECTORY+'Dataset_input/'+mediumname
parameter = TrainingSet(cobraname=cobrafile, 
                        mediumname=mediumfile, mediumbound=mediumbound, mediumsize=38, 
                        method=method,verbose=False)

# Saving file
trainingfile  = DIRECTORY+'Dataset_model/'+mediumname+'_'+parameter.mediumbound
parameter.save(trainingfile, reduce=reduce)

# Verifying
parameter = TrainingSet()
parameter.load(trainingfile)
parameter.printout()

model file name: ./Dataset_model/iML1515_EXP_reduced_UB
reduced model: False
medium file name: ./Dataset_input/iML1515_EXP_reduced
medium bound: UB
list of reactions in objective: ['BIOMASS_Ec_iML1515_core_75p37M']
method: EXP
trainingsize: 110
list of medium reactions: 38
list of medium levels: 0
list of medium values: 0
ratio of variable medium turned on: 0
list of measured reactions: 543
Stoichiometric matrix (1080, 543)
Boundary matrix from reactions to medium: (38, 543)
Measurement matrix from reaction to measures: (543, 543)
Reaction to metabolite matrix: (1080, 543)
Metabolite to reaction matrix: (543, 1080)
Training set X: (110, 38)
Training set Y: (110, 1)
S_int matrix (478, 543)
S_ext matrix (543, 2703)
Q matrix (543, 478)
P matrix (543, 543)
b_int vector (478,)
b_ext vector (110, 2703)
Sb matrix (543, 1080)
c vector (543,)


## Running Cobra on a provided dataset set

This cell has a completely different purpose than the rest of the notebook. It serves as a cell running Cobrapy with  provided values as inputs. These inputs are extracted from Reservoir Computing, you can see an example in the notebook `Build_Model_RC.ipynb`

In [3]:
# This cell run FBA on a provided training and compute R2 between
# provided objective and calculated objective
# R2 = 1 when the training set was generated by FBA, but may be different than 1
# when the training set is an experimental one
# For exprimental training set medium input fluxes can be scaled by a value

from sklearn.metrics import r2_score

# What you can change 
seed = 10
np.random.seed(seed=seed)  
cobraname = 'iML1515_EXP'  # name of the model 
mediumbound = 'UB' # a must, exact bounds unknown
# mediumname = 'iML1515_EXP' # name of experimental file, for out-of-the-box FBA
# mediumname = 'iML1515_UB_AMN_QP_RC_AMN_solution_for_Cobra_train' # for running Cobra with RC training points as inputs
mediumname = 'iML1515_UB_AMN_QP_RC_AMN_solution_for_Cobra_pred' # for running Cobra with RC predictions as inputs
method = 'EXP' # FBA, pFBA or EXP
# End of What you can change

# Get data
cobrafile =  DIRECTORY+'Dataset_input/'+cobraname
mediumfile = DIRECTORY+'Dataset_input/'+mediumname
parameter = TrainingSet(cobraname=cobrafile, 
                        mediumname=mediumfile, mediumbound=mediumbound, mediumsize=38, 
                        method=method,verbose=False)
scaler_list = [2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0] # test different scalers
scaler_list = [2.4] # best scaler for out-of-the box FBA
scaler_list = [1] # for running Cobra with RC training inputs, see mediumname

# regression cobra vs. true values
L = parameter.X.shape[0]
for scaler in scaler_list:
    Y = {}
    for i in range(L):
        inf = {r.id: 0 for r in parameter.model.reactions}
        for j in range(len(parameter.medium)):
            #print(j, parameter.medium[j],parameter.X[i,j], len(parameter.model.reactions))
            eps = 1.0e-4 if parameter.X[i,j] < 1.0e-4 else 0
            inf[parameter.medium[j]] = scaler * parameter.X[i,j] + eps
        out,Y[i] = run_cobra(parameter.model, parameter.objective, inf, method='pFBA', verbose=False)
        print("%d %.4f %.4f" % (i, parameter.Y[i], Y[i]))

    Y = list(Y.values())
    r2 = r2_score(parameter.Y[0:L], Y[0:L], multioutput='variance_weighted')
    print('scaler %.2f R2 %.4f ' % (scaler, r2))

0 0.1696 0.2041
1 0.1340 0.1503
2 0.1886 0.1996
3 0.1990 0.1580
4 0.0720 0.1089
5 0.0924 0.0839
6 0.0881 0.0852
7 0.0900 0.1311
8 0.1989 0.1737
9 0.1054 0.0839
10 0.2681 0.2577
11 0.1576 0.1124
12 0.1209 0.1699
13 0.2729 0.2433
14 0.2945 0.2854
15 0.2386 0.2434
16 0.2531 0.2012
17 0.2606 0.2869
18 0.2816 0.2001
19 0.1351 0.1724
20 0.1449 0.1716
21 0.2409 0.2750
22 0.2437 0.2655
23 0.1059 0.1018
24 0.1082 0.1249
25 0.2451 0.3241
26 0.3099 0.2835
27 0.2000 0.2761
28 0.2077 0.2065
29 0.3837 0.2881
30 0.2247 0.2361
31 0.3520 0.3883
32 0.2255 0.2113
33 0.1340 0.1275
34 0.2397 0.1937
35 0.3654 0.3591
36 0.1863 0.2036
37 0.1612 0.3016
38 0.3442 0.3725
39 0.2964 0.3434
40 0.4135 0.4223
41 0.2561 0.2930
42 0.3949 0.4262
43 0.4205 0.3552
44 0.3050 0.3002
45 0.2315 0.1863
46 0.2708 0.2802
47 0.3351 0.2890
48 0.2785 0.2193
49 0.0765 0.1460
50 0.0704 0.0852
51 0.2095 0.2374
52 0.1135 0.1447
53 0.2193 0.2378
54 0.3316 0.2954
55 0.1368 0.0852
56 0.1362 0.1624
57 0.1074 0.1310
58 0.2277 0.2304
59 0.20