# Functions

In [1]:
from IPython.display import display
from cobramod import __version__
print(__version__)
# From Escher:
# This option turns off the warning message if you leave or refresh this page
import escher
escher.rc['never_ask_before_quit'] = True

Scaling...
 A: min|aij| =  1.000e+00  max|aij| =  1.000e+00  ratio =  1.000e+00
Problem data seem to be well scaled
0.5.5-alpha.1


## Retrieving metabolic pathway information

CobraMod can obtain metabolic pathway information (metabolites, reactions or pathways) from various databases by using database-specific identifiers. It supports all databases from the [BioCyc collection](
https://biocyc.org/
), [Plant Metabolic Network (PMN)](
https://pmn.plantcyc.org/
), [the KEGG database](
https://www.genome.jp/kegg/
) and [the BiGG Models repository](
http://bigg.ucsd.edu/
). Call `cobramod.available_databases` to see all supported databases.

In [2]:
from cobramod import available_databases
                                         
available_databases

0,1
Database,URL with identifier (bold)
"BioCyc, sub-database ECOLI",https://biocyc.org/compound?orgid=ECOLI&id=PPI
"Plant Metabolic Network, sub-database CORN",https://pmn.plantcyc.org/compound?orgid=CORN&id=PPI
KEGG,https://www.genome.jp/entry/C00013
"BiGG Models Repository, universal model",http://bigg.ucsd.edu/universal/metabolites/ppi

0,1
Database,Abbreviation
BioCyc,"META or identifier of sub-database e.g: ECOLI, ARA, GCF_000010885"
Plant Metabolic Network,"Prefix ""pmn:"" with the sub-database identifier, e.g pmn:PLANT, pmn:ARA, pmn:CORN"
KEGG,KEGG
BiGG Models Repository,BIGG




The user can download the metabolic pathway information using the
`cobramod.get_data` function. In this example we download information from 
BioCyc sub-database for yeast.


In [3]:
from cobramod import get_data
from pathlib import Path

dir_data = Path.cwd().resolve().joinpath("data")
identifiers = [
    "CPD-14074",
    "CPD-14075",
    "CPD-14076",
    "CPD-14553",
    "CPD-15317",
    "CPD-15322",
    "CPD-15323",
]

for metabolite in identifiers:
    get_data(
        directory=dir_data,
        identifier=metabolite,
        database="YEAST"
    )

The first argument in [cobramod.get_data()](
module/cobramod/index.html#cobramod.get_data) is the system path where
CobraMod stores the metabolic pathway information. CobraMod uses [pathlib](
https://docs.python.org/3/library/pathlib.html#pathlib.Path) for path 
representation. The second argument is the original identifier used in the respective database. In this example we retrieve data from BioCyc 
sub-database `YEAST` The last argument is the abbreviation of the database. 

CobraMod creates a directory with the name of the database and stores the
metabolic pathway information in it:

```
data
`-- YEAST
    |-- CPD-14074.xml
    |-- CPD-14075.xml
    |-- CPD-14076.xml
    |-- CPD-14553.xml
    |-- CPD-15317.xml
    |-- CPD-15322.xml
    `-- CPD-15323.xml
```

## Converting stored-data into COBRApy objects

CobraMod can convert metabolic pathway information (metabolites, reactions, pathways) into COBRApy objects ([cobra.Reaction](
  https://cobrapy.readthedocs.io/en/latest/autoapi/cobra/index.html#cobra.Reaction
) and [cobra.Metabolite](
  https://cobrapy.readthedocs.io/en/latest/autoapi/cobra/index.html#cobra.Metabolite
)). It can thus be seamlessly integrated with a COBRApy workflow.

The function [cobramod.create_object()](
module/cobramod/index.html#cobramod.create_object) creates COBRApy objects from metabolic pathway data retrieved by [cobramod.get_data()](
module/cobramod/index.html#cobramod.get_data). If no pathway information was downloaded [cobramod.create_object()]( module/cobramod/index.html#cobramod.create_object) retrieves it automatically.

In this example, we convert the metabolite *2-Oxoglutarate* with the
KEGG identifier [C00026](
https://www.genome.jp/dbget-bin/www_bget?C00026) into a COBRApy object.
CobraMod automatically identifies the KEGG entry as a metabolite and converts it into the corresponding COBRApy Metabolite.

The first argument is the database-specific identifier (`C00026`) followed by
the database abbreviation (`KEGG`). The third argument is the path 
representation for the directory of the metabolic pathway information. CobraMod
downloads the metabolic pathway information into this directory and will always
utilize it instead of downloading it again. The last argument is the
compartment of the reaction (`c` for cytosol).

In [4]:
from cobramod import create_object
from pathlib import Path

# Path for the metabolic pathway information directory                                                                        
dir_data = Path.cwd().resolve().joinpath("data")

new_object = create_object(
    identifier="C00026",
    database="KEGG",
    directory=dir_data,
    compartment="c"
)
                                             
print(type(new_object))
new_object

<class 'cobra.core.metabolite.Metabolite'>


0,1
Metabolite identifier,C00026_c
Name,2-Oxoglutarate;
Memory address,0x07fed40fad6d0
Formula,C5H6O5
Compartment,c
In 0 reaction(s),


In the second example below, we convert the reaction [RXN-11502](
https://pmn.plantcyc.org/CORN/NEW-IMAGE?object=RXN-11502) from the PMN sub-database CORN into a COBRApy reaction. The
first argument is the database-specific identifier (`RXN-11502`) followed by
the database identifier (`pmn:CORN`). CobraMod automatically takes 
reversibility and gene information from the database entry and adds it to the reaction object.

In [5]:
new_object = create_object(
    identifier="RXN-11501",
    database="pmn:CORN",
    directory=dir_data,
    compartment="c"
)
                                             
print(type(new_object))
display(new_object)
new_object.genes

<class 'cobra.core.reaction.Reaction'>


0,1
Reaction identifier,RXN_11501_c
Name,alkaline α- galactosidase
Memory address,0x07fed40f99050
Stoichiometry,CPD_170_c + WATER_c --> ALPHA_D_GALACTOSE_c + CPD_1099_c  stachyose + H2O --> alpha-D-galactopyranose + raffinose
GPR,ZM00001D031300 or ZM00001D031303 or ZM00001D003279
Lower bound,0
Upper bound,1000


frozenset({<Gene ZM00001D003279 at 0x7fed40faddd0>,
           <Gene ZM00001D031300 at 0x7fed40fadd10>,
           <Gene ZM00001D031303 at 0x7fed40fad350>})


## Adding metabolites

The function [cobramod.add_metabolites()](
module/cobramod/index.html#cobramod.add_metabolites)
extends the COBRApy function [model.add_metabolites()](
https://cobrapy.readthedocs.io/en/latest/autoapi/cobra/core/model/index.html#cobra.core.model.Model.add_metabolites
) and can be used with a simple syntax. It can utilize a single string, a list of strings, a file path or a COBRAPy metabolite object. In the next
examples we showcase these options. We use the *E. coli* core model from COBRApy as test model. This core model can be found under `cobramod.test.textbook`. If the argument `obj` is used as a string, it can be the database-specific identifier of the respective metabolite and its compartment or a user-defined metabolite. It uses the following syntax:

------

**SYNTAX**  

To add metabolite information from a database:

    database-specific_identifier, compartment

To add user-curated metabolites:

    user-curated_identifier, name, compartment, chemical_formula, molecular_charge

------

In the first example, we add the metabolite *L-methionine* with the MetaCyc
identifier [MET](
https://metacyc.org/compound?orgid=META&id=MET) to the test model. The first
argument is the model name. The argument `obj` contains the identifier
`MET` and the compartment `c`. The second argument is the database identifier
(`META`) and the third argument is the directory where CobraMod stores the metabolite information.


In [6]:
from cobramod import add_metabolites
from cobramod.test import textbook_biocyc
from pathlib import Path

# Path for the metabolic pathway information directory                                                                        
dir_data = Path.cwd().resolve().joinpath("data")
# Using copy
test_model = textbook_biocyc.copy()

add_metabolites(
    model=test_model,
    obj="MET, c",
    database="META",
    directory=dir_data,
)
print(type(test_model.metabolites.get_by_id("MET_c")))
test_model.metabolites.get_by_id("MET_c")

<class 'cobra.core.metabolite.Metabolite'>


0,1
Metabolite identifier,MET_c
Name,L-methionine
Memory address,0x07fed4089b790
Formula,C5H11N1O2S1
Compartment,c
In 0 reaction(s),


In the second example, we add two metabolites ([methionine](
https://metacyc.org/compound?orgid=META&id=MET) and [sucrose](
https://metacyc.org/compound?orgid=META&id=SUCROSE
)) from MetaCyc. In the argument `obj` we define a list with the database-specific metabolite identifiers and the respecitive compartments. 
The rest of the arguments remains the same as in the previous example. 
CobraMod skips adding metabolites which are already in the model and prints 
a warning. The metabolite information contains multiple cross-references 
database entries. If an entry is found in the model, then CobraMod uses it
instead of creating a new COBRApy metabolite.

In [7]:
add_metabolites(
    model=test_model,
    obj=["MET, c", "SUCROSE, c"],
    database="META",
    directory=dir_data,
)
# Show metabolites in jupyter
display(test_model.metabolites.get_by_id("MET_c"))  
test_model.metabolites.get_by_id("SUCROSE_c")



0,1
Metabolite identifier,MET_c
Name,L-methionine
Memory address,0x07fed4089b790
Formula,C5H11N1O2S1
Compartment,c
In 0 reaction(s),


0,1
Metabolite identifier,SUCROSE_c
Name,sucrose
Memory address,0x07fed4089d890
Formula,C12H22O11
Compartment,c
In 0 reaction(s),


In the third example, we use a text file to add metabolites to the test
model. We use the file *metabolites.txt* in the current working directory with the following content:

    SUCROSE, c  
    MET, c  
    MALTOSE_c, MALTOSE[c], c, C12H22O11, 1

In this example, CobraMod downloads the first two metabolites from MetaCyc, while `MALTOSE_c` is a user-defined metabolite.  The user can specify the path to this file in the `obj` argument. The remaining arguments are the same as in the previous examples. We added two print statements to show that CobraMod adds the metabolites to the model.

In [8]:
# Path for the metabolic pathway information directory
dir_data = Path.cwd().resolve().joinpath("data")
# This is our file
file = dir_data.joinpath("metabolites.txt")
# Using a copy
test_model = textbook_biocyc.copy()

print(f'Number of metabolites prior addition: {len(test_model.metabolites)}')
# Using CobraMod
add_metabolites(
    model=test_model,
    obj=file,
    directory=dir_data,
    database="META",
)
print(f'Number of metabolites after addition: {len(test_model.metabolites)}')
# Show metabolites in jupyter
display(test_model.metabolites.get_by_id("MET_c"))
display(test_model.metabolites.get_by_id("SUCROSE_c"))
test_model.metabolites.get_by_id("MALTOSE_c")

Number of metabolites prior addition: 72
Number of metabolites after addition: 75


0,1
Metabolite identifier,MET_c
Name,L-methionine
Memory address,0x07fed408a2ed0
Formula,C5H11N1O2S1
Compartment,c
In 0 reaction(s),


0,1
Metabolite identifier,SUCROSE_c
Name,sucrose
Memory address,0x07fed407cc810
Formula,C12H22O11
Compartment,c
In 0 reaction(s),


0,1
Metabolite identifier,MALTOSE_c
Name,MALTOSE[c]
Memory address,0x07fed40f5c6d0
Formula,C12H22O11
Compartment,c
In 0 reaction(s),


Since [cobramod.add_metabolites()](
module/cobramod/index.html#cobramod.add_metabolites)
is an extension of the COBRApy function
[model.add_metabolites()](
https://cobrapy.readthedocs.io/en/latest/autoapi/cobra/core/model/index.html#cobra.core.model.Model.add_metabolites)
the user can also utilize COBRApy metabolites. In the following example, we use a variation of the test model (`textbook_biocyc`) which uses
BioCyc metabolite identifiers. We copy a COBRApy metabolite from
the test model and add it to the BioCyc-test model. 

In [9]:
from cobramod import add_metabolites
from cobramod.test import textbook, textbook_biocyc
                        
# Copying Metabolite from original model
metabolite = textbook.metabolites.get_by_id("xu5p__D_c")
# Using a copy
test_model = textbook_biocyc.copy()
add_metabolites(
    model=test_model,
    obj=metabolite
)
                                                               
test_model.metabolites.get_by_id("xu5p__D_c")

0,1
Metabolite identifier,xu5p__D_c
Name,D-Xylulose 5-phosphate
Memory address,0x07fed40d5e390
Formula,C5H9O8P
Compartment,c
In 3 reaction(s),"RPE, TKT2, TKT1"


If CobraMod detects large molecules (e.g.  enzymes) or if the metabolite information does not include a chemical formula the user receives a warning. In this example, we use the enzyme with the MetaCyc identifier `Red-NADPH-Hemoprotein-Reductases` and add it to the test model. CobraMod raises a warning due to the missing chemical formula.

In [10]:
# Using a copy
test_model = textbook.copy()

add_metabolites(
    model=test_model,
    obj="Red-NADPH-Hemoprotein-Reductases, c",
    directory=dir_data,
    database="META",
)
test_model.metabolites.get_by_id("Red_NADPH_Hemoprotein_Reductases_c")

  warn(msg)


0,1
Metabolite identifier,Red_NADPH_Hemoprotein_Reductases_c
Name,Red-NADPH-Hemoprotein-Reductases
Memory address,0x07fed4076c410
Formula,X
Compartment,c
In 0 reaction(s),


----------------

**NOTES**

- CobraMod replaces hyphens (`-`) with underscores (`_`) in the identifiers when
creating COBRApy metabolites.
- When adding several metabolites the user can only specify one database identifier (It is not possible to use two databases within the same
function call) or alternatively should call the function twice.

----------------

## Adding reactions

The function [cobramod.add_reactions()](
module/cobramod/index.html#cobramod.add_reactions) extends the COBRApy
function [model.add_reactions()](
https://cobrapy.readthedocs.io/en/latest/autoapi/cobra/index.html?highlight=optimize#cobra.Model.add_reactions
) and can be used with a simple syntax. It can utilize a single string, a 
list of string, a file path or a COBRApy reaction object. In the examples we
showcase these options. Again, we use the *E. coli* core model from COBRApy as test model. 
 
If the argument `obj` is used as a string, it can be the
database-specific identifier of the respective reaction and its
compartment or a user-curated reaction. It uses the following syntax:

--------

**SYNTAX**  

To add reaction information from a database:

    database-specific_identifier, compartment

To add user-curated reactions, the user should write the identifier and name of the reaction following the [COBRApy reaction string syntax](
  https://cobrapy.readthedocs.io/en/latest/autoapi/cobra/core/reaction/index.html#cobra.core.reaction.Reaction.build_reaction_from_string
):

    user-curated_identifier, name | coefficient_1 metabolite_1 <-> coefficient_2 metabolite_2

Metabolites must contain a suffix which specifies the compartment. This is given by an underscore (`_`) followed by the compartment-abbreviation. In this case we create a transport of oxygen between the external comparment 
(`e`) and cytosol (`c`): 

    TRANS_H2O_ec, Oxygen Transport | 2 OXYGEN-MOLECULE_e <-> 2 OXYGEN_MOLECULE_c

-------

In the first example we add the KEGG reaction [R04382](https://www.kegg.jp/dbget-bin/www_bget?rn:R04382) to the test model. The first argument is the model to extend. The `obj` argument use the identifier `R04382` and the compartment `c`. The next argument is the database identifier (`KEGG`) and finally the directory where CobraMod stores and uses the metabolic pathway information. The argument `genome` is a  KEGG-specific argument. Please read the notes below for more information about it.

CobraMod parses the reaction information for gene identifiers and automatically adds them to the COBRApy reaction. In this example, CobraMod creates the gene `c0319` and adds it to the reaction.

In [11]:
from cobramod.test import textbook_kegg
from cobramod import add_reactions
from pathlib import Path
                                                           
dir_data = Path.cwd().resolve().joinpath("data")
# Using copy
test_model = textbook_kegg.copy()
                                                           
add_reactions(
    model=test_model,
    obj="R04382, c",
    database="KEGG",
    directory=dir_data,
    genome="ecc"
)
                                                           
display(test_model.reactions.get_by_id("R04382_c"))
print(test_model.reactions.get_by_id("R04382_c").genes)

0,1
Reaction identifier,R04382_c
Name,4-(4-deoxy-alpha-D-galact-4-enuronosyl)-D-galacturonate lyase
Memory address,0x07fed36085f50
Stoichiometry,C06118_c <=> 2.0 C04053_c  4-(4-Deoxy-alpha-D-gluc-4-enuronosyl)-D-galacturonate; <=> 2.0 5-Dehydro-4-deoxy-D-glucuronate;
GPR,c0319
Lower bound,-1000
Upper bound,1000


frozenset({<Gene c0319 at 0x7fed40a568d0>})


In the second example, we add two reactions ([R04382](
  https://www.kegg.jp/entry/R04382
) and [R02736](
  https://www.kegg.jp/entry/R02736
 )) from KEGG. We introduce in the argument
`obj` a list with the database-specific identifier and their compartments. The
rest of the arguments remain the same as the previous example. CobraMod skips
the addition of reactions that are already included into the model and shows
a warning. The reaction information contains multiple cross-references database entries. If an entry is found in the model, then CobraMod uses it instead of creating a new COBRApy Reaction.

In [12]:
add_reactions(
    model=test_model,
    obj=["R04382, c", "R02736 ,c"],
    directory=dir_data,
    database="KEGG",
    genome="ecc"
)
                                                            
display(test_model.reactions.get_by_id("R04382_c"))
test_model.reactions.get_by_id("R02736_c")



0,1
Reaction identifier,R04382_c
Name,4-(4-deoxy-alpha-D-galact-4-enuronosyl)-D-galacturonate lyase
Memory address,0x07fed36085f50
Stoichiometry,C06118_c <=> 2.0 C04053_c  4-(4-Deoxy-alpha-D-gluc-4-enuronosyl)-D-galacturonate; <=> 2.0 5-Dehydro-4-deoxy-D-glucuronate;
GPR,c0319
Lower bound,-1000
Upper bound,1000


0,1
Reaction identifier,R02736_c
Name,beta-D-glucose-6-phosphate:NADP+ 1-oxoreductase
Memory address,0x07fed40fa3890
Stoichiometry,"C00006_c + C01172_c --> C00005_c + C00080_c + C01236_c  Nicotinamide adenine dinucleotide phosphate + beta-D-Glucose 6-phosphate --> Nicotinamide adenine dinucleotide phosphate - reduced + H+ + 6-phospho-D-glucono-1,5-lactone"
GPR,c2265
Lower bound,0
Upper bound,1000


In the following example, we use a text file to add reactions to the test model.
We have the file *reactions.txt* in the current working directory with:

    R04382, c  
    R02736, c  
    C06118_ce, digalacturonate transport | 1 C06118_c <-> 1 C06118_e

CobraMod downloads the first two reactions from KEGG, while `C06118_ce` is a
user-defined reaction.

The user can utilize the file path for this file in the `obj` argument to add
the reactions to the test model. The next arguments are the same as in the
previous examples. We added two print statements to show that CobraMod adds
the reaction to the model.

In [13]:
from cobramod.test import textbook_kegg
from cobramod import add_reactions
from pathlib import Path
                                                                     
dir_data = Path.cwd().resolve().joinpath("data")
test_model = textbook_kegg.copy()
# This is the file with text
file = dir_data.joinpath("reactions.txt")

print(f'Number of reactions prior addition: {len(test_model.reactions)}')
                                                                     
add_reactions(
    model=test_model,
    obj=file,
    directory=dir_data,
    database="KEGG",
    genome="ecc"
)

print(f'Number of reactions after addition: {len(test_model.reactions)}')
# Show in jupyter
display(test_model.reactions.get_by_id("R04382_c"))
display(test_model.reactions.get_by_id("R02736_c"))
test_model.reactions.get_by_id("C06118_ce")

Number of reactions prior addition: 95
Number of reactions after addition: 98


0,1
Reaction identifier,R04382_c
Name,4-(4-deoxy-alpha-D-galact-4-enuronosyl)-D-galacturonate lyase
Memory address,0x07fed40ce9490
Stoichiometry,C06118_c <=> 2.0 C04053_c  4-(4-Deoxy-alpha-D-gluc-4-enuronosyl)-D-galacturonate; <=> 2.0 5-Dehydro-4-deoxy-D-glucuronate;
GPR,c0319
Lower bound,-1000
Upper bound,1000


0,1
Reaction identifier,R02736_c
Name,beta-D-glucose-6-phosphate:NADP+ 1-oxoreductase
Memory address,0x07fed40ce9b10
Stoichiometry,"C00006_c + C01172_c --> C00005_c + C00080_c + C01236_c  Nicotinamide adenine dinucleotide phosphate + beta-D-Glucose 6-phosphate --> Nicotinamide adenine dinucleotide phosphate - reduced + H+ + 6-phospho-D-glucono-1,5-lactone"
GPR,c2265
Lower bound,0
Upper bound,1000


0,1
Reaction identifier,C06118_ce
Name,digalacturonate transport
Memory address,0x07fed406b5290
Stoichiometry,C06118_c <=> C06118_e  4-(4-Deoxy-alpha-D-gluc-4-enuronosyl)-D-galacturonate; <=> 4-(4-Deoxy-alpha-D-gluc-4-enuronosyl)-D-galacturonate;
GPR,
Lower bound,-1000
Upper bound,1000


Since [cobramod.add_reactions()](
module/cobramod/index.html#cobramod.add_reactions
) is an extension of the original COBRApy function
[model.add_reactions()](https://cobrapy.readthedocs.io/en/latest/autoapi/cobra/index.html?highlight=optimize#cobra.Model.add_reactions) the user can also
utilize COBRApy reactions. In this example, we use a variation of the test model
(`textbook_kegg`) which uses KEGG identifiers for their metabolites. We copy a
COBRApy Reaction from the test model and then add it to the KEGG-test model.

In [14]:
from cobramod.test import textbook_kegg, textbook
from cobramod import add_reactions
from pathlib import Path

# Using copy of test model
test_model = textbook_kegg.copy()
# Obtaining a reaction
reaction = textbook.reactions.get_by_id("ACALDt")
                                                                  
add_reactions(model=test_model, obj=reaction)

test_model.reactions.get_by_id("ACALDt")



0,1
Reaction identifier,ACALDt
Name,R acetaldehyde reversible - transport
Memory address,0x07fed406b5210
Stoichiometry,C00084_e <=> C00084_c  Acetaldehyde <=> Acetaldehyde
GPR,s0001
Lower bound,-1000.0
Upper bound,1000.0


 By default, COBRApy ignores metabolites that appear on
both sides of a reaction equation. CobraMod identifies such reactions and assigns one of these metabolites to the extracellular compartment and raises a
warning expecting the user to manual curate it. In the following example, we
add a
[transport reaction for acetic acid](
https://biocyc.org/META/new-image?object=TRANS-RXN-455
) from BioCyc sub-database `YEAST` to the test model.

In [15]:
test_model = textbook_kegg.copy()
                                                           
add_reactions(
    model=test_model,
    obj="TRANS-RXN-455, c",
    database="YEAST",
    directory=dir_data,
)
# Show in jupyter
test_model.reactions.get_by_id("TRANS_RXN_455_c")



0,1
Reaction identifier,TRANS_RXN_455_c
Name,acetic acid uptake
Memory address,0x07fed35eea5d0
Stoichiometry,CPD_24335_e --> CPD_24335_c  acetic+acid --> acetic+acid
GPR,G3O-32144
Lower bound,0
Upper bound,1000


---

**NOTES**

- CobraMod replaces hyphens (`-`) to underscores (`_`) in the identifiers when
creating COBRApy reactions.
- When adding several reactions the user can only specify one database identifier (It is not possible to use two databases within the same function call) or alternatively should call the function twice.
- CobraMod tries to identify reactions or metabolites that are already present
in the model. The metabolic pathway information contains multiple
cross-references database entries. If an entry is found in the model, then
CobraMod uses it instead of creating the COBRApy objects.
- The argument `genome` can be used with the database `KEGG` and specifies the genome for which gene information will be retrieved. The complete list of all available genomes can be found [here](
https://www.genome.jp/kegg/catalog/org_list.html).
If no genome is specified, no gene information will be retrieved and
a warning is printed as shown below:

In [16]:
test_model = textbook_kegg.copy()
                                                           
add_reactions(
    model=test_model,
    obj="R04382, c",
    database="KEGG",
    directory=dir_data,
)
test_model.reactions.get_by_id("R04382_c")



0,1
Reaction identifier,R04382_c
Name,4-(4-deoxy-alpha-D-galact-4-enuronosyl)-D-galacturonate lyase
Memory address,0x07fed35f32210
Stoichiometry,C06118_c <=> 2.0 C04053_c  4-(4-Deoxy-alpha-D-gluc-4-enuronosyl)-D-galacturonate; <=> 2.0 5-Dehydro-4-deoxy-D-glucuronate;
GPR,
Lower bound,-1000
Upper bound,1000


---

## Adding pathways
 
CobraMod can add metabolic pathways to a given model. The function
[cobramod.add_pathway()](
module/cobramod/index.html#cobramod.add_pathway) can handle either a sequence 
of database-specific reaction identifiers or a single pathway identifier as an 
argument. It is recommended to add a pathways singlely. The user should curate
the model after adding a pathway if necessary. In the examples below we
showcase these two options. Again, we use the *E. coli* core model from COBRApy
as test model.

In the first example, we add the [acetoacetate degradation pathway](
https://biocyc.org/ECOLI/new-image?object=ACETOACETATE-DEG-PWY
) from the BioCyc sub-database `ECOLI` to the test model. This pathway contains two reactions and six metabolites.

<img src="https://websvc.biocyc.org/ECOLI/diagram-only?type=PATHWAY&object=ACETOACETATE-DEG-PWY&pfontsize=normal"/>



In the following example, the first argument is the model to extend. The `pathway` argument uses the database-specific identifier `ACETOACETATE-DEG-PWY` and the database identifier `ECOLI`. We define the compartment as `c` (cytosol), i.e. all metabolites and reactions will be assigned to the cytosol. With the argument `filename` the user can specify a file to which the summary of the changes is written. All COBRApy reactions included in the pathway are tested for their capacity to cary a non-zero flux. Read more about it in the
[non-zero flux test](#Non-zero-flux-test) section. The function shows a summary
of the additions and deletions for extending the model with a pathway.


Additionally, calling [cobramod.Pathway](
module/cobramod/index.html#cobramod.Pathway
) outputs a table with the main attributes of the object.


In [17]:
from pathlib import Path
from cobramod import add_pathway
from cobramod.test import textbook
# Defining directory
dir_data = Path.cwd().resolve().joinpath("data")
                                   
# Using copy of test model
test_model = textbook.copy()

add_pathway(
    model=test_model,
    pathway="ACETOACETATE-DEG-PWY",
    database="ECOLI",
    compartment="c",
    filename="summary.txt",
    directory=dir_data,
)

# Display in jupyter
test_model.groups.get_by_id("ACETOACETATE-DEG-PWY")



Number of       new   | removed entities in
Reactions        2    |    0              
Metabolites      2    |    0              
Exchange         0    |    0              
Demand           0    |    0              
Sinks            1    |    0              
Genes            4    |    0              
Groups           1    |    0              



0,1
Pathway identifier,ACETOACETATE-DEG-PWY
Name,
Memory address,0x0140657564711120
Reactions involved,"ACETOACETYL_COA_TRANSFER_RXN_c, ACETYL_COA_ACETYLTRANSFER_RXN_c"
Genes involved,"EG12432, EG11669, EG11670, EG11672"
Visualization attributes,vertical = False color_negative = None color_positive = None color_quantile = False


Below is an example of the summary in form of a text file. The first
part lists names of all reactions, metabolites, exchange reactions,
auxiliary demand and sink reactions, genes, and groups in the model. The second part of the summary lists all elements that were added or removed by the function call `add_pathway()`.

In [18]:
%cat summary.txt

Summary:
Model identifier: e_coli_core
Model name:

Reactions:
['ACALD', 'ACALDt', 'ACKr', 'ACONTa', 'ACONTb', 'ACt2r', 'ADK1', 'AKGDH', 'AKGt2r', 'ALCD2x', 'ATPM', 'ATPS4r', 'Biomass_Ecoli_core', 'CO2t', 'CS', 'CYTBD', 'D_LACt2', 'ENO', 'ETOHt2r', 'FBA', 'FBP', 'FORt2', 'FORti', 'FRD7', 'FRUpts2', 'FUM', 'FUMt2_2', 'G6PDH2r', 'GAPD', 'GLCpts', 'GLNS', 'GLNabc', 'GLUDy', 'GLUN', 'GLUSy', 'GLUt2r', 'GND', 'H2Ot', 'ICDHyr', 'ICL', 'LDH_D', 'MALS', 'MALt2_2', 'MDH', 'ME1', 'ME2', 'NADH16', 'NADTRHD', 'NH4t', 'O2t', 'PDH', 'PFK', 'PFL', 'PGI', 'PGK', 'PGL', 'PGM', 'PIt2r', 'PPC', 'PPCK', 'PPS', 'PTAr', 'PYK', 'PYRt2', 'RPE', 'RPI', 'SUCCt2_2', 'SUCCt3', 'SUCDi', 'SUCOAS', 'TALA', 'THD2', 'TKT1', 'TKT2', 'TPI', 'ACETOACETYL_COA_TRANSFER_RXN_c', 'ACETYL_COA_ACETYLTRANSFER_RXN_c']
Metabolites:
['13dpg_c', '2pg_c', '3pg_c', '6pgc_c', '6pgl_c', 'ac_c', 'ac_e', 'acald_c', 'acald_e', 'accoa_c', 'acon_C_c', 'actp_c', 'adp_c', 'akg_c', 'akg_e', 'amp_c', 'atp_c', 'cit_c', 'co2_c', 'co2_e', 'c

In the next example, we use a list of database-specific reaction identifiers as `pathway` argument. We use the database identifier `ECOLI` and the
compartment `c` (cytosol). Additionally, we define a pathway name by using the argument `group`. The user can also use this argument to merge pathways by using the same group names.

In [19]:
from pathlib import Path
from cobramod import add_pathway
from cobramod.test import textbook_biocyc
# Defining directory
dir_data = Path.cwd().resolve().joinpath("data")

test_model = textbook_biocyc.copy()
# Defining database-specific identifiers
sequence = ["PEPDEPHOS-RXN", "PYRUVFORMLY-RXN", "FHLMULTI-RXN"]
                                                                
print(f'Number of reaction prior addition: {len(test_model.reactions)}')
                                                                
add_pathway(
    model=test_model,
    pathway=sequence,
    directory=dir_data,
    database="ECOLI",
    compartment="c",
    group="curated_pathway"
)

print(f'Number of reactions after addition: {len(test_model.reactions)}')
# Display in jupyter
test_model.groups.get_by_id("curated_pathway")

Number of reaction prior addition: 95
Number of       new   | removed entities in
Reactions        3    |    0              
Metabolites      2    |    0              
Exchange         0    |    0              
Demand           0    |    0              
Sinks            1    |    0              
Genes           11    |    0              
Groups           1    |    0              

Number of reactions after addition: 99




0,1
Pathway identifier,curated_pathway
Name,
Memory address,0x0140656786885008
Reactions involved,"PEPDEPHOS_RXN_c, PYRUVFORMLY_RXN_c, FHLMULTI_RXN_c"
Genes involved,"EG10803, EG10804, EG10701, G7627, EG10479, EG10477, EG10476, EG10478, EG10285, EG10475, EG10480"
Visualization attributes,vertical = False color_negative = None color_positive = None color_quantile = False


--------------------

**NOTES**

- A pathway is a set of COBRApy reactions. All the notes listed for `add_metabolites()` and `add_reactions()` also apply to pathways, i. e., handling of duplicate elements, transport reactions and the argument `genome` for KEGG.

--------------------


## Non-zero flux test

When calling the function `add_pathway()`, CobraMod tests each reaction of the `Pathway` object for its capability to carry a non-zero flux, i.e., if the involved metabolites can be turned over. Additionally, the user can test individual COBRApy reactions for their capability to cary a non-zero flux by using the function [cobramod.test_non_zero_flux()](
module/cobramod/index.html#cobramod.test_non_zero_flux
).


During the test, CobraMod selects one of the metabolites in the reaction and
creates a [demand reaction](
  https://cobrapy.readthedocs.io/en/latest/building_model.html#Exchanges,-Sinks-and-Demands
) to force a flux in the reaction that is tested. CobraMod checks that the
metabolites of the reaction can be turnover by counting the 
number of reactions related to each metabolite. Each metabolite must participate in at least one reaction besides the one that is tested. If the
test initially fails, auxiliary [sink reactions](
  https://cobrapy.readthedocs.io/en/latest/building_model.html#Exchanges,-Sinks-and-Demands
) are added to the model and CobraMod raises an error suggesting a manual intervention. Otherwise, if no message is printed, the test is passed and the
demand reaction is removed. The user can also use the argument `ignore_list` to
specify metabolites that should not have a auxiliary sink reaction created.

In the following example, we test the glutathione synthase reaction
(`GLUTATHIONE-SYN-RXN`) in the comparment plastid (`p`) for its capability to
carry a non-zero flux. This reaction has the following equation:

    ATP_p + GLY_p + L_GAMMA_GLUTAMYLCYSTEINE_p --> ADP_p + GLUTATHIONE_p + PROTON_p + Pi_p

The test model has only reactions and metabolites in the cytosol compartment.
Because we test a reaction from another compartment, CobraMod creates auxiliary
sink reactions for those metabolites. To showcase the error, we ignore
the metabolite `PROTON_p` using the `ignore_list` argument. CobraMod does not
create a auxiliary sink reaction for it. 

We use the function `cobramod.add_reactions()` to add the respective reaction
and some
tranport reactions for its metabolites excluding `PROTON_p`. Then we run
the [test_non_zero_flux()](
module/cobramod/index.html#cobramod.test_non_zero_flux()
) with the argument `ignore_list`. Because `PROTON_p` cannot be turnover, the
model cannot fullfill the demand reaction of the test. The model becomes
infeasible and the user sees an error suggesting manual curation.


In [20]:
from cobramod import test_non_zero_flux, add_reactions
from cobramod.test import textbook_biocyc

test_model = textbook_biocyc.copy()

add_reactions(
    model=test_model,
    # These reactions will break the model and raise errors
    obj=[
        "Redox_ADP_ATP_p, Redox_ADP_ATP_p | ADP_p <-> ATP_p",
        "TRANS_Pi_cp, Transport Phosphate_cp | Pi_c <-> Pi_p",
        "TRANS_GLUTATHIONE_cp, Transport GLUTATHIONE_cp | "
        + "GLUTATHIONE_c <-> GLUTATHIONE_p",
        "GLUTATHIONE-SYN-RXN, p",
    ],
    directory=dir_data,
    database="ECOLI",
    replacement={},
)
test_non_zero_flux(
    model=test_model,
    reaction="GLUTATHIONE_SYN_RXN_p",
    ignore_list=["PROTON_p"],
)


{'charge': -1.0, 'O': 3.0, 'P': 1.0}


NotInRangeError: The following reaction "GLUTATHIONE_SYN_RXN_p" failed the non-zero flux test multiple times. Flux value are below solver tolerance. Please curate manually by adding reactions that enable turnover of metabolites: GLY_p, L_GAMMA_GLUTAMYLCYSTEINE_p, PROTON_p, ATP_p, GLUTATHIONE_p, Pi_p, ADP_p

## Curation process

CobraMod automatically performs the following curation steps suring the creation of COBRApy reaction and metabolite objects and CobraMod 
pathway objects:


1. If CobraMod encounters large molecules or data objects with missing entries,
it prints a warning.
2. CobraMod tries to identify COBRApy reactions and metabolites that are
already in the model instead of creating them. The reaction and metabolite
information contains multiple cross-references database entries. If an entry is 
found in the model, then CobraMod uses it instead of creating the COBRApy 
objects.
3. CobraMod utilizes the COBRApy method [cobra.Reaction.check_mass_balance()](
https://cobrapy.readthedocs.io/en/latest/autoapi/cobra/core/reaction/index.html#cobra.core.reaction.Reaction.check_mass_balance 
) and returns a warning if imbalances are found.
4. This package uses the reaction reversibility information provided with the
obtained reaction data. If reversibility information is missing, CobraMod raises a warning.
5. When CobraMod adds a pathway, every pathway reaction undergos a *non-zero flux test*. If a reaction cannot carry a non-zero flux, CobraMod adds
auxiliary sink reactions to unblock the reaction and suggests manual curation steps based on these auxiliary modifications.
6. All information about downloads, the creation of objects, warnings and exceptions are written to the log file `debug.log`. As an example, below we show part of such a log file.

In [21]:
!head debug.log -n 20

2021-07-02 12:49:56,077 INFO Data for "CPD-14074" retrieved.
2021-07-02 12:49:56,082 INFO Data for "CPD-14075" retrieved.
2021-07-02 12:49:56,086 INFO Data for "CPD-14076" retrieved.
2021-07-02 12:49:56,092 INFO Data for "CPD-14553" retrieved.
2021-07-02 12:49:56,096 INFO Data for "CPD-15317" retrieved.
2021-07-02 12:49:56,102 INFO Data for "CPD-15322" retrieved.
2021-07-02 12:49:56,108 INFO Data for "CPD-15323" retrieved.
2021-07-02 12:49:56,114 INFO Data for "CPD-15326" retrieved.
2021-07-02 12:49:56,142 INFO Object 'C00026_c' identified as a metabolite
2021-07-02 12:49:56,616 INFO Metabolite "MET_c" was added to model.
2021-07-02 12:49:56,635 INFO Metabolite "SUCROSE_c" was added to model.
2021-07-02 12:49:56,688 INFO Metabolite "SUCROSE_c" was added to model.
2021-07-02 12:49:56,689 INFO Metabolite "MET_c" was added to model.
2021-07-02 12:49:56,689 INFO Metabolite "MALTOSE_c" was added to model.
2021-07-02 12:49:56,719 INFO Metabolite "xu5p__D_c" was added to model.

## Converting COBRApy Groups back to CobraMod Pathways 

The COBRApy function [cobra.io.write_sbml_model()](
  https://cobrapy.readthedocs.io/en/latest/autoapi/cobra/io/index.html#cobra.io.write_sbml_model
) writes cobra models to sbml files. If a model contains a [cobramod.Pathway](
module/cobramod/index.html#cobramod.Pathway
)
and the user calls the function `write_sbml_model`, they are saved
as a [COBRApy Group](
  https://cobrapy.readthedocs.io/en/latest/autoapi/cobra/core/group/index.html#cobra.core.group.Group
). If the user loads the written model using the COBRApy function
[cobra.io.read_sbml_model()](
  https://cobrapy.readthedocs.io/en/latest/autoapi/cobra/io/index.html#cobra.io.read_sbml_model
), the model creates COBRApy groups object instead of the CobraMod pathway
objects.

To overcome this problem, we created the function [cobramod.model_convert()](
module/cobramod/core/pathway/index.html#cobramod.core.pathway.model_convert
) which converts the COBRApy group objects into CobraMod pathway objects.

In the following example, we create a `Group` and add four reactions to 
it. We add this group to the model to simulate loading a model with groups. The
only argument for this function is `model` which is the respective model
with groups instead of pathway objects. 
 
Finally, we call the respective CobraMod pathway object to make sure it is a
pathway object.

In [22]:
from cobramod import model_convert
from cobramod.test import textbook_biocyc
from cobra.core.group import Group

test_model = textbook_biocyc.copy()
# Creation of group
test_group = Group(id="curated_pathway")
for reaction in ("GLCpts", "G6PDH2r", "PGL", "GND"):
    test_group.add_members([test_model.reactions.get_by_id(reaction)])
test_model.add_groups([test_group])

# Conversion to a Pathway
model_convert(model=test_model)
# Display to Jupyter
test_model.groups.get_by_id("curated_pathway")

0,1
Pathway identifier,curated_pathway
Name,
Memory address,0x0140657006540112
Reactions involved,"GLCpts, G6PDH2r, PGL, GND"
Genes involved,"b1818, b1621, b1101, b2416, b2417, b1819, b1817, b2415, b1852, b0767, b2029"
Visualization attributes,vertical = False color_negative = None color_positive = None color_quantile = False


## Visualization with Escher

CobraMod uses [Escher](https://escher.readthedocs.io/en/latest/) to visualize
pathways and fluxes. Each CobraMod pathway includes a visualization method
[Pathway.visualize()](
module/cobramod/core/pathway/index.html#cobramod.core.pathway.Pathway.visualize
) which automatically generates pathway maps of the respective set of
reactions. These pathway maps can be easily customized to visualize flux
distributions using default or user-defined colors and gradients (linear or
quantile normalized).

In the following example, we call the function `visualize` without any arguments.

In [23]:
test_model.groups.get_by_id("curated_pathway").visualize()

Builder(never_ask_before_quit=True, reaction_scale={}, reaction_styles=['color', 'text'])

We can modify the orientation of our pathway by changing the attribute 
`vertical` to `True`.

In [24]:
test_model.groups.get_by_id("curated_pathway").vertical = True
test_model.groups.get_by_id("curated_pathway").visualize()

Builder(never_ask_before_quit=True, reaction_scale={}, reaction_styles=['color', 'text'])

The visualization method can also be called with the argument `solution_fluxes`.
This argument can be a dictionary with the fluxes of the reactions
or a [COBRApy Solution](
https://cobrapy.readthedocs.io/en/latest/autoapi/cobra/core/solution/index.html#cobra.core.solution.Solution
). CobraMod assigns colors to the fluxes values based on the chosen normalizaton method. By default, smaller absolute fluxes (positive and negative) get a paler coloring with zero flux values colored in grey. By default, the visualization method uses the minimal and maximum bounds
from the argument `solution_flux` to show stronger coloring. CobraMod uses
these bounds to equally distribute the color gradient.

In the following example, we create a dictionary with fluxes and we pass it to the visualization method.

In [25]:
# For flux visualization of the group
solution =  {
    "GLCpts": -2, "G6PDH2r": -2, "PGL": 0.4, "GND": 1
}
# Modifying attributes
test_model.groups.get_by_id("curated_pathway").visualize(
    solution_fluxes=solution
)

Builder(never_ask_before_quit=True, reaction_data={'GLCpts': -2, 'G6PDH2r': -2, 'PGL': 0.4, 'GND': 1}, reactio…

We can change the colors of the fluxes by changing the attribute 
`color_negative` and `color_positive`. In this example, we use the red color for
negative fluxes and green for positive fluxes.

In [26]:
# Modifying attributes
test_model.groups.get_by_id("curated_pathway").color_negative = "red"
test_model.groups.get_by_id("curated_pathway").color_positive = "green"
test_model.groups.get_by_id("curated_pathway").visualize(
    solution_fluxes=solution
)

Builder(never_ask_before_quit=True, reaction_data={'GLCpts': -2, 'G6PDH2r': -2, 'PGL': 0.4, 'GND': 1}, reactio…

The user can also set the bounds of the coloring by modifying the CobraMod
patway attribute `color_min_max`. In this example we change the bounds to -10 
and 10. The color of fluxes result in pale colors because the value of the 
fluxes are not near the bounds. This option is useful when the user wants to
compare a specific range of values. For instance, comparing either only positive
or negative values between multiple pathways.

In [27]:
test_model.groups.get_by_id("curated_pathway").color_min_max = [-10, 10]
test_model.groups.get_by_id("curated_pathway").visualize(
    solution_fluxes=solution
)

Builder(never_ask_before_quit=True, reaction_data={'GLCpts': -2, 'G6PDH2r': -2, 'PGL': 0.4, 'GND': 1}, reactio…

In the next example, we use the default behavior of the bounds by setting the 
`color_min_max` attribute to `None` and change the colors to orange for
negative flux values and light blue for positive flux values. Available colors
can be found [here](
https://www.w3schools.com/cssref/css_colors.asp
)

In [28]:
# New flux with high value
solution =  {
    "GLCpts": -2, "G6PDH2r": -2, "PGL": 0.4, "GND": 1, "Other": 1000
}
# Using defaults
test_model.groups.get_by_id("curated_pathway").color_min_max = None

test_model.groups.get_by_id("curated_pathway").color_negative = "orange"
test_model.groups.get_by_id("curated_pathway").color_positive = "lightskyblue"
test_model.groups.get_by_id("curated_pathway").visualize(
    solution_fluxes=solution
)

Builder(never_ask_before_quit=True, reaction_data={'GLCpts': -2, 'G6PDH2r': -2, 'PGL': 0.4, 'GND': 1, 'Other':…

The user can change the color gradient to a quantile normalization. This means
that the color gradient is determined by the quantiles of the
`solution_fluxes` argument, rather than the maximum und minimum bounds. 
The user can activate this option by changing the attribute `color_quantile` to
`True`. This is for example useful when the fluxes values vary by several orders of magnitude. For instance, in the previous example, we added a reaction to the dictionary with a flux value of 1000. We can see that the positive colors are quite pale. Thus, in the next example we change the attribute `color_quantile` and now the colors are much brighter.

In [29]:
test_model.groups.get_by_id("curated_pathway").color_quantile = True
test_model.groups.get_by_id("curated_pathway").visualize(
    solution_fluxes=solution
)

Builder(never_ask_before_quit=True, reaction_data={'GLCpts': -2, 'G6PDH2r': -2, 'PGL': 0.4, 'GND': 1, 'Other':…

The user can call the `Pathway` for a summary of the current 
attributes.

In [30]:
test_model.groups.get_by_id("curated_pathway")

0,1
Pathway identifier,curated_pathway
Name,
Memory address,0x0140657006540112
Reactions involved,"GLCpts, G6PDH2r, PGL, GND"
Genes involved,"b1818, b1621, b1101, b2416, b2417, b1819, b1817, b2415, b1852, b0767, b2029"
Visualization attributes,vertical = True color_negative = orange color_positive = lightskyblue color_quantile = True


CobraMod pathway maps are saved as  HTML files with the default name
`pathway.html`. The user can specify the file name with the argument `filename`. In the following example, we name the file `curated_pathway.html`.


In [31]:
test_model.groups.get_by_id("curated_pathway").visualize(
    solution_fluxes=solution, filename = "curated_pathway.html"
)

Builder(never_ask_before_quit=True, reaction_data={'GLCpts': -2, 'G6PDH2r': -2, 'PGL': 0.4, 'GND': 1, 'Other':…

We can verify that the file exists by using the `ls` command.


In [32]:
!ls curated_pathway.html

curated_pathway.html
