# Functions

## Retrieving biochemical data
CobraMod can retrieve data from a list of identifiers. Our package
utilizes [pathlib's](https://docs.python.org/3/library/pathlib.html) to create system paths that can be used in multiple operating
systems. A single loop can manage to obtain the data from a list. For this example we will use data from MetaCyc:

In [1]:
from cobramod import get_data
from pathlib import Path

dir_data = Path.cwd().resolve().joinpath("data")
identifiers = [
    "CPD-14074",
    "CPD-14075",
    "CPD-14076",
    "CPD-14553",
    "CPD-15317",
    "CPD-15322",
    "CPD-15323",
    "CPD-15326"]

for single in identifiers:
    get_data(
        directory=dir_data,
        identifier=single,
        database="META"
    )

Scaling...
 A: min|aij| =  1.000e+00  max|aij| =  1.000e+00  ratio =  1.000e+00
Problem data seem to be well scaled


The first argument of [cobramod.get_data](
module/cobramod/index.html#cobramod.get_data) represents the system path where
the data will be stored. We used a [Path](
https://docs.python.org/3/library/pathlib.html#pathlib.Path) representation using `pathlib`

The next argument indicates the original identifier found in given database
(*META*). The last argument corresponds to the abbreviation of the database.

CobraMod will create a directory will the name of the database and stored the
new data in it. e. g:
```
data
`-- META
    |-- CPD-14074.xml
    |-- CPD-14075.xml
    |-- CPD-14076.xml
    |-- CPD-14553.xml
    |-- CPD-15317.xml
    |-- CPD-15322.xml
    |-- CPD-15323.xml
    `-- CPD-15326.xml
```
**NOTE**  
Check the section about [databases](#databases) for more information.




## Converting data to COBRApy objects

To simplify the creation of COBRApy objects users can call the function
[cobramod.create_object](module/cobramod/index.html#cobramod.create_object). 
This function will automatically use the stored data or download the 
biochemical information.

In [2]:
from cobramod import create_object
from pathlib import Path
                                             
dir_data = Path.cwd().resolve().joinpath("data")
                                             
new_object = create_object(
    identifier="C00026",
    directory=dir_data,
    database="KEGG",
    compartment="c"
)
                                             
type(new_object)

cobra.core.metabolite.Metabolite

In this example, the KEGG metabolite [C00026](
https://www.genome.jp/dbget-bin/www_bget?C00026) (2-Oxoglutarate) is identified
as a metabolite and is automatically built as a COBRApy object. 

## Adding metabolites

To add metabolites to a model, users can simply employ the function
[cobramod.add_metabolites](
module/cobramod/index.html#cobramod.add_metabolites). This function is an 
extension of the original COBRApy function. Users can now use a simple syntax
to create the metabolites.

------

**SYNTAX**  
To retrieve biochemical data from a database:

    identifier, compartment

Additionally, users are able to create self-curated metabolites:

    formatted_identifier, name, compartment, chemical_formula, molecular_charge

------

The function `add_metabolites` can utilize a single string, a list with strings,
a Path object that includes strings or regular Metabolite objects.

In [3]:
from cobramod import add_metabolites
from cobramod.test import textbook_biocyc
from pathlib import Path
                                                                        
dir_data = Path.cwd().resolve().joinpath("data")
# Using copy
test_model = textbook_biocyc.copy()

add_metabolites(
    model=test_model,
    obj="MET, c",
    directory=dir_data,
    database="META"
)
type(test_model.metabolites.get_by_id("MET_c"))

cobra.core.metabolite.Metabolite

The first argument is the model to extend. The `obj` represents a string with
the identifier and the corresponding compartment.

Instead of a single string, users can also add a list with strings. In the
following example, a list with two identifiers is used for `obj`.

In [4]:
add_metabolites(
    model=test_model,
    obj=["MET, c", "SUCROSE, c"],
    directory=dir_data,
    database="META",
)
print(type(test_model.metabolites.get_by_id("MET_c")))
                                                               
test_model.metabolites.get_by_id("SUCROSE_c")

<class 'cobra.core.metabolite.Metabolite'>




0,1
Metabolite identifier,SUCROSE_c
Name,sucrose
Memory address,0x07fdaa6e12550
Formula,C12H22O11
Compartment,c
In 0 reaction(s),


In case that metabolites are already found in the model, CobraMod will skip
their additions.

There is also the option to give the path of a file with text.
For instance, given the file *metabolites.txt* in the current working directory
with the content:

    SUCROSE, c  
    MET, c  
    MALTOSE_c, MALTOSE[c], c, C12H22O11, 1

Users can define `obj` a system path for that specific file:

In [5]:
# Defining where the data is loaded and saved
dir_data = Path.cwd().resolve().joinpath("data")
# This is our file
file = dir_data.joinpath("metabolites.txt")
# Using a copy
test_model = textbook_biocyc.copy()

print(f'Before: {len(test_model.metabolites)}')
                                                                     
add_metabolites(
    model=test_model,
    obj=file,
    directory=dir_data,
    database="META",
)
print(f'After: {len(test_model.metabolites)}')

Before: 72
After: 75


Finally, regular COBRApy Metabolites or lists with them are also supported.

In [6]:
from cobramod import add_metabolites
from cobramod.test import textbook, textbook_biocyc
                        
# Copying Metabolite from original model
metabolite = textbook.metabolites.get_by_id("xu5p__D_c")
# Using a copy
test_model = textbook_biocyc.copy()
add_metabolites(
    model=test_model,
    obj=metabolite
)
                                                               
type(test_model.metabolites.get_by_id("xu5p__D_c"))

cobra.core.metabolite.Metabolite

----------------
**NOTES**

- Hyphens ("-") in identifers will be replace for underscores ("_").
- When CobraMod encounters large molecules such as enzymes or
the data is not complete, the users will receive a warning about its
properties: 

In [7]:
test_model = textbook_biocyc.copy()
add_metabolites(
    model=test_model,
    obj="Red-NADPH-Hemoprotein-Reductases, c",
    directory=dir_data,
    database="META",
)
test_model.metabolites.get_by_id("Red_NADPH_Hemoprotein_Reductases_c")

  warn(msg)


0,1
Metabolite identifier,Red_NADPH_Hemoprotein_Reductases_c
Name,Red-NADPH-Hemoprotein-Reductases
Memory address,0x07fdaa6c33750
Formula,X
Compartment,c
In 0 reaction(s),


----------------

## Adding reactions

CobraMod includes the function [cobramod.add_reactions](
module/cobramod/index.html#cobramod.add_reactions) that works similar to 
[cobramod.add_metabolites](
module/cobramod/index.html#cobramod.add_metabolites) and its an extension of
the original COBRApy method.

Our new function can use a string, a list with strings, or a Reaction object.

--------

**SYNTAX**  
Users can use the following syntax to retrieve reactions from a database:

    original_identifier, compartment

In case of user-curated reactions, the users can specify the identifier and the name of the reaction, following the COBRApy reaction string syntax:

    identifier, name | coefficient_1 metabolite_1 <-> coefficient_2 metabolite_2

Metabolites need to include its compartment with a suffix, defined by an underscore
and a letter: e.g: 

    TRANS_H2O_ec, Oxygen Transport | 2 OXYGEN-MOLECULE_e <-> 2 OXYGEN_MOLECULE_c

-------

In [8]:
from cobramod.test import textbook_kegg
from cobramod import add_reactions
from pathlib import Path
                                                           
dir_data = Path.cwd().resolve().joinpath("data")
# Using copy
test_model = textbook_kegg.copy()
                                                           
add_reactions(
    model=test_model,
    obj="R04382, c",
    database="KEGG",
    directory=dir_data,
    genome="ecc"
)
                                                           
type(test_model.reactions.get_by_id("R04382_c"))

cobra.core.reaction.Reaction

The first argument represents the model to extend. The argument `obj` can pass 
a string with the identifier of the reaction and the compartment, where it 
should take place. Then, the data directory and the database must be
passed. 
Please read the notes below for more information about the argument `genome`.

Users can also use a list with multiple string:

In [9]:
add_reactions(
    model=test_model,
    obj=["R04382, c", "R02736 ,c"],
    directory=dir_data,
    database="KEGG",
    genome="ecc"
)
                                                            
type(test_model.reactions.get_by_id("R04382_c"))



cobra.core.reaction.Reaction

Another option is to use the path of a file that includes text. Given the file
*reactions.txt* in the current working directory with:

    R04382, c  
    R02736, c  
    C06118_ce, digalacturonate transport | 1 C06118_c <-> 1 C06118_e

Users can define `obj` as a system path for that specific file:

In [10]:
from cobramod.test import textbook_kegg
from cobramod import add_reactions
from pathlib import Path
                                                                     
dir_data = Path.cwd().resolve().joinpath("data")
test_model = textbook_kegg.copy()
# This is the file with text
file = dir_data.joinpath("reactions.txt")
                                                                     
print(f'Before: {len(test_model.reactions)}')
                                                                     
add_reactions(
    model=test_model,
    obj=file,
    directory=dir_data,
    database="KEGG",
    genome="ecc"
)

print(f'After: {len(test_model.reactions)}')

Before: 95
After: 98


Finally, regular COBRApy reactions can be added.

In [11]:
from cobramod.test import textbook_kegg, textbook
from cobramod import add_reactions
from pathlib import Path
                                                                  
dir_data = Path.cwd().resolve().joinpath("data")
test_model = textbook_kegg.copy()
reaction = textbook.reactions.get_by_id("ACALDt")
                                                                  
add_reactions(model=test_model, obj=reaction, directory=dir_data)
type(test_model.reactions.get_by_id("ACALDt"))




cobra.core.reaction.Reaction

---
**NOTES**

- Hyphens ("-") in identifers will be replace for underscores ("_").
- If CobraMod identifies that a reaction or its metabolites are already present
in the model under another name, then these already-in-model objects will be
used instead. This is a security behaviour to prevent duplicates.
- By default, COBRApy ignores in reactions metabolites that appear in
both sides of the equation. CobraMod will identify these types of reactions 
and change the compartment of one of the metabolites to the extracellular space. This behaviour will secure that no metabolites are ignored. A warning is
raised for the users. e.g


In [12]:
test_model = textbook_kegg.copy()
                                                           
add_reactions(
    model=test_model,
    obj="TRANS-RXN-455, c",
    database="YEAST",
    directory=dir_data,
)
# This reaction has 
test_model.reactions.get_by_id("TRANS_RXN_455_c")



0,1
Reaction identifier,TRANS_RXN_455_c
Name,acetic acid uptake
Memory address,0x07fdaa6ac0cd0
Stoichiometry,CPD_24335_e --> CPD_24335_c  acetic+acid --> acetic+acid
GPR,G3O-32144
Lower bound,0
Upper bound,1000


- The argument `genome` is a special argument that can be used in combination
of the database *KEGG*. This argument is responsable for selecting the genes
from this database. The complete list is available [here](
https://www.genome.jp/kegg/catalog/org_list.html).
If no argument is passed then, no genes will be created and
a warning will appear as shown below:

In [13]:
test_model = textbook_kegg.copy()
                                                           
add_reactions(
    model=test_model,
    obj="R04382, c",
    database="KEGG",
    directory=dir_data,
)
test_model.reactions.get_by_id("R04382_c")



0,1
Reaction identifier,R04382_c
Name,4-(4-deoxy-alpha-D-galact-4-enuronosyl)-D-galacturonate lyase
Memory address,0x07fdaa6b0f910
Stoichiometry,C06118_c <=> 2.0 C04053_c  4-(4-Deoxy-alpha-D-gluc-4-enuronosyl)-D-galacturonate; <=> 2.0 5-Dehydro-4-deoxy-D-glucuronate;
GPR,
Lower bound,-1000
Upper bound,1000


---
## Adding pathways

CobraMod can add complete pathways into the metabolic models. Users can use the
function [cobramod.add_pathway](
module/cobramod/index.html#cobramod.add_pathway) to either add a sequence of
reaction identifiers or a pathway identifier. For the first example, the
pathway [ACETOACETATE-DEG-PWY](
https://biocyc.org/ECOLI/new-image?object=ACETOACETATE-DEG-PWY)
(acetoacetate degradation) is used to extend the model:
<img src=
"https://websvc.biocyc.org/ECOLI/diagram-only?type=PATHWAY&
object=ACETOACETATE-DEG-PWY&pfontsize=normal"/>

This specific pathway has two reactions with six metabolites.

In [14]:
from pathlib import Path
from cobramod import add_pathway
from cobramod.test import textbook

dir_data = Path.cwd().resolve().joinpath("data")
                                   
print(
    f'Before extension: {textbook.slim_optimize()}'
)
                                               
test_model = textbook.copy()
add_pathway(
    model=test_model,
    pathway="ACETOACETATE-DEG-PWY",
    directory=dir_data,
    database="ECOLI",
    compartment="c",
    filename="summary.txt"
)
print(
    f'After extension: {textbook.slim_optimize()}'
)
test_model.groups.get_by_id("ACETOACETATE-DEG-PWY")



Before extension: 0.8739215069684307
Quantity of     new   | removed entities in
Reactions        2    |    0              
Metabolites      2    |    0              
Exchange         0    |    0              
Demand           0    |    0              
Sinks            1    |    0              
Genes            4    |    0              
Groups           1    |    0              

After extension: 0.8739215069684307


<Pathway ACETOACETATE-DEG-PWY at 0x7fdaa6903290>

The first argument is the model to extend. The argument `pathway` represents 
the identifier of the pathway. The directory and the abbreviation of the
database must be also included. The argument `compartment` will locate all
reaction in that specific location. Users can use the argument `summary` to
specify the file where the summary of the changes should be written. Posible
options are a text file, a csv or a excel file:

In [15]:
%cat summary.txt

Summary:
Model identifier: e_coli_core
Model name:

Reactions:
['ACALD', 'ACALDt', 'ACKr', 'ACONTa', 'ACONTb', 'ACt2r', 'ADK1', 'AKGDH', 'AKGt2r', 'ALCD2x', 'ATPM', 'ATPS4r', 'Biomass_Ecoli_core', 'CO2t', 'CS', 'CYTBD', 'D_LACt2', 'ENO', 'ETOHt2r', 'FBA', 'FBP', 'FORt2', 'FORti', 'FRD7', 'FRUpts2', 'FUM', 'FUMt2_2', 'G6PDH2r', 'GAPD', 'GLCpts', 'GLNS', 'GLNabc', 'GLUDy', 'GLUN', 'GLUSy', 'GLUt2r', 'GND', 'H2Ot', 'ICDHyr', 'ICL', 'LDH_D', 'MALS', 'MALt2_2', 'MDH', 'ME1', 'ME2', 'NADH16', 'NADTRHD', 'NH4t', 'O2t', 'PDH', 'PFK', 'PFL', 'PGI', 'PGK', 'PGL', 'PGM', 'PIt2r', 'PPC', 'PPCK', 'PPS', 'PTAr', 'PYK', 'PYRt2', 'RPE', 'RPI', 'SUCCt2_2', 'SUCCt3', 'SUCDi', 'SUCOAS', 'TALA', 'THD2', 'TKT1', 'TKT2', 'TPI', 'ACETOACETYL_COA_TRANSFER_RXN_c', 'ACETYL_COA_ACETYLTRANSFER_RXN_c']
Metabolites:
['13dpg_c', '2pg_c', '3pg_c', '6pgc_c', '6pgl_c', 'ac_c', 'ac_e', 'acald_c', 'acald_e', 'accoa_c', 'acon_C_c', 'actp_c', 'adp_c', 'akg_c', 'akg_e', 'amp_c', 'atp_c', 'cit_c', 'co2_c', 'co2_e', 'c

Similar results can be achieved using a sequence. For this example, three
reactions from the [mixed acid fermentation](
<https://biocyc.org/META/NEW-IMAGE?type=PATHWAY&object=FERMENTATION-PWY>)
pathway from MetaCyc will be added to the metabolic model:

In [16]:
from pathlib import Path
from cobramod import add_pathway
from cobramod.test import textbook_biocyc
                                                                
dir_data = Path.cwd().resolve().joinpath("data")
test_model = textbook_biocyc.copy()
sequence = ["PEPDEPHOS-RXN", "PYRUVFORMLY-RXN", "FHLMULTI-RXN"]
                                                                
print(f'Before: {len(test_model.reactions)}')
                                                                
add_pathway(
    model=test_model,
    pathway=sequence,
    directory=dir_data,
    database="ECOLI",
    compartment="c",
    group="test_group"
)

print(f'After: {len(test_model.reactions)}')

Before: 95
Quantity of     new   | removed entities in
Reactions        3    |    0              
Metabolites      2    |    0              
Exchange         0    |    0              
Demand           0    |    0              
Sinks            1    |    0              
Genes           11    |    0              
Groups           1    |    0              

After: 99




We defined the argument `group` as *test_group*. This argument will rename
the Pathway-object with the specified name. If we search for that identifier in
the model we will find the new group with these reactions as members:

In [17]:
test_model.groups.get_by_id("test_group").members

[<Reaction PEPDEPHOS_RXN_c at 0x7fdaa6ac7210>,
 <Reaction PYRUVFORMLY_RXN_c at 0x7fdaa67c3c90>,
 <Reaction FHLMULTI_RXN_c at 0x7fdaa6854cd0>]

--------------------

**NOTES**

- Because a Pathway is set of multiple reactions, the comments above also
apply when adding a pathway. i. e. duplicate elements, transport reactions and
the argument `genome` for KEGG.
- It is possible to merge to pathways. Users can define in multiple pathways to
the argument `group` and CobraMod will automatically join the pathways. This is
useful when encountering multiple sub-pathways.
- Each reaction will be tested for the capability to carry a non-zero flux. Read
the section about it for more information.

--------------------

## Visualization with Escher

This group is a special type of COBRApy Group. It is called [Pathway](
module/cobramod/index.html#cobramod.Pathway), and it
is able to show through Escher, their participants and if given, its flux
distribution with the method [Pathway.visualize()](
module/cobramod/core/pathway/index.html#cobramod.core.pathway.Pathway.visualize):

In [18]:
test_model.groups.get_by_id("test_group").visualize()

Builder(reaction_scale={}, reaction_styles=['color', 'text'])

The Pathway-objects can be easily modified in order to show a vertical
orientation, flux distributions using defaults or user-defined colors
changes the colors of the flux gradients (linear or quantile normalized).


In [19]:
# For flux visualization of the group
solution =  {
    "PEPDEPHOS_RXN_c": -2, "PYRUVFORMLY_RXN_c": -2, "FHLMULTI_RXN_c": 0.4
}
# Modifying attributes
test_model.groups.get_by_id("test_group").vertical = True
test_model.groups.get_by_id("test_group").color_negative = "red"
test_model.groups.get_by_id("test_group").color_positive = "green"
test_model.groups.get_by_id("test_group").color_quantile = True
test_model.groups.get_by_id("test_group").visualize(solution_fluxes=solution)

Builder(reaction_data={'PEPDEPHOS_RXN_c': -2, 'PYRUVFORMLY_RXN_c': -2, 'FHLMULTI_RXN_c': 0.4}, reaction_scale=…

--------------------

**NOTES**
- Calling the `visualize()` method without a flux solution, will display only
the map without colors.
- Users can pass in the argument `solution_fluxes` either a Solution object or
a dictionary with the fluxes for each reaction.
- The pathways will be always saved as a html. The default name
is *pathway.html*. Users can customize the name of the file.

--------------------
## Non-zero flux test

When using the function `add_pathway`, all the new reaction will be test for
their capability to carry non-zero fluxes. Additionally, users can run the
function for single reactions using `test_non_zero_flux`.

In this test, CobraMod verifies that the metabolites of the reactions can be 
turned over in the model. In the case that a reaction fail, auxiliary sink 
reactions will be added. CobraMod will raise warnings for the users and suggest
manual curation. Otherwise, if nothing appears, the test is passed.

In [20]:
from cobramod import test_non_zero_flux

test_non_zero_flux(model=test_model, reaction="PEPDEPHOS_RXN_c")

## Curation process

Using the multiple functions from above, CobraMod will download the biochemical
information and parse it to create the corresponding COBRApy objects. During
this process, every single object will be tested for multiple criteria.

1. CobraMod will try to identify if metabolites are included into the model
with different names.
2. Every time a Metabolite is created, CobraMod will read the metadata
of the files and will try to find duplicates in the model.
3. If CobraMod encounters large molecules or data with missing properties,
users will be warned about them.
4. Every time a Reaction is created, CobraMod will check if the reaction is
already in the model, to avoid adding duplicates.
5. CobraMod will use the COBRApy method `check_mass_balance` and will return
warning if imbalances are found.
6. This package will always respect the reaction reversibility stated in the
biochemical data of the reaction. In case that the reversibility is missing, a
warning will be raised.
7. When CobraMod adds pathway, every single reaction will undergo through a
"non-zero flux test". This test will ensure that the added reactions can carry a
non-zero flux. In case that a reaction encounter problems, CobraMod will create
auxiliary sink reactions and will suggest manual curation steps based on these
auxiliary modifications.
8. All the information about the download, the creation of every single object,
the warnings and exceptions will be written in a log file with the name
`debug.log`. This file should help users keep track of the changes of the model.

In [21]:
!tail debug.log -n 20

2021-07-14 14:48:30,574 INFO Reaction "PYRUVFORMLY_RXN_c" passed the non-zero flux test.
2021-07-14 14:48:30,575 INFO Reaction "PYRUVFORMLY_RXN_c" added to group "test_group".
2021-07-14 14:48:30,576 INFO Reaction "FHLMULTI_RXN_c" was added to model.
2021-07-14 14:48:30,576 INFO Test to carry non-zero fluxes for "FHLMULTI_RXN_c" started
2021-07-14 14:48:30,577 INFO Reaction "FHLMULTI_RXN_c" passed the non-zero flux test.
2021-07-14 14:48:30,578 INFO Reaction "FHLMULTI_RXN_c" added to group "test_group".
2021-07-14 14:48:30,578 INFO Pathway "test_group" added to Model.
2021-07-14 14:48:30,589 INFO Data for "PEPDEPHOS-RXN" retrieved from "ECOLI".
2021-07-14 14:48:30,598 INFO Data for "PYRUVFORMLY-RXN" retrieved from "ECOLI".
2021-07-14 14:48:30,608 INFO Data for "FHLMULTI-RXN" retrieved from "ECOLI".
2021-07-14 14:48:30,661 INFO Test to carry non-zero fluxes for "PEPDEPHOS_RXN_c" started
2021-07-14 14:48:30,663 INFO Reaction "PEPDEPHOS_RXN_c" passed the non-zero flux test.


## Converting Group back to Pathway

When reloading a model or the new model/copy will convert the
`Pathway` to `Group`. To convert it back
we only need the function [cobramod.model_convert()](
module/cobramod/core/pathway/index.html#model_convert)

In [22]:
from cobramod import model_convert
from cobramod.test import textbook_biocyc
from cobra.core.group import Group

new_model = textbook_biocyc.copy()
test_group = Group(id="test")
for reaction in ("GLCpts", "G6PDH2r", "PGL", "GND"):
    test_group.add_members([new_model.reactions.get_by_id(reaction)])
new_model.add_groups([test_group])

model_convert(model=new_model)
type(new_model.groups[-1])

cobramod.core.pathway.Pathway

## Databases

CobraMod supports all databases from the BioCyc collection, the KEGG database and the BiGG Models repository. Users can load `cobramod.available_databases` and print its message.

In [23]:
from cobramod import available_databases
                                         
available_databases

Biocyc includes around 18.000 sub-databases. The complete list can be found in 'https://biocyc.org/biocyc-pgdb-list.shtml'. Please use the corresponding object identifier. e.g: 'ARA', 'GCF_000963925'


['META', 'PLANT', 'KEGG', 'BIGG']