# Guide

## Retrieving data
CobraMod is able to retrieve data from a list of Biocyc identifiers. Our package
utilizes [pathlib's](https://docs.python.org/3/library/pathlib.html)
API to create system paths that can be used in multiple operating
systems. A single loop can manage to obtain the data from a list:

In [1]:
from cobramod import get_data
from pathlib import Path

dir_data = Path.cwd().resolve().joinpath("data")
identifiers = [
    "CPD-14074",
    "CPD-14075",
    "CPD-14076",
    "CPD-14553",
    "CPD-15317",
    "CPD-15322",
    "CPD-15323",
    "CPD-15326"]

for single in identifiers:
    get_data(
        directory=dir_data,
        identifier=single,
        database="META"
    )

The first argument of [cobramod.get_data](module/cobramod/index.html#cobramod.get_data)
represents the system path where the data will be stored. In this case is a [Path](
https://docs.python.org/3/library/pathlib.html#pathlib.Path) representation using 
`pathlib`

The next argument indicates the original identifier found in given
database, which is mention as the next argumen (*META*). A corresponding
directory will be created with the name of the database, where the new
data will be stored, e. g:  

```
data
`-- META
    |-- CPD-14074.xml
    |-- CPD-14075.xml
    |-- CPD-14076.xml
    |-- CPD-14553.xml
    |-- CPD-15317.xml
    |-- CPD-15322.xml
    |-- CPD-15323.xml
    `-- CPD-15326.xml
```
As expected, a new directory was created to store locally the data. The name
*META* is derived from [Metacyc](<https://metacyc.org/>) , the accumulation of
Biocyc' sub-databases. To see the working databases, load
`cobramod.available_databases`:

In [2]:
from cobramod import available_databases
                                         
available_databases

Biocyc includes around 18.000 sub-databases. The complete list can be found in 'https://biocyc.org/biocyc-pgdb-list.shtml'. Please use the corresponding object identifier. e.g: 'ARA', 'GCF_000963925'


['META', 'KEGG', 'BIGG']

## Converting data to objects

One of the goals of CobraMod is to simplify the creation of COBRApy objects,
such as *Metabolites* and *Reactions*. Single objects can be transform
immediately using the function [cobramod.create_object](module/cobramod/index.html#cobramod.create_object):

In [3]:
from cobramod import create_object
from pathlib import Path
                                             
dir_data = Path.cwd().resolve().joinpath("data")
                                             
new_object = create_object(
    identifier="C00026",
    directory=dir_data,
    database="KEGG",
    compartment="c"
)
                                             
type(new_object)

cobra.core.metabolite.Metabolite

In this example, the KEGG's metabolite [C00026](https://www.genome.jp/dbget-bin/www_bget?C00026)
(2-Oxoglutarate), is
identified as a metabolite and is automatically built as a COBRApy object. It
is not necessary to use [get_data](module/cobramod/index.html#cobramod.get_data)
before, as this function obtains the data on its own.

## Adding Objects

CobraMod uses `cobra`'s native objects
[cobra.Reaction](
https://cobrapy.readthedocs.io/en/latest/autoapi/cobra/index.html#cobra.Reaction)
and [cobra.Metabolites](
https://cobrapy.readthedocs.io/en/latest/autoapi/cobra/index.html#cobra.Metabolite).
Furthermore, CobraMod includes an extra class
[cobramod.Pathway](module/cobramod/index.html#cobramod.Pathway), which inherits and
expands the attributes and methods from his class parent [cobra.core.group.Group](
https://cobrapy.readthedocs.io/en/latest/autoapi/cobra/core/group/index.html#cobra.core.group.Group).

### Metabolites

To add metabolites to a Model, you can simply use the function
[cobramod.add_metabolites](
module/cobramod/index.html#cobramod.add_metabolites).


In [4]:
from cobramod import add_metabolites
from cobramod.test import textbook_biocyc
from pathlib import Path
                                                                        
dir_data = Path.cwd().resolve().joinpath("data")
                                                                        
test_model = textbook_biocyc.copy()

add_metabolites(
    model=test_model,
    obj="MET, c",
    directory=dir_data,
    database="META"
)
type(test_model.metabolites.get_by_id("MET_c"))

cobra.core.metabolite.Metabolite

In this example, a copy of test model is created. The function `add_metabolite` uses as
the first argument the model to extend. The `obj` represents, in`this case,
a string with the identifier and the corresponding compartment.

The syntax for a custom metabolite:

> formatted_identifier, name, compartment, chemical_formula,
molecular_charge

Otherwise, retrieve from database with:

> metabolite_identifier, compartment

Moreover, instead of a single string you can also add a list with strings. For
the next example, a list goes to the argument *obj* and includes a two
identifiers:

In [5]:
add_metabolites(
    model=test_model,
    obj=["MET, c", "SUCROSE, c"],
    directory=dir_data,
    database="META",
)
print(type(test_model.metabolites.get_by_id("MET_c")))
                                                               
test_model.metabolites.get_by_id("SUCROSE_c")

<class 'cobra.core.metabolite.Metabolite'>


0,1
Metabolite identifier,SUCROSE_c
Name,sucrose
Memory address,0x07f1cd80837d0
Formula,C12H22O11
Compartment,c
In 0 reaction(s),


Addtionally, there is the option to give the path of a file with text.
For instance, given the file *metabolites.txt* in the current working directory
with the content:

> SUCROSE, c  
> MET, c  
> MALTOSE_c, MALTOSE[c], c, C12H22O11, 1


In [6]:
# Defining where the data is loaded and saved
dir_data = Path.cwd().resolve().joinpath("data")
file = dir_data.joinpath("metabolites.txt")
# Using a copy
test_model = textbook_biocyc.copy()

print(f'Before: {len(test_model.metabolites)}')
                                                                     
add_metabolites(
    model=test_model,
    obj=file,
    directory=dir_data,
    database="META",
)
print(f'After: {len(test_model.metabolites)}')

Before: 72
After: 75


Additionally, regular COBRApy Metabolites can be added (lists are also
supported):

In [7]:
from cobramod import add_metabolites
from cobramod.test import textbook, textbook_biocyc
                        
# Copying Metabolite
metabolite = textbook.metabolites.get_by_id("xu5p__D_c")

test_model = textbook_biocyc.copy()
add_metabolites(
    model=test_model,
    obj=metabolite
)
                                                               
type(test_model.metabolites.get_by_id("xu5p__D_c"))

cobra.core.metabolite.Metabolite

### Reactions

Very much as adding metabolites, a model can be also extended with reactions.
CobraMod includes the function [cobramod.add_reactions](
module/cobramod/index.html#cobramod.add_reactions):

In [8]:
from cobramod.test import textbook_kegg
from cobramod import add_reactions
from pathlib import Path
                                                           
dir_data = Path.cwd().resolve().joinpath("data")
test_model = textbook_kegg.copy()
                                                           
add_reactions(
    model=test_model,
    obj="R04382, c",
    database="KEGG",
    directory=dir_data,
    genome="ecc"
)
                                                           
type(test_model.reactions.get_by_id("R04382_c"))

cobra.core.reaction.Reaction

The first argument represents the model to add the reactions. The next argument
`obj` can pass a string with the identifier and the compartment for the
reaction to take place. Then, the data directory and the database must be
passed. For now, it is okay to ignore the argument `genome`.

Moreover, a list with multiple string can be passed as well.

In [9]:
add_reactions(
    model=test_model,
    obj=["R04382, c", "R02736 ,c"],
    directory=dir_data,
    database="KEGG",
    genome="ecc"
)
                                                            
type(test_model.reactions.get_by_id("R04382_c"))

cobra.core.reaction.Reaction

Additionally, instead of using a list, the path of a file can be used as well.
Given the file *reactions.txt* in the current working directory with:

> R04382, c  
> R02736, c  
> C06118_ce, digalacturonate transport | 1 C06118_c <-> 1 C06118_e

**Syntax**  
To retrieve reactions from the database:

> original_identifier, compartment

in case of custom reactions, CobraMod uses COBRApy reactions string:

> identifier, name | coefficient_1 metabolite_1 <-> coefficient_2 metabolite_2

Metabolites need to include its compartment with a suffix, defined by an underscore
and a letter: e.g: 

> 2 OXYGEN-MOLECULE_c

In [10]:
from cobramod.test import textbook_kegg
from cobramod import add_reactions
from pathlib import Path
                                                                     
dir_data = Path.cwd().resolve().joinpath("data")
test_model = textbook_kegg.copy()
file = dir_data.joinpath("reactions.txt")
                                                                     
print(f'Before: {len(test_model.reactions)}')
                                                                     
add_reactions(
    model=test_model,
    obj=file,
    directory=dir_data,
    database="KEGG",
    genome="ecc"
)

print(f'After: {len(test_model.reactions)}')

Before: 95
After: 98


Finally, regular COBRApy reactions can be added.

In [11]:
from cobramod.test import textbook_kegg, textbook
from cobramod import add_reactions
from pathlib import Path
                                                                  
dir_data = Path.cwd().resolve().joinpath("data")
test_model = textbook_kegg.copy()
reaction = textbook.reactions.get_by_id("ACALDt")
                                                                  
add_reactions(model=test_model, obj=reaction, directory=dir_data)
type(test_model.reactions.get_by_id("ACALDt"))


cobra.core.reaction.Reaction

---
**NOTES**

- If the reaction identifies that either the reaction or its
metabolites are already in the model under another name, then these
already-in-model objects will be used instead. This is a security
behaviour to prevent duplicates.
- The argument`genome` is a special argument that can be used in combination
of the database *KEGG*. This argument is responsable for selecting the genes
from this database. If no argument is passed then, no genes will be created and
a warning will appear as shown below


In [12]:
test_model = textbook_kegg.copy()
                                                           
add_reactions(
    model=test_model,
    obj="R04382, c",
    database="KEGG",
    directory=dir_data,
)



---
### Pathways

CobraMod can add complete pathways into the metabolic models. Using the
function [cobramod.add_pathway](
module/cobramod/index.html#cobramod.add_pathway), either a sequence of reaction
identifiers or a pathway identifier can be used as an argument.

In [13]:
from pathlib import Path
from cobramod.test import textbook
                                               
dir_data = Path.cwd().resolve().joinpath("data")
                                               
textbook.optimize().objective_value

0.8739215069684307

The original metabolic model *e_coli_core* from COBRApy is loaded under the
name `cobramod.test.textbook`. It shows an optimation value of 0.874. For
this example, the identifier [ACETOACETATE-DEG-PWY](
https://biocyc.org/ECOLI/new-image?object=ACETOACETATE-DEG-PWY)will be
used for the test model. This specific pathway has two reactions, in which six
metabolites participates:

In [14]:
from cobramod import add_pathway

test_model = textbook.copy()
add_pathway(
    model=test_model,
    pathway="ACETOACETATE-DEG-PWY",
    directory=dir_data,
    database="ARA",
    compartment="c",
    summary="txt"
)
test_model.groups.get_by_id("ACETOACETATE-DEG-PWY")

Changes:
Reactions 	2
Metabolites	2
Exchange 	0
Demand		0
Sinks		1
Genes		2
Groups		1



<Pathway ACETOACETATE-DEG-PWY at 0x7f1cd7d56410>

Using the argument 'summary', the changes inside the model are tracked
csv, text or excel file are available options. For instance, the text version
shows the elements of the model and its differences.

In [15]:
%cat summary.txt

Summary:
Model identifier: e_coli_core
Model name:

Reactions:
['ACETOACETYL_COA_TRANSFER_RXN_c', 'ACETYL_COA_ACETYLTRANSFER_RXN_c']
Metabolites:
['3_KETOBUTYRATE_c', 'ACETOACETYL_COA_c']
Exchange:
[]
Demand:
[]
Sinks:
['SK_3_KETOBUTYRATE_c']
Genes:
['AT5G48230', 'AT5G47720']
Groups:
['ACETOACETATE-DEG-PWY']

Changes:
Reactions:
['ACALD', 'ACALDt', 'ACKr', 'ACONTa', 'ACONTb', 'ACt2r', 'ADK1', 'AKGDH', 'AKGt2r', 'ALCD2x', 'ATPM', 'ATPS4r', 'Biomass_Ecoli_core', 'CO2t', 'CS', 'CYTBD', 'D_LACt2', 'ENO', 'ETOHt2r', 'FBA', 'FBP', 'FORt2', 'FORti', 'FRD7', 'FRUpts2', 'FUM', 'FUMt2_2', 'G6PDH2r', 'GAPD', 'GLCpts', 'GLNS', 'GLNabc', 'GLUDy', 'GLUN', 'GLUSy', 'GLUt2r', 'GND', 'H2Ot', 'ICDHyr', 'ICL', 'LDH_D', 'MALS', 'MALt2_2', 'MDH', 'ME1', 'ME2', 'NADH16', 'NADTRHD', 'NH4t', 'O2t', 'PDH', 'PFK', 'PFL', 'PGI', 'PGK', 'PGL', 'PGM', 'PIt2r', 'PPC', 'PPCK', 'PPS', 'PTAr', 'PYK', 'PYRt2', 'RPE', 'RPI', 'SUCCt2_2', 'SUCCt3', 'SUCDi', 'SUCOAS', 'TALA', 'THD2', 'TKT1', 'TKT2', 'T

The pathway included two new metabolites. Thus, sink reactions are
automatically built for them, if needed. In this case, only one sink reaction
is created since the second metabolite can be synthesized from another reaction.
All changes that CobraMod makes are stored in a log (record) file that can be
used to track changes

In [16]:
%system tail debug.log -n 20

 "2021-05-31 12:31:16,710 INFO Object 'ACETYL-COA-ACETYLTRANSFER-RXN' identified as a reaction.",
 '2021-05-31 12:31:16,752 INFO For reaction "ACETYL_COA_ACETYLTRANSFER_RXN_c", gene "AT5G48230" was created and its name changed to "AACT2".',
 '2021-05-31 12:31:16,752 INFO For reaction "ACETYL_COA_ACETYLTRANSFER_RXN_c", gene "AT5G47720" was created and its name changed to "AACT1".',
 '2021-05-31 12:31:16,753 INFO Reaction "ACETOACETYL_COA_TRANSFER_RXN_c" added to model',
 '2021-05-31 12:31:16,753 INFO Test to carry non-zero fluxes for "ACETOACETYL_COA_TRANSFER_RXN_c" started',
 '2021-05-31 12:31:16,759 INFO Reaction "ACETYL_COA_ACETYLTRANSFER_RXN_c" added to model',
 '2021-05-31 12:31:16,759 INFO Test to carry non-zero fluxes for "ACETYL_COA_ACETYLTRANSFER_RXN_c" started',
 '2021-05-31 12:31:18,584 CRITICAL "ACETOACETYL-COA-TRANSFER-RXN" is not available in "ARA"',
 '2021-05-31 12:31:18,586 INFO Data for "ACETOACETYL-COA-TRANSFER-RXN" retrieved.',
 '2021-05-31 12:31:18,621 INFO Data for 

It can be seen that metabolites such as *ACETYL-COA* or *CO-A* where recognized by
CobraMod as metabolites, which were already included into the model.

Moreover, the objective value changed drastically due to insertion of
a sink reaction. It can be seen that both reaction are being activated if
their fluxes are checked.

In [17]:
print(test_model.sinks)
solution = test_model.optimize()
solution

[<Reaction SK_3_KETOBUTYRATE_c at 0x7f1cd7d6bf10>]


Unnamed: 0,fluxes,reduced_costs
ACALD,-8.646247,-6.521120e-18
ACALDt,0.000000,-0.000000e+00
ACKr,-161.140741,-1.496236e-18
ACONTa,401.748995,-1.115934e-16
ACONTb,401.748995,0.000000e+00
...,...,...
TKT2,-10.986560,-5.553047e-17
TPI,-10.241399,-3.474497e-17
ACETOACETYL_COA_TRANSFER_RXN_c,838.859259,2.602085e-18
SK_3_KETOBUTYRATE_c,-838.859259,1.734723e-18


If the sink reaction *SK_3_KETOBUTYRATE_c* gets removed, the fluxes for this
new pathways are deactivated since there is no reaction to synthetize the
start-metabolite.

In [18]:
test_model.remove_reactions(["SK_3_KETOBUTYRATE_c"])
test_model.optimize()

Unnamed: 0,fluxes,reduced_costs
ACALD,-3.189643e-15,6.762643e-18
ACALDt,0.000000e+00,-0.000000e+00
ACKr,1.024517e-15,7.481180e-19
ACONTa,6.007250e+00,-1.417065e-16
ACONTb,6.007250e+00,0.000000e+00
...,...,...
TKT1,1.496984e+00,0.000000e+00
TKT2,1.181498e+00,1.586585e-17
TPI,7.477382e+00,0.000000e+00
ACETOACETYL_COA_TRANSFER_RXN_c,0.000000e+00,1.734723e-17


Similar results can be achieved using a sequence. For this example, three
reactions from the [mixed acid fermentation](
<https://biocyc.org/META/NEW-IMAGE?type=PATHWAY&object=FERMENTATION-PWY>)
pathway from MetaCyc will be added to the metabolic model:

In [19]:
from pathlib import Path
from cobramod import add_pathway
from cobramod.test import textbook_biocyc
                                                                
dir_data = Path.cwd().resolve().joinpath("data")
test_model = textbook_biocyc.copy()
sequence = ["PEPDEPHOS-RXN", "PYRUVFORMLY-RXN", "FHLMULTI-RXN"]
                                                                
print(f'Before: {len(test_model.reactions)}')
                                                                
add_pathway(
    model=test_model,
    pathway=sequence,
    directory=dir_data,
    database="ECOLI",
    compartment="c",
    group="test_group"
)

print(f'After: {len(test_model.reactions)}')

Before: 95
Changes:
Reactions 	3
Metabolites	2
Exchange 	0
Demand		0
Sinks		1
Genes		11
Groups		1

After: 99


We defined the argument `group` as *test_group*. If we search for that
identifier in the model we will find the new group with these reactions as
members:

In [20]:
test_model.groups.get_by_id("test_group").members

[<Reaction PEPDEPHOS_RXN_c at 0x7f1cd7ce0c90>,
 <Reaction PYRUVFORMLY_RXN_c at 0x7f1cd5bfe210>,
 <Reaction FHLMULTI_RXN_c at 0x7f1cd5ae4dd0>]

This group is a special type of COBRApy Group. It is called [Pathway](
module/cobramod/index.html#cobramod.Pathway), and it
is able to show through Escher, their participants and if given, its flux
distribution with the method [Pathway.visualize()](
module/cobramod/core/pathway/index.html#cobramod.core.pathway.Pathway.visualize):

In [25]:
test_model.groups.get_by_id("test_group").visualize()

Builder(reaction_styles=['text'])

As expected as the prior example, a extra sink reaction was created since there
is no hydrogen metabolite in the model:

In [22]:
test_model.optimize()

Unnamed: 0,fluxes,reduced_costs
ACALD,-3.464201e-14,9.660919e-19
ACALDt,-3.783165e-14,-0.000000e+00
ACKr,-2.383494e-14,-5.984944e-18
ACONTa,6.007250e+00,-1.436386e-16
ACONTb,6.007250e+00,-2.164046e-16
...,...,...
TPI,7.477382e+00,0.000000e+00
PEPDEPHOS_RXN_c,0.000000e+00,-6.938894e-18
PYRUVFORMLY_RXN_c,0.000000e+00,1.273121e-02
FHLMULTI_RXN_c,0.000000e+00,5.204170e-18


As expected as the prior example, a extra sink reaction was created since there
is no hydrogen metabolite in the model

## Converting Group back to Pathway

When reloading a model or using a copy of a model using default [Model.copy()](
https://cobrapy.readthedocs.io/en/latest/autoapi/cobra/core/model/index.html#cobra.core.model.Model.copy),
the new model/copy will convert the `Pathway` to `Group`. To convert it back
we only need the function [cobramod.core.pathway.model_convert()](
module/cobramod/core/pathway/index.html#model_convert)


In [31]:
from cobramod.core.pathway import model_convert

# Using previous example
new_model = test_model.copy()
# Will raise an error
try:
    new_model.groups.get_by_id("test_group").visualize()
except Exception as e:
    print("Error raised ->", e)
    
# Using method
new_model = test_model.copy()
model_convert(model=new_model)
new_model.groups.get_by_id("test_group").visualize()

Error raised -> 'NoneType' object has no attribute 'sort'


AttributeError: 'NoneType' object has no attribute 'sort'