# Usage example on model of _Corynebacterium tuberculostearicum (high GC Gram+)_.
In this notebook we show how to use the Mass Charge Curation python package. As an example model we use a model of [_Corynebacterium tuberculostearicum (high GC Gram+)_, strain DSM 44922](https://www.ncbi.nlm.nih.gov/assembly/GCF_013408445.1/) created with [CarveMe Version 1.5.1](https://carveme.readthedocs.io/en/latest/index.html), which here is simply called _model.xml_.

If you are interested in a more verbose output you can uncomment the following line:

In [1]:
import logging; logging.basicConfig(level=logging.INFO) 

## Dependencies
First we will check if all dependencies of the curation package are fullfilled.

In [2]:
try: import numpy
except Exception as e: print("You seem to be missing numpy. You can usually install it via 'pip install numpy'"); raise e
try: import pandas
except Exception as e: print("You seem to be missing pandas. You can usually install it via 'pip install pandas'"); raise e
try: import cobra
except Exception as e: print("You seem to be missing cobrapy. See https://github.com/opencobra/cobrapy on how to install it."); raise e
try: import z3
except Exception as e: print("You seem to be missing z3 or the corresponding python bindings. See https://github.com/Z3Prover/z3 on how to install it."); raise e

Next we see if the Mass Charge Curation package is installed properly.

In [3]:
try: import MCC
except Exception as e: print("The mass charge curation package does not seem to be installed correctly. Make sure you have the correct python version installed and try running pip install -e ./.. in the folder of this notebook."); raise e

## Loading the model
Once all dependencies are installed, we can take a look at our example model.

First we will read in our model using cobrapy.

In [4]:
model = cobra.io.read_sbml_model("model.xml")
print(f"The model has {len(model.metabolites)} metabolites and {len(model.reactions)} reactions.")

The model has 1019 metabolites and 1481 reactions.


We can first take a look at how many reactions in our model are unbalanced.

In [5]:
unbalanced_count = sum([not MCC.is_balanced(reaction) for reaction in model.reactions])
print(f"There are {unbalanced_count} unbalanced reactions in the original model.")

There are 447 unbalanced reactions in the original model.


Out of those there are usually around 100-300 pseudo reactions, which cannot be balanced by definition.

In [6]:
pseudo_count = len(MCC.get_pseudo_reactions(model))
print(f"There are {pseudo_count} pseudo reactions in the original model.")
print(f"There are {unbalanced_count - pseudo_count} unbalanced non-pseudo reactions in the original model.")

There are 196 pseudo reactions in the original model.
There are 251 unbalanced non-pseudo reactions in the original model.


Our goal will be to find most plausible assignments to the models metabolites such that we reduce this number of 251 unbalanced non-pseudo reactions to its minimum.

## Curating Mass and Charge
We can now instantiate a curation class. There are different ways to use the package, depending how data should be gathered, how much curation has already been done and how much data is available offline.

We will first give an example of the most simple usage, downloading as many databases as possible and updating all [identifiers.org](https://identifiers.org/) identifiers we need for the package to work optimally. This will take significantly longer (~ 15 Minutes) than running the algorithm on an already annotated model, however it is important to use the most up-to-date identifiers if we want to include as much information as possible.

This will create a folder _/data_ in the current directory where all database information is downloaded to.

The arguments are as follows:
* **model**: Model we want to curate.
* **data_path**: Path to the directory containing database files. Defaults to _/data_. If the directory does not exist, it will be created. If a file cannot be found, we will try to download it. 
* **update_ids**: If this is set to _True_, we will first try to update all [identifiers.org](https://identifiers.org/) ids. This will take a while but is important to properly index the different databases. Defaults to _False_. 

**Note**: It is expected to see _No objective coefficients in model. Unclear what should be optimized_ warnings here, this poses no problem for this package.

In [7]:
balancer = MCC.MassChargeCuration(model = model, data_path = "./data", update_ids = True)

INFO:root:1/1019: 10fthf_c
INFO:root:Found new ids ['10fthf'] in bigg.metabolite via id & name based search for 10fthf_c
INFO:root:2/1019: 12dgr140_c
INFO:root:Found new ids ['12dgr140'] in bigg.metabolite via id & name based search for 12dgr140_c
INFO:root:3/1019: 12dgr140_p
INFO:root:Found new ids ['12dgr140'] in bigg.metabolite via id & name based search for 12dgr140_p
INFO:root:4/1019: 12dgr141_c
INFO:root:Found new ids ['12dgr141'] in bigg.metabolite via id & name based search for 12dgr141_c
INFO:root:Found new id MNXM4940 in metanetx.chemical from ('bigg.metabolite', '12dgr141') for 12dgr141_c
INFO:root:5/1019: 12dgr141_p
INFO:root:Found new ids ['12dgr141'] in bigg.metabolite via id & name based search for 12dgr141_p
INFO:root:Found new id MNXM4940 in metanetx.chemical from ('bigg.metabolite', '12dgr141') for 12dgr141_p
INFO:root:6/1019: 12dgr160_c
INFO:root:Found new ids ['12dgr160'] in bigg.metabolite via id & name based search for 12dgr160_c
INFO:root:Found new id MNXM3132 in

INFO:root:44/1019: 2ahbut_c
INFO:root:Found new ids ['2ahbut'] in bigg.metabolite via id & name based search for 2ahbut_c
INFO:root:45/1019: 2ahhmd_c
INFO:root:Found new ids ['2ahhmd'] in bigg.metabolite via id & name based search for 2ahhmd_c
INFO:root:46/1019: 2ahhmp_c
INFO:root:Found new ids ['2ahhmp'] in bigg.metabolite via id & name based search for 2ahhmp_c
INFO:root:47/1019: 2aobut_c
INFO:root:Found new ids ['2aobut'] in bigg.metabolite via id & name based search for 2aobut_c
INFO:root:Found new id MNXM114087 in metanetx.chemical from ('bigg.metabolite', '2aobut') for 2aobut_c
INFO:root:48/1019: 2cpr5p_c
INFO:root:Found new ids ['2cpr5p'] in bigg.metabolite via id & name based search for 2cpr5p_c
INFO:root:Found new id MNXM1455 in metanetx.chemical from ('bigg.metabolite', '2cpr5p') for 2cpr5p_c
INFO:root:49/1019: 2dda7p_c
INFO:root:Found new ids ['2dda7p'] in bigg.metabolite via id & name based search for 2dda7p_c
INFO:root:Found new id MNXM1219 in metanetx.chemical from ('bigg

INFO:root:Found new id MNXM192 in metanetx.chemical from ('bigg.metabolite', '4abut') for 4abut_c
INFO:root:93/1019: 4abut_e
INFO:root:Found new ids ['4abut'] in bigg.metabolite via id & name based search for 4abut_e
INFO:root:Found new id MNXM192 in metanetx.chemical from ('bigg.metabolite', '4abut') for 4abut_e
INFO:root:94/1019: 4abut_p
INFO:root:Found new ids ['4abut'] in bigg.metabolite via id & name based search for 4abut_p
INFO:root:Found new id MNXM192 in metanetx.chemical from ('bigg.metabolite', '4abut') for 4abut_p
INFO:root:95/1019: 4abz_c
INFO:root:Found new ids ['4abz'] in bigg.metabolite via id & name based search for 4abz_c
INFO:root:Found new id MNXM421 in metanetx.chemical from ('bigg.metabolite', '4abz') for 4abz_c
INFO:root:96/1019: 4adcho_c
INFO:root:Found new ids ['4adcho'] in bigg.metabolite via id & name based search for 4adcho_c
INFO:root:Found new id MNXM1458 in metanetx.chemical from ('bigg.metabolite', '4adcho') for 4adcho_c
INFO:root:97/1019: 4ahmmp_c
INFO:

INFO:root:Found new id MNXM6020 in metanetx.chemical from ('bigg.metabolite', 'R_3hcddec5ecoa') for R_3hcddec5ecoa_c
INFO:root:141/1019: R_3hcmrs7e_c
INFO:root:Found new ids ['R_3hcmrs7e'] in bigg.metabolite via id & name based search for R_3hcmrs7e_c
INFO:root:142/1019: R_3hcmrs7ecoa_c
INFO:root:Found new ids ['R_3hcmrs7ecoa'] in bigg.metabolite via id & name based search for R_3hcmrs7ecoa_c
INFO:root:Found new id MNXM6018 in metanetx.chemical from ('bigg.metabolite', 'R_3hcmrs7ecoa') for R_3hcmrs7ecoa_c
INFO:root:143/1019: R_3hdcaa_c
INFO:root:Found new ids ['R_3hdcaa'] in bigg.metabolite via id & name based search for R_3hdcaa_c
INFO:root:144/1019: R_3hdcoa_c
INFO:root:Found new ids ['R_3hdcoa'] in bigg.metabolite via id & name based search for R_3hdcoa_c
INFO:root:Found new id MNXM4928 in metanetx.chemical from ('bigg.metabolite', 'R_3hdcoa') for R_3hdcoa_c
INFO:root:145/1019: R_3hdd5ea_c
INFO:root:Found new ids ['R_3hdd5ea'] in bigg.metabolite via id & name based search for R_3hdd

INFO:root:194/1019: acald_e
INFO:root:Found new ids ['acald'] in bigg.metabolite via id & name based search for acald_e
INFO:root:Found new id MNXM75 in metanetx.chemical from ('bigg.metabolite', 'acald') for acald_e
INFO:root:195/1019: acald_p
INFO:root:Found new ids ['acald'] in bigg.metabolite via id & name based search for acald_p
INFO:root:Found new id MNXM75 in metanetx.chemical from ('bigg.metabolite', 'acald') for acald_p
INFO:root:196/1019: accoa_c
INFO:root:Found new ids ['accoa'] in bigg.metabolite via id & name based search for accoa_c
INFO:root:197/1019: acetol_c
INFO:root:Found new ids ['acetol'] in bigg.metabolite via id & name based search for acetol_c
INFO:root:Found new id MNXM1744 in metanetx.chemical from ('bigg.metabolite', 'acetol') for acetol_c
INFO:root:198/1019: acg5p_c
INFO:root:Found new ids ['acg5p'] in bigg.metabolite via id & name based search for acg5p_c
INFO:root:Found new id MNXM1384 in metanetx.chemical from ('bigg.metabolite', 'acg5p') for acg5p_c
INF

INFO:root:252/1019: alaleu_c
INFO:root:Found new ids ['alaleu'] in bigg.metabolite via id & name based search for alaleu_c
INFO:root:253/1019: alaleu_e
INFO:root:Found new ids ['alaleu'] in bigg.metabolite via id & name based search for alaleu_e
INFO:root:254/1019: alaleu_p
INFO:root:Found new ids ['alaleu'] in bigg.metabolite via id & name based search for alaleu_p
INFO:root:255/1019: alathr_c
INFO:root:Found new ids ['alathr'] in bigg.metabolite via id & name based search for alathr_c
INFO:root:256/1019: alathr_e
INFO:root:Found new ids ['alathr'] in bigg.metabolite via id & name based search for alathr_e
INFO:root:257/1019: alathr_p
INFO:root:Found new ids ['alathr'] in bigg.metabolite via id & name based search for alathr_p
INFO:root:258/1019: alatrp_c
INFO:root:Found new ids ['alatrp'] in bigg.metabolite via id & name based search for alatrp_c
INFO:root:259/1019: alatrp_e
INFO:root:Found new ids ['alatrp'] in bigg.metabolite via id & name based search for alatrp_e
INFO:root:260/10

INFO:root:308/1019: cd2_c
INFO:root:Found new ids ['cd2'] in bigg.metabolite via id & name based search for cd2_c
INFO:root:309/1019: cd2_e
INFO:root:Found new ids ['cd2'] in bigg.metabolite via id & name based search for cd2_e
INFO:root:310/1019: cd2_p
INFO:root:Found new ids ['cd2'] in bigg.metabolite via id & name based search for cd2_p
INFO:root:311/1019: cdp_c
INFO:root:Found new ids ['cdp'] in bigg.metabolite via id & name based search for cdp_c
INFO:root:312/1019: cdpc16c19g_c
INFO:root:313/1019: cdpc19c19g_c
INFO:root:314/1019: cdpdhdecg_c
INFO:root:Found new ids ['cdpdhdecg'] in bigg.metabolite via id & name based search for cdpdhdecg_c
INFO:root:315/1019: cdpdodec11eg_c
INFO:root:Found new ids ['cdpdodec11eg'] in bigg.metabolite via id & name based search for cdpdodec11eg_c
INFO:root:316/1019: cdpdtdec7eg_c
INFO:root:Found new ids ['cdpdtdec7eg'] in bigg.metabolite via id & name based search for cdpdtdec7eg_c
INFO:root:317/1019: cgly_c
INFO:root:Found new ids ['cgly'] in bigg

INFO:root:Found new ids ['dad_5'] in bigg.metabolite via id & name based search for dad_5_c
INFO:root:363/1019: dadp_c
INFO:root:Found new ids ['dadp'] in bigg.metabolite via id & name based search for dadp_c
INFO:root:364/1019: dag181d9_e
INFO:root:Found new ids ['dag181d9'] in bigg.metabolite via id & name based search for dag181d9_e
INFO:root:365/1019: dag182d9d12_e
INFO:root:Found new ids ['dag182d9d12'] in bigg.metabolite via id & name based search for dag182d9d12_e
INFO:root:366/1019: damp_c
INFO:root:Found new ids ['damp'] in bigg.metabolite via id & name based search for damp_c
INFO:root:Found new id MNXM432 in metanetx.chemical from ('bigg.metabolite', 'damp') for damp_c
INFO:root:367/1019: datp_c
INFO:root:Found new ids ['datp'] in bigg.metabolite via id & name based search for datp_c
INFO:root:Found new id MNXM286 in metanetx.chemical from ('bigg.metabolite', 'datp') for datp_c
INFO:root:368/1019: db4p_c
INFO:root:Found new ids ['db4p'] in bigg.metabolite via id & name based

INFO:root:Found new id MNXM492 in metanetx.chemical from ('bigg.metabolite', 'duri') for duri_c
INFO:root:414/1019: dutp_c
INFO:root:Found new ids ['dutp'] in bigg.metabolite via id & name based search for dutp_c
INFO:root:Found new id MNXM452 in metanetx.chemical from ('bigg.metabolite', 'dutp') for dutp_c
INFO:root:415/1019: dxyl5p_c
INFO:root:Found new ids ['dxyl5p'] in bigg.metabolite via id & name based search for dxyl5p_c
INFO:root:Found new id MNXM622 in metanetx.chemical from ('bigg.metabolite', 'dxyl5p') for dxyl5p_c
INFO:root:416/1019: e4hglu_c
INFO:root:Found new ids ['e4hglu'] in bigg.metabolite via id & name based search for e4hglu_c
INFO:root:417/1019: e4p_c
INFO:root:Found new ids ['e4p'] in bigg.metabolite via id & name based search for e4p_c
INFO:root:Found new id MNXM258 in metanetx.chemical from ('bigg.metabolite', 'e4p') for e4p_c
INFO:root:418/1019: eig3p_c
INFO:root:Found new ids ['eig3p'] in bigg.metabolite via id & name based search for eig3p_c
INFO:root:Found n

INFO:root:Found new ids ['gam1p'] in bigg.metabolite via id & name based search for gam1p_c
INFO:root:471/1019: gam6p_c
INFO:root:Found new ids ['gam6p'] in bigg.metabolite via id & name based search for gam6p_c
INFO:root:472/1019: gam_e
INFO:root:Found new ids ['gam'] in bigg.metabolite via id & name based search for gam_e
INFO:root:473/1019: gar_c
INFO:root:Found new ids ['gar'] in bigg.metabolite via id & name based search for gar_c
INFO:root:474/1019: gcald_c
INFO:root:Found new ids ['gcald'] in bigg.metabolite via id & name based search for gcald_c
INFO:root:475/1019: gcald_e
INFO:root:Found new ids ['gcald'] in bigg.metabolite via id & name based search for gcald_e
INFO:root:476/1019: gdp_c
INFO:root:Found new ids ['gdp'] in bigg.metabolite via id & name based search for gdp_c
INFO:root:477/1019: gdpmann_c
INFO:root:Found new ids ['gdpmann'] in bigg.metabolite via id & name based search for gdpmann_c
INFO:root:478/1019: gdptp_c
INFO:root:Found new ids ['gdptp'] in bigg.metabolite

INFO:root:523/1019: gly_tyr_c
INFO:root:524/1019: gly_tyr_e
INFO:root:525/1019: glyald_c
INFO:root:Found new ids ['glyald'] in bigg.metabolite via id & name based search for glyald_c
INFO:root:Found new id MNXM435 in metanetx.chemical from ('bigg.metabolite', 'glyald') for glyald_c
INFO:root:526/1019: glyald_e
INFO:root:Found new ids ['glyald'] in bigg.metabolite via id & name based search for glyald_e
INFO:root:Found new id MNXM435 in metanetx.chemical from ('bigg.metabolite', 'glyald') for glyald_e
INFO:root:527/1019: glyb_c
INFO:root:Found new ids ['glyb'] in bigg.metabolite via id & name based search for glyb_c
INFO:root:Found new id MNXM289 in metanetx.chemical from ('bigg.metabolite', 'glyb') for glyb_c
INFO:root:528/1019: glyb_e
INFO:root:Found new ids ['glyb'] in bigg.metabolite via id & name based search for glyb_e
INFO:root:Found new id MNXM289 in metanetx.chemical from ('bigg.metabolite', 'glyb') for glyb_e
INFO:root:529/1019: glyb_p
INFO:root:Found new ids ['glyb'] in bigg.

INFO:root:579/1019: h2o_p
INFO:root:Found new ids ['h2o', 'h2o_'] in bigg.metabolite via id & name based search for h2o_p
INFO:root:Found new id oh1 in bigg.metabolite from ('seed.compound', 'cpd15275') for h2o_p
INFO:root:580/1019: h2s_c
INFO:root:Found new ids ['h2s'] in bigg.metabolite via id & name based search for h2s_c
INFO:root:581/1019: h2s_e
INFO:root:Found new ids ['h2s'] in bigg.metabolite via id & name based search for h2s_e
INFO:root:582/1019: h_c
INFO:root:Found new ids ['h_', 'h'] in bigg.metabolite via id & name based search for h_c
INFO:root:Found new id MNXM1 in metanetx.chemical from ('bigg.metabolite', 'h') for h_c
INFO:root:583/1019: h_e
INFO:root:Found new ids ['h_', 'h'] in bigg.metabolite via id & name based search for h_e
INFO:root:Found new id MNXM1 in metanetx.chemical from ('bigg.metabolite', 'h') for h_e
INFO:root:584/1019: h_p
INFO:root:Found new ids ['h_', 'h'] in bigg.metabolite via id & name based search for h_p
INFO:root:Found new id MNXM1 in metanetx.

INFO:root:632/1019: ipdp_c
INFO:root:Found new ids ['ipdp'] in bigg.metabolite via id & name based search for ipdp_c
INFO:root:Found new id MNXM83 in metanetx.chemical from ('bigg.metabolite', 'ipdp') for ipdp_c
INFO:root:633/1019: istfrnA_e
INFO:root:Found new ids ['istfrnA'] in bigg.metabolite via id & name based search for istfrnA_e
INFO:root:634/1019: istfrnB_e
INFO:root:Found new ids ['istfrnB'] in bigg.metabolite via id & name based search for istfrnB_e
INFO:root:635/1019: itp_c
INFO:root:Found new ids ['itp'] in bigg.metabolite via id & name based search for itp_c
INFO:root:636/1019: ivcoa_c
INFO:root:Found new ids ['ivcoa'] in bigg.metabolite via id & name based search for ivcoa_c
INFO:root:637/1019: k_c
INFO:root:Found new ids ['k'] in bigg.metabolite via id & name based search for k_c
INFO:root:Found new id MNXM95 in metanetx.chemical from ('bigg.metabolite', 'k') for k_c
INFO:root:638/1019: k_e
INFO:root:Found new ids ['k'] in bigg.metabolite via id & name based search for k

INFO:root:Found new ids ['met__L'] in bigg.metabolite via id & name based search for met__L_c
INFO:root:691/1019: met__L_e
INFO:root:Found new ids ['met__L'] in bigg.metabolite via id & name based search for met__L_e
INFO:root:692/1019: met__L_p
INFO:root:Found new ids ['met__L'] in bigg.metabolite via id & name based search for met__L_p
INFO:root:693/1019: methf_c
INFO:root:Found new ids ['methf'] in bigg.metabolite via id & name based search for methf_c
INFO:root:Found new id MNXM511 in metanetx.chemical from ('bigg.metabolite', 'methf') for methf_c
INFO:root:694/1019: metsox_R__L_c
INFO:root:Found new ids ['metsox_R__L'] in bigg.metabolite via id & name based search for metsox_R__L_c
INFO:root:Found new id MNXM2245 in metanetx.chemical from ('bigg.metabolite', 'metsox_R__L') for metsox_R__L_c
INFO:root:695/1019: metsox_R__L_e
INFO:root:Found new ids ['metsox_R__L'] in bigg.metabolite via id & name based search for metsox_R__L_e
INFO:root:Found new id MNXM2245 in metanetx.chemical fr

INFO:root:738/1019: nh4_e
INFO:root:Found new ids ['nh4'] in bigg.metabolite via id & name based search for nh4_e
INFO:root:Found new id nh3 in bigg.metabolite from ('seed.compound', 'cpd00013') for nh4_e
INFO:root:739/1019: nh4_p
INFO:root:Found new ids ['nh4'] in bigg.metabolite via id & name based search for nh4_p
INFO:root:Found new id nh3 in bigg.metabolite from ('seed.compound', 'cpd00013') for nh4_p
INFO:root:740/1019: ni2_c
INFO:root:Found new ids ['ni2'] in bigg.metabolite via id & name based search for ni2_c
INFO:root:741/1019: ni2_e
INFO:root:Found new ids ['ni2'] in bigg.metabolite via id & name based search for ni2_e
INFO:root:742/1019: ni2_p
INFO:root:Found new ids ['ni2'] in bigg.metabolite via id & name based search for ni2_p
INFO:root:743/1019: nicrns_c
INFO:root:Found new ids ['nicrns'] in bigg.metabolite via id & name based search for nicrns_c
INFO:root:744/1019: nicrnt_c
INFO:root:Found new ids ['nicrnt'] in bigg.metabolite via id & name based search for nicrnt_c
IN

INFO:root:Found new id MNXM373 in metanetx.chemical from ('bigg.metabolite', 'pan4p') for pan4p_c
INFO:root:788/1019: pant__R_c
INFO:root:Found new ids ['pant__R'] in bigg.metabolite via id & name based search for pant__R_c
INFO:root:Found new id MNXM593 in metanetx.chemical from ('bigg.metabolite', 'pant__R') for pant__R_c
INFO:root:789/1019: pap_c
INFO:root:Found new ids ['pap'] in bigg.metabolite via id & name based search for pap_c
INFO:root:790/1019: paps_c
INFO:root:Found new ids ['paps'] in bigg.metabolite via id & name based search for paps_c
INFO:root:791/1019: pdx5p_c
INFO:root:Found new ids ['pdx5p'] in bigg.metabolite via id & name based search for pdx5p_c
INFO:root:Found new id MNXM454 in metanetx.chemical from ('bigg.metabolite', 'pdx5p') for pdx5p_c
INFO:root:792/1019: pe120_c
INFO:root:Found new ids ['pe120'] in bigg.metabolite via id & name based search for pe120_c
INFO:root:793/1019: pe120_p
INFO:root:Found new ids ['pe120'] in bigg.metabolite via id & name based sear

INFO:root:839/1019: pppg9_c
INFO:root:Found new ids ['pppg9'] in bigg.metabolite via id & name based search for pppg9_c
INFO:root:Found new id MNXM351 in metanetx.chemical from ('bigg.metabolite', 'pppg9') for pppg9_c
INFO:root:840/1019: pppi_c
INFO:root:Found new ids ['pppi'] in bigg.metabolite via id & name based search for pppi_c
INFO:root:Found new id MNXM332 in metanetx.chemical from ('bigg.metabolite', 'pppi') for pppi_c
INFO:root:841/1019: pprdn_c
INFO:root:Found new ids ['pprdn'] in bigg.metabolite via id & name based search for pprdn_c
INFO:root:Found new id MNXM1438 in metanetx.chemical from ('bigg.metabolite', 'pprdn') for pprdn_c
INFO:root:842/1019: pqq_p
INFO:root:Found new ids ['pqq'] in bigg.metabolite via id & name based search for pqq_p
INFO:root:Found new id MNXM601 in metanetx.chemical from ('bigg.metabolite', 'pqq') for pqq_p
INFO:root:843/1019: pqqh2_p
INFO:root:Found new ids ['pqqh2'] in bigg.metabolite via id & name based search for pqqh2_p
INFO:root:Found new id

INFO:root:886/1019: rib__D_e
INFO:root:Found new ids ['rib__D'] in bigg.metabolite via id & name based search for rib__D_e
INFO:root:887/1019: rib__D_p
INFO:root:Found new ids ['rib__D'] in bigg.metabolite via id & name based search for rib__D_p
INFO:root:888/1019: ribflv_c
INFO:root:Found new ids ['ribflv'] in bigg.metabolite via id & name based search for ribflv_c
INFO:root:Found new id MNXM270 in metanetx.chemical from ('bigg.metabolite', 'ribflv') for ribflv_c
INFO:root:889/1019: ribflv_e
INFO:root:Found new ids ['ribflv'] in bigg.metabolite via id & name based search for ribflv_e
INFO:root:Found new id MNXM270 in metanetx.chemical from ('bigg.metabolite', 'ribflv') for ribflv_e
INFO:root:890/1019: rnam_c
INFO:root:Found new ids ['rnam'] in bigg.metabolite via id & name based search for rnam_c
INFO:root:891/1019: ru5p__D_c
INFO:root:Found new ids ['ru5p__D'] in bigg.metabolite via id & name based search for ru5p__D_c
INFO:root:Found new id MNXM145 in metanetx.chemical from ('bigg.m

INFO:root:939/1019: tartr__D_c
INFO:root:Found new ids ['tartr__D'] in bigg.metabolite via id & name based search for tartr__D_c
INFO:root:Found new id MNXM7948 in metanetx.chemical from ('bigg.metabolite', 'tartr__D') for tartr__D_c
INFO:root:940/1019: tartr__D_p
INFO:root:Found new ids ['tartr__D'] in bigg.metabolite via id & name based search for tartr__D_p
INFO:root:Found new id MNXM7948 in metanetx.chemical from ('bigg.metabolite', 'tartr__D') for tartr__D_p
INFO:root:941/1019: taur_c
INFO:root:Found new ids ['taur'] in bigg.metabolite via id & name based search for taur_c
INFO:root:Found new id MNXM282 in metanetx.chemical from ('bigg.metabolite', 'taur') for taur_c
INFO:root:942/1019: taur_e
INFO:root:Found new ids ['taur'] in bigg.metabolite via id & name based search for taur_e
INFO:root:Found new id MNXM282 in metanetx.chemical from ('bigg.metabolite', 'taur') for taur_e
INFO:root:943/1019: tdcoa_c
INFO:root:Found new ids ['tdcoa'] in bigg.metabolite via id & name based searc

INFO:root:992/1019: udpgal_c
INFO:root:Found new ids ['udpgal'] in bigg.metabolite via id & name based search for udpgal_c
INFO:root:993/1019: ugmd_c
INFO:root:Found new ids ['ugmd'] in bigg.metabolite via id & name based search for ugmd_c
INFO:root:994/1019: ugmda_c
INFO:root:Found new ids ['ugmda'] in bigg.metabolite via id & name based search for ugmda_c
INFO:root:995/1019: ump_c
INFO:root:Found new ids ['ump'] in bigg.metabolite via id & name based search for ump_c
INFO:root:996/1019: uppg3_c
INFO:root:Found new ids ['uppg3'] in bigg.metabolite via id & name based search for uppg3_c
INFO:root:Found new id MNXM414 in metanetx.chemical from ('bigg.metabolite', 'uppg3') for uppg3_c
INFO:root:997/1019: ura_c
INFO:root:Found new ids ['ura'] in bigg.metabolite via id & name based search for ura_c
INFO:root:Found new id MNXM158 in metanetx.chemical from ('bigg.metabolite', 'ura') for ura_c
INFO:root:998/1019: ura_e
INFO:root:Found new ids ['ura'] in bigg.metabolite via id & name based sea

INFO:root:71/1019: Getting information for 35dnta_e
INFO:root:72/1019: Getting information for 3c2hmp_c
INFO:root:73/1019: Getting information for 3c3hmp_c
INFO:root:74/1019: Getting information for 3c4mop_c
INFO:root:75/1019: Getting information for 3dhq_c
INFO:root:76/1019: Getting information for 3dhsk_c
INFO:root:77/1019: Getting information for 3hbcoa__R_c
INFO:root:78/1019: Getting information for 3hbcoa_c
INFO:root:79/1019: Getting information for 3hpcoa_c
INFO:root:80/1019: Getting information for 3hptcoa_c
INFO:root:81/1019: Getting information for 3ig3p_c
INFO:root:82/1019: Getting information for 3mb_c
INFO:root:83/1019: Getting information for 3mb_e
INFO:root:84/1019: Getting information for 3mb_p
INFO:root:85/1019: Getting information for 3mob_c
INFO:root:86/1019: Getting information for 3mop_c
INFO:root:87/1019: Getting information for 3opcoa_c
INFO:root:88/1019: Getting information for 3optcoa_c
INFO:root:89/1019: Getting information for 3pg_c
INFO:root:90/1019: Getting 

INFO:root:225/1019: Getting information for akg_c
INFO:root:226/1019: Getting information for ala_B_c
INFO:root:227/1019: Getting information for ala_B_e
INFO:root:228/1019: Getting information for ala_B_p
INFO:root:229/1019: Getting information for ala_L_asp__L_c
INFO:root:230/1019: Getting information for ala_L_asp__L_e
INFO:root:231/1019: Getting information for ala_L_glu__L_c
INFO:root:232/1019: Getting information for ala_L_glu__L_e
INFO:root:233/1019: Getting information for ala_L_thr__L_c
INFO:root:234/1019: Getting information for ala_L_thr__L_e
INFO:root:235/1019: Getting information for ala__D_c
INFO:root:236/1019: Getting information for ala__D_e
INFO:root:237/1019: Getting information for ala__L_c
INFO:root:238/1019: Getting information for ala__L_e
INFO:root:239/1019: Getting information for ala_gln_c
INFO:root:240/1019: Getting information for ala_gln_e
INFO:root:241/1019: Getting information for ala_his_c
INFO:root:242/1019: Getting information for ala_his_e
INFO:root:24

INFO:root:382/1019: Getting information for decdp_tb_c
INFO:root:383/1019: Getting information for dgdp_c
INFO:root:384/1019: Getting information for dgmp_c
INFO:root:385/1019: Getting information for dgsn_c
INFO:root:386/1019: Getting information for dgtp_c
INFO:root:387/1019: Getting information for dha_c
INFO:root:388/1019: Getting information for dhap_c
INFO:root:389/1019: Getting information for dhf_c
INFO:root:390/1019: Getting information for dhlam_c
INFO:root:391/1019: Getting information for dhlpro_c
INFO:root:392/1019: Getting information for dhna_c
INFO:root:393/1019: Getting information for dhnpt_c
INFO:root:394/1019: Getting information for dhor__S_c
INFO:root:395/1019: Getting information for dhpmp_c
INFO:root:396/1019: Getting information for dhpt_c
INFO:root:397/1019: Getting information for dhptd_c
INFO:root:398/1019: Getting information for dhptd_e
INFO:root:399/1019: Getting information for din_c
INFO:root:400/1019: Getting information for dmlz_c
INFO:root:401/1019: 

INFO:root:540/1019: Getting information for glygln_c
INFO:root:541/1019: Getting information for glygln_e
INFO:root:542/1019: Getting information for glygln_p
INFO:root:543/1019: Getting information for glyglu_c
INFO:root:544/1019: Getting information for glyglu_e
INFO:root:545/1019: Getting information for glyglu_p
INFO:root:546/1019: Getting information for glygly_c
INFO:root:547/1019: Getting information for glygly_e
INFO:root:548/1019: Getting information for glygly_p
INFO:root:549/1019: Getting information for glyglygln_c
INFO:root:550/1019: Getting information for glyglygln_e
INFO:root:551/1019: Getting information for glymet_c
INFO:root:552/1019: Getting information for glymet_e
INFO:root:553/1019: Getting information for glymet_p
INFO:root:554/1019: Getting information for glyphe_c
INFO:root:555/1019: Getting information for glyphe_e
INFO:root:556/1019: Getting information for glyphe_p
INFO:root:557/1019: Getting information for glyser_c
INFO:root:558/1019: Getting information 

INFO:root:697/1019: Getting information for metsox_S__L_e
INFO:root:698/1019: Getting information for mg2_c
INFO:root:699/1019: Getting information for mg2_e
INFO:root:700/1019: Getting information for mg2_p
INFO:root:701/1019: Getting information for mhpglu_c
INFO:root:702/1019: Getting information for mi1p__D_c
INFO:root:703/1019: Getting information for mi3p__D_c
INFO:root:704/1019: Getting information for micit_c
INFO:root:705/1019: Getting information for mlthf_c
INFO:root:706/1019: Getting information for mmcoa__S_c
INFO:root:707/1019: Getting information for mn2_c
INFO:root:708/1019: Getting information for mn2_e
INFO:root:709/1019: Getting information for mn2_p
INFO:root:710/1019: Getting information for mnl1p_c
INFO:root:711/1019: Getting information for mnl_e
INFO:root:712/1019: Getting information for mobd_c
INFO:root:713/1019: Getting information for mobd_e
INFO:root:714/1019: Getting information for mql7_c
INFO:root:715/1019: Getting information for mql8_c
INFO:root:716/10

INFO:root:856/1019: Getting information for prpncoa_c
INFO:root:857/1019: Getting information for prpp_c
INFO:root:858/1019: Getting information for psd5p_c
INFO:root:859/1019: Getting information for pser__L_c
INFO:root:860/1019: Getting information for pt2coa_c
INFO:root:861/1019: Getting information for pta_c
INFO:root:862/1019: Getting information for pta_e
INFO:root:863/1019: Getting information for pta_p
INFO:root:864/1019: Getting information for ptcoa_c
INFO:root:865/1019: Getting information for ptcys_c
INFO:root:866/1019: Getting information for ptrc_c
INFO:root:867/1019: Getting information for ptrc_e
INFO:root:868/1019: Getting information for ptrc_p
INFO:root:869/1019: Getting information for ptth_c
INFO:root:870/1019: Getting information for pyam5p_c
INFO:root:871/1019: Getting information for pydam_c
INFO:root:872/1019: Getting information for pydx5p_c
INFO:root:873/1019: Getting information for pydx_c
INFO:root:874/1019: Getting information for pydxn_c
INFO:root:875/101

INFO:root:1014/1019: Getting information for xu5p__D_c
INFO:root:1015/1019: Getting information for xyl__D_c
INFO:root:1016/1019: Getting information for xyl__D_e
INFO:root:1017/1019: Getting information for zn2_c
INFO:root:1018/1019: Getting information for zn2_e
INFO:root:1019/1019: Getting information for zn2_p
INFO:root:[21.491 s] Finished constructing model.
INFO:root:[25.209 s] Finished balancibility check. 13 non-pseudo reactions were unbalancable.
INFO:root:[33.299 s] Finished adherence optimization.
INFO:root:[25.866 s] Finished formula optimization.


If you have access (currently requires an explicit academic license or a subscription) and downloaded the BioCyc flat files, you can pass the corresponding directory as well. Assuming we have already updated all the ids in the last step, we can set _update_ids_ to _False_.

The additional argument is:
* **biocyc_path**: Directory containing BioCyc flat files.

In [None]:
balancer_BioCyc = MCC.MassChargeCuration(model = model, data_path = "./data", update_ids = False, biocyc_path = "./data/23.5/data")

INFO:root:1/1019: Getting information for 10fthf_c
INFO:root:2/1019: Getting information for 12dgr140_c
INFO:root:3/1019: Getting information for 12dgr140_p
INFO:root:4/1019: Getting information for 12dgr141_c
INFO:root:5/1019: Getting information for 12dgr141_p
INFO:root:6/1019: Getting information for 12dgr160_c
INFO:root:7/1019: Getting information for 12dgr160_e
INFO:root:8/1019: Getting information for 12dgr160_p
INFO:root:9/1019: Getting information for 12dgr180_c
INFO:root:10/1019: Getting information for 12dgr180_e
INFO:root:11/1019: Getting information for 12dgr180_p
INFO:root:12/1019: Getting information for 12dgr181_c
INFO:root:13/1019: Getting information for 12dgr181_p
INFO:root:14/1019: Getting information for 12ppd__S_c
INFO:root:15/1019: Getting information for 12ppd__S_e
INFO:root:16/1019: Getting information for 13dpg_c
INFO:root:17/1019: Getting information for 15dap_c
INFO:root:18/1019: Getting information for 1ag160_e
INFO:root:19/1019: Getting information for 1ag1

INFO:root:156/1019: Getting information for R_3hhxa_e
INFO:root:157/1019: Getting information for R_3hhxa_p
INFO:root:158/1019: Getting information for R_3hmrscoa_c
INFO:root:159/1019: Getting information for R_3hnonaa_c
INFO:root:160/1019: Getting information for R_3hnonacoa_c
INFO:root:161/1019: Getting information for R_3hocoa_c
INFO:root:162/1019: Getting information for R_3hocta_c
INFO:root:163/1019: Getting information for R_3hpba_c
INFO:root:164/1019: Getting information for R_3hpbcoa_c
INFO:root:165/1019: Getting information for R_3hpdeca_c
INFO:root:166/1019: Getting information for R_3hpdecacoa_c
INFO:root:167/1019: Getting information for R_3hphpa_c
INFO:root:168/1019: Getting information for R_3hphpcoa_c
INFO:root:169/1019: Getting information for R_3hphxa_c
INFO:root:170/1019: Getting information for R_3hphxacoa_c
INFO:root:171/1019: Getting information for R_3hpnona_c
INFO:root:172/1019: Getting information for R_3hpnonacoa_c
INFO:root:173/1019: Getting information for R_

INFO:root:311/1019: Getting information for cdp_c
INFO:root:312/1019: Getting information for cdpc16c19g_c
INFO:root:313/1019: Getting information for cdpc19c19g_c
INFO:root:314/1019: Getting information for cdpdhdecg_c
INFO:root:315/1019: Getting information for cdpdodec11eg_c
INFO:root:316/1019: Getting information for cdpdtdec7eg_c
INFO:root:317/1019: Getting information for cgly_c
INFO:root:318/1019: Getting information for cgly_e
INFO:root:319/1019: Getting information for cgly_p
INFO:root:320/1019: Getting information for chol_c
INFO:root:321/1019: Getting information for chol_e
INFO:root:322/1019: Getting information for chol_p
INFO:root:323/1019: Getting information for chor_c
INFO:root:324/1019: Getting information for cigam_c
INFO:root:325/1019: Getting information for cit_c
INFO:root:326/1019: Getting information for cit_e
INFO:root:327/1019: Getting information for citr__L_c
INFO:root:328/1019: Getting information for cl_c
INFO:root:329/1019: Getting information for cl_e
IN

INFO:root:470/1019: Getting information for gam1p_c
INFO:root:471/1019: Getting information for gam6p_c
INFO:root:472/1019: Getting information for gam_e
INFO:root:473/1019: Getting information for gar_c
INFO:root:474/1019: Getting information for gcald_c
INFO:root:475/1019: Getting information for gcald_e
INFO:root:476/1019: Getting information for gdp_c
INFO:root:477/1019: Getting information for gdpmann_c
INFO:root:478/1019: Getting information for gdptp_c
INFO:root:479/1019: Getting information for ggdp_c
INFO:root:480/1019: Getting information for glc__D_c
INFO:root:481/1019: Getting information for glc__D_e
INFO:root:482/1019: Getting information for glc__D_p
INFO:root:483/1019: Getting information for glcn__D_c
INFO:root:484/1019: Getting information for glcn__D_e
INFO:root:485/1019: Getting information for glcn_c
INFO:root:486/1019: Getting information for glcn_e
INFO:root:487/1019: Getting information for glcn_p
INFO:root:488/1019: Getting information for glcur_c
INFO:root:489

INFO:root:627/1019: Getting information for imp_c
INFO:root:628/1019: Getting information for indole_c
INFO:root:629/1019: Getting information for inost_c
INFO:root:630/1019: Getting information for inost_e
INFO:root:631/1019: Getting information for ins_c
INFO:root:632/1019: Getting information for ipdp_c
INFO:root:633/1019: Getting information for istfrnA_e
INFO:root:634/1019: Getting information for istfrnB_e
INFO:root:635/1019: Getting information for itp_c
INFO:root:636/1019: Getting information for ivcoa_c
INFO:root:637/1019: Getting information for k_c
INFO:root:638/1019: Getting information for k_e
INFO:root:639/1019: Getting information for k_p
INFO:root:640/1019: Getting information for lac__D_c
INFO:root:641/1019: Getting information for lac__D_e
INFO:root:642/1019: Getting information for lac__D_p
INFO:root:643/1019: Getting information for lac__L_c
INFO:root:644/1019: Getting information for lac__L_e
INFO:root:645/1019: Getting information for lac__L_p
INFO:root:646/1019: 

INFO:root:785/1019: Getting information for pa181_p
INFO:root:786/1019: Getting information for pa190190_c
INFO:root:787/1019: Getting information for pan4p_c
INFO:root:788/1019: Getting information for pant__R_c
INFO:root:789/1019: Getting information for pap_c
INFO:root:790/1019: Getting information for paps_c
INFO:root:791/1019: Getting information for pdx5p_c
INFO:root:792/1019: Getting information for pe120_c
INFO:root:793/1019: Getting information for pe120_p
INFO:root:794/1019: Getting information for pe141_c
INFO:root:795/1019: Getting information for pe141_p
INFO:root:796/1019: Getting information for pe160_c
INFO:root:797/1019: Getting information for pe160_p
INFO:root:798/1019: Getting information for pe161_c
INFO:root:799/1019: Getting information for pe161_p
INFO:root:800/1019: Getting information for pea_c
INFO:root:801/1019: Getting information for pea_e
INFO:root:802/1019: Getting information for pep_c
INFO:root:803/1019: Getting information for pg140_c
INFO:root:804/10

INFO:root:943/1019: Getting information for tdcoa_c
INFO:root:944/1019: Getting information for tdecoa_c
INFO:root:945/1019: Getting information for thdp_c
INFO:root:946/1019: Getting information for thf_c
INFO:root:947/1019: Getting information for thm_c
INFO:root:948/1019: Getting information for thm_e
INFO:root:949/1019: Getting information for thmmp_c
INFO:root:950/1019: Getting information for thmpp_c
INFO:root:951/1019: Getting information for thr__L_c
INFO:root:952/1019: Getting information for thr__L_e
INFO:root:953/1019: Getting information for thr__L_p
INFO:root:954/1019: Getting information for thrp_c
INFO:root:955/1019: Getting information for thym_c
INFO:root:956/1019: Getting information for thym_e
INFO:root:957/1019: Getting information for thymd_c
INFO:root:958/1019: Getting information for tnt_c
INFO:root:959/1019: Getting information for tnt_e
INFO:root:960/1019: Getting information for tnt_p
INFO:root:961/1019: Getting information for tntmdh_c
INFO:root:962/1019: Get

## Evaluating the result
To see the results at a first glance, we can generate a quick visual report, which tells us how many reactions are balanced now, and how many assignments were changed.

In [None]:
balancer.generate_visual_report().show()

### Metabolite Report
The algorithm should be able to give a reason for every assignment that it chooses. We can have a look at these reasons in the metabolite report of the balancer.

The report holds valuable information how the algorithm decided which assignment to choose and can be useful during further manual curation. The entries of the resulting DataFrame are the following:
* **Id**: Id of the metabolite in the model.
* **Name**: Name of the metabolite in the model.
* **Determined Formula**: Formula that was assigned by the algorithm.
* **Determined Charge**: Charge that was assigned by the algorithm.
* **Previous Formula**: Formula that was assigned before the algorithm.
* **Previous Charge**: Charge that was assigned before the algorithm.
* **Inferrence Type**: Category of how the assignment was determined.
    - Unconstrained: No information about the metabolite was found or only incomplete (wildcard containing) formulae were found and we could not find a concrete formula either. Should contain a wildcard.
    - Inferred: No information about the metabolite was found or only incomplete (wildcard containing formulae were found, however we arrived at a concrete formula. Should not contain a wildcard.
    - Clean: Information about the metabolite was found and used.
* **Reasoning**: Reasoning how the assignment was determined. Can contain:
    - database name:database identifier: The formula could be found in this database under this identifier.
    - (unconstrained) Target: This assignment was chosen because it is the same as in the original (target) model. Unconstrained means that the original (target) model seemed to be missing a wildcard symbol that was thus added.
    - Reaction_id (metabolite id -> Reasoning...): The assignment for this metabolite must follow from other reasons. The given reaction id and metabolite reasons make it so that if the model must be balanced, this metabolite must have the determined assignment.
    - Used Databases: The databases which back up the determined assignment.
    - Previous Databases: The databases which back up the previous assignment.

In [None]:
metabolite_report_df = balancer.generate_metabolite_report()
metabolite_report_df[::200]

Usually we are interested in the assignments which differ from the original report. We can do this by indexing the Dataframe accordingly.

In [None]:
# If you like to see the entire report, uncomment the following line
#pandas.set_option("display.max_row", None)
metabolite_report_df[metabolite_report_df["Similarity"] != "Same"]

We might also be interested in all assignments which are not backed by a database:

In [None]:
metabolite_report_df[metabolite_report_df["Used Databases"] == ""]

### Reaction Report
Finally, especially for further curation, we might be interested in the remaining imbalanced reactions. For this the algorithm can also provide a report.

The report also includes reactions which are technically balanced but where many protons had to be added to arrive at that result.

The fields are as follows:
* **Id**: Id of the reaction in the model.
* **Unbalanced Reaction**: Name of the reaction in the model and corresponding equation.
* **Unbalanced Type**: Type of imbalance. Can be both Mass and charge, only mass, only charge or high proton count.
* **Reason**: Set of reactions which caused the reaction to be imbalanced. This effectively means that these reactions could not be balanced together. The sets are minimal, but for example for BTS2, it would not help to remove HCYSMT, as BTS2 would require protons to have no charge and HCYSMT is only one of many reactions which then would not be balanced.
* **Shared Metabolites**: Metabolites which are shared between the reactions which are listed in Reason. Can give an indication where the problem might lie.
* **Mass Difference**: Mass imbalance.
* **Charge Difference**: Charge imbalance.

In [None]:
balancer.generate_reaction_report()

### Writing Files
The mass charge curation writes directly to the model that was given to it. Thus, if we want to write our model, we can just pass the model to cobrapy. If you want to keep your old model, you should make sure to not overwrite it here.

In [None]:
cobra.io.write_sbml_model(model, f"{model.id}_MCC.xml")

For the reports, we can add filenames to the functions to write the visual report to a .png file and the metabolite and reaction DataFrames to .csv files.

In [None]:
balancer.generate_visual_report(f"{model.id}_visual")
balancer.generate_metabolite_report(f"{model.id}_metabolites")
balancer.generate_reaction_report(f"{model.id}_reactions")
pass