# Hands-On Session I: First steps with GEMs and basic functions of the toolbox refineGEMs

In this notebook, you will get first-hand experience on how to load genome-scale metabolic models and extract basic information from them.
Additionally, some of the basic functionalities of the toolbox ``refineGEMs`` are introduced. Some code blocks can be executed directly, other may require some input. 

If you get stuck, feel free to ask or take a look at the provided solutions.

## Getting Data and Loading a GEM

Over the year, many genome-scale metabolic models have been published, many of which remain in the supplementals of their corresponding paper. However, there are two main databases for models of this type, [BioModels](https://www.ebi.ac.uk/biomodels/) and [BiGG Models](http://bigg.ucsd.edu). In the context of this notebook, we will use a model from the BiGG Models database with the identifier [`iJN1463`](http://bigg.ucsd.edu/models/iJN1463).

`iJN1463` is a model for *Pseudomonas putida* KT2440, a pollutant degrative bacteria, pushing it into the focus of industrial biotechnology [^1]. 

As mentioned in the slides, there are two main libraries, COBRApy[^2] and libSBML[^3], for loading models. Each has their own object type for the GEMs, both with their own adcantages and disadvantages. Since this notebook is about the basics and information extraction, it will focus on the COBRApy model object, as information extraction is easier on that object type.

📝 **Task**

1. Download the model with the accession number `iJN1463` from BiGG Models using, e.g. the package `requests`.
2. Load the model with `refineGEMs` as a COBRApy model in a variable called `cobramodel`.


<font color="grey">

[^1]: Duque E, Monk J, Feist AM, Ramos JL, Niu W, Palsson BO. High-quality genome-scale metabolic modelling of Pseudomonas putida highlights its broad metabolic capabilities. Environ Microbiol. 2020 Jan;22(1):255-269. doi: 10.1111/1462-2920.14843. Epub 2019 Nov 11. PMID: 31657101; PMCID: PMC7078882.
[^2]: Ebrahim, A., Lerman, J.A., Palsson, B.O. et al. COBRApy: COnstraints-Based Reconstruction and Analysis for Python. BMC Syst Biol 7, 74 (2013). https://doi.org/10.1186/1752-0509-7-74
[^3]: Bornstein, B. J., Keating, S. M., Jouraku, A., and Hucka, M. (2008) LibSBML: An API Library for SBML. Bioinformatics, 24(6):880-881. 
</font>

In [None]:
# Your code goes here

<details>
<summary>🔑 Click to see the answer 🔑</summary>

Here is the code for the task:

```python
# Imports
import requests # For model download
from refinegems.utility.io import load_model

# Get a model from BiGG
resp = requests.get('http://bigg.ucsd.edu/static/models/iJN1463.xml')
with open('./data/iJN1463.xml', 'w') as file: file.write(resp.text)

# Read in model_file path
model_file = './data/iJN1463.xml'

# Load a model with COBRApy
cobramodel = load_model(model_file, 'cobra')

```

</details>

## What is in the model?

When working inside a Jupyter notebook, the easiest way to quickly get some information about a COBRApy model is to simply return it:

In [None]:
cobramodel

Another option is the `summary` method of the COBRApy model object:

In [None]:
cobramodel.summary()

To get a more refined and presentable format of the basic model content, one can use the `ModelInfoReport` from the `refinegems.classes.reports` module. Like all child classes of the report class, it provides methods for presenting the extracted data in table format or as a graphical representation. 

📝 **Task**

Generate a `ModelInfoReport` for the loaded model and visualise its content.

In [None]:
# Your code goes here

<details>
<summary>🔑 Click to see the answer 🔑</summary>

Here is the code for the task:

```python
# Imports
from refinegems.classes.reports import ModelInfoReport

# Initialse model statistics report object
model_info = ModelInfoReport(cobramodel)
model_info

# Get visualisation of model statistics
model_info.visualise()
```

</details>

## Simulation

A good model does not need to be perfect but it should work well for its intended purpose - this is also true for GEMs.

In the case of GEMs, simulation usually means optimising the flux values of a model for an objective function. Depending on which biological goal should be prioretised, a different objective function has to be used. Here, we will focus on the overall growth of the bacteria, modelled by the so called biomass objective function. The biomass objective function is a sum over all compounds needed for a bacterium to grow and perform a cell division. 

### Biomass objective function

The currently set objective function of a model can be accessed with:

In [None]:
from refinegems.utility.util import test_biomass_presence

# Test biomass presence & get biomass reactions from model
biomass_reacs = test_biomass_presence(cobramodel)
biomass_reacs

For the mathemaical and chemical consistency of the model, the biomass function(s) should be normalised:

In [None]:
from refinegems.curation.biomass import check_normalise_biomass
from refinegems.utility.util import test_biomass_consistency

# Test biomass consistency
test_biomass_consistency(cobramodel, biomass_reacs[0])

# Fix biomass consistency
cobramodel = check_normalise_biomass(cobramodel)

# Test biomass consistency
test_biomass_consistency(cobramodel, biomass_reacs[0])

### Media

Since a model does not grow in isolation, one needs a medium, which provides the nutrients for the model to grow on. The toolbox refineGEMs incorparates an *in silico* media database with multiple media to choose from. 

Additionally, the `medium` class provides functionalities to load, manipulate and save media

In [None]:
from refinegems.classes import medium

# load the M9 medium from the database 
m9ext = medium.load_medium_from_db("M9")

# change the default flux 
m9ext.set_default_flux(flux = 5.0, double_o2 = True)

# check the carbon source
print(f"old carbon source: {m9ext.get_source("C")}")

# change the carbon source
m9ext.set_source("C", "2-Oxidopropane-1,2,3-tricarboxylate [Citrate]")
print(f"new carbon source: {m9ext.get_source("C")}")

# check, if the medium is aerobic
m9ext.is_aerobic()

📝 **Task**

Build (at least) two more media. 

* An anaerobic LB medium. This is a typical medium, that can be tested in a lab.
    - default flux 5.0
* A mixture of basal medium (medium, BMS23) and artifical sweat (medium, artSw). This is based on the configuration for a skin medium for Pseudomonadota for the highest OD, see [Swaney et al.](https://doi.org/10.1128/spectrum.04180-22), Figure 4.
    - default flux 10.0, twice the amount of sweat compared to basal
    - make sure its aerobic
* Feel free to create more, if you finish early!

You can either use a configuration file (see `../data/example_media_config.yaml`) for this task or directly build the media in this notebook.

In [None]:
# Your code goes here

<details>
<summary>🔑 Click to see the answer 🔑</summary>

If you wrote & loaded a config:

```python
# Imports
from refinegems.utility.set_up import download_config

# Get media file
media_file_path = './test_files/media_config.yml'
media_list, suppl_list = download_config(media_file_path, 'media')
```

If you did it by hand:

```
from refinegems.classes import medium

# LB_an

lban = medium.load_medium_from_db("LB")
lban.make_anaerobic()
lban.set_default_flux(flux = 5.0)

# SwSe_new
basw = medium.load_medium_from_db("BMS23")
basw.combine(medium.load_medium_from_db("artSw"), (1.0,2.0))
basw.set_default_flux(flux = 10.0)
basw.is_aerobic()
```

</details>

Finally, collect all media in a list.

In [None]:
media_list = [m9ext, ...] # add your media here

### Growth

Now we have everything ready to simulate the growth of the model on the different media:

In [None]:
with cobramodel: # changes are not applied to the model
    # add one medium to model 
    medium.medium_to_model(cobramodel, m9ext)

    # optimise fluxes 
    solution = cobramodel.optimize()

    # return the growth rate
    solution.objective_value

To produce these results quickly and easily for multiple media - like the media list or the media configuration file - refineGEMs provides functionalities for exactly that.
In addition to just producing the growth rates, the doubling times and even graphical output can be produced to easily share the results with other bioinformaticians but also the collaborators in the laboratory.

All functionalities for the growth analyses can be found in the `refinegems.analysis.growth` module. In this block, the function `growth_analysis` will be needed. It can perform a growth analysis on any number of media and models and returns a `GrowthSimulationReport`, containing all the information from the different simulations.

📝 **Task**

Perform a growth test using the (COBRApy) model used in this notebook and the previously generated media configuration file or list of media objects. 
Present the result both in table format and as a plot of the doubling times. 

What do the results show?

In [None]:
# Your code goes here

<details>
<summary>🔑 Click to see the answer 🔑</summary>

Here is the code for the task:

```python
# Imports
from refinegems.analysis.growth import growth_analysis

# A: Growth analysis using config
report = growth_analysis(cobramodel, media_file_path, retireve="report")

# B: Growth analysis using list
report = growth_analysis(cobramodel, media_list, supplements = None, retrieve="report")
# supplements can be chosen differently

# Generate the results table
res_table = report.to_table()
res_table

# Visualise the information 
res_fig = report.plot_growth(unit="dt")
res_fig

```

</details>

# ⭐️ Bonus 

### Extending the model 

As mentioned in the slides, sometimes, growth simulations do not lead to the expected results. Or one finds new evidence for more functions for the modelled organisms. Or maybe one just wants to test, whether an added reaction has an impact on the observed growth. In any case, the model needs to be extended. 

`refineGEMs` provides functionalities for adding metabolites and/or reactions from an ID or from a reaction string for the BiGG, MetaNetX or KEGG databases. 

📝 **Task**

Pick some KEGG (KEGG organism ID: `ppu`), MetaNetX or BiGG IDs and/or reactions, that you would like to add to the model. Alternatively, it is also possible to construct a reaction string in any of the three database formats by hand. 

Add the reaction(s) to the model. Does this influence the statistics and/or growth of the model? 

*Note:* Iteratively extending the model based on literature research or laboratory results is a very common practice of the manual curation of high-quality genome-scale metabolic models.

In [None]:
# Your code goes here

<details>
<summary>🔑 Click to see the answer 🔑</summary>

Here is the code for the task:

```python

# Generate a reaction via ID, e.g. ID for UDP-glucose:NAD+ 6-oxidoreductase
new_reac1 = build_reaction_kegg(cobramodel, id='R00286')

# Generate a reaction via reaction string, e.g. reaction string for 2-dehydro-L-idonate:NADP+ 5-oxidoreductase
new_reac2 = build_reaction_kegg(cobramodel, reac_str='C15673 + C00006 <=> C02780 + C00005 + C00080')

# Add reaction to model
cobramodel.add_reactions([new_reac1, new_reac2])

# Look at statistics, e.g. via ModelInfoReport
# Initialse model statistics report object
model_info = ModelInfoReport(cobramodel)
model_info

# Get visualisation of model statistics
model_info.visualise()

# Check growth, e.g. via 
# Growth analysis using list
report = growth_analysis(cobramodel, media_list, supplements = None, retrieve="report")

# Generate the results table
res_table = report.to_table()
res_table

# Visualise the information 
res_fig = report.plot_growth(unit="dt")
res_fig
```

</details>