# Hands-On Session II: Advanced funtions of the toolbox refineGEMs
In this notebook you will get first-hand experience on more advanced functionalities provided by ``refineGEMs``.
Due to the large number of functionalities in the toolbox, this notebook will only focus on a small subset to give you an idea of what is possible.

The covered functionalities include:
- Advanced simulations
- Model extension by automated gap-filling
- Indentifying energy generating cycles
- Handle duplicates and pruning metabolites

Some more functinalities are inluded in the bonus task, hopefully incuraging to explore more functionalities of the toolbox ``refineGEMs`` by yourself.
As in the first notebook, the hidden solutions may help you, if you get stuck.


## Advanced Simulations
Now that we covered basic simulations we can go to more advanced methods. 
These include auxotrophy tests and source tests. 

Auxotrophy tests show if an organism is dependent on certain amino acids or if certain amino acids can be produced by 
the organism, respectively. 

Source tests specify which substances can be used as elemental sources like e.g. nitrogen, carbon and sulfur.

In the following you will use functions from refineGEMs to find out about the auxotrophies and usable sources for carbon for 
*Pseudomonas putida*.

In [None]:
# Imports
from refinegems.utility.io import load_model

# Loading the model from Notebook I
model_file = '../data/iJN1463.xml'

# Load the model with COBRApy
cobramodel = load_model(model_file, 'cobra')
cobramodel

### Auxotrophy test
📝 **Task**

Perform an amino acid auxotrophy test on the model. If you already have created a media configuration file in the previous notebook, feel free to use that one for testing, otherwise you either create one or just use one or more media from the database.

Generate an auxotrophy test report for the model and visualise it.

In [None]:
# Your code goes here

<details>
<summary>🔑 Click to see the answer 🔑</summary>

Here is the code for the task:

```python
# Imports
from refinegems.analysis.growth import test_auxotrophies 
from refinegems.classes.medium import load_media, load_medium_from_db

# load media ...
# ... from configuration file
media_list, suppl_list = load_media("./media_config.yaml")
# ... from database
media_list = [load_medium_from_db("LB"), load_medium_from_db("SNM3"), load_medium_from_db("M9")]
suppl_list = [None, "min", "std"] # can be any combination of None or "min" 

# Perform test and get report
auxo_report = test_auxotrophies(
    cobramodel,
    media_list,
    suppl_list
) 

# Visualise results
auxo_report.visualise_auxotrophies()
```

</details>

### Source test
📝 **Task**

Perform a source test for the model to identify possible carbon sources. 
To find a suitable minimal medium for growth, test the growth of the model with the M9 medium and minimal supplementation.
Create a medium from M9 and the supplemented materials to use for the source tests.

Generate a source test report and visualise it.

In [None]:
# Your code goes here

<details>
<summary>🔑 Click to see the answer 🔑</summary>

Here is the code for the task:

```python
# Imports
from refinegems.analysis.growth import test_growth_with_source, growth_analysis
from refinegems.classes import medium

# Growth analysis using list
report = growth_analysis(cobramodel, media_list, supplements = "min", retrieve="report")

# Generate the results table
res_table = report.to_table()
additives = res_table.additives[0]

# Generate new media
m9 = medium.load_medium_from_db("M9")
medium.medium_to_model(cobramodel, m9)
cobramodel.medium = cobramodel.medium + additives

# Get report
source_report = test_growth_with_source(
    cobramodel,
    element = "C"
) 

# Visualise results
source_report.visualise()
```

</details>

## Extending the model 
One of the major tasks in generating and curating models is to find gaps as not all genes or reactions might already be 
in the model or information on them is too sparse for automatic addition to the model.

Within refineGEMs we have three GapFillers that tackle this problem based on database information or the GFF file. 
Some manual work still has to be done in the end but the task is tackled faster.

You already saw how individual metabolites and reactions can be build with refineGEMs. 
This functionality is used to add genes and reactions found as missing from the model to the model. 

For all GapFillers the according data is provided in the data folder.

Here are some run time specifications for each of the GapFillers in refineGEMs:
- GeneGapFiller: $\approx$ 5 min
- BioCycGapFiller: $\approx$ 3s
- KEGGapFiller: $\approx$ 2h 10m

📝 **Task**


Due to time reasons it would be best if you use either the Gene- or the BioCycGapFiller. 
Please also get the report.

Example calls for all three GapFillers with the corresponding command line results can be found in the Notebook 
`gapfill.ipynb`.

In [None]:
# Your code goes here

<details>
<summary>🔑 Click to see the answer 🔑</summary>

Here is the code for the task:

```python
# Imports
from refinegems.classes.reports import BioCycGapFiller

# Load model as libSBML model, required for most of the parts of a GapFiller
libmodel = load_model(model_file, 'libsbml')

# Get files required for the BioCycGapFiller
gffpath = '../data/GCA_000007565.2_ASM756v2_genomic.gff'
biocyc_gene_tbl_path = '../data/PputKT2440_Accession-22Reactions.tsv'
biocyc_reacs_tbl_path = '../data/PputKT2440_biocyc_rxns.tsv'
fasta = '../data/GCA_000007565.2_ASM756v2_translated_cds.faa'

# Initialise GapFiller
gfbc = BioCycGapFiller(biocyc_gene_tbl_path, biocyc_reacs_tbl_path, gffpath)

# Find missing genes
gfbc.find_missing_genes(model)
# Table with missing genes stored in: gfbc.missing_genes

# Find missing reactions
gfbc.find_missing_reactions(cmodel)
# Table with missing reactions stored in: gfbc.missing_reactions

# Fill model
filled_model = gfbc.fill_model(model)
# Can be written to file with: 
# from refinegems.utility.io import write_model_to_file
# write_model_to_file(filled_model, '../data/iJN1463_filled.xml')

# Get statistics
gfbc.report('../data/')
```

</details>

⭐️ **Bonus**

You can try to use the other GapFillers that you did not try so far.

*Note:* This will most likely take more time than the workshop provides if you use the KEGGapFiller.

In [None]:
# Your code goes here

## EGCs
Energy generating cycles happen in a model due to wrong reaction directions, missing information, wrongly added reactions from a 
template/universal model if the model was created with one. However, since these cycles are thermodynamically infeasible, they need to be identified and removed.

Within refineGEMs the EGCSolver base class offers the functionality to find EGCs in a model. 
The child class GreedyEGCSolver offers in addition to finding EGCs a method to solve EGCs greadily. However, it is restricted to solving EGCs that are solvable by chaning at most one reaction for each type.

📝 **Task**

1. Download the model [`iJN746`](http://bigg.ucsd.edu/models/iJN746) from BiGG like in Notebook I - 01_refinegems_demo.ipynb.
2. Identify EGCs in the model `iJN1463` and in the model `iJN746` and determine the possible reasoning behind the EGCs.

In [None]:
# Your code goes here

<details>
<summary>🔑 Click to see the answer 🔑</summary>

Here is the code for the task:

```python
# Imports
from refinegems.classes.egcs import EGCSolver

# Initialise EGCSolver
egc_solver = EGCSolver()

# Find EGCs in iJN1463
egc_dict, reacs = egc_solver.find_egcs(cobramodel, with_reacs=True)

# Display energy metabolites for which EGCs exist
egc_dict

# Display reactions that cause/are involved in EGCs
reacs

# Find EGCS in iJN746
egc_dict2, reacs2 = egc_solver.find_egcs(cobramodel2, with_reacs=True)

# Display energy metabolites for which EGCs exist
egc_dict2

# Display reactions that cause/are involved in EGCs
reacs2
```

</details>

⭐️ **Bonus**

You can only use the greedy EGC solver to attempt to solve the EGCs. 

*Note:* This will take more time than the workshop provides. Due to the amount of cores in the Codespace the run time is too long. If you want to try that, please, copy the notebook and try it on your computer locally.

In [None]:
# Your code goes here

<details>
<summary>🔑 Click to see the answer 🔑</summary>

Here is the code for the task:

```python
# Imports
from refinegems.classes.egcs import GreedyEGCSolver

# Initialise EGCSolver
egc_solver = GreedyEGCSolver()

# Find and try to solve EGCs
results = egc_solver.solve_egcs(cobramodel)

# Display results
results
```

</details>

## Duplicates 
Due to inconsistent naming within a database namespace or identifiers used within a model from different databases 
the same metabolites or reactions can be found in one model. 
To reduce these duplicates refineGEMs offers three functions in its ``curation.curate`` module: 

- one for metabolites `resolve_duplicate_metabolites`
- one for reactions, `resolve_duplicate_reactions`, and 
- a combined one `resolve_duplicates` 

While checking and resolving reactions is a yes or no case, the options for metabolites offer more flexibility. 
The removal can either be skipped, performed on the default setting, or by performing an exhaustive search. The difference between the last two is a trade off between performance and completeness. The default uses the MetaNetX IDs as a seed for searching for duplicates while the exhaustive one iteratively uses every database as the seed, which drastically increases the runtime, but potentially identifies more duplicates, as the default version requires the metabolites to have a MetaNetX ID to be included in the matching. 

Lastly, unused metabolites - metabolites, that are no longer connected to any reaction - can be removed from the model. This is called *pruning of metabolites*.

📝 **Task**

1. Try to identify and remove duplicates from the model in a meaningful way.
2. Try pruning the metabolites of the model and see, if this has effect (e.g. using `len(cobramodel.metabolites)` or growth test)

In [None]:
# Your code goes here

<details>
<summary>🔑 Click to see the answer 🔑</summary>

Here is the code for the task:

```python
# Import
from refinegems.curation.curate import resolve_duplicates

cobramodel = resolve_duplicates(
    cobramodel,
    check_reac: bool = True,
    check_meta = "default",
    replace_dupl_meta = True,
    remove_dupl_reac = True,
)

with cobramodel as testmodel: # changes are not applied to the model
    print(len(testmodel.metabolites))
        testmodel = resolve_duplicates(
        testmodel,
        check_reac = False,
        check_meta = "skip",
        remove_unused_meta = True,
        replace_dupl_meta = False,
        remove_dupl_reac = False,
    )    
    print(len(testmodel.metabolites))
```

</details>

## ⭐️ Bonus
If you already finished the tasks and still want to go further, you can try other curation functions in the 
`curation` modules, like:

- `miriam.polish_annotations` to clean-up annotations in a model
- `miriam.change_all_qualifiers` to clean-up the RDF bags of the annotations in a model
- `pathways.set_kegg_pathways` to add KEGG Pathways to a model

📝 **Task**

- Clean-up annotations in the model
- Add KEGG Pathways to the model

*Note:* Adding KEGG Pathways will most likely take more time than the workshop provides.

In [None]:
# Your code goes here

<details>
<summary>🔑 Click to see the answer 🔑</summary>

Here is the code for the task:

```python
# Imports
from refinegems.curation.miriam import change_all_qualifiers, polish_annotations
from refinegems.curation.pathways import set_kegg_pathways

# Clean-up annotations in the model
model_w_cleaned_annots = polish_annotations(libmodel, new_pattern=True, outpath = '../data/')
model_w_cleaned_annots = change_all_qualifiers(model_w_cleaned_annots, lab_strain=False)

# If you want to save the model
write_model_to_file(model_w_cleaned_annots, '../data/iJN1463_cleaned.xml')

# Set KEGG Pathways
from refinegems.curation.pathways import set_kegg_pathways

non_kegg_rxns = set_kegg_pathways(libmodel, viaEC=True, viaRC=True)
print(f'Amount of reactions without KEGG Reaction ID: {len(non_kegg_rxns)}')

# If you want to save the model
write_model_to_file(libmodel, '../data/iJN1463_with_pathways.xml')
```

</details>