# How to use *GIMME* with *Troppo*

A typical workflow follows two main steps. The first is to attribute a score to each reaction of the model, in accordance with the omics data imputed. The second is to use the scores and apply an integration method to select a subset of reactions to build the final model.

Integration scoring methods implemented in *Troppo* are:
- continuous: `ContinuousScoreIntegrationStrategy`
- threshold: `ThresholdSelectionIntegrationStrategy`
- default_core: `DefaultCoreIntegrationStrategy`
- adjusted_score: `AdjustedScoreIntegrationStrategy`
- custom: `CustomSelectionIntegrationStrategy`

Omics integration methods implemented in *Troppo* are:
- gimme: `GIMME`
- tinit: `tINIT`
- fastcore: `GIMME`
- imat: `IMAT`
- swiftcore: `SWIFTCORE`
- corda: `CORDA`

This example can be applied for all the Omics integration methods implemented in this package. Note that the appropriate integration scoring method can differ between integration algorithms. For instance, for *GIMME* a continuous scoring method can be used, while for `fastcore` a threshold scoring method is more adequate.

### Imports and Setup

In [1]:
import pandas as pd
import cobra
import re

from troppo.omics.readers.generic import TabularReader
from troppo.methods_wrappers import ModelBasedWrapper
from troppo.omics.integration import ContinuousScoreIntegrationStrategy, CustomSelectionIntegrationStrategy
from troppo.methods.reconstruction.gimme import GIMME, GIMMEProperties

COBRAModelObjectReader is available for cobra
FramedModelObjectReader could not be loaded for reframed


The wrappers.external_wrappers module will be deprecated in a future release in favour of the wrappers module. 
    Available ModelObjectReader classes can still be loaded using cobamp.wrappers.<class>. An appropriate model 
    reader can also be created using the get_model_reader function on cobamp.wrappers


Define the parsing rules for the GPRs that will be used later on.

In [2]:
patt = re.compile('__COBAMPGPRDOT__[0-9]{1}')
replace_alt_transcripts = lambda x: patt.sub('', x)

### Read model and omics data

In [3]:
model = cobra.io.read_sbml_model('data/HumanGEM_Consistent_COVID19_HAM.xml')
model

0,1
Name,HumanGEM
Memory address,0x01b62320e788
Number of metabolites,6149
Number of reactions,10347
Number of groups,142
Objective expression,1.0*biomass_human - 1.0*biomass_human_reverse_fb2f2
Compartments,"Cytosol, Lysosome, Endoplasmic reticulum, Extracellular, Mitochondria, Peroxisome, Golgi apparatus, Nucleus, Inner mitochondria"


In [4]:
omics_data = pd.read_csv(filepath_or_buffer='data/Desai-GTEx_ensembl.csv', index_col=0)
omics_data.head()

Unnamed: 0_level_0,ENSG00000000419,ENSG00000000460,ENSG00000000938,ENSG00000000971,ENSG00000001036,ENSG00000001084,ENSG00000001167,ENSG00000001461,ENSG00000001497,ENSG00000001561,...,ENSG00000271321,ENSG00000271605,ENSG00000272047,ENSG00000272325,ENSG00000272333,ENSG00000272414,ENSG00000272573,ENSG00000272968,ENSG00000273045,ENSG00000273079
ensembl_gene_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Lung_Healthy,5.022368,0.584963,6.444601,6.213347,4.82273,3.0,3.776104,3.336283,4.343408,3.722466,...,0.137504,3.070389,1.847997,3.432959,2.944858,3.350497,5.074677,0.378512,0.847997,0.0
Lung_COVID19,2.988018,1.551051,5.77763,7.134232,4.429446,3.593211,4.770509,3.824891,3.566066,4.433298,...,0.0,4.669531,2.331411,3.326899,4.985126,4.696205,0.0,0.0,0.0,0.381678
Heart_Healthy,4.498251,0.263034,2.232661,7.360189,3.906891,2.035624,2.510962,2.485427,3.446256,2.765535,...,0.0,1.485427,1.807355,2.655352,1.678072,3.510962,6.238405,0.0,0.137504,0.0
Heart_COVID19,1.853724,0.0,3.443118,4.658543,2.425952,2.840368,1.938861,3.244538,1.761493,3.18375,...,0.0,2.119085,1.319589,3.271281,4.353896,5.244192,0.0,0.0,0.0,0.190673
Liver_Healthy,4.193772,0.584963,2.378512,8.166916,4.017922,3.765535,2.459432,1.263034,3.185867,1.678072,...,0.0,0.584963,1.321928,2.035624,1.632268,4.129283,2.432959,0.137504,0.263034,0.0


### Create a container for the omics data.

The `TabularReader` class is used to read and store the omics data in a container that can then be used by *Troppo*. 

Relevant arguments from the `TabularReader` class:
- `path_or_df`: the omics data can be either a pandas dataframe or a path to a dataset file. The file can be in any format supported by pandas.
- `index_col`: the name of the column that contains the identifiers of the genes.
- `sample_in_rows`: a boolean indicating whether the samples are in rows or columns.
- `header_offset`: the number of rows to skip before reading the header.
- `omics_type`: a string containing the type of omics data. This is used to select the appropriate integration method.
- `nomenclature`: a string containing the nomenclature of the identifiers in the omics data. This is used to map the identifiers to the identifiers in the model.

The `to_containers()` method returns a list of containers, one for each sample of the dataset. In this example, we will be using only one sample, however, the process can be iterated for all the samples in the dataset.
The `get_integrated_data_map()` method is used to map the identifiers in the omics data to the identifiers in the model. This is done by using the `gpr_gene_parse_function` argument from the `ModelBasedWrapper` class.

In [5]:
omics_container = TabularReader(path_or_df=omics_data, nomenclature='ensemble_gene_id', omics_type='transcriptomics').to_containers()[0]
omics_container

<troppo.omics.core.OmicsContainer at 0x1b656ab0408>

### Create a model wrapper.

The `ModelBasedWrapper` class is used to wrap the model so that it can be used by *Troppo*.

Relevant arguments from this class include:
- `model`: the model to be wrapped.
- `ttg_ratio`: the ratio between the number of reactions to be selected and the total number of reactions in the model.
- `gpr_gene_parse_function`: a function that parses the GPRs of the model. This is used to map the identifiers in the omics data to the identifiers in the model.

Important attributes from this class include:
- `model_reader`: a COBRAModelObjectReader instance containing all the information of the model, such as, reaction_ids, metabolite_ids, GPRs, bounds, etc.
- `S`: the stoichiometric matrix of the model.
- `lb`: the lower bounds of the reactions in the model.
- `ub`: the upper bounds of the reactions in the model.

In [6]:
model_wrapper = ModelBasedWrapper(model=model, ttg_ratio=9999, gpr_gene_parse_function=replace_alt_transcripts)
model_wrapper



<troppo.methods_wrappers.ModelBasedWrapper at 0x1b656ab0fc8>

### Map the identifiers in the omics data to the identifiers in the model

For this we can use the `get_integrated_data_map()` method from the `TabularReader` class. This maps the gene ids in the omics dataset reaction ids in the model through their GPRs, and attributes a score to each reaction in accordance with the expression values of the associated genes. This method returns a dictionary with the reaction ids as keys and the scores as values.

Important arguments from this method include:
- `model_reader`: a COBRAModelObjectReader instance containing all the information of the model. It can be accessed through the `model_wrapper.model_reader`.
- `and_func`: a function that is used to combine the scores of the genes associated with a reaction for AND rules in the GPR. In this example, we will be using the minimum function, which means that the score of a reaction with AND in their GPRs will be the minimum score of the genes associated with it.
- `or_func`: a function that is used to combine the scores of the genes associated with a reaction for OR rules in the GPR. In this example, we will be using the sum function, which means that the score of a reaction with OR in their GPRs will be the sum of the scores of the genes associated with it.

In [7]:
data_map = omics_container.get_integrated_data_map(model_reader=model_wrapper.model_reader, and_func=min, or_func=sum)

### Integrate scores

The `integrate()` method from the `ContinuousScoreIntegrationStrategy` class is used to integrate the scores of the reactions in the model. This method returns a dictionary with the reaction ids as keys and the integrated scores as values. In the case of this continuous scoring method, the resulting scores are the same as the scores in the data map. However, for other scoring methods, such as threshold scoring methods, the result will be a list of reactions with a score above the selected threshold. 

Moreover, this method allows us to apply an additional function to the method, which can be useful if you have any protected reactions that need to be in the final model or to remove nan values from the result. This can be done by passing the function as the `score_apply` argument of the `ContinuousScoreIntegrationStrategy` class. 

In this example, we will be using a function that replaces the nan values with 0 and returns a list with all the scores. This is the required format for the *GIMME* method.

In [8]:
def score_apply(reaction_map_scores):
    return [0 if v is None else v for k, v in reaction_map_scores.items()]

continuous_integration = ContinuousScoreIntegrationStrategy(score_apply=score_apply)
scores = continuous_integration.integrate(data_map=data_map)

print(scores)

[3.336283388, 7.301587646, 3.336283388, 3.336283388, 0.0, 16.317904841999997, 16.317904841999997, 10.289723824, 24.161542437999998, 14.972071356, 14.972071356, 12.94899207, 9.398209262, 0, 17.701664925, 12.370940183, 11.741142932999999, 8.185172776, 12.370940183, 3.765534746, 10.17729495, 12.94899207, 7.230933093, 6.704595348, 0, 20.330185487, 12.078519108000002, 5.649615459, 5.169925001, 3.201633861, 3.5360529, 16.472204492, 3.857980995, 6.704595348, 6.704595348, 0, 6.012121673, 10.412659603, 11.44493205, 4.336283388, 12.084110784, 1.201633861, 1.201633861, 12.084110784, 1.584962501, 0, 4.350497247, 5.070389328, 3.070389328, 3.070389328, 6.606442228000001, 0.0, 14.972071356, 14.972071356, 0, 0.0, 10.334977139, 0.0, 6.541483864, 3.070389328, 0, 12.850353762, 0, 6.541483864, 15.986777665000002, 4.078951341, 13.568817585999998, 14.238585064999999, 10.322108421, 3.485426827, 0, 2.925999419, 0, 15.986777665000002, 13.615888508, 0, 6.17990909, 5.626439136, 6.541483864, 0, 0, 0, 6.490249211,

Below is and example on how to use the `CustomSelectionIntegrationStrategy` class to integrate the scores of the reactions in the model. This is basically a threshold scoring method that allows us to also keep a set of protected reactions. This will output a list of core reactions that will be used to build the final model with the *FastCORE* method.

In [9]:
threshold = 0.5
protected_reactions = ['biomass_human']

def integration_fx(reaction_map_scores):
    return [[k for k, v in reaction_map_scores.get_scores().items() if (v is not None and v > threshold) or k in protected_reactions]]

threshold_integration = CustomSelectionIntegrationStrategy(group_functions=[integration_fx])
threshold_scores = threshold_integration.integrate(data_map=data_map)

print(threshold_scores)

[['HMR_4097', 'HMR_4099', 'HMR_4108', 'HMR_4133', 'HMR_4281', 'HMR_4388', 'HMR_4283', 'HMR_8357', 'HMR_4379', 'HMR_4301', 'HMR_4355', 'HMR_4358', 'HMR_4363', 'HMR_4365', 'HMR_4368', 'HMR_4370', 'HMR_4371', 'HMR_4372', 'HMR_4373', 'HMR_4375', 'HMR_4377', 'HMR_4381', 'HMR_4394', 'HMR_4396', 'HMR_4521', 'HMR_6410', 'HMR_6412', 'HMR_7745', 'HMR_7746', 'HMR_7747', 'HMR_7748', 'HMR_7749', 'HMR_5395', 'HMR_5396', 'HMR_9727', 'HMR_5397', 'HMR_5398', 'HMR_5399', 'HMR_5400', 'HMR_5401', 'HMR_8585', 'HMR_4128', 'HMR_4130', 'HMR_4131', 'HMR_4132', 'HMR_4414', 'HMR_4774', 'HMR_4775', 'HMR_7674', 'HMR_8766', 'HMR_8767', 'HMR_4297', 'HMR_4316', 'HMR_4319', 'HMR_4383', 'HMR_4385', 'HMR_4386', 'HMR_4387', 'HMR_4399', 'HMR_4401', 'HMR_4490', 'HMR_4706', 'HMR_4590', 'HMR_4591', 'HMR_4592', 'HMR_8344', 'HMR_8352', 'HMR_8727', 'HMR_6537', 'HMR_1568', 'HMR_3853', 'HMR_3854', 'HMR_3855', 'HMR_3857', 'HMR_3859', 'HMR_4087', 'HMR_4089', 'HMR_4091', 'HMR_4101', 'HMR_4103', 'HMR_4143', 'HMR_4193', 'HMR_8497', 'H

### Run the GIMME algorithm

The `GIMMEProperties` class is used to create the properties for the GIMME algorithm. This class contains the following arguments:
- `exp_vector`: a list of scores for each reaction in the model. This can be obtained from the `integrate()` method of the `ContinuousScoreIntegrationStrategy` class.
- `objectives`: a list of dictionaries with the reactions to be used as objectives. Each dictionary should have the reaction id as key and the coefficient as value.
- `preprocess`: a boolean indicating if the model should be preprocessed before running the GIMME algorithm. This is useful if you want to remove reactions that are not connected to the biomass reaction.
- `flux_threshold`: a threshold to remove reactions with fluxes below it. This is useful if you want to remove reactions that are not connected to the biomass reaction.
- `obj_frac`: the flux fraction of the objective reactions to be used.

The `GIMME` class is used to run the GIMME algorithm. This class contains the following arguments:
- `S`: the stoichiometric matrix of the model. It can be accessed through the `model_wrapper.S`.
- `lb`: the lower bounds of the reactions in the model. It can be accessed through the `model_wrapper.lb`.
- `ub`: the upper bounds of the reactions in the model. It can be accessed through the `model_wrapper.ub`.
- `properties`: a `GIMMEProperties` instance containing the properties for the GIMME algorithm.

In the end, the `run()` method of the `GIMME` class will return a list of zeros, ones, and twos indicating the reactions that should be removed or kept to be in the final model. This list can be used to build the final model.

Moreover, to access the flux distribution determined by the algorithm, you can use the `sol` attribute of the `GIMME` class.

In [10]:
# Get the index of the biomass reaction in the model. This will be used as objective for the GIMME algorithm.
idx_objective = model_wrapper.model_reader.r_ids.index('biomass_human')

# Create the properties for the GIMME algorithm.
properties = GIMMEProperties(exp_vector=scores, obj_frac=0.8, objectives=[{idx_objective: 1}],
                             preprocess=True, flux_threshold=0.8, solver='CPLEX', 
                             reaction_ids= model_wrapper.model_reader.r_ids, metabolite_ids=model_wrapper.model_reader.m_ids)

# Run the GIMME algorithm.
gimme = GIMME(S=model_wrapper.S, lb=model_wrapper.lb, ub=model_wrapper.ub, properties=properties)

model_gimme = gimme.run()

print(model_gimme)

[0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 52, 53, 56, 58, 59, 61, 63, 64, 65, 66, 67, 68, 69, 71, 73, 74, 76, 77, 78, 82, 83, 85, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 134, 135, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 151, 152, 153, 154, 155, 156, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 213, 214, 215, 216, 217, 222, 226, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 246, 247, 248, 254, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 26