from mewpy.omics.expression import ExpressionSet
import numpy as np
import pandas as pd
from src.integration import *
from mewpy import *
from mewpy.omics.integration.gimme import GIMME


At this phase the data is ready to be integrated into a metabolic model. To do so, MewPy will be used, which is a Python library that allows to integrate data into a metabolic model. Two methods were implemented: GIMME and eFlux. To run both of them the gene expression dataset will be necessary TPM file to be integrated in a in house GEM.

Load your model. It will give you a brief description of the model, and also understand if the model was properly loaded.

In [5]:
from src.integration import Integration

my_model = Integration(model="data/inputs/model_ngaditana.xml")

Model loaded
Metabolites:
C_00001 191
C_00002 668
C_00003 484
C_00004 1189
C_00005 748
C_00006 186
C_00007 41
C_00008 108
C_00009 52
C_00010 1
C_00011 1

Reactions:
enzymatic 2664
transport 529
exchange 199
sink 0
other 124
None


In order to run GIMME or eFlux, you need to set your expression data.

In [6]:
from mewpy.omics.expression import ExpressionSet
import numpy as np
import pandas as pd

expr = pd.read_csv('data/inputs/tpm.tsv', sep='\t') # load expression data

expr["Geneid"] = expr["Geneid"] + "_RA" # add a suffix to the geneid to avoid conflicts with the model
n_genes = expr.shape[0] # number of genes
print("Number of genes:", n_genes)
print("Number of samples:", expr.shape[1]-1)
print("Head of the expression data:")
print(expr.head())
print("Summary of expression data:")
print(expr.describe())

Number of genes: 11261
Number of samples: 1
Head of the expression data:
       Geneid  tpm
0  Ng00001_RA  0.0
1  Ng00002_RA  0.0
2  Ng00003_RA  0.0
3  Ng00004_RA  0.0
4  Ng00005_RA  0.0
Summary of expression data:
                tpm
count  11261.000000
mean      88.802060
std      417.656209
min        0.000000
25%        0.000000
50%        0.000000
75%       74.315983
max    12133.104496


In [15]:
identifiers = expr['Geneid'].tolist() # list of gene identifiers
conditions = ['tpm'] # list of conditions
expression = expr['tpm'].to_numpy()[:, np.newaxis] # it has to be a numpy array
set_expression = ExpressionSet(identifiers, conditions, expression)
if set_expression is not None:
    print("Expression data loaded successfully")

Expression data loaded successfully


Set the desired solver to be used. In this case the Gurobi solver will be used.

In [16]:
import mewpy.solvers

mewpy.solvers.get_default_solver()
mewpy.solvers.set_default_solver('gurobi')

Now it is possible to run GIMME and eFlux.

In [17]:
my_model.gimme(set_expression, conditions)

UnboundLocalError: local variable 'values' referenced before assignment

In [18]:
my_model.eflux(set_expression)

TypeError: eFlux() takes from 2 to 7 positional arguments but 8 were given

In [49]:
#identifiers: list = expr['Geneid'].tolist()
#conditions: list = ['tpm']
#expression = expr['tpm'].to_numpy()[:, np.newaxis]

#set_expression = ExpressionSet(identifiers, conditions, expression)