# Analiza obogatenosti metabolnih podsistemov

## Hipergeometrični test
Hipergeometrični test temelji na hipergeometrični porazdelitvi - gre za diskrentno verjetnostno porazdelitev, ki opisuje verjetnost, da v končni populaciji velikosti N z natančno K objekti v n poskusih k-krat izberemo enega izmed K objektov brez zamenjave (objekta po končanem poskusu ne vrnemo v vzorec).

Za razliko od hipergeometrične porazdelitve, binomska porazdelitev opisuje verjetnost k uspešnih izidov v populaciji velikosti n z zamenjavo.

Uporaba hipergeometričnega testa:
$$P(x \geq k) = 1 - hypergeom.cdf(k-1, M, n, N)$$

* k: število diferenčno izraženih reakcij v podsistemu,
* n: število diferenčno izraženih reakcij v modelu,
* N: število reakcij v podsistemu,
* M: število reakcij v modelu.

In [1]:
import pandas as pd
import numpy as np

from scipy.stats import hypergeom
#import statsmodels.stats.multitest as multi
from helpers import bh

import cobra

import os.path


## Osnovne nastavitve

In [10]:
require_biomass = True
folder_enrich = os.path.join('enrichment','biomass') if require_biomass else os.path.join('enrichment','no_biomass')

# df_control = pd.read_csv(os.path.join(f'{folder_samples}','samples_control_RECON3D.csv'))
# df_tumor = pd.read_csv(os.path.join(f'{folder_samples}','samples_tumor_RECON3D.csv'))

## Podatki o podsistemih
SBML predstavitev modela Recon3D ne vsebuje podatkov o podsistemih. Pridobimo jih lahko preko `mat` (Matlab) zapisa modela (glej zakomentirano kodo spodaj).

Podatke o podsistemih smo si pripravili v ločeni datoteki:

In [3]:
df_subsystems = pd.read_csv(os.path.join('models','subsystems.csv'))
subsystems = df_subsystems.subsystem.unique()
df_subsystems.head()

Unnamed: 0,subsystem,reaction
0,"Transport, mitochondrial",24_25DHVITD3tm
1,"Transport, extracellular",25HVITD3t
2,"Transport, lysosomal",COAtl
3,Extracellular exchange,EX_5adtststerone_e
4,Extracellular exchange,EX_5adtststerones_e


Podatki o diferenčni aktivnosti reakcij:

In [4]:
folder_enrich

'enrichment/biomass'

In [5]:
df_reactions = pd.read_csv(os.path.join(f'{folder_enrich}','reactions_RECON3D.csv'))

## Hipergeometrični test
Gremo čez podsisteme in izračunamo p-vrednosti:

In [11]:
def get_enrichment(df_reactions, df_subsystems, take_all=False):
    df_enrichment = pd.DataFrame(columns=["subsystem", "p_up", "p_down", "q_up", "q_down", "enrichment", "p_changed", "q_changed", "changed"])
    df_enrichment["subsystem"] = subsystems

    M = len(df_reactions) # number of different reactions in pairs of models
    n_up = sum(df_reactions.enrichment == 1) # number of upregulated reactions in models
    n_down = sum(df_reactions.enrichment == -1)  # number of downregulated reactions in models
    n_changed = sum(df_reactions.changed == 1)  # number of changed reactions in models

    for subsystem in subsystems:
        subsystem_reactions = df_subsystems.loc[df_subsystems.subsystem == subsystem,'reaction'].values
        df_sub = df_reactions[df_reactions['reaction'].isin(subsystem_reactions)]
        
        if not take_all:
            # option 1: take only remaining reactions
            N = len(df_sub) # number of reactions in a subsystem
        else:
            # option 2: take all reactions from the original model
            N = len(df_subs[df_subs.subsystem == subsystem])
        k_up = sum(df_sub.enrichment == 1)# number of upregulated reactions in a subsystem
        k_down = sum(df_sub.enrichment == -1)# number of downregulated reactions in a subsystem
        k_changed = sum(df_sub.changed == 1)# number of changed reactions in a subsystem
        
        if n_up:         
            p_up = 1 - hypergeom.cdf(k_up-1, M, n_up, N)                
        else:
            p_up = 1.0
            
        if n_down:         
            p_down = 1 - hypergeom.cdf(k_down-1, M, n_down, N)                
        else:
            p_down = 1.0
            
        if n_changed:
            p_changed = 1 - hypergeom.cdf(k_changed, M, n_changed, N)                
        else:
            p_changed = 1
            
        df_enrichment.loc[df_enrichment["subsystem"] == subsystem, 'p_up'] = p_up
        df_enrichment.loc[df_enrichment["subsystem"] == subsystem, 'p_down'] = p_down
        df_enrichment.loc[df_enrichment["subsystem"] == subsystem, 'p_changed'] = p_changed
    
    df_enrichment['q_up'] = bh(df_enrichment['p_up'])
    df_enrichment['q_down'] = bh(df_enrichment['p_down'])
    df_enrichment['q_changed'] = bh(df_enrichment['p_changed'])

        
    df_enrichment.loc[(df_enrichment['q_up']<0.05) & (df_enrichment['q_up']<df_enrichment['q_down']),'enrichment'] = 1
    df_enrichment.loc[(df_enrichment['q_down']<0.05) & (df_enrichment['q_down']<=df_enrichment['q_up']),'enrichment'] = -1
    df_enrichment.loc[(df_enrichment['q_changed']<0.05),'changed'] = 1

    df_enrichment = df_enrichment.fillna(0)

    return df_enrichment

In [12]:
df_enrichment = get_enrichment(df_reactions, df_subsystems, take_all=False)
df_enrichment

Unnamed: 0,subsystem,p_up,p_down,q_up,q_down,enrichment,p_changed,q_changed,changed
0,"Transport, mitochondrial",0.999128,1.603518e-04,1.000000,5.665762e-03,-1,0.019309,0.021774,1
1,"Transport, extracellular",0.000837,9.744668e-01,0.015323,1.000000e+00,1,0.000008,0.000009,1
2,"Transport, lysosomal",0.513549,8.348701e-01,1.000000,1.000000e+00,0,0.674175,0.744402,0
3,Extracellular exchange,0.999999,2.055747e-09,1.000000,2.179092e-07,-1,0.000022,0.000025,1
4,Vitamin D metabolism,1.000000,1.166009e-01,1.000000,6.606094e-01,0,0.000000,0.000000,1
...,...,...,...,...,...,...,...,...,...
101,N-glycan metabolism,0.391954,1.000000e+00,1.000000,1.000000e+00,0,0.000000,0.000000,1
102,Drug metabolism,0.372012,8.186131e-01,1.000000,1.000000e+00,0,0.000000,0.000000,1
103,Protein formation,1.000000,1.000000e+00,1.000000,1.000000e+00,0,0.000000,0.000000,1
104,Vitamin B12 metabolism,1.000000,1.000000e+00,1.000000,1.000000e+00,0,0.000000,0.000000,1


In [13]:
df_enrichment.to_csv(f"{folder_enrich}/subsystems_RECON3D.csv", index=False)