# Diferenčno izražanje reakcij
Primerjali bomo pare reakcij (kontrola, utišanje) na podlagi vzorčenih metabolnih pretokov.

Uporabimo lahko npr. test Kolmogorov-Smirnov, ki ne predpostavlja normalne porazdelitve v vzorcih, statistična značilnost razlik pa je ovrednotena s p-vrednostmi.

Dodatno bomo opazovali kako močno se reakcije med vzorci razlikujejo (fold-changes):

$$FC = \frac{\overline{R_{kd}} - \overline{R_{control}}}{\left|\overline{R_{kd}} + \overline{R_{control}}\right|}$$

In [1]:
import pandas as pd
import numpy as np

from scipy.stats import ks_2samp
#import statsmodels.stats.multitest as multi

import os.path

from helpers import bh

### Osnovne nastavitve

In [2]:
require_biomass = True
folder_samples = os.path.join('samples','biomass') if require_biomass else os.path.join('samples','no_biomass')
folder_enrich = os.path.join('enrichment','biomass') if require_biomass else os.path.join('enrichment','no_biomass')

### Branje iz datotek

In [3]:
df_control = pd.read_csv(os.path.join(f'{folder_samples}','samples_control_RECON3D.csv'))
df_tumor = pd.read_csv(os.path.join(f'{folder_samples}','samples_tumor_RECON3D.csv'))

In [4]:
reactions = sorted(list(set(df_control.columns) | set(df_tumor.columns)))
len(reactions) # število reakcij

2295

### Diferenčna aktivnost reakcij

In [5]:
df = pd.DataFrame(columns=['reaction', 'FC', 'p', 'q', 'enrichment', 'changed'])
df['reaction']=reactions

n_samples = df_control.shape[0]

# sprehodimo se čez vse reakcije
for reaction in reactions:
    if reaction in df_control.columns:
        control = df_control[reaction].values
    else:
        # če reakcije ni v kontrolni skupini, ji pripišemo same ničle
        control = np.zeros(n_samples)
        
    if reaction in df_tumor.columns:
        kd = df_tumor[reaction].values
    else:
        # če reakcije ni v kd skupini, ji pripišemo same ničle
        kd = np.zeros(n_samples)
        
    # iztračunamo sredino za kontrolo in kd
    mean_control = np.mean(control)
    mean_tumor = np.mean(kd)
    
    # izračunamo FC - fold change in signifikanco z uporabo 2 sample Kolmogorov-Smirnov testa
    if mean_control != 0 or mean_tumor != 0:
        FC = mean_tumor-mean_control/(abs(mean_tumor + mean_control))
        p = ks_2samp(control,kd)[1]
    else:
        FC = 0
        p = 1     
        
    df.loc[df['reaction']==reaction, 'FC'] = FC
    df.loc[df['reaction']==reaction, 'p'] = p
    
    
# korigiramo p vrednosti za večkratno testiranje - FDR korekcija
df['q'] = bh(df['p'])

# signifikanca zahteva vsaj 10-kratno up-/down-regulacijo
df.loc[(df['FC'] >= 0.82) & (df['q'] < 0.05),'enrichment'] = 1
df.loc[(df['FC'] <= -0.82) & (df['q'] < 0.05),'enrichment'] = -1
df.loc[~df['enrichment'].isna(),'changed'] = 1
df = df.fillna(0)
    
    

In [8]:
df.to_csv(f"{folder_enrich}/reactions_RECON3D.csv", index=False)

In [9]:
df[df.enrichment == -1]

Unnamed: 0,reaction,FC,p,q,enrichment,changed
0,12PPDRte,-121.113373,6.607762e-199,9.286475e-199,-1,1
1,1a_24_25VITD2Hm,-1.000000,0.000000e+00,0.000000e+00,-1,1
2,24_25VITD2Hm,-1.000000,0.000000e+00,0.000000e+00,-1,1
3,25HVITD2tin_m,-1.000000,0.000000e+00,0.000000e+00,-1,1
7,2MOPtm,-1.000000,0.000000e+00,0.000000e+00,-1,1
...,...,...,...,...,...,...
2282,r2434,-133.956475,0.000000e+00,0.000000e+00,-1,1
2286,r2506,-31.424564,0.000000e+00,0.000000e+00,-1,1
2287,r2510,-55.556888,0.000000e+00,0.000000e+00,-1,1
2290,r2520,-85.137898,1.140072e-69,1.373472e-69,-1,1
