##### About this notebook: Generating functional module factors for a given gene expression matrix, using the selected functional modules. In this example, we used the gene expression profile of MCF7 with or without drug treatment, and generate FM-factor matrix for these two groups. Data are from the Connectivity Map (doi: 10.1016/j.cell.2017.10.049). For the gene expression profiles of MCF7 with drug treatment, only the samples with drug sensitivity measured in the CTRP project(doi: 10.1038/nchembio.1986) are used. The transcription factor and functional module pairs were from the results of Example1-generate-TF-pairs.ipynb.  Before processing the following pipeline, make sure you have downloaded all essential input data  from the shared directory  https://osf.io/34xnm/?view_only=5b968aebebe14d4c97ff9d7ce4cb5070 which has been discribed in the manuscript "Functional module states framework reveals cell states for drug and target prediction" by Guangrong Qin et al.  

In [1]:
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import rpy2.robjects as ro
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
import scipy 
import scipy.stats as ss
import statsmodels
from statsmodels import stats
from statsmodels.stats import multitest
sys.path.append('../Script/')
import FM_States
import FM_selection
import TF

ROOT_DIR = os.path.abspath("../")

##### Load the parameters

In [2]:
para_in = {
    'input_TF': ROOT_DIR+"/Sample_output/TF_pairs/TF_pairs.csv",     # The TF_pairs are from the results of Example1-generate-TF-pairs.ipynb; 
    'output_dir': ROOT_DIR+"/Sample_output/Example1/",
    'input_expr_file': os.path.join(ROOT_DIR, "Sample_input/Example1/Sample1_data_MCF7_drugs_CTRP2.csv"),
    'input_ctrl_file': os.path.join(ROOT_DIR, "Sample_input/Example1/ctl_MCF7_24h.csv"),
    'Label_UP': True,                    # Whether to calculate the up regulation strength or not, "True" for yes, and "False" for no.
    'Label_DOWN':True,                   # Whether to calculate the down regulation strength or not, "True" for yes, and "False" for no.
    'Label_ssGSEA': True,                # Whether to calculate the ssGSEA enrichment score or not, "True" for yes, and "False" for no.
    'Label_TF':True,                     # Whether to calculate the transcription regulation strength or not , "True" for yes, and "False" for no.
    'isAbsoluteValues':False,            # Whether the gene express matrix is the absolute RPKM or FPKM values or z-scored transformed across samples, "False" for z-score transformed values and "True" for FPKM or RPKM values.  
    'sele_modules': ['Translation',      # Selection of functional modules
         'Nucleotide metabolism',
         'Signal transduction',
         'Amino acid metabolism',
         'Folding sorting and degradation',
         'Replication and repair',
         'Carbohydrate metabolism',
         'Membrane transport',
         'Cellular community - eukaryotes',
         'Lipid metabolism',
         'Metabolism of other amino acids',
         'Transcription',
         'Xenobiotics biodegradation and metabolism',
         'Signaling molecules and interaction',
         'Energy metabolism',
         'Transport and catabolism',
         'Glycan biosynthesis and metabolism',
         'Metabolism of cofactors and vitamins',
         'Cell motility',
         'Cell cycle', 
         'Apoptosis', 
         'Cellular senescence', 
         'p53 signaling pathway']
}

In [3]:
para_out = {'output_fmf_file':para_in['output_dir'] +"/matrix_factor_mcf7.csv",
            'output_fmf_ctrl_file': para_in['output_dir'] + "/ctrl_factor_mcf7.csv"
           }

In [4]:
## generate a output directory
output_dir = para_in['output_dir']

if os.path.exists(output_dir) == False:
    try:
        os.mkdir(output_dir)
    except OSError:
        print ("Creation of the directory %s failed" % output_dir)
    else:
        print ("Successfully created the directory %s " % output_dir)
else:
    print ("INfO:  %s already exists!" % output_dir)
    

INfO:  /project/Sample_output/Example1/ already exists!


##### Calculate the functional module factors for the gene expression matrix. The gene expression matrix here is the gene expression profiles of MCF7 24 hours after drug treatment. 


In [5]:
#1) Load genes from the selected fucntional modules from KEGG pathways; 
#dic_module, KEGG_level2, KEGG_level3, KEGG_modules = FM_selection.load_function_modules("KEGG")
File_FM = os.path.join(ROOT_DIR,"Dataset/Sample_FM.csv")
dic_module,  KEGG_modules = FM_States.load_function_modules(File_FM)

module_selected_gmt = KEGG_modules.loc[KEGG_modules['name'].isin(para_in['sele_modules']) ]

#2) Load the gene expression matrix 
data_matrix_MCF7_CTRP2 = pd.read_csv(para_in['input_expr_file'], index_col = 'Unnamed: 0')

#3) Load the TF-module pairs as estimated using the Example1-generate-TF-pairs.ipynb
TF_pairs = pd.read_csv(para_in['input_TF'], index_col = 'Unnamed: 0')

#4) Calculate the functional module factors for the gene expression matrix.
matrix_factor = FM_States.generate_factor(data_matrix_MCF7_CTRP2, para_in['sele_modules'], module_selected_gmt, TF_pairs, UP = para_in['Label_UP'], DOWN = para_in['Label_DOWN'], ssGSEA = para_in['Label_ssGSEA'], TF = para_in['Label_TF'], absolute = para_in['isAbsoluteValues'])

matrix_factor.to_csv(para_out['output_fmf_file'])

0
1


##### Calculate the functional module factors for the gene expression matrix of the control data. The control data are the transcription profiles for MCF7 after treatment of H20 or DSMO.

In [None]:
ctrl = pd.read_csv(para_in['input_ctrl_file'], index_col = 'Unnamed: 0')
ctrl_factor = FM_States.generate_factor(ctrl, para_in['sele_modules'], module_selected_gmt, TF_pairs,UP = para_in['Label_UP'], DOWN = para_in['Label_DOWN'], ssGSEA = para_in['Label_ssGSEA'], TF = para_in['Label_TF'], absolute = para_in['isAbsoluteValues'])
ctrl_factor.to_csv(para_out['output_fmf_ctrl_file'])

#### The result files will be used in Example1-comparing-clustering-annotation.ipynb for further clustering and annotation.