## Instructions

This jupyter notebook run MADRID pipeline to identify drug targets and repurposing drugs for user-defined complex human diseases. The entire process contains four steps:
1. Download and analyze transcriptomics and proteomics data, output a list of active genes.
2. Create tissue specific models based on the list of active genes. If required the user can manually refine these models and supply them in Step 4. 
3. Identifying differential gene expressions from disease datasets.
4. Identifying drug targets and repruposable drugs. This step consists of four substeps. 
 (i) mapping drugs on automatically created or user-supplied models, (ii) knock-out simulation, (iii) compare simulation results of perturbed and unperturbed models, and (iv) integrate with disease genes and score drug targets.

The users needs to create the input files for each step and upload input files to the docker container `/root/pipelines/data/`, and specify the input files in this notebook. In the original docker image, some exemplary input files are included to build metabolic models of naive, Th1, Th2, and Th17 subtypes and identify drug targets for rheumatoid arthritis. User should follow the documentation and the format of the exemplary input files to create your own input files.

In [None]:
# import necessary python packages
import sys
import os
import pandas
import numpy
import json
from subprocess import call
from project import configs

# print root path of the project
print(configs.rootdir) 

## Step 1: Identifying gene activity by analyzing transcriptomics and proteomics datasets

*** Specify input files for step 1 here ***

If proteomics data is not availabe, use:

proteomics_data_file = 'dummy_proteomics_data.xlsx'

proteomics_config_file = 'dummy_proteomics_config.xlsx'

In [None]:
# Specific input files for step 1

# config file for transcriptomics
transcriptomics_config_file = 'transcriptomics_data_inputs.xlsx' 

# data file for proteomics
proteomics_data_file = 'ProteomicsDataMatrix.xlsx' 

# config file for proteomics
proteomics_config_file = 'proteomics_data_inputs.xlsx' 

In [None]:
# Step 1.1 Download and analyze transcriptomics
cmd = ' '.join(['python3', 'transcriptomic_gen.py', 
      '-i', '"{}"'.format(transcriptomics_config_file)])
!{cmd}

In [None]:
# Step 1.2 Analyze proteomics
cmd = ' '.join(['python3', 'proteomics_gen.py', 
      '-d', '"{}"'.format(proteomics_data_file), 
      '-s', '"{}"'.format(proteomics_config_file)])
!{cmd}

In [None]:
# Step 1.3 Merge the gene lists of transcriptomics and proteomics, create a list of active gene IDs
cmd = ' '.join(['python3', 'merge_xomics.py', 
      '-t', '"{}"'.format(transcriptomics_config_file), 
      '-p', '"{}"'.format(proteomics_config_file)])
!{cmd}

## Step 2: Create tissue-specific or cell-type-specific Models

In [None]:
# Load the output of step 1, which is a dictionary that specifies the merged list of active Gene IDs for each tissue

step1_results_file = os.path.join(configs.datadir, 'step1_results_files.json')
with open(step1_results_file) as json_file:
    tissue_gene_exp = json.load(json_file)
print(tissue_gene_exp)

*** Specify input files for step 2 here ***

In [None]:
# (input) filename of General Model, Recon3D_Teff_ver2
GeneralModelFile = 'GeneralModel.mat'

# (input) filename of Tissue Gene Expression
# genefile = 'merged_Th1.csv'

# (output) filename of Tissue Specific Model
# tissuefile = 'Th1_SpecificModel.mat'

In [None]:
# create tissue specific model, the names of output files are stored in dictionary tissue_spec_model
tissue_spec_model = {}

for key,value in tissue_gene_exp.items():
    tissuefile = '{}_SpecificModel.json'.format(key)
    tissue_spec_model[key] = tissuefile
    tissue_gene_file = value.split('/')[-1]
    tissue_gene_folder = os.path.join(configs.datadir, key)
    os.makedirs(tissue_gene_folder, exist_ok=True)
    cmd = ' '.join(['python3', 'create_tissue_specific_model.py', 
                      '-m', '"{}"'.format(GeneralModelFile), 
                      '-g', '"{}"'.format(tissue_gene_file),
                      '-o', '"{}"'.format(tissuefile)])
    !{cmd}

print(tissue_spec_model)

## Step 3: Identifying disease related genes by analyzing transcriptomics data of patients
Differential Expression Analysis

Only 1 disease to be analyzed, output files in data folder

*** Specify input files for step 3 here ***

In [None]:
#input filename transcriptomics data of disease
disease_gene_file = 'disease_transcriptomics_data_inputs.xlsx'

In [None]:
# Differential gene expression analysis
cmd = ' '.join(['python3', 'disease_analysis.py', 
              '-i', '"{}"'.format(disease_gene_file)])
!{cmd}

In [None]:
# load the results of step 3 to dictionary 'disease_files'
step3_results_file = os.path.join(configs.datadir, 'step2_results_files.json')
with open(step3_results_file) as json_file:
    disease_files = json.load(json_file)
print(disease_files)

## Step 4: Identification of drug targets and repurposable drugs
This step maps drug targets in metabolic models,prforms knock out simulation, and compare simulation results with disease genes and identifies drug targets and repurposable drugs

*** Specify input files for step 4 here ***

1. Instruction: A processed Drug-Target file is included in the `/root/pipelines/data/`. (Optional step) For the updated versions the users can download `Repurposing_Hub_export.txt` from [Drug Repurposing Hub](https://clue.io/repurposing-app). From the downloaded file first remove all the activators, agonists, and withdrawn drugs and then upload to to `/root/pipelines/data/`.

2. To use automatically created tissue specific models. Note: It is recommended to use refined and validated models for further analysis. User can define cutomized models in next sub-step.

In [None]:
# tissue specific models
tissue_spec_model

In [None]:
Disease_Down = disease_files['DN_Reg']
Disease_Up = disease_files['UP_Reg']
drug_raw_file = 'Repurposing_Hub_export.txt'

3. To use customized model, please specify `tissue_spec_model` manually, e.g. uncomment tissue_spec_model in the following cell.

In [None]:
# Manually specify Up and Down Regulated Genes for Disease. (Please upload manually created files `/pipelines/data/`. Use filenames as given belwo or change them accordingly.)
# Disease_Down = 'Disease_DOWN.txt'
# Disease_Up = 'Disease_UP.txt'
# drug_raw_file = 'Repurposing_Hub_export.txt'

# Manually specify tissue specific models fine-tuned by user. Change names of the files accordingly. Users can use single or multiple models here. Using multiple models, simulation time will increase.
# tissue_spec_model = {'Th1':'Th1Model.mat',
#                      'Th2':'Th2Model.mat',
#                      'Th17':'Th17Model.mat',
#                      'Naive':'NaiveModel.mat'}

# Manually specify tissue specific model created by matlab cobratoolbox. For example run, we have provided four models of CD4+ T cells (niave, Th1, Th2, and Th17) please uncomment all or any specific model
# tissue_spec_model = {'Th1':'Th1_SpecificModel_matlab.mat',
#                      'Th2':'Th2_SpecificModel_matlab.mat',
#                      'Th17':'Th17_SpecificModel_matlab.mat',
#                      'Naive':'Naive_SpecificModel_matlab.mat'}


In [None]:
# Knock out simulation for the analyzed tissues
for key,value in tissue_spec_model.items():
    tissueSpecificModelfile = value
    tissue_gene_folder = os.path.join(configs.datadir, key)
    os.makedirs(tissue_gene_folder, exist_ok=True)
    inhibitors_file = '{}_inhibitors_Entrez.txt'.format(key)
    cmd = ' '.join(['python3' , 'knock_out_simulation.py',
                  '-t', tissueSpecificModelfile,
                  '-i', inhibitors_file,
                  '-u', Disease_Up,
                  '-d', Disease_Down,
                  '-f', key,
                  '-r', drug_raw_file])
    !{cmd}
    
    # copy generated output to output folder
    cmd = ' '.join(['cp', '-a', os.path.join(configs.datadir, key), configs.outputdir])
    !{cmd}
    #break
