## Instruction

This jupyter notebook gives you a list of repurposing drugs for certain gene related disease. The entire process contains 4 steps:
1. Download and analyze transcriptomics and proteomics data, output a list of active gene IDs.
2. Create tissue specific models based on the list of active gene IDs (or optionally by manual fine-tuning)
3. Differential expressions analysis for the disease genes.
4. Knockout simulation, Score Genes, Generate Drug List.

You need to create the input files for each step, upload your input files to the docker container `/root/pipelines/data/`, and specify the input files in this notebook. In the original docker image, there are already some exemplary input files. You may follow the documentation and the format of the exemplary input files to create your own input files.

In [None]:
# import necessary python packages
import sys
import os
import pandas
import numpy
import json
from subprocess import call
from project import configs

# print root path of the project
print(configs.rootdir) 

## Step 1: Transcriptomics and Proteomics Generation

*** Specify input files for step 1 here ***

In [None]:
# Specific input files for step 1

# config file for transcriptomics
transcriptomics_config_file = 'GeneExpressionDataUsed.xlsx' 

# data file for proteomics
proteomics_data_file = 'ni.3693-S5.xlsx' 

# config file for proteomics
proteomics_supplementary_data_file = 'Supplementary Data 1.xlsx' 

In [None]:
# Step 1.1 Download and analyze transcriptomics
cmd = ' '.join(['python3', 'transcriptomic_gen.py', 
      '-i', '"{}"'.format(transcriptomics_config_file)])
!{cmd}

In [None]:
# Step 1.2 Analyze proteomics
cmd = ' '.join(['python3', 'proteomics_gen.py', 
      '-d', '"{}"'.format(proteomics_data_file), 
      '-s', '"{}"'.format(proteomics_supplementary_data_file)])
!{cmd}

In [None]:
# Step 1.3 Merge the gene lists of transcriptomics and proteomics, create a list of active gene IDs
cmd = ' '.join(['python3', 'merge_xomics.py', 
      '-t', '"{}"'.format(transcriptomics_config_file), 
      '-p', '"{}"'.format(proteomics_supplementary_data_file)])
!{cmd}

## Step 2 Create Tissue Specific Model

In [None]:
# Load the output of step 1, which is a dictionary that specifies the merged list of active Gene IDs for each tissue

step1_results_file = os.path.join(configs.datadir, 'step1_results_files.json')
with open(step1_results_file) as json_file:
    tissue_gene_exp = json.load(json_file)
print(tissue_gene_exp)

*** Specify input files for step 2 here ***

In [None]:
# (input) filename of General Model, Recon3D_Teff_ver2
GeneralModelFile = 'GeneralModel.mat'

# (input) filename of Tissue Gene Expression
# genefile = 'merged_Th1.csv'

# (output) filename of Tissue Specific Model
# tissuefile = 'Th1_SpecificModel.mat'

In [None]:
# create tissue specific model, the names of output files are stored in dictionary tissue_spec_model
tissue_spec_model = {}

for key,value in tissue_gene_exp.items():
    tissuefile = '{}_SpecificModel.json'.format(key)
    tissue_spec_model[key] = tissuefile
    tissue_gene_file = value.split('/')[-1]
    tissue_gene_folder = os.path.join(configs.datadir, key)
    os.makedirs(tissue_gene_folder, exist_ok=True)
    cmd = ' '.join(['python3', 'create_tissue_specific_model.py', 
                      '-m', '"{}"'.format(GeneralModelFile), 
                      '-g', '"{}"'.format(tissue_gene_file),
                      '-o', '"{}"'.format(tissuefile)])
    !{cmd}

print(tissue_spec_model)

## Step 3: Disease Gene Analysis
Differential Expression Analysis

Only 1 disease to be analyzed, output files in data folder

*** Specify input files for step 3 here ***

In [None]:
# input filename of Gene disease
disease_gene_file = 'GSE56649_RA.xlsx'

In [None]:
# Differential expression analysis
cmd = ' '.join(['python3', 'disease_analysis.py', 
              '-i', '"{}"'.format(disease_gene_file)])
!{cmd}

In [None]:
# load the results of step 3 to dictionary 'disease_files'
step3_results_file = os.path.join(configs.datadir, 'step2_results_files.json')
with open(step3_results_file) as json_file:
    disease_files = json.load(json_file)
print(disease_files)

## Step 4: Knock Out Simulation
Knock out simulation and repurposing drug List generation

*** Specify input files for step 4 here ***

1. Instruction: Download `Repurposing_Hub_export.txt` from [Drug Repurposing Hub](https://clue.io/repurposing-app) to `/pipelines/data/`

2. To use automatically created tissue specific models

In [None]:
# tissue specific models
tissue_spec_model

In [None]:
RA_Down = disease_files['DN_Reg']
RA_Up = disease_files['UP_Reg']
drug_raw_file = 'Repurposing_Hub_export.txt'

3. To use customized model, please specify `tissue_spec_model` manually, e.g. uncomment tissue_spec_model in the following cell.

In [None]:
# Manually specify Up and Down Regulated Genes for Disease
# RA_Down = 'RA_DOWN.txt'
# RA_Up = 'RA_UP.txt'
# drug_raw_file = 'Repurposing_Hub_export.txt'

# Manually specify tissue specific models fine-tuned by user
# tissue_spec_model = {'Th1':'Th1_Cell_SpecificModel4manuscript.mat',
#                      'Th2':'Th2_Cell_SpecificModel4manuscript.mat',
#                      'Th17':'Th17_Cell_SpecificModel4manuscript.mat',
#                      'Naive':'Naive_Cell_SpecificModel4manuscript.mat'}

# Manually specify tissue specific model created by matlab cobratoolbox
# tissue_spec_model = {'Th1':'Th1_SpecificModel_matlab.mat',
#                      'Th2':'Th2_SpecificModel_matlab.mat',
#                      'Th17':'Th17_SpecificModel_matlab.mat',
#                      'Naive':'Naive_SpecificModel_matlab.mat',
#                      'Treg':'Treg_SpecificModel_matlab.mat'}


In [None]:
# Knock out simulation for the analyzed tissues
for key,value in tissue_spec_model.items():
    tissueSpecificModelfile = value
    tissue_gene_folder = os.path.join(configs.datadir, key)
    os.makedirs(tissue_gene_folder, exist_ok=True)
    inhibitors_file = '{}_inhibitors_Entrez.txt'.format(key)
    cmd = ' '.join(['python3' , 'knock_out_simulation.py',
                  '-t', tissueSpecificModelfile,
                  '-i', inhibitors_file,
                  '-u', RA_Up,
                  '-d', RA_Down,
                  '-f', key,
                  '-r', drug_raw_file])
    !{cmd}
    
    # copy generated output to output folder
    cmd = ' '.join(['cp', '-a', os.path.join(configs.datadir, key), configs.outputdir])
    !{cmd}
    #break
