[View in Colaboratory](https://colab.research.google.com/github/djinnome/jmvae/blob/master/MOMA.ipynb)

In [2]:
!git clone https://github.com/djinnome/MOMA.git

Cloning into 'MOMA'...
remote: Counting objects: 87, done.[K
remote: Compressing objects: 100% (59/59), done.[K
remote: Total 87 (delta 43), reused 63 (delta 23), pack-reused 0[K
Unpacking objects: 100% (87/87), done.


# MOMA

## Steps to run MOMA
### Step 1: Install software requirements

* R 3.4.0 or above
* Python 3.6.3 or above
* Python TensorFlow package
* Python numpy package
* Python pandas package

Tip: once you install python TensorFlow, you can simply install all other required python packages by
<code>pip install -r requirements.txt</code>.


In [3]:
!cd MOMA && pip install -r requirements.txt

Collecting numpy==1.14.2 (from -r requirements.txt (line 1))
[?25l  Downloading https://files.pythonhosted.org/packages/76/4d/418dda252cf92bad00ab82d6b2a856e7843b47a5c2f084aed34b14b67d64/numpy-1.14.2-cp27-cp27mu-manylinux1_x86_64.whl (12.1MB)
[K    100% |████████████████████████████████| 12.1MB 1.8MB/s 
Collecting pandas==0.20.3 (from -r requirements.txt (line 3))
[?25l  Downloading https://files.pythonhosted.org/packages/77/61/222973b3373d127386124ce715dc9680111b74f2255d13384fcc4a6ff463/pandas-0.20.3-cp27-cp27mu-manylinux1_x86_64.whl (22.4MB)
[K    100% |████████████████████████████████| 22.4MB 1.2MB/s 
Installing collected packages: numpy, pandas
  Found existing installation: numpy 1.14.5
    Uninstalling numpy-1.14.5:
      Successfully uninstalled numpy-1.14.5
  Found existing installation: pandas 0.22.0
    Uninstalling pandas-0.22.0:
      Successfully uninstalled pandas-0.22.0
Successfully installed numpy-1.14.2 pandas-0.20.3



### Step 2: Download the dataset
Then you download the Ecomcis transcriptome data from [here](https://www.dropbox.com/sh/t3zs3jbmq1efj3q/AAATQNlJimWT1bnTI9uK81S9a?dl=0) and place it in Dataset folder.


In [4]:
!cd MOMA/Dataset && wget -c https://www.dropbox.com/sh/t3zs3jbmq1efj3q/AAD4KYrRuLfPFJBQXvbA1yuDa/Ecomics.transcriptome.no_avg.v8.txt

--2018-07-13 18:58:17--  https://www.dropbox.com/sh/t3zs3jbmq1efj3q/AAD4KYrRuLfPFJBQXvbA1yuDa/Ecomics.transcriptome.no_avg.v8.txt
Resolving www.dropbox.com (www.dropbox.com)... 162.125.9.1, 2620:100:601f:1::a27d:901
Connecting to www.dropbox.com (www.dropbox.com)|162.125.9.1|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /sh/raw/t3zs3jbmq1efj3q/AAD4KYrRuLfPFJBQXvbA1yuDa/Ecomics.transcriptome.no_avg.v8.txt [following]
--2018-07-13 18:58:17--  https://www.dropbox.com/sh/raw/t3zs3jbmq1efj3q/AAD4KYrRuLfPFJBQXvbA1yuDa/Ecomics.transcriptome.no_avg.v8.txt
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc601df3b677309595e0dfbb3289.dl.dropboxusercontent.com/cd/0/inline/ALRXygVLFesfgb-igq0EOAm4fCSVHnuoML62Dlhsp2OQivJxRSIRIDjLrvw40-TyiFMdyGoTxGJYQ9o3_lYRAryLhcWDssEKOH5NVIp4LOF56jtiKasjet4vGOAwingWqd55cCiILFgjxeAaqnnfLLKQDmM933KUmaw1T8lfftXeld8szR45Wv-SpJdQTe4uWkg/file [following]
-

### Step 3: Preprocess the dataset
This step will preprocess the original dataset in the format that MOMA can train. For this, type
```Rscript preprocess_dataset.R Dataset/Ecomics.transcriptome.no_avg.v8.txt Dataset/Ecomics.transcriptome_with_meta.avg.v8.txt```

Note that the code reads information from <code>Dataset/Meta.txt</code>, <code>Dataset/Meta.Medium.txt</code>, <code>Dataset/Meta.Strain.txt</code>. This will save the preprocessed dataset in the file <code>Dataset/Ecomics.transcriptome_with_meta.avg.v8.txt</code>


In [17]:
!cd MOMA/Dataset && curl ftp://ftp.broadinstitute.org/outgoing/biocyc/Ecomics.transcriptome_with_meta.avg.v8.txt.bz2 > Ecomics.transcriptome_with_meta.avg.v8.txt.bz2

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13.5M  100 13.5M    0     0  4632k      0  0:00:03  0:00:03 --:--:-- 4589k


In [0]:
!cd MOMA/Dataset && bunzip2 Ecomics.transcriptome_with_meta.avg.v8.txt.bz2


### Step 4: Run MOMA
Then you can run MOMA (prediction of transcriptomic response from characteristics of experimental condition) by

```python3 run_moma.py Dataset/Ecomics.transcriptome_with_meta.avg.v8.txt Dataset/GRN.txt OPTIMIZATION_METHOD CONDITION_INDEX_TO_TEST```

* <code>Dataset/Ecomics.transcriptome_with_meta.avg.v8.txt</code> is the dataset to be used for leave-one-condition-out cross-validation.
* <code>Dataset/GRN.txt</code> is the list of gene-regulatory relations (gene-regulatory network). This information is used to regulate the recurrent weight matrix. That is, we constrain the weight matrix not to have nonzero weights on any connections between genes that are not in the gene-regulatory network.
* <code>OPTIMIZATION_METHOD</code> can be SGD or RMSProp. To speed up the model training, RMSprop is recommended for <code>OPTIMIZATION_METHOD</code>.
* <code>CONDITION_INDEX_TO_TEST</code> is the index of condition (that is, a row index, ranging from 0 to 492 as there are 493 conditions or 493 rows in the <code>Dataset/Ecomics.transcriptome_with_meta.avg.v8.txt</code>) to test its prediction from the model that is built on the rest of conditions (Leave-one-condition-out cross-validation; refer to [Kim et al. Nature commms 2016](https://www.nature.com/articles/ncomms13090) for more information).

Please note the following before running the model:
* Note that some of test conditions will not produce prediction results if the conditions are cross-validatable (for example, strain of the test condition is JM109 but this strain is not in the training data).
* The prediction results will display in the console in PCC metric (that is, PCC between predicted expression levels and known expression levels for the test condition) in comparison to the wildtype baseline (that is, PCC between mean expression levels of wildtype profiles and known expression levels for the test condition).


In [25]:
!cd MOMA && python3 run_moma.py Dataset/Ecomics.transcriptome_with_meta.avg.v8.txt Dataset/GRN.txt SGD 455



Using TensorFlow backend.
[DATA READ] 493 samples, 4 timesteps, 521 features
[MODEL CREATED]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
simple_rnn_1 (SimpleRNN)     (None, 4096)              18915328  
Total params: 18,915,328
Trainable params: 18,915,328
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
[MODEL EVALUATION 455] MG1655.MD103.O2-starvation.na_WT PCC: -0.011275226568870889, WT Baseline: 0.34458203113416247


In [23]:
!ls -l MOMA/Dataset

total 301712
-rw-r--r-- 1 root root 270852299 Jul 13 18:58 Ecomics.transcriptome.no_avg.v8.txt
-rw-r--r-- 1 root root  37080542 Jul 13 19:33 Ecomics.transcriptome_with_meta.avg.v8.txt
-rw-r--r-- 1 root root    157143 Jul 13 18:54 GRN.txt
-rw-r--r-- 1 root root     39881 Jul 13 18:54 Meta.Medium.txt
-rw-r--r-- 1 root root     36321 Jul 13 18:54 Meta.Strain.txt
-rw-r--r-- 1 root root    770436 Jul 13 18:54 Meta.txt


In [38]:
transcriptome = pd.read_table('MOMA/Dataset/Ecomics.transcriptome.no_avg.v8.txt')
transcriptome['Strain'] = transcriptome['Cond'].str.split('.').str.get(0)
transcriptome['Medium'] = transcriptome['Cond'].str.split('.').str.get(1)
transcriptome['Stress'] = transcriptome['Cond'].str.split('.').str.get(2)
transcriptome['GP'] = transcriptome['Cond'].str.split('.').str.get(3)
transcriptome.set_index(['Strain','GP','Medium','Stress','ID','Cond'])




Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,m.b4412,m.b1994,m.b2861,m.b4428,m.b0264,m.b1545,m.b4421,m.b4453,m.b4409,m.b4410,...,m.b2133,m.b2779,m.b0118,m.b0688,m.b1241,m.b1276,m.b2029,m.b3236,m.b0114,m.b2417
Strain,GP,Medium,Stress,ID,Cond,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
MG1655,na_WT,MD026,RP-overexpress,T0568,MG1655.MD026.RP-overexpress.na_WT,0.085563,0.130231,0.198816,0.087420,0.115444,0.078182,0.072614,0.085038,0.095487,0.070092,...,0.120548,1.000711,0.329642,0.065464,0.369252,0.105639,0.833678,0.560209,0.542496,0.872275
MG1655,na_WT,MD026,RP-overexpress,T0569,MG1655.MD026.RP-overexpress.na_WT,0.085563,0.130231,0.198816,0.087420,0.115444,0.078182,0.072614,0.085038,0.095487,0.070092,...,0.144902,1.017368,0.338428,0.054674,0.374115,0.107077,0.847071,0.569329,0.551248,0.887193
MG1655,na_WT,MD026,RP-overexpress,T0570,MG1655.MD026.RP-overexpress.na_WT,0.085563,0.130231,0.198816,0.087420,0.115444,0.078182,0.072614,0.085038,0.095487,0.070092,...,0.151406,1.031590,0.345845,0.041630,0.382057,0.114552,0.857121,0.579380,0.561100,0.900418
DH5alpha,na_WT,MD064,none,T1297,DH5alpha.MD064.none.na_WT,0.039532,0.130231,0.198816,0.062236,0.115444,0.078182,0.044316,0.030313,0.066920,0.048741,...,0.164769,1.011920,0.360810,0.225402,0.396012,0.123063,0.852014,0.583266,0.566177,0.887618
CG2,na_WT,MD064,none,T1301,CG2.MD064.none.na_WT,0.058716,0.130231,0.198816,0.061663,0.115444,0.078182,0.069920,0.057086,0.063326,0.057184,...,0.064437,1.620909,0.271331,0.185681,0.446123,0.097883,0.152928,0.779862,0.749163,1.369439
CG2,na_WT,MD064,none,T1305,CG2.MD064.none.na_WT,0.020921,0.130231,0.198816,0.046549,0.115444,0.078182,0.028799,0.028547,0.029025,0.034223,...,0.148782,0.998718,0.343995,0.204845,0.379294,0.104296,0.836800,0.567952,0.550803,0.873482
W3110,hns(delete)93-1_VAR,MD001,Indole,T1315,W3110.MD001.Indole.hns(delete)93-1_VAR,0.067421,0.130231,0.198816,0.052340,0.115444,0.078182,0.054862,0.046723,0.046673,0.058324,...,0.150606,0.274574,0.065469,0.260221,0.266876,0.179273,0.060648,0.290093,0.179846,0.048546
MG1655,r+m+_RM,MD018,none,T2781,MG1655.MD018.none.r+m+_RM,0.085563,0.130231,0.198816,0.087420,0.115444,0.078182,0.072614,0.085038,0.095487,0.070092,...,0.235772,0.247652,0.136907,0.310121,0.670411,0.140452,0.327610,0.057386,0.305324,0.052487
MG1655,r+m+_RM,MD018,none,T2782,MG1655.MD018.none.r+m+_RM,0.085563,0.130231,0.198816,0.087420,0.115444,0.078182,0.072614,0.085038,0.095487,0.070092,...,0.219810,0.210352,0.059361,0.310995,0.604744,0.134962,0.383991,0.166837,0.370596,0.049199
MG1655,r-m+_RM,MD018,none,T2783,MG1655.MD018.none.r-m+_RM,0.098442,0.130231,0.198816,0.080292,0.115444,0.078182,0.079884,0.105426,0.081951,0.080055,...,0.226585,0.121957,0.317395,0.307791,0.183950,0.114442,0.367570,0.357599,0.341868,0.081346


In [28]:
!cd MOMA/Dataset && wget -c https://www.dropbox.com/s/nh4d2iuj2tgjhkf/ecomics.proteome.v5.csv

--2018-07-13 20:47:18--  https://www.dropbox.com/s/nh4d2iuj2tgjhkf/ecomics.proteome.v5.csv
Resolving www.dropbox.com (www.dropbox.com)... 162.125.82.1, 2620:100:6032:1::a27d:5201
Connecting to www.dropbox.com (www.dropbox.com)|162.125.82.1|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /s/raw/nh4d2iuj2tgjhkf/ecomics.proteome.v5.csv [following]
--2018-07-13 20:47:19--  https://www.dropbox.com/s/raw/nh4d2iuj2tgjhkf/ecomics.proteome.v5.csv
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc796f937a702ac09400b82a89e8.dl.dropboxusercontent.com/cd/0/inline/ALTz2XNoErvmy-uAzntk66ZfTPlTi-0O9-Zss5Q4Cds1N08RHJF_Ez2GaB6zDcH9Q8BM3QrYhkB2lxyfVro2yd4qPPIqj9aSz71d-JQs_50hVWcRH2FON1GxKTgdHTXgzZBLsvgZu1IFTg35JmUK0NH0PyoeamJLxYTvIHxSP-L7c2KEQ4xzaFp9-hK-UZ1hgn4/file [following]
--2018-07-13 20:47:20--  https://uc796f937a702ac09400b82a89e8.dl.dropboxusercontent.com/cd/0/inline/ALTz2XNoErvmy-u

In [31]:
!ls -l MOMA/Dataset


total 301832
-rw-r--r-- 1 root root    115705 Jul 13 20:47 ecomics.proteome.v5.csv
-rw-r--r-- 1 root root 270852299 Jul 13 18:58 Ecomics.transcriptome.no_avg.v8.txt
-rw-r--r-- 1 root root  37080542 Jul 13 19:33 Ecomics.transcriptome_with_meta.avg.v8.txt
-rw-r--r-- 1 root root    157143 Jul 13 18:54 GRN.txt
-rw-r--r-- 1 root root     39881 Jul 13 18:54 Meta.Medium.txt
-rw-r--r-- 1 root root     36321 Jul 13 18:54 Meta.Strain.txt
-rw-r--r-- 1 root root    770436 Jul 13 18:54 Meta.txt


In [30]:
import pandas as pd
proteomics = pd.read_csv('MOMA/Dataset/ecomics.proteome.v5.csv',index_col=['Strain','MediumID','Medium','Stress','GP'])
proteomics

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,m.b0002,m.b0003,m.b0004,m.b0008,m.b0014,m.b0023,m.b0026,m.b0031,m.b0032,m.b0033,...,m.b4254,m.b4258,m.b4260,m.b4375,m.b4376,m.b4381,m.b4383,m.b4384,m.b4391,m.b4401
Strain,MediumID,Medium,Stress,GP,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1
N3433,MD004,MOPS+Glu(0.4%),none,none,1.0,0.0,1.0,0.451,0.318,0.402,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.047,0.75,1.0,1.0,0.103,0.148,0.0
MG1655,MD002,M9+Gly(40%),none,none,0.415,0.551,0.283,0.24,0.094,0.242,0.398,0.427,0.77,0.667,...,0.404,0.678,0.325,0.165,0.348,0.775,0.397,0.329,0.775,0.75
MG1655,MD003,M9+Lac(40%),none,none,0.215,0.335,0.281,0.235,0.112,0.698,0.063,0.193,0.341,0.504,...,0.172,0.378,0.043,0.002,0.258,0.483,0.767,0.122,0.171,0.178
MG1655,MD001,M9+Glu(40%),none,none,0.422,0.602,0.217,0.395,1.0,0.62,0.539,0.556,0.983,0.835,...,0.65,0.86,0.361,0.216,1.0,0.917,0.101,0.382,0.776,0.819
MG1655,MD001,M9+Glu(40%),butanol,none,0.144,0.268,0.239,0.255,0.167,0.2,0.202,0.239,0.423,0.359,...,0.272,0.37,0.149,0.077,0.197,0.499,0.074,0.112,0.334,0.352
BW25113,MD066,synthetic+Glu,none,b0756(KO),0.206,0.31,0.218,0.06,0.119,0.293,0.257,0.309,0.546,0.464,...,0.275,0.478,0.196,0.174,0.279,0.439,0.253,0.136,0.432,0.455
BW25113,MD066,synthetic+Glu,none,b2388(KO),0.243,0.353,0.248,0.12,0.141,0.35,0.293,0.352,0.622,0.529,...,0.313,0.544,0.225,0.202,0.334,0.524,0.302,0.162,0.491,0.518
BW25113,MD066,synthetic+Glu,none,b0688(KO),0.165,0.264,0.185,0.052,0.095,0.231,0.219,0.263,0.464,0.395,...,0.234,0.406,0.165,0.144,0.221,0.347,0.2,0.107,0.367,0.387
BW25113,MD066,synthetic+Glu,none,b4025(KO),0.08,0.166,0.117,0.067,0.045,0.102,0.138,0.165,0.293,0.249,...,0.147,0.256,0.1,0.08,0.098,0.153,0.088,0.048,0.231,0.244
BW25113,MD066,synthetic+Glu,none,b3916(KO),0.012,0.089,0.062,0.034,0.005,0.0,0.073,0.088,0.156,0.133,...,0.079,0.137,0.049,0.029,0.0,0.0,0.0,0.0,0.123,0.13
