# Preparint transcriptomic integration datasets for deep learning

#### As transcription integration with iMAT is sensitive to the quartile, threshold and epsilon parameters, a set of different parameters need to be explored and their impact on prediction accuracy evaluated.

Different Settings:
1. UQ:10%, LQ:90%, epsilon:10, threshold:1
2. UQ:20%, LQ:80%, epsilon:10, threshold:1
3. UQ:40%, LQ:60%, epsilon:10, threshold:1
<br>
<br>
4. UQ:10%, LQ:90%, epsilon:20, threshold:10
5. UQ:20%, LQ:80%, epsilon:20, threshold:10
6. UQ:40%, LQ:60%, epsilon:20, threshold:10
<br>
<br>
10. UQ:10%, LQ:90%, epsilon:40, threshold:20
11. UQ:20%, LQ:80%, epsilon:40, threshold:20
12. UQ:40%, LQ:60%, epsilon:40, threshold:20  


Downlaoding in model

In [1]:
from pyGSLModel import download_GSL_model

model = download_GSL_model()

print(f"Number of Reactions in model : {len(model.reactions)}")
print(f"Number of Metabolites in model : {len(model.metabolites)}")
print(f"Number of Genes in model : {len(model.genes)}")

print(f"Checking gene symbol conversion :")
model.genes.get_by_id("UGT8")

Downloading  and Reading in Model
Model succesfully downloaded and read in.
Number of Reactions in model : 2312
Number of Metabolites in model : 2015
Number of Genes in model : 2887
Checking gene symbol conversion :


0,1
Gene identifier,UGT8
Name,G_UGT8
Memory address,0x16117cf6cd0
Functional,True
In 2 reaction(s),"MAR00920, MAR00919"


Generating the list of parameters for automating simulations.

In [4]:
integration_params = [{"UQ":0.1,"LQ":0.9,"epsilon":10,"threshold":1},{"UQ":0.2,"LQ":0.8,"epsilon":10,"threshold":1},{"UQ":0.4,"LQ":0.6,"epsilon":10,"threshold":1},
                      {"UQ":0.1,"LQ":0.9,"epsilon":20,"threshold":10},{"UQ":0.2,"LQ":0.8,"epsilon":20,"threshold":10},{"UQ":0.4,"LQ":0.6,"epsilon":20,"threshold":10},
                      {"UQ":0.1,"LQ":0.9,"epsilon":40,"threshold":20},{"UQ":0.2,"LQ":0.8,"epsilon":40,"threshold":20},{"UQ":0.4,"LQ":0.6,"epsilon":40,"threshold":20}]

Performing simulations for each set of parameters and storing output dataframes in a dictionary

In [5]:
import pandas as pd
from pyGSLModel import TCGA_iMAT_sample_integrate

integrated_dataframes = {}
i=0
for params in integration_params:
    i += 1
    iMAT_df = TCGA_iMAT_sample_integrate(model, tissue="Brain", datasets="TCGA",
                                         upper_quantile=params["UQ"],
                                         lower_quantile=params["LQ"],
                                         epsilon=params["epsilon"],
                                         threshold=params["threshold"])
    integrated_dataframes[f"TCGA_iMAT_integrated_df_{i}"] = iMAT_df
    print(f"--------------------------\nParameters set {i} completed\n--------------------------")

Simulations Performed:1/694
Simulations Performed:2/694
Simulations Performed:3/694
Simulations Performed:4/694
Simulations Performed:5/694
Simulations Performed:6/694
Simulations Performed:7/694
Simulations Performed:8/694
Simulations Performed:9/694
Simulations Performed:10/694
Simulations Performed:11/694
Simulations Performed:12/694
Simulations Performed:13/694
Simulations Performed:14/694
Simulations Performed:15/694
Simulations Performed:16/694
Simulations Performed:17/694
Simulations Performed:18/694
Simulations Performed:19/694
Simulations Performed:20/694
Simulations Performed:21/694
Simulations Performed:22/694
Simulations Performed:23/694
Simulations Performed:24/694
Simulations Performed:25/694
Simulations Performed:26/694
Simulations Performed:27/694
Simulations Performed:28/694
Simulations Performed:29/694
Simulations Performed:30/694
Simulations Performed:31/694
Simulations Performed:32/694
Simulations Performed:33/694
Simulations Performed:34/694
Simulations Performed:3

Saving Dataframes

In [None]:
for key, df in integrated_dataframes.items():
    df.to_csv(f"./iMAT_integrated_data/{key}.csv")