# Preparint transcriptomic integration datasets for deep learning

#### As transcription integration with iMAT is sensitive to the quartile, threshold and epsilon parameters, a set of different parameters need to be explored and their impact on prediction accuracy evaluated.

Different Settings:
1. UQ:10%, LQ:90%, epsilon:1, threshold:0.1
2. UQ:20%, LQ:80%, epsilon:1, threshold:0.1
3. UQ:30%, LQ:70%, epsilon:1, threshold:0.1
4. UQ:40%, LQ:60%, epsilon:1, threshold:0.1
<br>
<br>
5. UQ:10%, LQ:90%, epsilon:10, threshold:1
6. UQ:20%, LQ:80%, epsilon:10, threshold:1
7. UQ:30%, LQ:70%, epsilon:10, threshold:1
8. UQ:40%, LQ:60%, epsilon:10, threshold:1
<br>
<br>
9. UQ:10%, LQ:90%, epsilon:20, threshold:10
10. UQ:20%, LQ:80%, epsilon:20, threshold:10
11. UQ:30%, LQ:70%, epsilon:20, threshold:10
12. UQ:40%, LQ:60%, epsilon:20, threshold:10
<br>
<br>
13. UQ:10%, LQ:90%, epsilon:40, threshold:20
14. UQ:20%, LQ:80%, epsilon:40, threshold:20
15. UQ:30%, LQ:70%, epsilon:40, threshold:20
16. UQ:40%, LQ:60%, epsilon:40, threshold:20  


Downlaoding in model

In [None]:
from pyGSLModel import download_GSL_model

model = download_GSL_model()

print(f"Number of Reactions in model : {len(model.reactions)}")
print(f"Number of Metabolites in model : {len(model.metabolites)}")
print(f"Number of Genes in model : {len(model.genes)}")

print(f"Checking gene symbol conversion :")
model.genes.get_by_id("UGT8")

Generating the list of parameters for automating simulations.

In [None]:
integration_params = [{"UQ":0.1,"LQ":0.9,"epsilon":1,"threshold":0.1},{"UQ":0.2,"LQ":0.8,"epsilon":1,"threshold":0.1},{"UQ":0.3,"LQ":0.7,"epsilon":1,"threshold":0.1},{"UQ":0.4,"LQ":0.6,"epsilon":1,"threshold":0.1},
                      {"UQ":0.1,"LQ":0.9,"epsilon":10,"threshold":1},{"UQ":0.2,"LQ":0.8,"epsilon":10,"threshold":1},{"UQ":0.3,"LQ":0.7,"epsilon":10,"threshold":1},{"UQ":0.4,"LQ":0.6,"epsilon":10,"threshold":1},
                      {"UQ":0.1,"LQ":0.9,"epsilon":20,"threshold":10},{"UQ":0.2,"LQ":0.8,"epsilon":20,"threshold":10},{"UQ":0.3,"LQ":0.7,"epsilon":20,"threshold":10},{"UQ":0.4,"LQ":0.6,"epsilon":20,"threshold":10},
                      {"UQ":0.1,"LQ":0.9,"epsilon":40,"threshold":20},{"UQ":0.2,"LQ":0.8,"epsilon":40,"threshold":20},{"UQ":0.3,"LQ":0.7,"epsilon":40,"threshold":20},{"UQ":0.4,"LQ":0.6,"epsilon":40,"threshold":20}]

Performing simulations for each set of parameters and storing output dataframes in a dictionary as well as saving the csv

In [None]:
import pandas as pd
from pyGSLModel import TCGA_iMAT_sample_integrate

integrated_dataframes = {}
i=0
for params in integration_params:
    i += 1
    iMAT_df = TCGA_iMAT_sample_integrate(model, tissue="Brain", datasets="TCGA",
                                         upper_quantile=params["UQ"],
                                         lower_quantile=params["LQ"],
                                         epsilon=params["epsilon"],
                                         threshold=params["threshold"])
    iMAT_df.to_csv(f"./iMAT_integrated_data/TCGA_iMAT_integrated_df_{i}.csv")
    integrated_dataframes[f"TCGA_iMAT_integrated_df_{i}"] = iMAT_df
    print("")
    print(f"--------------------------\nParameters set {i} completed\n--------------------------")
    print("")