Skip to content

AmandineGrosfils/Master-Thesis

Repository files navigation

Master-Thesis

This repository contains all the needed scripts and data files in order to use ComboFM and the Dose model together. The idea is to use ComboFM to predict the responses of pairs of drugs, that are then used as an input of the Dose model which then predicts responses of higher order combinations (triplet of drugs in this work).

In order to do so, different scripts must be used (in the following order):

  1. makeFile.py: it creates the NCI-ALMANAC.csv file. It is a subset of the NCI-ALMANAC_full_data.csv file. The NaN values are removed, as well as the dose-response matrices that are not of size 3x3.
  2. CV-Dispatch-* .py: this script makes the different folders (see meaning of * below) for the cross-validation, as well as the test set that is kept apart from the cross-validation.
  3. ComboFM.py: it runs ComboFM and saves the predictions in a txt file. The script calls the utils.py script. The ComboFM code takes 1 argument as input: the name of the folder containing the CV folds.
  4. Input_Dose_ComboFM.py : from the predictions of ComboFM, it creates the input csv file for the Dose model. The Input_Dose_ComboFM takes 2 arguments: the name of the folder containing the result of ComboFM and the name of the file containing the predictions of ComboFM (which should be in the folder specified as the first argument).
  5. mainDose.m : it runs the Dose model and saves the predictions in a csv file. The scripts of the Dose model have obtained by conctacting the authors of the following paper : https://www.pnas.org/content/113/37/10442.short

Note that the Dose model in this work uses the modified Hill function :

The Input_Dose_validation.py script creates the input of the Dose model when we do use ComboFM previously. It creates a VALIDATION_DOSE.csv file.

The * represents the fact that there are different possible scenarios of use of ComboFM, each one having one script to divide the dataset. See the image below for a explanation/representation of the different scenarios. The color representation is the following: grey = training set, orange = test set. The names correspond to the * in the CV-Dispatch-*.py scripts.

SCENARIOS

The data files used are the following :

The set of files used in ComboFM is obtained using the R scripts : https://zenodo.org/record/4129688#.YK5SaC2FBBY (Preprocessing section). The R scripts must be run on the NCI-ALMANAC.csv file. It gives the following files:

-  cell_lines__gene_expression.csv
-  cell_lines__one-hot_encoding.csv
-  drug1__estate_fingerprints.csv
-  drug1__one-hot_encoding.csv
-  drug1_concentration__one-hot_encoding.csv
-  drug1_drug2_concentration__values.csv
-  drug2__estate_fingerprints.csv
-  drug2__one-hot_encoding.csv
-  drug2_concentration__one-hot_encoding.csv
-  drug2_drug1_concentration__values.csv
-  drugs__estate_fingerprints.csv
-  responses.csv

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages