Skip to content

gtStyLab/SCOUR

Repository files navigation

SCOUR: A stepwise machine learning framework for predicting metabolite-dependent regulatory interactions


Folder Descriptions
-------------------
BiggerModelData: contains files to generate ODE data for bigger synthetic model.

ChassData: contains files to generate ODE data for E. coli model.

createRegSchemes: contains files to create lists of regulatory interactions tested within SCOUR.

dataPreparationFiles: contains files to autogenerate training data and generate triplicate noisy ODE data. The noisy data is smoothed and the median is taken from the triplicates.

extraFiles: contains extra files necessary for some features and plots

featureGeneration: contains files to create feature matrices and calculate their features

HynneData: contains files to generate ODE data for yeast model.

plotFigures: contains files to plot figures found in the manuscript

results: contains compact results (large datasets (e.g. training datasets) removed) found in the manuscript and used for plotting

SmallerModelData: contains files to generate ODE data for smaller synthetic model.


Main File Descriptions
----------------------
SCOUR_Ecoli_noiseless.m: Predicts interactions in the E. coli model using SCOUR with noiseless datasets.
SCOUR_Ecoli_noisy.m: Predicts interactions in the E. coli model using SCOUR with noisy datasets.
SCOUR_Ecoli_random.m: Predicts interactions in the E. coli model using a random classifier with noisy datasets.
SCOUR_Synthetic_noiseless.m: Predicts interactions in the synthetic models using SCOUR with noiseless datasets.
SCOUR_Synthetic_noisy.m: Predicts interactions in the synthetic models using SCOUR with noisy datasets.
SCOUR_Synthetic_random.m: Predicts interactions in the synthetic models using a random classifier with noisy datasets.
SCOUR_Yeast_noiseless.m: Predicts interactions in the yeast model using SCOUR with noiseless datasets.
SCOUR_Yeast_noisy.m: Predicts interactions in the yeast model using SCOUR with noisy datasets.
SCOUR_Yeast_random.m: Predicts interactions in the yeast model using a random classifier with noisy datasets.


Instructions to reproduce noiseless results in SCOUR manuscript
---------------------------------------------------------------
1) Generate noiseless autogenerated training data by running dataPreparationFiles/dataPreparation_autogeneration_noiseless.m with num_IC = 15 (for 15 different initial conditions) and reps = 30 (for 30 different repetitions).
2) Generate noiseless testing data using either: SmallerModelData/driver_genDatasets_SmallerModel.m, BiggerModelData/driver_genDatasets_BiggerModel.m, ChassData/driver_genDatasets_chassV.m, or HynneData/driver_genDatasets_hynne.m.
3) Run SCOUR_*_noiseless.m where num_IC = 15 and rep = 1 to 30 for each repetition.
4) Run plotFigures/plot_Fig3.m to reproduce Fig. 3.
Note: results may vary slightly due to random autogenerated training data.


Instructions to reproduce noisy results in SCOUR manuscript
-----------------------------------------------------------
1) Generate noisy autogenerated training data by running dataPreparationFiles/dataPreparation_autogeneration_noisy.m with nT = 50 or 15, cov = 5 or 15, num_IC = 15 (for 15 different initial conditions), and reps = 30 (for 30 different repetitions).
2) Generate noisy testing data using dataPreparationFiles/dataPreparation_*_noisy.m with nT = 50 or 15, cov = 5 or 15, num_IC = 15 (for 15 different initial conditions), and reps = 30 (for 30 different repetitions).
3) Run SCOUR_*_noisy.m and SCOUR_*_random where num_IC = 15 and rep = 1 to 30 for each repetition.
4) Run plotFigures/plot_Fig4.m to reproduce Fig. 4.
Note: results may vary slightly due to random autogenerated training data and random noise added to the testing data.

Instructions for using SCOUR on other systems
---------------------------------------------
Information needed:
-Stoichiometric matrix of system in a file named modelSTM.mat.
-Either single or triplicate samples for metabolomics and fluxomics data contained in modelData folder. Each sample file should contain similar information found in the simulated biological data (found in HynneData/odeData or ChassData/odeData).

1) Generate noisy autogenerated training data by running dataPreparationFiles/dataPreparation_autogeneration_noisy.m with nT = 50 or 15, cov = 5 or 15, num_IC = 15 (for 15 different initial conditions), and reps = 30 (for 30 different repetitions).
2) Prepare user data using dataPreparationFiles/dataPreparation_framework.m. User data should be located in modelData folder and be named sprintf('model_k-%02d_nT-%03d_cov-%02d_s%01d_rep-0%02d.mat',IC,nT,cov,s,rep), where num_IC is the number of initial conditions, nT and cov are the values used in step 1, s is the sample number of the triplicates (i.e. 1 to 3), and rep is the repetition number if there are multiple repetitions. If there are only single samples, the filenames should be labeled as sprintf('model_k-%02d_nT-%03d_cov-%02d_rep-0%02d.mat',IC,nT,cov,rep).
3) Run SCOUR_framework where nT and cov are the values used in step 1, num_IC is the number of initial conditions, and rep is the repetition number if there are multiple repetitions. Results will be saved as sprintf('model_results_IC-%02d_nT-%03d_cov-%02d_rep-%02d.mat',num_IC,nT,cov,rep) in the main folder.

About

SCOUR: Stepwise Classification Of Unknown Regulation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages