GitHub - gtStyLab/SCOUR: SCOUR: Stepwise Classification Of Unknown Regulation

gtStyLab / SCOUR Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

SCOUR: Stepwise Classification Of Unknown Regulation

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
BiggerModelData		BiggerModelData
ChassData		ChassData
HynneData		HynneData
SmallerModelData		SmallerModelData
createRegSchemes		createRegSchemes
dataPreparationFiles		dataPreparationFiles
extraFiles		extraFiles
featureGeneration		featureGeneration
plotFigures		plotFigures
results		results
README.txt		README.txt
SCOUR_Ecoli_noiseless.m		SCOUR_Ecoli_noiseless.m
SCOUR_Ecoli_noisy.m		SCOUR_Ecoli_noisy.m
SCOUR_Ecoli_random.m		SCOUR_Ecoli_random.m
SCOUR_Synthetic_noiseless.m		SCOUR_Synthetic_noiseless.m
SCOUR_Synthetic_noisy.m		SCOUR_Synthetic_noisy.m
SCOUR_Synthetic_random.m		SCOUR_Synthetic_random.m
SCOUR_Yeast_noiseless.m		SCOUR_Yeast_noiseless.m
SCOUR_Yeast_noisy.m		SCOUR_Yeast_noisy.m
SCOUR_Yeast_random.m		SCOUR_Yeast_random.m
SCOUR_framework.m		SCOUR_framework.m

Repository files navigation

SCOUR: A stepwise machine learning framework for predicting metabolite-dependent regulatory interactions

Folder Descriptions
-------------------
BiggerModelData: contains files to generate ODE data for bigger synthetic model.

ChassData: contains files to generate ODE data for E. coli model.

createRegSchemes: contains files to create lists of regulatory interactions tested within SCOUR.

dataPreparationFiles: contains files to autogenerate training data and generate triplicate noisy ODE data. The noisy data is smoothed and the median is taken from the triplicates.

extraFiles: contains extra files necessary for some features and plots

featureGeneration: contains files to create feature matrices and calculate their features

HynneData: contains files to generate ODE data for yeast model.

plotFigures: contains files to plot figures found in the manuscript

results: contains compact results (large datasets (e.g. training datasets) removed) found in the manuscript and used for plotting

SmallerModelData: contains files to generate ODE data for smaller synthetic model.

Main File Descriptions
----------------------
SCOUR_Ecoli_noiseless.m: Predicts interactions in the E. coli model using SCOUR with noiseless datasets.
SCOUR_Ecoli_noisy.m: Predicts interactions in the E. coli model using SCOUR with noisy datasets.
SCOUR_Ecoli_random.m: Predicts interactions in the E. coli model using a random classifier with noisy datasets.
SCOUR_Synthetic_noiseless.m: Predicts interactions in the synthetic models using SCOUR with noiseless datasets.
SCOUR_Synthetic_noisy.m: Predicts interactions in the synthetic models using SCOUR with noisy datasets.
SCOUR_Synthetic_random.m: Predicts interactions in the synthetic models using a random classifier with noisy datasets.
SCOUR_Yeast_noiseless.m: Predicts interactions in the yeast model using SCOUR with noiseless datasets.
SCOUR_Yeast_noisy.m: Predicts interactions in the yeast model using SCOUR with noisy datasets.
SCOUR_Yeast_random.m: Predicts interactions in the yeast model using a random classifier with noisy datasets.

Instructions to reproduce noiseless results in SCOUR manuscript
---------------------------------------------------------------
1) Generate noiseless autogenerated training data by running dataPreparationFiles/dataPreparation_autogeneration_noiseless.m with num_IC = 15 (for 15 different initial conditions) and reps = 30 (for 30 different repetitions).
2) Generate noiseless testing data using either: SmallerModelData/driver_genDatasets_SmallerModel.m, BiggerModelData/driver_genDatasets_BiggerModel.m, ChassData/driver_genDatasets_chassV.m, or HynneData/driver_genDatasets_hynne.m.
3) Run SCOUR_*_noiseless.m where num_IC = 15 and rep = 1 to 30 for each repetition.
4) Run plotFigures/plot_Fig3.m to reproduce Fig. 3.
Note: results may vary slightly due to random autogenerated training data.

Instructions to reproduce noisy results in SCOUR manuscript
-----------------------------------------------------------
1) Generate noisy autogenerated training data by running dataPreparationFiles/dataPreparation_autogeneration_noisy.m with nT = 50 or 15, cov = 5 or 15, num_IC = 15 (for 15 different initial conditions), and reps = 30 (for 30 different repetitions).
2) Generate noisy testing data using dataPreparationFiles/dataPreparation_*_noisy.m with nT = 50 or 15, cov = 5 or 15, num_IC = 15 (for 15 different initial conditions), and reps = 30 (for 30 different repetitions).
3) Run SCOUR_*_noisy.m and SCOUR_*_random where num_IC = 15 and rep = 1 to 30 for each repetition.
4) Run plotFigures/plot_Fig4.m to reproduce Fig. 4.
Note: results may vary slightly due to random autogenerated training data and random noise added to the testing data.

Instructions for using SCOUR on other systems
---------------------------------------------
Information needed:
-Stoichiometric matrix of system in a file named modelSTM.mat.
-Either single or triplicate samples for metabolomics and fluxomics data contained in modelData folder. Each sample file should contain similar information found in the simulated biological data (found in HynneData/odeData or ChassData/odeData).

1) Generate noisy autogenerated training data by running dataPreparationFiles/dataPreparation_autogeneration_noisy.m with nT = 50 or 15, cov = 5 or 15, num_IC = 15 (for 15 different initial conditions), and reps = 30 (for 30 different repetitions).
2) Prepare user data using dataPreparationFiles/dataPreparation_framework.m. User data should be located in modelData folder and be named sprintf('model_k-%02d_nT-%03d_cov-%02d_s%01d_rep-0%02d.mat',IC,nT,cov,s,rep), where num_IC is the number of initial conditions, nT and cov are the values used in step 1, s is the sample number of the triplicates (i.e. 1 to 3), and rep is the repetition number if there are multiple repetitions. If there are only single samples, the filenames should be labeled as sprintf('model_k-%02d_nT-%03d_cov-%02d_rep-0%02d.mat',IC,nT,cov,rep).
3) Run SCOUR_framework where nT and cov are the values used in step 1, num_IC is the number of initial conditions, and rep is the repetition number if there are multiple repetitions. Results will be saved as sprintf('model_results_IC-%02d_nT-%03d_cov-%02d_rep-%02d.mat',num_IC,nT,cov,rep) in the main folder.