EnGRaiN

EnGRaiN is the first supervised machine learning method to construct ensemble networks. To benefit from the typical accuracy advantages of supervised learning methods while taking into account the impossibility of knowing true networks for training, we devised a method that uses small training datasets of true positives and true negatives among gene pairs.

Dependencies

EnGRaiN requires python v 3.1 or above and depends upon the following python libraries:

numpy
matplotlib
pandas
sklearn
xgboost

The libraries can be installed via pip or conda.

EnGRaiN script

The source code for EnGRaiN is made available as engrain_ensemble.py python script in the src/ folder of this reporsitory. This script has the following usage.

python src/engrain_ensemble.py -h 

usage: engrain_ensemble.py [-h]
          {sim,ravgu,ravg,xtissue,pred_stk,pred_ens,grids_rocpr,grids_tpfp}
          ...

Train with a subset of network and predict Ensemble

positional arguments:
  {sim,ravgu,ravg,xtissue,pred_stk,pred_ens,grids_rocpr,grids_tpfp}
  sim                 Run Simulated Datasets
  ravgu               Run Rank Avg.&ScaleSum of union networks for A.
                      thaliana Datasets
  ravg                Run Rank Avg.&ScaleSum of networks for A. thaliana
                      Datasets
  xtissue             Run Cross-tissue comparisions A. thaliana Datasets
  pred_stk            Stacked Predictions for A. thaliana Networks
  pred_ens            Ensemble Predictions for A. thaliana Networks
  grids_rocpr         Run Grid Search with XGBoost params for A. thaliana
                      Datasets
  grids_tpfp          Run Grid Search with XGBoost params for A. thaliana
                      Datasets

optional arguments:
  -h, --help            show this help message and exit

Simulated Dataset Runs

In our paper, we demonstrate the effectiveness of EnGRaiN using simulated datasets. The smaller networks from simulated datsets are made available in the data/ sub-directory in this reporsitory. The larger networks from simulated datasets that are used in the paper can be downloaded via the links provided in data/README.md file.

The EnGRaiN script requires a JSON input file with the required input configurations. The input configurations for the results shown in the paper are available in results/config folder.

To run the latest version of the simulated analysis:

cd to the results folder.
Link the data directory : ln -s ../data/
Download the yeast-edge-weights-v5.csv.gz dataset to the data folder.
Run the command python engrain_ensemble.py sim v5.
AUROC/AUPR Output will generated as a table in the standard output.

A. thaliana Dataset Runs and Networks

To evaluate EnGRaiN, we also used a curated collection of \textit{A. thaliana} datasets, that we created from microarray datasets available from public repositories.

Tissue-specific Network data used for evaluation are available at the data/athaliana_raw directory. Note that this includes only the scores for positives and negatives. AUROC/AUPR can be computed using this data with the help of the input config files in the runs/ens_grid_search directory.

The final Arabidopsis Ensemble network constructed using EnGRain can be downloaded from here.

Runs on Simulated Data

The source code for the containers and aggregation scripts for simulated runs are in the github repo AluruLab/ardmore.

Microarray Data Processing

The scripts for Microarray data processing workflow are available in the github repo AluruLab/tanyard.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
data		data
notebooks		notebooks
results		results
src		src
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
related_work.md		related_work.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

notebooks

notebooks

results

results

src

src

utils

utils

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

related_work.md

related_work.md

Repository files navigation

EnGRaiN

Dependencies

EnGRaiN script

Simulated Dataset Runs

A. thaliana Dataset Runs and Networks

Runs on Simulated Data

Microarray Data Processing

About

Releases 1

Packages

Contributors 2

Languages

License

AluruLab/EnGRaiN

Folders and files

Latest commit

History

Repository files navigation

EnGRaiN

Dependencies

EnGRaiN script

Simulated Dataset Runs

A. thaliana Dataset Runs and Networks

Runs on Simulated Data

Microarray Data Processing

About

Resources

License

Stars

Watchers

Forks

Languages