EIR-auto-GP

EIR-auto-GP: Automated genomic prediction (GP) using deep learning models with EIR.

WARNING: This project is in alpha phase. Expect backwards incompatible changes and API changes.

Overview

EIR-auto-GP is a comprehensive framework for genomic prediction (GP) tasks, built on top of the EIR deep learning framework. EIR-auto-GP streamlines the process of preparing data, training, and evaluating models on genomic data, automating much of the process from raw input files to results analysis. Key features include:

Support for .bed/.bim/.fam PLINK files as input data.
Automated data processing and train/test splitting.
Takes care of launching a configurable number of deep learning training runs.
SNP-based feature selection based on GWAS, deep learning-based attributions, and a combination of both.
Ensemble prediction from multiple training runs.
Analysis and visualization of results.

Installation

First, ensure that plink2 is installed and available in your PATH.

Then, install EIR-auto-GP using pip:

pip install eir-auto-gp

Important: The latest version of EIR-auto-GP supports Python 3.11. Using an older version of Python will install a outdated version of EIR-auto-GP, which likely be incompatible with the current documentation and might contain bugs. Please ensure that you are installing EIR-auto-GP in a Python 3.11 environment.

Usage

Please refer to the Documentation for examples and information.

Workflow

The rough workflow can be visualized as follows:

Data processing: EIR-auto-GP processes the input .bed/.bim/.fam PLINK files and .csv label file, preparing the data for model training and evaluation.
Train/test split: The processed data is automatically split into training and testing sets, with the option of manually specifying splits.
Training: Configurable number of training runs are set up and executed using EIR's deep learning models.
SNP feature selection: GWAS based feature selection, deep learning-based feature selection with Bayesian optimization, and mixed strategies are supported.
Test set prediction: Predictions are made on the test set using all training run folds.
Ensemble prediction: An ensemble prediction is created from the individual predictions.
Results analysis: Performance metrics, visualizations, and analysis are generated to assess the model's performance.

Citation

If you use EIR-auto-GP in a scientific publication, we would appreciate if you could use the following citation:

@article{sigurdsson2021deep,
  title={Deep integrative models for large-scale human genomics},
  author={Sigurdsson, Arnor Ingi and Westergaard, David and Winther, Ole and Lund, Ole and Brunak, S{\o}ren and Vilhjalmsson, Bjarni J and Rasmussen, Simon},
  journal={bioRxiv},
  year={2021},
  publisher={Cold Spring Harbor Laboratory}
}

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
.github/workflows		.github/workflows
docs		docs
eir_auto_gp		eir_auto_gp
misc		misc
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
codecov.yml		codecov.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
tox.ini		tox.ini

License

arnor-sigurdsson/EIR-auto-GP

Folders and files

Latest commit

History

Repository files navigation

EIR-auto-GP

Overview

Installation

Usage

Workflow

Citation

About

Resources

License

Stars

Watchers

Forks

Languages