Skip to content

Latest commit

 

History

History
16 lines (12 loc) · 3.47 KB

use-cases.md

File metadata and controls

16 lines (12 loc) · 3.47 KB

Use cases

A major goal of EUGENe is to streamline end-to-end DL solutions in regulatory genomics. We want to make common tasks that have been published in the field accessible to a broad user base and in doing so hope to make it easy for users to adapt these solutions to their own data. The table below lists several common DL for regulatory genomics tasks that can be analyzed in an end-to-end fashion with EUGENe:

Task Examples Potential insights gained ETL Training and evaluation End-to-end currently available? Interpretation analyses currently available Example in EUGENe use cases
Single task regression from a tabular file DeepBind, ResidualBind Identification and quantification of motif importance on continuous or binary events (e.g. RBP binding) Yes Yes Yes Filter interpretation, attribution analysis, evolution, GIA DeepBind
Single track classification of peak regions from a single bed file DeepBind Identification and quantification of motif importance on binary events (e.g. TF binding) Yes Yes Yes Filter interpretation, attribution analysis, evolution, GIA Kopp21
Multitask track classification (ChIP, ATAC, DNase, etc.) of peak regions from multiple bed files DeepSEA, DanQ, Basset, Sei, Satori Identification and quantification of motif importance on biochemical activity (e.g. TF binding, transcription, DNA accessibility, etc.). Variant effects on biochemical activity Yes Yes Yes Filter interpretation, attribution analysis, evolution, GIA Basset
Multitask track regression (ChIP, ATAC, DNase, etc.) at binned or base pair resolution Basenji, Enformer, BPNet Identification and quantification of motif importance on biochemical activity (e.g. transcription, DNA accessibility, etc. Variant effects on biochemical activity. CRE syntax rules Yes Yes Yes Filter interpretation, GIA BPNet
Single task and multitask CRE activity prediction (both regression and classification (multiclass and multilabel) DeepSTARR, MPRA-DragoNN Identification and quantification of motif importance on CRE activity. Variant effects on CRE activity. CRE syntax rules Yes Yes Yes Filter interpretation, attribution analysis, evolution, GIA DeepSTARR
Single cell ATAC-seq topic classification (multiclass classification) DeepMEL, DeepMEL2, DeepFlyBrain Identification and quantification of cell type specific motif importance. Cell type specific variant effect prediction . Cell type specific CRE syntax Requires preprocessing with pycisTopic Yes Yes, with preprocessing performed by pycisTopic Filter interpretation, attribution analysis, evolution, GIA DeepMEL
Single cell ATAC-seq cell accessibility prediction* scBasset Single cell analysis (denoising, imputation, clustering, etc.). Identification and quantification of cell type specific motif importance Requires preprocessing with ScanPy Yes Yes, with preprocessing performed by ScanPy Filter interpretation, attribution analysis, evolution, GIA scBasset

The final column provides a link to a implementated example of this task in EUGENe's accompanying "use cases" GitHub repository that are described below. Many of these are works in progress and we welcome contributions from the community to help us expand this list. We envision that this list will grow as the field of regulatory genomics continues to develop and new DL solutions are published.