Skip to content

arnavmdas/epiphany

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Epiphany

Epiphany: predicting Hi-C contact maps from 1D epigenomic signals

Epiphany, a neural network to predict cell-type-specific Hi-C contact maps from widely available epigenomic tracks. Epiphany uses bidirectional long short-term memory layers to capture long-range dependencies and optionally a generative adversarial network architecture to encourage contact map realism and sharpness. Epiphany shows excellent generalization to held-out chromosomes within and across cell types, yields accurate TAD and interaction calls, and predicts structural changes caused by perturbations of epigenomic signals.

Model Input and Output

  • Input: any combination of epigenomic tracks for certain cell type of interest.
  • Output: Hi-C contact map of the same cell type.

Epiphany is creating a connection between 1D epigenomic signals and the 3D chromatin structure, enabling the interpretation of feature importance of epigenomic signals from specific tracks in relation to structural changes. Any combination of epigenomic tracks can be used as input. Through our ablation analysis, we found that using a two-track combination (ATAC + CTCF) along yields commendable prediction quality. Furthermore, incorporating ATAC or CTCF in conjunction with other relevant epigenomic tracks as the input set significantly enhances the predictive capabilities.

Roadmap

This repo includes scripts and related files for the Epiphany model [preprint].

Resource repo: Zenodo DOI

  • Sample datasets: GM12878_X.h5 and GM12878_y.pickle for input and target sample datasets for Epiphany training
  • Pretrained model weights:
  • pretrained_10kb.pt_model: pretrained weights of 10kb model
  • pretrained_5kb.pt_model: pretrained weights of 5kb model

Quick start training

Clone Repository

git clone https://github.com/arnavmdas/epiphany.git

Training

Move to training directory

cd epiphany/epiphany

Download dataset from google drive

mkdir ./Epiphany_dataset
cd ./Epiphany_dataset
wget --no-check-certificate https://drive.google.com/drive/u/2/folders/1UJX6cp-4s0Jbud9jovzuaqnBeORg5R8x -O GM12878_X.h5
wget --no-check-certificate https://drive.google.com/drive/u/2/folders/1UJX6cp-4s0Jbud9jovzuaqnBeORg5R8x -O GM12878_y.pickle
cd ..

Run training script

python3 adversarial.py --wandb

Prediction using pretrained models

  • Generate contact map of GM12878 chromosome 3 using pre-trained model at 10kb resolution: Google colab
  • Generate contact map of a certain region on H1ES cell chromosome 8 [chr8:53167500-55167500] with original and perturbed epigenomic signals using pretrained model at 5kb resolution: Google colab

Contact

If you have any questions, please feel free to contact Rui Yang (ruy4001@med.cornell.edu), Arnav Das (arnavmd2@uw.edu).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published