This repository is part of our study in Nature Biotechnology "Machine learning prediction of prime editing efficiency across diverse chromatin contexts" and the initial BioRxiv preprint.
Predict prime editing efficiency based on chromatin context of a genomic location in K562 cells.
Repository containing Python
package for running trained ePRIDICT
(epigenetic PRIme editing preDICTion) models.
Models were trained in K562 cells. Prediction performance may vary in different cellular contexts. Check out our publication for further details.
- PRIDICT2.0: This model focuses on the sequence-context based prime efficiency prediction and pegRNA design. We recommend to first select the most suitable pegRNA with PRIDICT2.0 and then assess its overall endogenous targetability with ePRIDICT. Access PRIDICT2.0 GitHub Repository
- Supplementary Files: Access Here
- Web Application: For an online version of ePRIDICT, visit our webapp*.
*Default model for this repository and online webapp is ePRIDICT-light
. For running the full ePRIDICT
model, check the description below.
For questions or suggestions, please either:
- Email us at nicolas.mathis@pharma.uzh.ch
- Open a GitHub issue
If find our work useful for your research please cite:
- Mathis et al., Nature Biotechnology, 2024 (ePRIDICT and PRIDICT2.0)
- Mathis & Allam et al., Nature Biotechnology, 2023 (PRIDICT)
📣 ePRIDICT
can only be installed on Linux
and Mac OS
since pybigwig
package is not available for Windows
📣
The easiest way to install and manage Python packages on various OS platforms is through Anaconda. Once installed, any package (even if not available on Anaconda channel) could be installed using pip.
-
Install Anaconda.
-
Start a terminal and run:
# clone ePRIDICT repository git clone https://github.com/Schwank-Lab/epridict.git # navigate into repository cd epridict # create conda environment and install dependencies for ePRIDICT # (only has to be done before first run/install) conda env create -f epridict_env.yml # activate the created environment conda activate epridict
-
Next, downloading ENCODE datasets is needed for prediction with ePRIDICT. Files will be downloaded in
bigwig
folder. Note: For running thefull
model, 455 datasets will be downloaded, requiring 624 GB of storage space! For thelight
model, with near on-par performance, 6 datasets will be downloaded, requiring 5.3 GB of storage space.# make download script executable: chmod +x epridict_download_encode.sh # run download script: ./epridict_download_encode.sh light # or ./epridict_download_encode.sh full
--chromosome
: Chromosome of desired location. Format: "chr1", "chr2", ... (human chromosomes; Y-chromosome not supported)--position_hg38
: Position within chromosome (hg38). Example: "1192940"
--use_full_model
: Usefull
model (455 ENCODE datasets) for prediction. Only possible when downloaded all datasets with./epridict_download_encode.sh full
. Default:light
model
python epridict_prediction.py manual --chromosome chr3 --position_hg38 44843504
# for full model:
# python epridict_prediction.py manual --chromosome chr3 --position_hg38 44843504 --use_full_model
input_filename
: Input file name - name of .csv file that has two columns [chromosome
,position_hg38
]. Seesample_epridict_batch.csv
in the./input
folder.
--output-fname
: Alternative output filename. Default isinput_filename_output.csv
. (e.g. in the example below the default output will besample_epridict_batch_output.csv
)--use_full_model
: Usefull
model (455 ENCODE datasets) for prediction. Only possible when downloaded all datasets with./epridict_download_encode.sh full
. Default:light
model
# python epridict_prediction.py batch input_filename
python epridict_prediction.py batch sample_epridict_batch.csv
# for full model and alternative output filename:
# python epridict_prediction.py batch sample_epridict_batch.csv --output-fname alternative_output_filename_batch.csv --use_full_model