Skip to content

Singh-Lab/dSPRINT

Repository files navigation

dSPRINT

A machine learning framework predicting interaction sites in human protein domains

Our dSPRINT predictor can run on any human Pfam protein domain, and return per-position predictions of interaction-sites for five different ligand types: DNA, RNA, ion, peptide, and small molecule.

All rules

If you use data or scripts from this repository, please cite:

Etzion-Fuchs A., Todd D.A. and Singh M., (2021) "dSPRINT: predicting DNA, RNA, ion, peptide and small molecule interaction sites within protein domains", Nucleic Acids Research https://doi.org/10.1093/nar/gkab356

Given a domain input file: 1) all the required external datasets are downloaded, 2) features are calculated for each of the input domain positions, 3) our trained predictors are run and return per-position prediction results.

This repository can be used as a computation pipeline, and uses Snakemake as the underlying engine.

Essentially, given a file input.hmm, with one or multiple domains which follow the syntax of a Pfam-A entry, the following computational graph of rules is run:

All rules

Output files are generated in the output folder, with the final result per-position ligand binding score generated in the file output/binding_scores.csv

        ligand_type     binding_score   domain  match_state
0       dna     0.9916359186172485      zf-C2H2 1
1       dna     0.9872528910636902      zf-C2H2 10
2       dna     0.997771143913269       zf-C2H2 11
3       dna     0.997983455657959       zf-C2H2 12
4       dna     0.9957016110420227      zf-C2H2 13
5       dna     0.9956439733505249      zf-C2H2 14

Read the Getting Started guide on how to run dSPRINT.

About

Per-position prediction of protein domains interaction-sites

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages