StackDPP

In this work

We have proposed a new dataset for DNA binding protein (DNA-BP) prediction. The training dataset (UNIPROT1424) is available in uniprot1424.fasta. The independent test set (UNIPROT356) is available in uniprot356.fasta.
We have proposed a stacking ensemble model for DNA-BP prediction. We have named this predictor StackDPP.

The resources attached to this repository are as follows:

DataSet: This folder contains the datasets used in this work. pdb1075.fasta, pdb1035.fasta, pdb186.fasta are datasets from previous work and uniprot1424.fasta, uniprot356.fasta are the proposed new benchmark datasets.
Features: This folder contains the finally selected features for StackDPP.
Results: The results of some of our experiments are placed under this folder as CSV files. All the experimental results are available in the manuscript.
Scripts: Run the script in the script folder to generate results. Both the notebook version and Python script execute the same logic.
Models/Uniprot1424: These are some trained models on Uniprot1424 dataset. The models tied to previous literature are according to our implementation of their methodologies.

Run the predictor on your protein sequences

We will need the sequence, PSSM, and Spider output of the sequence. Two example files, example.pssm and example.spd33 have been uploaded.
Run the TestANewSequence script (either Python script or notebook version) by setting up the variables (sequence, pssmFile, spiderFile).

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Dataset		Dataset
Features		Features
Models/Uniprot1424		Models/Uniprot1424
Results		Results
Scripts		Scripts
README.md		README.md
example.pssm		example.pssm
example.spd33		example.spd33