Proteins, structures and data from the manuscript

AlphaFold2: A role for disordered protein prediction? by Carter J. Wilson, Wing-Yiu Choy, and Mikko Karttunen

Manuscript and SI are available at: bioarxiv.

Data organization

The folder (structures/) contains the PDB structures from AlphaFold2 used in this work, the names of these files correspond to their UniProt IDs.

The file (combined.dat) contains all the raw predictions and sequence metrics. The organization of that file is as follows:

UniProtID|DisProtID
Amino acid sequence
DisProt-PDB annotation (1 = disordered, 0 = ordered, - = no data)
DisProt annotation (1 = disordered, 0 = ordered)
DSSP assignment (H = a-helix, G = 310 helix, I = pi-helix, T = H-bond turn, E = B-strand, B = B-bridge, S = bend, - = coil)
pLDDT assignment ('|' are used to seperate values for each amino acid)
AUCpreD disorder score
AUCpreD binary prediction
AUCpreD-np disorder score
AUCpreD-np binary prediction
DisoMine disorder score
DisoMine binary prediction
ESpritz-D disorder score
ESpritz-D binary prediction
fIDPnn disorder score
fIDPnn binary prediction
fIDPlr disorder score
fIDPlr binary prediction
Predisorder disorder score
Predisorder binary prediction
RawMSA disorder score
RawMSA binary prediction
SPOT-Disorder1 disorder score
SPOT-Disorder1 binary prediction
SPOT-Disorder-S disorder score
SPOT-Disorder-S binary prediction
SPOT-Disorder2 disorder score
SPOT-Disorder2 binary prediction

And so on for all the proteins considered. Relevant data can be parsed by considering only the line of interest and every 28th line after that.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
structures		structures
README.md		README.md
combined.dat		combined.dat