AlphaFold2: A role for disordered protein prediction? by Carter J. Wilson, Wing-Yiu Choy, and Mikko Karttunen
Manuscript and SI are available at: bioarxiv.
The folder (structures/) contains the PDB structures from AlphaFold2 used in this work, the names of these files correspond to their UniProt IDs.
The file (combined.dat) contains all the raw predictions and sequence metrics. The organization of that file is as follows:
- UniProtID|DisProtID
- Amino acid sequence
- DisProt-PDB annotation (1 = disordered, 0 = ordered, - = no data)
- DisProt annotation (1 = disordered, 0 = ordered)
- DSSP assignment (H = a-helix, G = 310 helix, I = pi-helix, T = H-bond turn, E = B-strand, B = B-bridge, S = bend, - = coil)
- pLDDT assignment ('|' are used to seperate values for each amino acid)
- AUCpreD disorder score
- AUCpreD binary prediction
- AUCpreD-np disorder score
- AUCpreD-np binary prediction
- DisoMine disorder score
- DisoMine binary prediction
- ESpritz-D disorder score
- ESpritz-D binary prediction
- fIDPnn disorder score
- fIDPnn binary prediction
- fIDPlr disorder score
- fIDPlr binary prediction
- Predisorder disorder score
- Predisorder binary prediction
- RawMSA disorder score
- RawMSA binary prediction
- SPOT-Disorder1 disorder score
- SPOT-Disorder1 binary prediction
- SPOT-Disorder-S disorder score
- SPOT-Disorder-S binary prediction
- SPOT-Disorder2 disorder score
- SPOT-Disorder2 binary prediction
And so on for all the proteins considered. Relevant data can be parsed by considering only the line of interest and every 28th line after that.