Code to reproduce analyses and figures for the preprint:
Disordered but Different: The Unique Characteristics of Intrinsically Disordered Regions in Human Transcription Factors
Susie Song and Joshua Akey
Preprint: bioRxiv (2026) https://doi.org/10.64898/2026.01.27.700206
This repository contains scripts and workflows to:
- annotate intrinsically disordered regions (IDRs) across the human proteome
- map protein-level annotations to genomic coordinates
- integrate population variation (e.g., gnomAD) with IDR vs non-IDR intervals
- run downstream statistics and generate all manuscript figures
.
├── data/ # input datasets (sources in Data Availability)
├── envs/ # conda env files / requirements
├── figures/ # all main and supplemental figures
├── notebooks/ # Jupyter notebooks to generate all analysis & figures
├── results/ # generated intermediate outputs
├── scripts/ # one-off analysis scripts
├── src/ # reusable modules
└── README.md
Install dependencies:
pip install -r envs/requirements.txtLarge data files are not included in this repository. Below are instructions to download required resources.
| Input | Download | Description | Source | Version | Format | Example path |
|---|---|---|---|---|---|---|
| Proteome FASTA | Link | Human proteome sequences | UniProt | UP000005640 (2025_04) | .fasta |
data/raw/uniprot/UP000005640_9606.fasta |
| UniProt ID mappings | Link | Reference genome build | GENCODE | GRCh38.p14 | .fa |
data/proteome_annot/uniprot/UP000005640_9606.idmapping |
| TF list | Link | Human TF catalog | Lambert et al. | 2018 | .csv |
data/proteome_annot/tfs/DatabaseExtract_v_1.01.csv |
| GenOrigin | Link | Gene age catalog | GenOrigin | 2021 | .csv |
data/gene_age/genorigin/Homo_sapiens.csv |
| GenEra | Link | Gene age catalog | GenEra | - | .tsv |
data/gene_age/genera/Homo_sapiens_gene_ages.tsv |
| PhyloP | Link | - | - | - | `` | `` |
| GO Association | Link | - | - | 2025_07_22 | `` | `` |
| GO OBO | Link | - | - | - | `` | `` |
| STRING | Link | PPI | STRING | v12.0 | `` | `` |
| GRNDB | Link | - | - | - | `` | `` |
| Human Protein Atlas | Link | HPA | HPA | v25.0 | `` | `` |
| GTEx Fine-mapped QTLs | Link | GTEx Portal | GTEx | v10 (2024_11) | `` | `` |
| Translational efficiency | Link | Supplementary Table 1 | Zheng et al. | 2025 | `` | `` |
| gnomAD | Link | Population variant sites | gnomAD | v4.1 exomes | `` | `` |
| 1KGP | Link | - | - | - | `` | `` |
| Lethal OMIM | Link | Lethal Phenotypes Portal | Lethal Phenotypes Portal | (2024_11) | `` | `` |
| HGMD | Link | Disease Variants | HGMD | Professional | `` | `` |
If you use this repository, cite the following:
@article{song_akey_2026_disordered_but_different,
title = {Disordered but Different: The Unique Characteristics of Intrinsically Disordered Regions in Human Transcription Factors},
author = {Song, Susie and Akey, Joshua},
journal = {bioRxiv},
year = {2026},
doi = {10.64898/2026.01.27.700206},
url = {https://doi.org/10.64898/2026.01.27.700206}
}