Skip to content

AkeyLab/tf_idr_paper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TF IDR Paper

Code to reproduce analyses and figures for the preprint:

Disordered but Different: The Unique Characteristics of Intrinsically Disordered Regions in Human Transcription Factors
Susie Song and Joshua Akey
Preprint: bioRxiv (2026) https://doi.org/10.64898/2026.01.27.700206


What this repository contains

This repository contains scripts and workflows to:

  • annotate intrinsically disordered regions (IDRs) across the human proteome
  • map protein-level annotations to genomic coordinates
  • integrate population variation (e.g., gnomAD) with IDR vs non-IDR intervals
  • run downstream statistics and generate all manuscript figures

Repository layout

.
├── data/                 # input datasets (sources in Data Availability) 
├── envs/                 # conda env files / requirements
├── figures/              # all main and supplemental figures 
├── notebooks/            # Jupyter notebooks to generate all analysis & figures
├── results/              # generated intermediate outputs 
├── scripts/              # one-off analysis scripts
├── src/                  # reusable modules
└── README.md

Getting Started

Requirements

Install dependencies:

pip install -r envs/requirements.txt

Data Availability

Large data files are not included in this repository. Below are instructions to download required resources.

Required inputs

Input Download Description Source Version Format Example path
Proteome FASTA Link Human proteome sequences UniProt UP000005640 (2025_04) .fasta data/raw/uniprot/UP000005640_9606.fasta
UniProt ID mappings Link Reference genome build GENCODE GRCh38.p14 .fa data/proteome_annot/uniprot/UP000005640_9606.idmapping
TF list Link Human TF catalog Lambert et al. 2018 .csv data/proteome_annot/tfs/DatabaseExtract_v_1.01.csv
GenOrigin Link Gene age catalog GenOrigin 2021 .csv data/gene_age/genorigin/Homo_sapiens.csv
GenEra Link Gene age catalog GenEra - .tsv data/gene_age/genera/Homo_sapiens_gene_ages.tsv
PhyloP Link - - - `` ``
GO Association Link - - 2025_07_22 `` ``
GO OBO Link - - - `` ``
STRING Link PPI STRING v12.0 `` ``
GRNDB Link - - - `` ``
Human Protein Atlas Link HPA HPA v25.0 `` ``
GTEx Fine-mapped QTLs Link GTEx Portal GTEx v10 (2024_11) `` ``
Translational efficiency Link Supplementary Table 1 Zheng et al. 2025 `` ``
gnomAD Link Population variant sites gnomAD v4.1 exomes `` ``
1KGP Link - - - `` ``
Lethal OMIM Link Lethal Phenotypes Portal Lethal Phenotypes Portal (2024_11) `` ``
HGMD Link Disease Variants HGMD Professional `` ``

Citation

If you use this repository, cite the following:

BibTeX

@article{song_akey_2026_disordered_but_different,
  title   = {Disordered but Different: The Unique Characteristics of Intrinsically Disordered Regions in Human Transcription Factors},
  author  = {Song, Susie and Akey, Joshua},
  journal = {bioRxiv},
  year    = {2026},
  doi     = {10.64898/2026.01.27.700206},
  url     = {https://doi.org/10.64898/2026.01.27.700206}
}

About

Analysis code and figure reproduction for the TF IDR preprint

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages