Repository containing notebooks to compute statistics in the paper "A unified approach to evolutionary conservation and population constraint in proteins".
Author: Stuart MacGowan (smacgowan@dundee.ac.uk)
The analysis is based on aggregated statistics we computed from data accessed from the following databases:
- Pfam-A database of protein families (version 31.0)
- gnomAD database of human genetic variation (version 2.1.1).
- ClinVar database of human genetic variants and their clinical significance.
- PDBe database of protein structures.
These were processed into a single dataset of aggregated statistics for each Pfam domain, which is provided in data/pfam-gnomAD-clinvar-pdb-colstats_c7c3e19.csv.gz
.
The figures in the manuscript are generated by the notebooks in the figure folders under manuscript-figures.
- Figure 1B: Frequency distribution of gnomAD missense variants across all amino acid residues in Pfam domains.
- Figure 1C: Frequency distributions of gnomAD missense variants over alignment columns of Pfam domains.
- Figure 1D: Total number of gnomAD missense or synonymous variants vs. the Shenkin diversity at each position across SH2 domains.
- Figure 2A: Cumulative distributions of the normalised missense enrichment score or normalised Shenkin for positions where the consensus relative solvent accessibility class is core, partially exposed, or surface.
- Figure 3A: The conservation plane: classifying residues in Pfam domains with evolutionary conservation and population constraint.
- Figure 4A: Odds ratios of the enrichment of protein-ligand interacting residues from BioLiP within sites in different conservation plane categories.
- Figure 4B: PPI site enrichments.
- Figure 4C: ClinVar Pathogenic site enrichments relative to the gnomAD missense background.
Stuart A. MacGowan, Fábio Madeira, Thiago Britto-Borges et al. A unified approach to evolutionary conservation and population constraint in protein domains highlights structural features and pathogenic sites, 13 July 2023, PREPRINT (Version 1) available at Research Square [https://doi.org/10.21203/rs.3.rs-3160340/v1]
This repository and its contents were created by Stuart A. MacGowan (@stuartmac) at the University of Dundee and is provided under the MIT license. See LICENSE for details.