Skip to content

cs975/protein_sectors

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Global statistical models of protein coevolution reveal higher-order sectors beyond those obtained from structure alone

Recent methods have shown promise in using pairwise sequence coevolution predictions to illuminate physical interactions and functional relationships between pairs of protein residues. As a result, there has been an increased interest in identifying higher-order correlations between sequence positions in an effort to further understand how the multiple sequence alignment (MSA) encodes the conserved biological properties of a protein family. To this end, we propose a robust and generalizable spectral clustering model that can extract interconnected networks of coevolving residues - termed "protein sectors" - using pairwise sequence coevolution predicted by a global statistical model. We assess the statistical and evolutionary origins for protein sectors extracted from the MSA for 120 protein families. We show that protein sectors are extracted from a subset of densely connected components in the sequence coevolution matrix, many of which are not present in the pairwise residue contact graph that is constructed from the protein crystal structure, revealing the existence of networks of paired residues that are not necessarily in direct physical contact but are nonetheless evolutionarily coupled. We found that protein sectors form structurally connected entities in three-dimensional space, despite sector identification being independent of protein crystal structure. Interestingly, protein families with high structural similarity do not share similar protein sectors, suggesting that nuances in the sequence coevolution matrix can differentiate between the evolutionary histories of structurally-related protein families.

Source code is available here. You can play with the spectral clustering model using our Google Colab Notebook. We have provided example protein families here.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published