Software pipeline for modeling the functional effects of mutations across an enzyme family

In brief, this software pipeline will build comparative models of the members of an enzyme family, choose low-energy models, dock a transition state structure appropriate for the enzyme mechanism, model each point mutation to the protein, and generate features for each mutation. This pipeline is designed to handle thousands of input sequences and run on Cabernet in a few days. The requirements:

phmmer (HMMER)
promals (Promals3D) and a Python 2 environment
Rosetta
PyRosetta

Most of the glue code is written in Bash and Python.

Application: a look at the GH1 family

The glycosyl hydrolase 1 family consists of over 11,000 sequences found by searching genomic databases using the Pfam's HMM. To choose which of these we are able to model, we first inspect the alignment for the known catalytic residues in this family (information which Pfam does not use). We select only proteins having all three known catalytic residues in the alignment. This code is located in alignment, with the chosen sequences in target.fa.

Comparative modeling for overall protein fold

This family (TIM-barrel) is very well represented in structure databases. We first search each sequence against the PDB, and selected 10 template structures based on coverage and identity. Each target sequence is computationally folded into a 3D structure using Rosetta's Hybridize protocol.

Model enzyme active site with transition state structure

Next, the target substrate or substrates is created, parameterized for Rosetta, and then docked into the active site. In order to do this, we infer the identity of the catalytic residues based on a multiple sequence alignment. A defined set of distance, angle, and dihedral restraints derived from quantum mechanical modeling of the enzyme reaction are used to place the modeled transition state structure in the binding pocket.

Deep mutational scan to generate feature sets for point mutations

Once we have a complete model of the enzyme with transition state structure, we can model all possible point mutations to the enzyme-substrate complex (including differnet chemical groups on the substrate) to predict function. For now, we perform a computational deep mutational scan, and assess structural features for the mutated structures.

Machine learning predictions of enzyme function

Using our training data set, which consists of quantitative determination of enzyme kinetic constants for about 200 point mutants of BglB (a member of GH1), we are able to train a machine learning model to predict the functional effect of mutations across the enzyme family using the structural features generated by our molecular modeling.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
Untitled Folder		Untitled Folder
alignments		alignments
apbs_runs		apbs_runs
data_sets		data_sets
feature_sets		feature_sets
machine_learning		machine_learning
pipeline		pipeline
protocols_to_try		protocols_to_try
pyrosetta_runs		pyrosetta_runs
qm		qm
rosetta_runs		rosetta_runs
visuals		visuals
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Software pipeline for modeling the functional effects of mutations across an enzyme family

Application: a look at the GH1 family

Comparative modeling for overall protein fold

Model enzyme active site with transition state structure

Deep mutational scan to generate feature sets for point mutations

Machine learning predictions of enzyme function

About

Releases

Packages

Languages

dacarlin/bglb_family

Folders and files

Latest commit

History

Repository files navigation

Software pipeline for modeling the functional effects of mutations across an enzyme family

Application: a look at the GH1 family

Comparative modeling for overall protein fold

Model enzyme active site with transition state structure

Deep mutational scan to generate feature sets for point mutations

Machine learning predictions of enzyme function

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages