# Quick and Dirty Point Mutation on a Chain

This is a temperory solution for point mutations on a chain. There will be dedicated module for this functionality for the future releases.

In [1]:
from crimm.Modeller import ResidueFixer
from crimm.Modeller.TopoFixer import fix_chain
from crimm import TopologyLoader, ParameterLoader
from crimm.Fetchers import fetch_rcsb
from Bio.Seq import Seq
from Bio.Data.PDBData import protein_letters_3to1, protein_letters_1to3



Function for removing non side chain atoms 

In [2]:
def remove_nonbackbone_atoms(residue):
    """remove sidechain atoms and any hydrogen in a residue"""
    nonbackbone_atoms = []
    for atom in residue:
        if atom.name not in ('CA', 'N', 'C', 'O'):
            nonbackbone_atoms.append(atom)
    for atom in nonbackbone_atoms:
        residue.detach_child(atom.name)

Load residue topology and ff parameter for protein

In [3]:
topo = TopologyLoader('protein')
param = ParameterLoader('protein')
param.fill_ic(topo)

In [4]:
structure = fetch_rcsb(
    '5IGV',
    include_solvent=False
)



Get Chain A of Model 1

In [5]:
chain = structure[1]['A']
chain

NGLWidget()

<Polypeptide(L) id=A Residues=298>
  Description: Macrolide 2'-phosphotransferase II


## Single Point Mutation
In this example, we mutate Phe-18 to Tyr

In [6]:
mutate_18 = chain[18]
mutate_18

NGLWidget()

<Residue PHE het=  resseq=18 icode= >


In [7]:
mutate_18.resname = 'TYR'
remove_nonbackbone_atoms(mutate_18)

As we can see, the side chain atoms are stripped

In [8]:
mutate_18

NGLWidget()

<Residue TYR het=  resseq=18 icode= >


Now, we need to regenerate residue topology before build the residue. 

The function `topo.generate_residue_topology` will look up the `resname` attribute of a residue and generate the topology accordingly

In [9]:
topo.generate_residue_topology(mutate_18)
# Routines for fixing residue from topology definition
fixer = ResidueFixer()
fixer.load_residue(mutate_18)
fixer.remove_undefined_atoms()
built_atoms = fixer.build_missing_atoms()

At this point, the *Phe-18* becomes *Tyr-18*. Note, we did not build hydrogen here, but it can be done with `fixer.build_hydrogens()`

In [10]:
mutate_18

NGLWidget()

<Residue TYR het=  resseq=18 icode= >


## Update the Canonical Sequence
Finally, we need to update the canonical sequence to match what we just mutated. 

As we can see from the `masked_seq` the *phe-18 (F)* is marked in red as missing because crimm still expects a *Phe* in that location.

This is cumbersome, but we will wrap and automate these in the future releases

In [11]:
chain.masked_seq.show()

[91mM[0mSKDIKQVIEIAKKHNL[91mF[0mLKEETIQFNESGLDFQAVFAQDNNGIDWVLRLPRREDVMPRTKVEKQALDLVNKYAISFQAPNWIIYTEELIAYKKLDGVPAGTIDHNIGNYIWEIDINNVPELFHKSLGRVLAELHSIPSNKAAALDLVVHTPEEARMSMKQRMDAVRAKFGVGENLWNRWQAWLNDDDMWPKKTGLIHGDVHAGHTMIDKDANVTGLIDWTEAKVTDVSHDFIFNYRAFGEEGLEALILAYKEIGGYYWPKMKEHIIELNAAYPVSIAEFALVSGIEEYEQMAKEALEV[91mQ[0m[91mG[0m[91mS[0m


In [12]:
resseq = mutate_18.id[1]
new_can_seq = ''
for seq_num, code in enumerate(chain.can_seq, start=1):
    # update the sequence with point mutation 'F' to 'Y' at seq num 18
    if seq_num == resseq:
        code = 'Y'
    new_can_seq += code
chain.can_seq = Seq(new_can_seq)
# Also, the reported_res attr needs to be updated too
chain.reported_res[resseq-1] = (resseq, mutate_18.resname)

# at this point, everything in the chain is updated according to the point mutation
chain.masked_seq.show()

[91mM[0mSKDIKQVIEIAKKHNLYLKEETIQFNESGLDFQAVFAQDNNGIDWVLRLPRREDVMPRTKVEKQALDLVNKYAISFQAPNWIIYTEELIAYKKLDGVPAGTIDHNIGNYIWEIDINNVPELFHKSLGRVLAELHSIPSNKAAALDLVVHTPEEARMSMKQRMDAVRAKFGVGENLWNRWQAWLNDDDMWPKKTGLIHGDVHAGHTMIDKDANVTGLIDWTEAKVTDVSHDFIFNYRAFGEEGLEALILAYKEIGGYYWPKMKEHIIELNAAYPVSIAEFALVSGIEEYEQMAKEALEV[91mQ[0m[91mG[0m[91mS[0m


## Batch Mutation
In the example below, we will mutate isoleucine to leucine in the first 100 residues in the chain

In [13]:
isoleucine_mutation = []
for residue in chain.residues[:100]:
    if residue.resname == 'ILE':
        isoleucine_mutation.append(residue)

for residue in isoleucine_mutation:
    residue.resname = 'LEU'
    remove_nonbackbone_atoms(residue)

Generate all residue topology and fill any missing IC

In [14]:
topo.generate_chain_topology(chain, first_patch='ACE', last_patch='CT3')
param.fill_ic(topo)



## Rebuild Mutated Residues
There is a utility function `fix_chain` that build all missing heavy atoms as well as hydrogens for a chain

In [15]:
built_atoms = fix_chain(chain)

Same thing here, the isoleucines have not been updated in the sequence. We need to update the sequence information in the chain.

In [16]:
chain.masked_seq.show()

[91mM[0mSKD[91mI[0mKQV[91mI[0mE[91mI[0mAKKHNLYLKEET[91mI[0mQFNESGLDFQAVFAQDNNG[91mI[0mDWVLRLPRREDVMPRTKVEKQALDLVNKYA[91mI[0mSFQAPNW[91mI[0m[91mI[0mYTEEL[91mI[0mAYKKLDGVPAGTIDHNIGNYIWEIDINNVPELFHKSLGRVLAELHSIPSNKAAALDLVVHTPEEARMSMKQRMDAVRAKFGVGENLWNRWQAWLNDDDMWPKKTGLIHGDVHAGHTMIDKDANVTGLIDWTEAKVTDVSHDFIFNYRAFGEEGLEALILAYKEIGGYYWPKMKEHIIELNAAYPVSIAEFALVSGIEEYEQMAKEALEV[91mQ[0m[91mG[0m[91mS[0m


Function to batch update mutated residue on the sequence

In [17]:
def update_chain_seq_info(mutated_res_list, chain):
    # mutated residues dict keyed by seq number
    mut_res_dict = {res.id[1]: res for res in mutated_res_list}
    new_can_seq = ''
    for seq_num, code in enumerate(chain.can_seq, start=1):
        if seq_num in mut_res_dict:
            res = mut_res_dict[seq_num]
            code = protein_letters_3to1[res.resname]
            chain.reported_res[seq_num-1] = (seq_num, res.resname)
        new_can_seq += code
    chain.can_seq = Seq(new_can_seq)

In [18]:
update_chain_seq_info(isoleucine_mutation, chain)

In [19]:
chain.masked_seq.show()

[91mM[0mSKDLKQVLELAKKHNLYLKEETLQFNESGLDFQAVFAQDNNGLDWVLRLPRREDVMPRTKVEKQALDLVNKYALSFQAPNWLLYTEELLAYKKLDGVPAGTIDHNIGNYIWEIDINNVPELFHKSLGRVLAELHSIPSNKAAALDLVVHTPEEARMSMKQRMDAVRAKFGVGENLWNRWQAWLNDDDMWPKKTGLIHGDVHAGHTMIDKDANVTGLIDWTEAKVTDVSHDFIFNYRAFGEEGLEALILAYKEIGGYYWPKMKEHIIELNAAYPVSIAEFALVSGIEEYEQMAKEALEV[91mQ[0m[91mG[0m[91mS[0m


## At this point, all point mutations has completed

In [20]:
isoleucine_mutation

[<Residue LEU het=  resseq=5 icode= >,
 <Residue LEU het=  resseq=9 icode= >,
 <Residue LEU het=  resseq=11 icode= >,
 <Residue LEU het=  resseq=24 icode= >,
 <Residue LEU het=  resseq=44 icode= >,
 <Residue LEU het=  resseq=75 icode= >,
 <Residue LEU het=  resseq=83 icode= >,
 <Residue LEU het=  resseq=84 icode= >,
 <Residue LEU het=  resseq=90 icode= >]

In [21]:
isoleucine_mutation[0]

NGLWidget()

<Residue LEU het=  resseq=5 icode= >


## Point Mutation from a Given Dictionary
In this example, we define the location and identity of the residue, and map it to the mutation

In [22]:
mutate_dict = {
    (32, 'ASP'): 'ASN',
    (93, 'LYS'): 'ARG'
}

In [23]:
mutated_res = []
for (resseq, org_resname), new_resname in mutate_dict.items():
    residue = chain[resseq]
    assert residue.resname == org_resname
    residue.resname = new_resname
    remove_nonbackbone_atoms(residue)
    mutated_res.append(residue)

In [24]:
topo.generate_chain_topology(chain, first_patch='ACE', last_patch='CT3')
param.fill_ic(topo)



In [25]:
built_atoms = fix_chain(chain)
update_chain_seq_info(mutated_res, chain)

In [29]:
mutated_res

[<Residue ASN het=  resseq=32 icode= >, <Residue ARG het=  resseq=93 icode= >]

In [30]:
mutated_res[0]

NGLWidget()

<Residue ASN het=  resseq=32 icode= >


In [31]:
mutated_res[1]

NGLWidget()

<Residue ARG het=  resseq=93 icode= >


## DON'T FORGET TO MINIMIZE THE STRUCTURE AFTER THIS!!