## Rosetta models of each mutation in our data set 

First, we'll make models of all of the enzymes using Rosetta. All we need is a wild type PDB structure, `bglb.pdb`, and a list of mutations that we have kinetic data for, `mutant_list`. 

In [5]:
protocols = [
    'protocols/benchmark', 
    'protocols/playground', 
]

In [6]:
! ls protocols/benchmark/mutant_list

protocols/benchmark/mutant_list


In [7]:
! head -5 protocols/benchmark/mutant_list

A192S
A227W
A236E
A249E
A356A


In [5]:
# this one for MutateResidue 

from Bio.SeqUtils import IUPACData
nstruct = 100

with open( 'mutant_list' ) as fn:
    mutants = [ i.strip() for i in fn.readlines() if len( i ) > 1 ] 
    print len( mutants ), 'mutants'

nstruct = 100

runs = [
    '-parser:script_vars target={} new_res={} -suffix _{}_{:04d}\n'.format( 
        m[1:-1], IUPACData.protein_letters_1to3[ m[-1] ].upper(), m, i )
    for i in range( nstruct )
    for m in mutants 
]

with open( 'list', 'w' ) as fn:
    fn.write( ''.join( runs ) )
    
! wc -l list 
! head -2 list 

200 mutants
20000 list
-parser:script_vars target=192 new_res=SER -suffix _A192S_0000
-parser:script_vars target=227 new_res=TRP -suffix _A227W_0000


In [9]:
# run the simulations on Cabernet w/ SLURM 
 
! echo sbatch sub.sh

sbatch sub.sh


In [3]:
# concatenate all the score files together 

import pandas 
from glob import glob 

sfs = [ pandas.read_csv( i, sep='\s+' ) for i in glob( 'out/*sc' ) ]
sf = pandas.concat( sfs )
sf['name'] = sf.description.str.split( '_' ).str[ 1 ]
sf.to_csv( 'data_sets/enzyme_design_features.csv' )

In [None]:
# things to add: 
# 
#  + how to recover mutants that the scheduler dropped 