# `sanity_checks`

Data consistency checks for BglB data set

In [1]:
from skbio import Protein 
import pandas
import numpy as np 

Collect data sets

In [2]:
df = pandas.read_csv( 'data_sets/data_set.csv', index_col=0 )
plos = pandas.read_csv( '/Users/alex/Documents/bagel-benchmark/data_sets/experimental/plos_2016.csv', index_col=0 )

### Basic sanity checks 

First, we check that the named residue is in fact the native BglB residue. 

In [3]:
#collector
#func_list = []
#def sanity_check( func ):
#    func_list.append( func )
    
#sanity check functions 
#@sanity_check 
def check_name( dat ):
    '''
    As in all these functions, 
    
    Input: 
        `dat`
        pandas Series with kcat, km, etc. params 
        
    Returns:
        True 
        if name checks out 
        
        False
        if the given native amino acid doesn't match the 
        amino acid in the native BglB sequence 
    '''
    
    pos = int( dat.name[1:-1] )
    
    protein = Protein.read( './data_sets/reference/bglb.pep' )
    nat = str( protein[pos-1] )
    
    if nat == dat.name[0]:
        return True 
    else:
        return False 
    
def true_check( dat ):
    return True

def false_check( dat ):
    return False

In [4]:
def translate_boolean_to_emoji( boo ):
    if boo == True:
        return '✔️' #✅
    elif boo == False:
        return '❌'

In [5]:
func_list = [
    true_check, 
    false_check, 
    check_name,
]

In [6]:
def run_tests():
    print('Mutant\t' + '\t'.join( [ i.__name__ for i in func_list ] ))
    for mutant_name, dat in df.iterrows():
        if mutant_name != 'BglB':
            print( mutant_name, end='\t' ) 
            for f in func_list:
                print( translate_boolean_to_emoji(f(dat)), end='\t' )
            print()

### Results of sanity checks

### Discussion 

Wilson's idea, automate justin's thought process/questions as algorithm

In [7]:
run_tests()

Mutant	true_check	false_check	check_name
G12N	✔️	❌	✔️	
S14A	✔️	❌	✔️	
T15A	✔️	❌	✔️	
S16N	✔️	❌	✔️	
S16A	✔️	❌	✔️	
S17A	✔️	❌	✔️	
S17E	✔️	❌	✔️	
Y18A	✔️	❌	✔️	
Q19P	✔️	❌	✔️	
Q19C	✔️	❌	✔️	
Q19A	✔️	❌	✔️	
Q19S	✔️	❌	✔️	
S32L	✔️	❌	✔️	
W34A	✔️	❌	✔️	
V52G	✔️	❌	✔️	
F72A	✔️	❌	✔️	
R76A	✔️	❌	✔️	
I91E	✔️	❌	✔️	
H101R	✔️	❌	✔️	
H119E	✔️	❌	✔️	
H119N	✔️	❌	✔️	
H119A	✔️	❌	✔️	
W120A	✔️	❌	✔️	
W120H	✔️	❌	✔️	
W120F	✔️	❌	✔️	
D121F	✔️	❌	✔️	
E154D	✔️	❌	✔️	
N163K	✔️	❌	✔️	
N163E	✔️	❌	✔️	
N163A	✔️	❌	✔️	
N163D	✔️	❌	✔️	
N163C	✔️	❌	✔️	
E164G	✔️	❌	✔️	
E164R	✔️	❌	✔️	
E164A	✔️	❌	✔️	
Y166P	✔️	❌	✔️	
C167A	✔️	❌	✔️	
C167Q	✔️	❌	✔️	
L171A	✔️	❌	✔️	
L171R	✔️	❌	✔️	
T175R	✔️	❌	✔️	
E177L	✔️	❌	✔️	
E177K	✔️	❌	✔️	
E177A	✔️	❌	✔️	
H178R	✔️	❌	✔️	
H178A	✔️	❌	✔️	
A192S	✔️	❌	✔️	
T218A	✔️	❌	✔️	
N220R	✔️	❌	✔️	
N220G	✔️	❌	✔️	
N220H	✔️	❌	✔️	
N220A	✔️	❌	✔️	
N220Y	✔️	❌	✔️	
M221A	✔️	❌	✔️	
E222Y	✔️	❌	✔️	
E222K	✔️	❌	✔️	
E222R	✔️	❌	✔️	
E222H	✔️	❌	✔️	
E222Q	✔️	❌	✔️	
E222A	✔️	❌	✔️	
A236E	✔️	❌	✔️	
R240E	✔️	❌	✔️	
R240D	✔️	❌	✔️	
R240K	✔️	❌	✔️	
R240A	✔️	❌	✔️	
I2