The purpose of my project is the computational generation of ssDNA aptamers in respect to a given peptide via Watson-Crick pseudo-pairing probability scores.

"A total of 446 crystal structures of nucleotide–protein complexes provide 18 types of direct pseudo pairs and five types of water-mediated pseudo pairs between nucleotide bases and amino acid side-chains. Compared with the previously observed pseudo pairs in DNA/RNA–protein complexes (26), eight direct and five water-mediated new types of pseudo pairs are clustered in our data set of nucleotide–protein complexes. In addition, several pseudo pairs between bases and the peptide backbone are observed in this study.", (Jiro Kondo, 2011). https://pubmed.ncbi.nlm.nih.gov/21737431/

This program has three purposes -- predict the optimal aptamer for a given peptide, predict pseudoparing locations, give pseudopairing bonding strenth approximation. The pseudoparing scores of the primer sequencences shared by all aptamer candidates will be used to establish a minimum pseudoparing score under which an aptamer candidate will be discarded. 



In [None]:
import numpy as np
import random



In [None]:
class interaction_approximator():

    def __init__(self):
        
        #holds the possible relationship of each nucleotide to a given amino 
        #list of attribute tuples as responde to bases A,G,T
        #each base alphabet member gives aspects of its prospective pseudobond partners and their scores relative to itself. 
        #tuples arranged by (specific amino, bond frequency, number of hydrogen bonds) ordered from most probable to least
        #frequency score outweighs number of h bonds in signficance of ordering predispositions. 
        #results from the study concerning C -> amino bonding predispositions were inconclusive, no test yeilding both a probability score and hydrogen bond count. 

        #each value pertaining to a key holds a tuple of two tuples --
        #Watson-Crick pseudo-pairs between nucleotides and amino acid side chains -- first tuple
        #Watson Crick pseudopairing between nucleotide and peptide backbone -- second tuple


        self.base_predisposition_dict = {'A':{'N':((3.8, 4)), 'D':(( 2.6, 10)),'Q':((0.8, 4)), 'S':((0.4, 6)}, 'G':{'D':(34.1,2),'Q':(2.4,7)}, 'T':{'N':(8.3,2),'Q':(16.3,1)}}
        self.amino_predisposition_dict = {'Q':{'G':(2.4, 7), 'A':(0.8, 4), 'T':(16.3, 1)}, 'D':{'G':(34.1, 2),'A':(2.6,10)}, 'N':{'T':(8.3,2), 'A':(3.8,4)}, 'S':{'A':(0.4,6)}}


        #the value is in units of kcal/mol  
        self.hydrogen_bond_strength = 0.0000000001     

        #the primer sequences must be scored to aid in reducing false positives and obfuscating results. 
        #will be used to develop a minimum score to test sequences against for study validity 
        self.null_pseudopair_strength_and_peptide_fragment = (0, '')
        self.p5 = "TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG"
        self.p7 = "CTGTCTCTTATACACATCTCCGAGCCCACGAGAC"
        self.null_model_aptamer_sequence = self.p5 + self.p7
        self.null_pseudopair_bond_strength = 0
        self.null_pseudopair_bond_probability = 0

        #the number of nucleotides to be selected:
        self.N = 50

        #list of attributes that went into the development of the given aptamers
        self.random_aptamer_attributes_list = []
        
        #GFP
        self.target_peptide = "MSKGEELFTG VVPILVELDG DVNGHKFSVS GEGEGDATYG KLTLKFICTT GKLPVPWPTL VTTFSYGVQC FSRYPDHMKQ HDFFKSAMPE GYVQERTIFF KDDGNYKTRA EVKFEGDTLV NRIELKGIDF KEDGNILGHK LEYNYNSHNV YIMADKQKNG IKVNFKIRHN IEDGSVQLAD HYQQNTPIGD GPVLLPDNHY LSTQSALSKD PNEKRDHMVL LEFVTAAGIT HGMDELYK"
        self.peptide_fragment_list = []

        self.aptamer_candidate_pool_list = []
        #aminos which have adequate pseudopairing statistics 

        #set to false to alter sort contingent variable
        self.sort_by_bond_strength = True

        self.programatc_iterations = 10

    def parse_and_score(self):
        """
        The purpose of this function is to develop stability scores for each k and nmer.
        These scores represent an expectation of molecular stability, as extrapolated 
        from the sited texts empirical values and bias results.

        and then augmenting the scores of the kmers in regards to relationship of 
        the ssDNA kmer to the protein nmer.   
        
        Input: dictionary of pseudoparing predispositions from the amino perspective
        Output:
        
           """
        
        target_peptide_stripped = self.target_peptide.replace(" ", '')

        self.target_peptide = target_peptide_stripped
        #parse the peptide, treating it as the emperical path -- the aptamer is unknown and to be established! 
        
        #develop list of all possible fragments of peptide which will be used to develop aptamer candidates for comparison  
        for index in range(0, len(self.target_peptide)- self.N):
            self.peptide_fragment_list.append(self.target_peptide[index:index+self.N])

        for peptide_fragment in self.peptide_fragment_list:
            #the aminos from the peptide fragments will be used to develop an aptamer from weighted random
            self.random_aptamer_attributes_list = [] 
            for amino in peptide_fragment:
                #check if the amino is in the qualtified amino alphabet 
                if amino in self.amino_predisposition_dict.keys():

                    possible_base_list = self.amino_predisposition_dict[amino].items()
        
                    possible_base_list = list(possible_base_list)
                #pull weights from list of possible bases to be used in the random index generator 
                    base_probability_weight_tup = tuple([i[1][0] for i in possible_base_list])

                    #randomly select index of the amino accociated bases by weights -- use that index to access the selected tuple 
                    selected_base = random.choices(possible_base_list, weights= base_probability_weight_tup, k=1)[0]
                    self.random_aptamer_attributes_list.append((selected_base[0], self.amino_predisposition_dict[amino][selected_base[0]]))

                else:
                    #if there is not sufficient pseudopairing information for the current amino, N results
                    self.random_aptamer_attributes_list.append(('N', (0.0, 0)))
            
            #at the end of each run, the current prospective aptamer will be compared to the null model
            #if the bond strength is greater than the null model bond strength, the current aptamer
            #enters he final aptamer candidate pool 
            self.pseudopair_bondstrength_threshold_checker()
     
        return 

    def pseudopair_bondstrength_threshold_checker(self):
        """Will add to aptamer candidate list of the current aptamer meets the threshold value for bonding strength."""
    
        aptamer_candidate_quantities_tuple = self.statistic_calculator()
         #if the threshold value is exceeded by the candidates score, 
         #append the aptamer candidate attribute to candidate pool list
        if aptamer_candidate_quantities_tuple[2] > self.null_pseudopair_bond_probability:
            self.aptamer_candidate_pool_list.append(aptamer_candidate_quantities_tuple)
        return  


    def statistic_calculator(self):
        """Counts the number of H bonds present in the prospective 
        pseudopairing regions shared by the ssDNA and peptide 
        sequence's aminos.
        
        Input: access to data accociated with nucleotide - amino interactions
        Output: aptamer candidate list
        
         """

        #this will record the number of hydrogen bonds in the pseudopair
        #this will record the prediction score

        aptamer_candidate_sequence = ''

        hydrogen_bond_sum = sum([i[1][1] for i in self.random_aptamer_attributes_list])
        occurance_prediction_sum = sum([i[1][0] for i in self.random_aptamer_attributes_list])
        pseudopair_bond_strength = hydrogen_bond_sum * self.hydrogen_bond_strength
        
        for base in [i[0] for i in self.random_aptamer_attributes_list]: 
            aptamer_candidate_sequence += base

        return (aptamer_candidate_sequence, hydrogen_bond_sum, occurance_prediction_sum, pseudopair_bond_strength) 


    def null_model_aptamer_value_calculator(self):
        """Bonds occuring only between the primer sequences and the peptide will have the resulting score.
            This score will be used as a null model to discard aptamer candidates. 

            This score is predicted from the prospective of the null model aptamer sequence, p5+p7, as mapped onto the peptides fragments. 

        Input: null model aptamer sequence -- PCR primers
        Output: null model Watson-Crick pseudo-pairing probablity sum value and bonding strength
        """
        target_peptide_stripped = self.target_peptide.replace(" ", '')

        null_peptide = target_peptide_stripped
     
        null_peptide_fragment_list = []
        for index in range(0, len(null_peptide)- len(self.null_model_aptamer_sequence)):
            null_peptide_fragment_list.append(null_peptide[index:index+len(self.null_model_aptamer_sequence)])


        aptamer_score_list = []

        #relationships between null model and fragments are 1-1, as both base and amino sequences are known
        for peptide_fragment in null_peptide_fragment_list:
            null_aptamer_attribute_list = []
            for index in range(len(peptide_fragment)):
                
                #condition one -- the amino must be in the amino alphabet and the base must be in the base alphabet 
                if peptide_fragment[index] in self.amino_predisposition_dict.keys() and self.null_model_aptamer_sequence[index] in self.base_predisposition_dict.keys():  
                    
                    #condition 2 -- cross membership deterministic 1-1 relationship
                    if peptide_fragment[index] in self.base_predisposition_dict[self.null_model_aptamer_sequence[index]].keys() and self.null_model_aptamer_sequence[index] in self.amino_predisposition_dict[peptide_fragment[index]].keys():
                    
                        null_aptamer_attribute_list.append(self.base_predisposition_dict[self.null_model_aptamer_sequence[index]][peptide_fragment[index]])

            aptamer_score_list.append((sum([i[1] for i in null_aptamer_attribute_list]), peptide_fragment, sum([i[0] for i in null_aptamer_attribute_list])))

        self.null_pseudopair_strength_and_peptide_fragment = max(aptamer_score_list)
        
        self.null_pseudopair_bond_strength = round(self.null_pseudopair_strength_and_peptide_fragment[0] * self.hydrogen_bond_strength, 12)
        self.null_pseudopair_bond_probability = round(self.null_pseudopair_strength_and_peptide_fragment[2],3)
        print("calculated null model values:")
        print("null occurance probability sum:", self.null_pseudopair_bond_probability)
        print("null bond strength:", self.null_pseudopair_bond_strength, "kcal/mol")
    
    def final_candidate_display(self):
        """
        This function develops the final output which is sorted by either bond strength at default. 
        """

        print(" ")
        #sorting by bond strength if option activated
        if self.sort_by_bond_strength:
            self.aptamer_candidate_pool_list.sort(key = lambda x: x[3], reverse=True) 
        else:
            self.aptamer_candidate_pool_list.sort(key = lambda x: x[2], reverse=True)
        print("Aptamer Candidate Pool:")
        print("rank, ssDNA aptamer sequence, probability sum, number of H-bonds, pseudopair hydrogen bond strength kcal/mol")
        i=0
        for aptamer in self.aptamer_candidate_pool_list[0:100]:
            i+=1
            print(i, aptamer[0], round(aptamer[2],3), round(aptamer[1],3), round(aptamer[3],12))
            
        return 


    def driver(self):
        """The purpose of this function is to drive iterations of the program.
        """
        print("programatic iterations: ", self.programatc_iterations)
        print(" ")
        #find bonding strength and probability scores for primer sequences p5 and p7        
        self.null_model_aptamer_value_calculator()

        
        for i in range(0, self.programatc_iterations):
            self.parse_and_score()

        self.final_candidate_display()



In [None]:
def main():

    class_access = interaction_approximator()

    class_access.driver()

if __name__ == '__main__':
    main()


programatic iterations:  10
 
calculated null model values:
null occurance probability sum: 53.3
null bond strength: 7.6e-09 kcal/mol
 
Aptamer Candidate Pool:
rank, ssDNA aptamer sequence, probability sum, number of H-bonds, pseudopair hydrogen bond strength kcal/mol
1 GNANNNNTNNNNNANNANANTNNANNGAANNNNGNNNNNNATNNNANTAN 109.3 86 8.6e-09
2 NNNANNANANGNNANNTTTNNNNGNNNNNNGTNNNANTANNANGNTNNNA 191.7 80 8e-09
3 NANAANANNNNNANTNANNNNTNNNNNTNNGNANGNNANNATTNNNNANN 118.6 79 7.9e-09
4 NNNANNNNNANNGNANGNNGNNATTNNNNANNNNNNAANNNANTANNANG 164.6 77 7.7e-09
5 NNTNNANANTNNANNTATNNNNGNNNNNNGANNNANTANNANGNTNNNAN 190.1 77 7.7e-09
6 NNNTNNANANTNNGNNGTTNNNNGNNNNNNGTNNNANGANNANANTNNNG 213.8 76 7.6e-09
7 NTNNNNNANNGNANTNNGNNGTANNNNGNNNNNNGTNNNANGANNANANA 206.0 74 7.4e-09
8 TNNNNNANNGNANGNNANNGTTNNNNGNNNNNNGANNNANTANNANGNAN 206.0 74 7.4e-09
9 NNNTNNGNANTNNANNTGTNNNNGNNNNNNGANNNANTANNANANANNNG 218.7 74 7.4e-09
10 NANNGNANTNNANNTTANNNNGNNNNNNATNNNANGANNANGNTNNNGNN 218.7 74 7.4e-09
11 NNGNANGNNGNNAGANNNNGNNNNNNGAN

References:



https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3201857/
