# BIOINF529 Homework #3 - Winter 2022
This homework is worth **10% of your final grade**.

The exam is due before the next course module begins as enforced by Canvas.

## Coding by Contract
We (the Instructors) promise a fair, impartial, and objective means of grading such that you (the Students) follow the tenets of Coding by Contract:
1. You must not modify/delete any of the existing code in this document (besides the `pass` statements)
* Your functions must use the function signatures as written
* Your functions must return/print the expected results (as written)

If these are followed correctly, your submission should be compatible with the automated testing suite. Therefore, the more tests your code passes, the less scrutiny your code will be under by our review. We do not care *how* you get there, just that you get there *correctly*.

## Submission
Please rename this notebook to **homework3_uniqname.ipynb** for submission. 

For example:
> `homework3_apboyle.ipynb`

We will *only* grade the most recent submission of your exam.

## Late Policy
Each submission will receive a **25%** penalty per day (up to three days) that the assignment is late.

After that, the student will receive a **0** for the homework.

## Academic Honor Code
You may consult with others. However, all answers must be your own and code comparison software will be used to enforce this rule. You are allowed to ask questions at office hours but the answers given will be high-level/conceptual in nature.


### Baum-Welch
In class we have now implemented aspects of evaluating a hidden Markov model through Viterbi and the Forward, Backward, and Forward-Backward algorithms. Each of these algorithms requires prior knowledge of dataset labels in order to generate the state inititaion, transition, and emission probabilities. You will now implement the Baum-Welch algorithm which will learn these probabilities for your model in an unsupervised manner. For this assignement, you may use any of the previously provided class implementations as part of your submission.

### The Baum-Welch Algorithm
The Baum-Welch algorithm, much like the other algorithms in the last few classes, consists of a series of steps: Initialization, Expectation, Maximization, and Termination. It is a type of Expectation-Maximization algorithm and so you will loop between the Expectation and Maximization steps until you reach convergence.

Initialization: <br>
Set arbitrary parameters for $\Theta$ in $A$ transition probabilities, $E$ emission probabilities, and $B$ starting probabilities. <br>

Expectation:<br>
For each input sequence $x$ where sequence index $j = 1 \dots n $:<br>
Calculate $f_{k}(i)$ matrix for sequence $x$ using the Forward algorithm.<br>
Calculate $r_{k}(i)$ matrix for sequence $x$ using the Backward algorithm.<br>
Update transition matrix $A_{kl}$ by summing over all positions ($i=1\dots T-1$):<br>
$A_{kl} = \sum_{j}1/P(x^{j}) \sum_{i}f_{k}^{j}(i)a_{kl}e_{l}(x_{i+1}^{j})r_{l}^{j}(i+1)$ <br>
where $x^{j}$, $f^{j}$, and $r^{j}$ are sequence, forward matrix, and backward matrix for squence index $j$ respectively.<br>
Update emission matrix $E_{k}$ by summing over all positions ($i=1\dots T$):<br>
$E_{k}(\sigma) = \sum_{j}1/P(x^{j}) \sum_{i|x_{i}^{j}=\sigma}f_{k}^{j}(i)r_{k}^{j}(i)$ <br>
where the inner sum is only over positions $i$ that have emission $\sigma$. <br>
Update initial state matrix:<br>
$B_{k} = \sum_{j}1/P(x^{j}) * f_{k}^{j}(0)r_{k}^{j}(0)$

Maximization:<br>
Calculate new model parameters as we did with Markov Chains:<br>
$a_{kl} = A_{kl}/\sum_{l}A_{kl}$<br>
$e_{k}(\sigma) = E_{k}(\sigma) / \sum_{\sigma}E_{k}(\sigma)$<br>
$b_{k} = B_{k} / \sum_{k}{B_k}$

<Br>Termination: <br>
Stop at convergence as measured by log likelihood or is maximum number of iterations has been reached.

Please add to the HMM class a new function baum_welch(self, sequences, pseudocount=1e-100) that takes as input a list of sequences to train the model. I have provided empty implentations of Viterbi, Forward, Backward, and Forward-Backward, but I recommend you place my solutions from class into these functions.

In [1]:
import numpy as np
import json

In [11]:
class HMM(object):
    """Main class for HMM objects
    
    Class for holding HMM parameters and to allow for implementation of
    functions associated with HMMs
    
    Private Attributes:
        _alphabet (set): The alphabet of emissions
        _hidden_states (set): Hidden states in the model
        _transitions (dict(dict)): A dictionary of transition probabilities
        _emissions (dict(dict)): A dictionary of emission probabilities
        _initial (dict): A dictionary of initial state probabilities

    """

    __all__ = ['viterbi', 'forward', 'backward', 'forward_backward']

    def __init__(self, alphabet, hidden_states, A=None, E=None, B=None, seed=None):
        self._alphabet = set(alphabet)
        self._hidden_states = set(hidden_states)
        self._transitions = A
        self._emissions = E
        self._initial = B
        self._seed = seed
        if(self._transitions == None):
            self._initialize_random(self._alphabet, self._hidden_states, self._seed)
            
    def __str__(self):
        out_text = [f'Alphabet: {self._alphabet}',
                    f'Hidden States: {self._hidden_states}',
                    f'Initial Probabilities: {json.dumps(self._initial, sort_keys = True, indent=4)}',
                    f'Transition Probabilities: {json.dumps(self._transitions, sort_keys = True, indent=4)}',
                    f'Emission Probabilities: {json.dumps(self._emissions, sort_keys = True, indent=4)}']
        return '\n'.join(out_text)
    
    @classmethod
    def __dir__(cls):
        return cls.__all__
    
    def _emit(self, cur_state, symbol):
        return self._emissions[cur_state][symbol]
    
    def _transition(self, cur_state, next_state):
        return self._transitions[cur_state][next_state]
    
    def _init(self, cur_state):
        return self._initial[cur_state]

    def _states(self):
        for k in self._hidden_states:
            yield k
    
    
    def _get_alphabet(self):
        for sigma in self._alphabet:
            yield sigma
            
    def _initialize_random(self, alphabet, states, seed):
        alphabet = list(set(alphabet))
        alphabet.sort()
        states = list(set(states))
        states.sort()
        self._alphabet = alphabet
        self._hidden_states = states

        #Initialize empty matrices A and E with pseudocounts
        A = {}
        E = {}
        I = {}
        np.random.seed(seed=seed)
        I_rand = np.random.dirichlet(np.ones(len(self._hidden_states)))
        for i, state in enumerate(self._states()):
            E[state] = {}
            A[state] = {}
            I[state] = I_rand[i]
            E_rand = np.random.dirichlet(np.ones(len(self._alphabet)))
            A_rand = np.random.dirichlet(np.ones(len(self._hidden_states)))
            for j, sigma in enumerate(self._get_alphabet()):
                E[state][sigma] = E_rand[j]
            for j, next_state in enumerate(self._states()):
                A[state][next_state] = A_rand[j]
                
        self._transitions = A
        self._emissions = E
        self._initial = I
        return
    
    
    
    def viterbi(self, sequence):
        """ The viterbi algorithm for decoding a string using a HMM

        Args:
            sequence (list): a list of valid emissions from the HMM

        Returns:
            result (list): optimal path through HMM given the model parameters
                           using the Viterbi algorithm
        
        Pseudocode for Viterbi:
            Initialization (𝑖=0): 𝑣𝑘(𝑖)=𝑒𝑘(𝜎)𝑏𝑘.
            Recursion (𝑖=1…𝑇): 𝑣𝑙(𝑖)=𝑒𝑙(𝑥𝑖) max𝑘(𝑣𝑘(𝑖−1)𝑎𝑘𝑙); 
                                ptr𝑖(𝑙)= argmax𝑘(𝑣𝑘(𝑖−1)𝑎𝑘𝑙).
            Termination: 𝑃(𝑥,𝜋∗)= max𝑘(𝑣𝑘(𝑙)𝑎𝑘0); 
                             𝜋∗𝑙= argmax𝑘(𝑣𝑘(𝑙)𝑎𝑘0).
            Traceback: (𝑖=𝑇…1): 𝜋∗𝑖−1= ptr𝑖(𝜋∗𝑖).
        """
        #Taken from Alan's Class Solutions. 
        
        # Initialization (𝑖=0): 𝑣𝑘(𝑖)=𝑒𝑘(𝜎)𝑏𝑘.
        # Initialize trellis and traceback matrices
        # trellis will hold the vi data as defined by Durbin et al.
        # and trackback will hold back pointers
        trellis = {} # This only needs to keep the previous column probabilities
        traceback = [] # This will need to hold all of the traceback data so will be an array of dicts()
        for state in self._states():
            trellis[state] = np.log10(self._init(state)) + np.log10(self._emit(state, sequence[0])) # b * e(0) for all k
            
        # Next we do the recursion step:
        # Recursion (𝑖=1…𝑇): 𝑣𝑙(𝑖)=𝑒𝑙(𝑥𝑖) max𝑘(𝑣𝑘(𝑖−1)𝑎𝑘𝑙); 
        #                 ptr𝑖(𝑙)= argmax𝑘(𝑣𝑘(𝑖−1)𝑎𝑘𝑙).
        for t in range(1, len(sequence)):  # For each position in the sequence
            trellis_next = {}
            traceback_next = {}

            for next_state in self._states():    # Calculate maxk and argmaxk
                k={}
                for cur_state in self._states():
                    k[cur_state] = trellis[cur_state] + np.log10(self._transition(cur_state, next_state)) # k(t-1) * a
                argmaxk = max(k, key=k.get)
                trellis_next[next_state] =  np.log10(self._emit(next_state, sequence[t])) + k[argmaxk] # k * e(t)
                traceback_next[next_state] = argmaxk
                
            #Overwrite trellis 
            trellis = trellis_next
            #Keep trackback pointer matrix
            traceback.append(traceback_next)
            
        # Termination: 𝑃(𝑥,𝜋∗)= max𝑘(𝑣𝑘(𝑙)𝑎𝑘0); 
        #                  𝜋∗𝑙= argmax𝑘(𝑣𝑘(𝑙)𝑎𝑘0).
        max_final_state = max(trellis, key=trellis.get)
        max_final_prob = trellis[max_final_state]
                
        # Traceback: (𝑖=𝑇…1): 𝜋∗𝑖−1= ptr𝑖(𝜋∗𝑖).
        result = [max_final_state]
        for t in reversed(range(len(sequence)-1)):
            result.append(traceback[t][max_final_state])
            max_final_state = traceback[t][max_final_state]

        return result[::-1]

  

    def forward(self, sequence):
        """ The forward algorithm for calculating probability of sequence given HMM

        Args:
            sequence (list): a list of valid emissions from the HMM

        Returns:
            result (float, list of dicts): P(x) and the f matrix as a list
        
        Pseudocode for Forward:
            Initialization (𝑖=0): 𝑓𝑘(0)=𝑒𝑘(𝜎0)𝑏𝑘.
            Recursion (𝑖=1…𝑇): 𝑓𝑙(𝑖)=𝑒𝑙(𝜎𝑖)∑𝑘(𝑓𝑘(𝑖−1)𝑎𝑘𝑙)
            Termination: 𝑃(𝑥)=∑𝑘𝑓𝑘(𝑇)
        """
        #Taken from Alan's Class Solutions. 
        
        # Initialization (𝑖=0): 𝑓𝑘(0)=𝑒𝑘(𝜎0)𝑏𝑘.
        # Initialize f
        f = [] # For this algorithm it is helpful to keep this entire matrix
        f.append({})
        for state in self._states():
            f[-1][state] = self._init(state) * self._emit(state, sequence[0]) # b * e(0) for all k

        # Next we do the recursion step:
        # Recursion (𝑖=1…𝑇): 𝑓𝑙(𝑖)=𝑒𝑙(𝜎𝑖)∑𝑘(𝑓𝑘(𝑖−1)𝑎𝑘𝑙) 
        for i in range(1, len(sequence)):  # For each position in the sequence
            f.append({})
            for next_state in self._states(): # For each state
                f[-1][next_state] = 0
                for cur_state in self._states():
                    f[-1][next_state] += f[i-1][cur_state] * self._transition(cur_state, next_state) # sum of f(i-1) * a
                f[-1][next_state] = self._emit(next_state, sequence[i]) * f[-1][next_state] # f * e(i)
        
        # Termination: 𝑃(𝑥)=∑𝑘𝑓𝑘(𝑇)
        Px = 0
        for state in self._states():
            Px += f[-1][state]
            
        return Px, f



    def backward(self, sequence):
        """ The backward algorithm for calculating probability of sequence given HMM

        Args:
            sequence (list): a list of valid emissions from the HMM

        Returns:
            result (float, list of dicts): P(x) and the b matrix as a list
        
        Pseudocode for Backward:
            Initialization (𝑖=T): 𝑟𝑘(𝑇)=1.
            Recursion (𝑖=𝑇−1…1): 𝑟𝑘(𝑖)=∑𝑙𝑟𝑙(𝑖+1)𝑎𝑘𝑙𝑒𝑙(𝜎𝑖+1)
            Termination: 𝑃(𝑥)=∑𝑙𝑟𝑘(1)𝑒𝑙(𝜎1)𝑏𝑙
        """
        #Taken from Alan's Class Solutions. 
        
        # Initialization (𝑖=T): 𝑟𝑘(𝑇)=1.
        # Initialize r
        r = [] # For this algorithm it is helpful to keep this entire matrix
        r.insert(0, {})
        for state in self._states():
            r[0][state] = 1 # 1 for all k

        # Next we do the recursion step:
        # Recursion (𝑖=T-1…1): 𝑟𝑘(𝑖)=∑𝑙𝑟𝑙(𝑖+1)𝑎𝑘𝑙𝑒𝑙(𝜎𝑖+1)
        for i in range(len(sequence)-1, 0, -1):  # For each position in the sequence in reverse
            r.insert(0, {}) # append a new item at the beginning
            for prev_state in self._states(): # For each state
                r[0][prev_state] = 0
                for next_state in self._states():
                    r[0][prev_state] += r[1][next_state] * self._transition(prev_state, next_state) * self._emit(next_state, sequence[i])

        # Termination: 𝑃(𝑥)=∑𝑙𝑟𝑘(1)𝑒𝑙(𝜎1)𝑏𝑙
        Px = 0
        for state in self._states():
            Px += r[0][state] * self._init(state) * self._emit(state, sequence[0])
                        
        return Px, r
    
    
    def forward_backward(self, sequence):
        """ The forward-backward algorithm for calculating marginal posteriors given HMM

        Args:
            sequence (list): a list of valid emissions from the HMM

        Returns:
            posterior (list of dicts): all posteriors as a list
        
        Pseudocode for Forward-Backward:
            Calculate f[] as forward algorithm
            Calculate r[] as backward algorithm
            for all i in sequence
                for all states
                    posterior[i][state] = f[i][state] * r[i][state] / Px
        """    
        #Taken from Alan's Class Solutions. 
        
        #Calculate forward and backward matrices
        f_Px, f_matrix = self.forward(sequence)
        r_Px, r_matrix = self.backward(sequence)
    
        posterior = []
        for i in range(0, len(sequence)):  # For each position in the sequence
            posterior.append({})
            for state in self._states(): # For each state
                posterior[i][state] = f_matrix[i][state] * r_matrix[i][state] / f_Px
                
        return posterior
    
    
    def baum_welch(self, sequences, pseudocount=1e-100):
        """ The baum-welch algorithm for unsupervised HMM parameter learning

        Args:
            sequence (list): a list of sequences containing valid emissions from the HMM
            pseudocount (float): small pseudocount value (default: 1e-100)

        Returns:
            None but updates the current HMM model parameters:
             self._transitions, self._emissions, self._initial
        
        """  
        #Initialization Step:
        #Set arbitrary parameters for alpha in A (transition), E (emission), and B (initial) prbabilitiess
        
        #initialize the empty dictionaries. 
        initial_prob = {}
        transition_prob = {}
        emission_prob = {}
        
        for current_state in self._hidden_states: #genome or island. 
            initial_prob[current_state] = pseudocount
            #creating the nested dictionaries for the other ones. 
            transition_prob[current_state] = {}
            emission_prob[current_state] = {}
            #second for loops to fill in the nested dicts. 
            for next_state in self._hidden_states:
                transition_prob[current_state][next_state] = pseudocount  
            #take care of the A,C,G,T for the emission_prob matrix. 
            for alph in self._alphabet:
                emission_prob[current_state][alph] = pseudocount
        # print(initial_prob, transition_prob, emission_prob)

        
        #Expectation Step:
        #For each input sequence  𝑥  where sequence index  𝑗=1…𝑛 :
        #Calculate  𝑓𝑘(𝑖)  matrix for sequence  𝑥  using the Forward algorithm.
        #Calculate  𝑟𝑘(𝑖)  matrix for sequence  𝑥  using the Backward algorithm.
        
        #for convergence; decided to do a maximum number of iterations and chose the arbitrary number 1000 for iterations. 
        for i in range(0, 1000): #don't know the global max, but can find the local max (unsupervised ML) 
            Pxj = 0 #setting Pxj to 0 before beginning summation. 
            
            #sequence in this function is a list of sequences. Need to iterate through each seq in the list. 
            for seq in sequences: 
                #want both outputs from the forward and reverse algorithms. 
                f_Px, f_matrix = self.forward(seq)
                r_Px, r_matrix = self.backward(seq)
                
                #do this for every sequence and continually add. 
                Pxj += (f_Px + r_Px) / 2 
                
                #need to now update our probability matrices. 
                for index, holder in enumerate(seq):
                    #note holder here is A,C,G,or T
                    for state in self._hidden_states: #Island or Genome     
                        if index == 0: #if at beginning 
                            # Update initial state matrix:
                            # 𝐵𝑘=∑𝑗1/𝑃(𝑥𝑗)∗𝑓𝑗𝑘(0)𝑟𝑗𝑘(0). 
                            initial_prob[state] += (f_matrix[index][state] * r_matrix[index][state])
                            # print(initial_prob)
                        if index != 0:
                            #Update emission matrix  𝐸𝑘  by summing over all positions ( 𝑖=1…𝑇 ):
                            #𝐸𝑘(𝜎)=∑𝑗1/𝑃(𝑥𝑗)∑𝑖|𝑥𝑗𝑖=𝜎𝑓𝑗𝑘(𝑖)𝑟𝑗𝑘(𝑖). 
                            emission_prob[state][holder] += (f_matrix[index][state] * r_matrix[index][state])       
                        #when we reach the end of our seq.  
                        if index == len(seq) - 1:               
                            break #break out of for loop. 
                        
                        #Update transition matrix  𝐴𝑘𝑙  by summing over all positions ( 𝑖=1…𝑇−1 ):
                        #𝐴𝑘𝑙=∑𝑗1/𝑃(𝑥𝑗)∑𝑖𝑓𝑗𝑘(𝑖)𝑎𝑘𝑙𝑒𝑙(𝑥𝑗𝑖+1)𝑟𝑗𝑙(𝑖+1). 
                        #where  𝑥𝑗 ,  𝑓𝑗 , and  𝑟𝑗  are sequence, forward matrix, and backward matrix for squence index  𝑗  respectively.
                        for next_state in self._hidden_states: #need next state. 
                            #I split this equation up onto separate lines since it was very long on ONE line. 
                            transition_prob[state][next_state] += (f_matrix[index][state]
                                                                * self._transitions[state][next_state]
                                                                * self._emissions[next_state][seq[index + 1]]
                                                                * r_matrix[index + 1][next_state])
             
            
            #want to continue to sum up the numerator and then divide by the one constant denomintor which is already summed.  
            #for initial prob matrix.
            for state in self._hidden_states: #I or G 
                initial_prob[state] = initial_prob[state] / Pxj
                
            #for emission prob matrix. 
            for state in self._hidden_states:
                for alph in self._alphabet:
                    emission_prob[state][alph] = emission_prob[state][alph] / Pxj 
                    
            #for transition prob matrix. 
            for state in self._hidden_states:
                for next_state in self._hidden_states:
                    transition_prob[state][next_state] = transition_prob[state][next_state] / Pxj

           
            #Maximization Step:
            #Calculate new model parameters as we did with Markov Chains:
            
            #getting all the summations or denominators for the maximization step. 
            #need to be out of the for seq in sequences loop for this to work. 
            sum_init = sum(initial_prob.values())
            #this was very tricky since we want to only sum within the nested dictionary. 
            #only want the values here (integers). take the sum of values for each index. 
            #then divide each value at each index/state by that respective sum. 
            sum_emission = {state: sum(emission_prob[state].values()) for state in self._hidden_states}
            sum_transition = {state: sum(transition_prob[state].values()) for state in self._hidden_states}
        
            #for initial prob matrix.
            #𝑏𝑘=𝐵𝑘/∑𝑘𝐵𝑘
            for state in self._hidden_states:
                initial_prob[state] = initial_prob[state] / sum_init 
            
            #for emission prob matrix. 
            #𝑒𝑘(𝜎)=𝐸𝑘(𝜎)/∑𝜎𝐸𝑘(𝜎)
            for state in self._hidden_states:
                for alph in self._alphabet:
                    emission_prob[state][alph] = emission_prob[state][alph] / sum_emission[state]
            
            #for transition prob matrix. 
            #𝑎𝑘𝑙=𝐴𝑘𝑙/∑𝑙𝐴𝑘𝑙 
            for state in self._hidden_states:
                for next_state in self._hidden_states:
                    transition_prob[state][next_state] = transition_prob[state][next_state] / sum_transition[state]
        
    
        #Termination 
        #Stop at convergence as measured by log likelihood or is maximum number of iterations has been reached.  
        #need .copy() to ensure we return the updated matrix. 
        self._initial = initial_prob.copy()
        self._emissions = emission_prob.copy()
        self._transitions = transition_prob.copy()
        
        #done
    
        

In [12]:
# This section of code will initialize your HMM with parameters as defined in the lecture slides
# for the identification of CpG Islands.
# All of this should be able to run whether or not you implement the Viterbi function!

hidden_states = ('I', 'G') # CpG Island or Genome
alphabet = ('A', 'C', 'G', 'T') # DNA Alphabet

model = HMM(alphabet, hidden_states, seed=70)

sequence = ["ACGCGATCATACTATATTAGCTAAATAGATACGCGCGCGCGCGCGATATATATATATAGCTAATGATCGATTACCCCCCCCCCCAATTA", "GCAGATCGATCGATATATTAGCTAAATAGATACGCGCGCGCGCGCGATATATGCATATAGCTAATGATGACCCCCGCGCA", "ACATCGATCTGATCGAAATAGATACGCGCGCGCGCGCGATATATATATATAGCTAATACTTGATCGATGCAA"]

model.baum_welch(sequence)
print(model)

Alphabet: ['A', 'C', 'G', 'T']
Hidden States: ['G', 'I']
Initial Probabilities: {
    "G": 0.480482653962524,
    "I": 0.519517346037476
}
Transition Probabilities: {
    "G": {
        "G": 0.6514755213356083,
        "I": 0.3485244786643918
    },
    "I": {
        "G": 0.6035452125544299,
        "I": 0.39645478744557
    }
}
Emission Probabilities: {
    "G": {
        "A": 0.33527124184256984,
        "C": 0.2841112364196943,
        "G": 0.1967270011087938,
        "T": 0.18389052062894198
    },
    "I": {
        "A": 0.2957068421180174,
        "C": 0.08628578837158679,
        "G": 0.23937315169824658,
        "T": 0.37863421781214923
    }
}


### After one iteration:
```Alphabet: ['A', 'C', 'G', 'T']
Hidden States: ['G', 'I']
Initial Probabilities: {
    "G": 0.480482653962524,
    "I": 0.519517346037476
}
Transition Probabilities: {
    "G": {
        "G": 0.6514755213356082,
        "I": 0.3485244786643918
    },
    "I": {
        "G": 0.6035452125544299,
        "I": 0.39645478744557
    }
}
Emission Probabilities: {
    "G": {
        "A": 0.3422919266954332,
        "C": 0.2811097197581482,
        "G": 0.1946505595358382,
        "T": 0.1819477940105804
    },
    "I": {
        "A": 0.3179475236592407,
        "C": 0.08356033398812515,
        "G": 0.23181760304768745,
        "T": 0.36667453930494665
    }
}```

### My Work for Baum-Welch Algorithm Below

In [68]:
stats = [{'a':1000, 'b':3000, 'c': 100}, {'a':10, 'b':400, 'c': 1000}, {'a':20, 'b':4, 'c': 10}]

In [82]:
test = [2,3,4,5]
sum(test)

14

In [74]:
for index, value in enumerate(stats):
    # print(index)
    test = max(stats[index].values())
    final = []
    final.append(test)
    print(type(final))

<class 'list'>
<class 'list'>
<class 'list'>


In [35]:
test = {}
state = ["Hi", "Now", "Time"]
count = 20

In [36]:
for i in state:
    test[i] = count

In [37]:
test

{'Hi': 20, 'Now': 20, 'Time': 20}

In [8]:
test

{'Hi': 20}

In [28]:
test2 = {}
main = "Family"
states = ["Lauth", "Smith"]
count = [0.6, 0.7]
count2 = 0.8

In [29]:
test2[main] = {}

In [30]:
test2

{'Family': {}}

In [33]:
for state in states:
    test2[state] = {c for c in count}

In [34]:
test2

{'Family': {}, 'Lauth': {0.6, 0.7}, 'Smith': {0.6, 0.7}}

In [2]:
seq = "AATTCCGG"
list(seq)

['A', 'A', 'T', 'T', 'C', 'C', 'G', 'G']

In [80]:
test = 20 * 40

test

800

In [11]:
test = (1 + 1) / 2

In [12]:
test

1.0