## **** Viterbi training and Baum-Welch ****

Submit:
*   Ilay Anais
*   Hadar Pur


# **** Imports ****

In [None]:
import numpy as np
import math

**Section C:**

Implement the two parameter inference algorithms we saw in class: Viterbi training and Baum-Welch. Write a single program for both algorithms. The program should receive the following arguments:

• an observed sequence X

• an indicator for the inference method – V Viterbi training; B Baum-Welch

• initial values for the free model parameters TIG, TGI, EIA, 𝐸IT, 𝐸IC, 𝐸GA, 𝐸GT, 𝐸GC (in this order).

Your program should run the specified inference algorithm on the input sequence X starting from the initial parameter values. The halting condition should be when the traced score is improved by less than eps=0.00001. The traced score is the log-likelihood for Baum-Welch and Viterbi score for Viterbi training. Use natural log – ln(⁡).

The program should produce a trace of the eight free model parameters and the associated score. See execution examples in the next page.
Additional important implementation notes:

• You may assume that the initial state of the HMM is S1 with probability 100%. The HMM path may end with any state.

• Write your code clearly and provide detailed documentation for the various functions / code blocks. Submit the code as appendix to this assignment.

• In both algorithms make sure to use the formulas you received in (b) above for the parameter update step.

• Make sure to hold log-probabilities in your dynamic programming matrices, and when necessary apply the “log-of-sum-of-exponents” technique we saw in class (lecture #6). Notice that when computing expected sufficient statistics for the Baum-Welch algorithm (transition and emission counts), you need to convert values from log-probabilities to actual probabilities. Make sure that you are exponentiating probabilities that are conditional on the data and not joint probabilities with the data (e.g. 𝑃(𝑆|𝑋,Θ) and not 𝑃(𝑆, 𝑋|Θ)). This is because joint probabilities with the data are very small and could evaluate to 0, and the expected counts you need for the Baum- Welch algorithm require conditional probabilities, which are larger.

• When debugging the procedures for computing the forward / backward matrices, it is useful to ensure that the likelihood can be computed at every column of these matrices (as you did in HW #3).

• Another useful validation for the Baum-Welch algorithm is to make sure that the sum of expected transition and emission counts per position is 1.

After implementing the two algorithms, we wish to execute each algorithm on the
three following DNA sequences extracted from the virus:

X1: AAATTTTATTACGTTTAGTAGAAGAGAAAGGTAAACATGATGGTTCAGTGGTGCTAGATGAACAAACAATT ATAAAATAAAATGAAGTATTTGTATAGAA

X2: CCCCCCAGGGGGGGGGGGGTCCCCCCCCCCCCCCCCCCCCCCAGGGGGGGGGGGGGGGGGGTCCCAGGGGG
GGGGGGGGGGTCCAGGGTCCCCCCCCCCC

X3:
CGCACACGTCCTTGAGGGCAGTTTTTTTGTCGCCCCCACGATTTTTCTCGGCCGCAGTTCCCGTTTTTTTT TGTTTTTTTTGTTGGCCTCTGGTTTTCTACGAGGCCGGGGAGAGGCCGGGGCGGCAGATTTTCTTGTTTTT CAGGATTGCTGGTTTGCTCAGTGTTTTTCTTCTTTGTTTGGCTGTGCCGGAAGAGATG

## **** Global vars ****


In [None]:
T_IG = 0.4
T_II = 1 - T_IG
T_GI = 0.4
T_GG = 1 - T_GI

E_IA = 0.1
E_IT = 0.2
E_IC = 0.3
E_IG = 1 - E_IA - E_IT - E_IC

E_GA = 0.4
E_GT = 0.3
E_GC = 0.2
E_GG = 1 - E_GA - E_GT - E_GC

epsilon = 0.00001
training_mode = {'V': 0, 'B' : 1}

S = 'AAATTTTATTACGTTTAGTAGAAGAGAAAGGTAAACATGATGG'

X1 = 'AAATTTTATTACGTTTAGTAGAAGAGAAAGGTAAACATGATGGTTCAGTGGTGCTAGATGAACAAACAATTATAAAATAAAATGAAGTATTTGTATAGAA'

X2 = 'CCCCCCAGGGGGGGGGGGGTCCCCCCCCCCCCCCCCCCCCCCAGGGGGGGGGGGGGGGGGGTCCCAGGGGGGGGGGGGGGGTCCAGGGTCCCCCCCCCCC'

X3 = 'CGCACACGTCCTTGAGGGCAGTTTTTTTGTCGCCCCCACGATTTTTCTCGGCCGCAGTTCCCGTTTTTTTTTGTTTTTTTTGTTGGCCTCTGGTTTTCTACGAGGCCGGGGAGAGGCCGGGGCGGCAGATTTTCTTGTTTTTCAGGATTGCTGGTTTGCTCAGTGTTTTTCTTCTTTGTTTGGCTGTGCCGGAAGAGATG'

#Bases
Bases = {"A": 0, "C": 1, "G": 2, "T": 3}

#States
States = {"S1":0, "S2": 1, "S3": 2, "S4": 3, "S5": 4, "S6": 5}

#Emmissions matrix
model_emissions_test = np.zeros((len(States),len(Bases)))

#Transitions matrix
model_transitions_test = np.zeros((len(States),len(States)))

## **** HMM Model ****


In [None]:
def log_calc(prob):
  if prob < 1e-200:
      return -100000000000000
  else:
      return np.log(prob)

In [None]:
#Emmissions matrix setup
def set_emission_table():
  global model_emissions_test

  model_emissions_test[States['S1']][Bases['A']] = log_calc(E_IA)
  model_emissions_test[States['S1']][Bases['C']] = log_calc(E_IC)
  model_emissions_test[States['S1']][Bases['G']] = log_calc(E_IG)
  model_emissions_test[States['S1']][Bases['T']] = log_calc(E_IT)

  model_emissions_test[States['S2']][Bases['A']] = log_calc(1)
  model_emissions_test[States['S2']][Bases['C']] = log_calc(0)
  model_emissions_test[States['S2']][Bases['G']] = log_calc(0)
  model_emissions_test[States['S2']][Bases['T']] = log_calc(0)

  model_emissions_test[States['S3']][Bases['A']] = log_calc(E_GA)
  model_emissions_test[States['S3']][Bases['C']] = log_calc(E_GC)
  model_emissions_test[States['S3']][Bases['G']] = log_calc(E_GG)
  model_emissions_test[States['S3']][Bases['T']] = log_calc(E_GT)

  model_emissions_test[States['S4']][Bases['A']] = log_calc(E_GA)
  model_emissions_test[States['S4']][Bases['C']] = log_calc(E_GC)
  model_emissions_test[States['S4']][Bases['G']] = log_calc(E_GG)
  model_emissions_test[States['S4']][Bases['T']] = log_calc(E_GT)

  model_emissions_test[States['S5']][Bases['A']] = log_calc(E_GA)
  model_emissions_test[States['S5']][Bases['C']] = log_calc(E_GC)
  model_emissions_test[States['S5']][Bases['G']] = log_calc(E_GG)
  model_emissions_test[States['S5']][Bases['T']] = log_calc(E_GT)

  model_emissions_test[States['S6']][Bases['A']] = log_calc(0)
  model_emissions_test[States['S6']][Bases['C']] = log_calc(0)
  model_emissions_test[States['S6']][Bases['G']] = log_calc(0)
  model_emissions_test[States['S6']][Bases['T']] = log_calc(1)

#Transitions matrix setup
def set_transition_table():
  global model_transitions_test
  
  model_transitions_test[States['S1']][States['S1']] = log_calc(T_II)
  model_transitions_test[States['S1']][States['S2']] = log_calc(T_IG)
  model_transitions_test[States['S1']][States['S3']] = log_calc(0)
  model_transitions_test[States['S1']][States['S4']] = log_calc(0)
  model_transitions_test[States['S1']][States['S5']] = log_calc(0)
  model_transitions_test[States['S1']][States['S6']] = log_calc(0)

  model_transitions_test[States['S2']][States['S1']] = log_calc(0)
  model_transitions_test[States['S2']][States['S2']] = log_calc(0)
  model_transitions_test[States['S2']][States['S3']] = log_calc(1)
  model_transitions_test[States['S2']][States['S4']] = log_calc(0)
  model_transitions_test[States['S2']][States['S5']] = log_calc(0)
  model_transitions_test[States['S2']][States['S6']] = log_calc(0)

  model_transitions_test[States['S3']][States['S1']] = log_calc(0)
  model_transitions_test[States['S3']][States['S2']] = log_calc(0)
  model_transitions_test[States['S3']][States['S3']] = log_calc(0)
  model_transitions_test[States['S3']][States['S4']] = log_calc(1)
  model_transitions_test[States['S3']][States['S5']] = log_calc(0)
  model_transitions_test[States['S3']][States['S6']] = log_calc(0)

  model_transitions_test[States['S4']][States['S1']] = log_calc(0)
  model_transitions_test[States['S4']][States['S2']] = log_calc(0)
  model_transitions_test[States['S4']][States['S3']] = log_calc(0)
  model_transitions_test[States['S4']][States['S4']] = log_calc(0)
  model_transitions_test[States['S4']][States['S5']] = log_calc(1)
  model_transitions_test[States['S4']][States['S6']] = log_calc(0)

  model_transitions_test[States['S5']][States['S1']] = log_calc(0)
  model_transitions_test[States['S5']][States['S2']] = log_calc(0)
  model_transitions_test[States['S5']][States['S3']] = log_calc(T_GG)
  model_transitions_test[States['S5']][States['S4']] = log_calc(0)
  model_transitions_test[States['S5']][States['S5']] = log_calc(0)
  model_transitions_test[States['S5']][States['S6']] = log_calc(T_GI)

  model_transitions_test[States['S6']][States['S1']] = log_calc(T_II)
  model_transitions_test[States['S6']][States['S2']] = log_calc(T_IG)
  model_transitions_test[States['S6']][States['S3']] = log_calc(0)
  model_transitions_test[States['S6']][States['S4']] = log_calc(0)
  model_transitions_test[States['S6']][States['S5']] = log_calc(0)
  model_transitions_test[States['S6']][States['S6']] = log_calc(0)

## **** Viterbi Algorithm ****


In [None]:
def reconstruct_max_prob_annotation(seq, viterbi_matrix, viterbi_matrix_pointers, emmision_count, transition_count):
  # Finding cell V[n,j] with highest value
  current_state = "S1"
  old_state = current_state

  max_prob_annotation = -np.inf

  for state in States:
    prob = viterbi_matrix[States[state], len(seq)-1]
    if prob > max_prob_annotation:
      max_prob_annotation = prob
      current_state = state
  
  #Traceback from max_prob_annotation
  under = "--"
  emission = "" + seq[-1]

  emmision_count[States[current_state]][Bases[seq[len(seq)-1]]] += 1 
  old_state = current_state
  current_state = viterbi_matrix_pointers[States[current_state], len(seq)-1]
  # transition_count[States[current_state]][States[old_state]] += 1

  for i in range(len(seq) - 2, -1, -1):
    emmision_count[States[current_state]][Bases[seq[i]]] += 1 
    old_state = current_state
    current_state = viterbi_matrix_pointers[States[current_state], i]
    transition_count[States[current_state]][States[old_state]] += 1

    emission = seq[i] + "  " +  emission
    under = "---" + under
  
  return emission, under, max_prob_annotation


def viterbiAlg(seq, emissions_matrix, transitions_matrix, emmision_count, transition_count):
  # initialize viterbi matrix
  viterbi_matrix = np.zeros((len(States),len(seq)))
  viterbi_matrix_pointers = np.ndarray((len(States), len(seq)), dtype = object)

  for j in States:
    viterbi_matrix[States[j]][0] = -np.inf

  viterbi_matrix[States['S1']][0] = emissions_matrix[States['S1'], Bases[seq[0]]] 
  viterbi_matrix_pointers[States['S1']][0] = 'S1'

  for i in range(1, len(seq)):
    for j in States:
      viterbi_matrix[States[j], i] = max([viterbi_matrix[States[l], i-1] + transitions_matrix[States[l], States[j]] for l in States])
      viterbi_matrix[States[j], i] += emissions_matrix[States[j], Bases[seq[i]]]

      for k in States:
        viterb = viterbi_matrix[States[k], i-1]
        transition = transitions_matrix[States[k], States[j]]
        emmision = emissions_matrix[States[j], Bases[seq[i]]]
        
        if viterbi_matrix[States[j], i] == viterb + transition + emmision:
          viterbi_matrix_pointers[States[j]][i] = k

  return reconstruct_max_prob_annotation(seq, viterbi_matrix, viterbi_matrix_pointers, emmision_count, transition_count)

def forwardAlg(seq, emissions_matrix, transitions_matrix):
  # forward matrix
  forward_matrix = np.zeros((len(States),len(seq)))
  
  # f[0,j] = -∞ for ,j>0
  for j in States:
    forward_matrix[States[j]][0] = -np.inf

  forward_matrix[States['S1']][0] = emissions_matrix[States['S1'], Bases[seq[0]]]

  for i in range(1, len(seq)):
    for j in States:
      a_max = max([forward_matrix[States[l], i-1] + transitions_matrix[States[l], States[j]] for l in States])

      if a_max == -np.inf:
        forward_matrix[States[j], i] = -np.inf
      else:
        for k in States:
          a_k = forward_matrix[States[k], i-1] + transitions_matrix[States[k], States[j]]
          b_k = a_k - a_max
          forward_matrix[States[j], i] += np.exp(b_k)

        #log(∑l exp(al)) = log(∑l exp(amax + bl )) = log(exp(amax)∑l exp(bl)) = amax + log(∑l exp(bl))
        forward_matrix[States[j], i] = log_calc(forward_matrix[States[j], i]) + a_max + emissions_matrix[States[j], Bases[seq[i]]] 
  
  return forward_matrix

def backwardAlg(seq, emissions_matrix, transitions_matrix):
  # backward matrix
  backward_matrix = np.zeros((len(States),len(seq)))

  for i in range(len(seq) - 2, -1, -1):
    for j in States:
      a_max = max([backward_matrix[States[l], i+1] + transitions_matrix[States[j], States[l]] + emissions_matrix[States[l], Bases[seq[i+1]]]for l in States])

      if a_max == -np.inf:
        backward_matrix[States[j], i] = -np.inf
      else:
        for k in States:
          a_k = backward_matrix[States[k], i+1] + transitions_matrix[States[j], States[k]] + emissions_matrix[States[k], Bases[seq[i+1]]]
          b_k = a_k - a_max
          backward_matrix[States[j], i] += np.exp(b_k)

        #log(∑l exp(al)) = log(∑l exp(amax + bl )) = log(exp(amax)∑l exp(bl)) = amax + log(∑l exp(bl))
        backward_matrix[States[j], i] = log_calc(backward_matrix[States[j], i]) + a_max
  
  return backward_matrix

def likelihoodAlgForward(seq, forward_matrix):
  # print("\nlikelihoodAlgForward")
  max_likelihoond = 0
  a_max = max([forward_matrix[States[j], len(seq) - 1] for j in States])

  for k in States:
    a_k = forward_matrix[States[k], len(seq) - 1]
    b_k = a_k - a_max
    max_likelihoond += np.exp(b_k)

  max_likelihoond = log_calc(max_likelihoond) + a_max
  return max_likelihoond

def likelihoodAlgBackward(seq, backward_matrix, transitions_matrix, emissions_matrix):
  # print("\nlikelihoodAlgBackward")
  max_likelihoond = 0
    
  for j in range(1, len(States)):
    backward_matrix[j][0] = -np.inf

  a_max = max([backward_matrix[States[j], 0] + emissions_matrix[States[j], Bases[seq[0]]] for j in States])
  for k in States:
    a_k = backward_matrix[States[k], 0] + emissions_matrix[States[k], Bases[seq[0]]]
    b_k = a_k - a_max
    max_likelihoond += np.exp(b_k)

  max_likelihoond = log_calc(max_likelihoond) + a_max

def computeMaximumAPosterioriProbability(seq, forward_matrix, backward_matrix):
  # Objective: for a given sequence of observed of symbols X = X1...Xn and index i=1..n, compute the probability P(X , Si=sj |HMM) = ∑S|Si=sj P(X,S|HMM) for every sj
  # We want to sum over the probabilities of all paths in the decoding matrix that pass through cell [i,j]
  # print("\nMaximum A-Posteriori Probability:")

  emission = ""
  under = "--"

  for i in range(len(seq) - 1, -1, -1):
    max_state = 'S1'
    max_prob = -np.inf

    # find max
    for j in States:
      prob = forward_matrix[States[j]][i] + backward_matrix[States[j]][i]
      if prob > max_prob:
        max_prob = prob
        max_state = j

    emission = seq[i] + "  " + emission
    under = "---" + under

  likelihood = likelihoodAlgForward(seq, forward_matrix)

  return emission, under, likelihood

## **** Viterbi training algorithm ****


In [None]:
def setInitialGuess(T_IG_INPUT = 0.4, T_GI_INPUT = 0.4, E_IA_INPUT = 0.1, E_IT_INPUT = 0.2, E_IC_INPUT = 0.3, E_GA_INPUT = 0.4, E_GT_INPUT = 0.3, E_GC_INPUT = 0.2):
  global T_IG, T_II, T_GI, T_GG, E_IA, E_IT, E_IC, E_IG, E_GA, E_GT, E_GC, E_GG 
  T_IG = T_IG_INPUT 
  T_II = 1 - T_IG

  # T_IG = max(0, T_IG)
  # T_II = max(0, T_II)

  T_GI = T_GI_INPUT
  T_GG = 1 - T_GI
  
  # T_GI = max(0, T_GI)
  # T_GG = max(0, T_GG)

  E_IA = E_IA_INPUT
  E_IT = E_IT_INPUT
  E_IC = E_IC_INPUT
  E_IG = 1 - E_IA - E_IT - E_IC

  # E_IA = max(0, E_IA)
  # E_IT = max(0, E_IT)
  # E_IC = max(0, E_IC)
  # E_IG = max(0, E_IG)

  E_GA = E_GA_INPUT
  E_GT = E_GT_INPUT
  E_GC = E_GC_INPUT
  E_GG = 1 - E_GA - E_GT - E_GC
  
  # E_GA = max(0, E_GA)
  # E_GT = max(0, E_GT)
  # E_GC = max(0, E_GC)
  # E_GG = max(0, E_GG)

  set_emission_table()
  set_transition_table()


In [None]:
def updateParameters(emmision_count, transition_count):
  global T_IG, T_II, T_GI, T_GG, E_IA, E_IT, E_IC, E_IG, E_GA, E_GT, E_GC, E_GG 
  
  # transition_count
  S5Transition = transition_count[States['S5'], States['S3']] + transition_count[States['S5'], States['S6']]
  if S5Transition != 0:
    T_GI = transition_count[States['S5'], States['S6']] / S5Transition
    T_GG = 1 - T_GI
    # T_GI = max(0, T_GI)
    # T_GG = max(0, T_GG)

  else:
    T_GI = 0
    T_GG = 1 - T_GI

  S1Transition = transition_count[States['S1'], States['S1']] + transition_count[States['S1'], States['S2']]
  S6Transition = transition_count[States['S6'], States['S1']] + transition_count[States['S6'], States['S2']]
  if S1Transition + S6Transition != 0:
    T_IG = (transition_count[States['S1'], States['S2']] + transition_count[States['S6'], States['S2']]) / (S1Transition + S6Transition)
    T_II = 1 - T_IG

    # T_IG = max(0, T_IG)
    # T_II = max(0, T_II)
  else:
    T_IG = 0
    T_II = 1 - T_IG

  # emmision_count
  S1Emission = emmision_count[States['S1'], Bases['A']] + emmision_count[States['S1'], Bases['T']] + emmision_count[States['S1'], Bases['C']] + emmision_count[States['S1'], Bases['G']]
  if S1Emission != 0:
    E_IA = emmision_count[States['S1'], Bases['A']] / S1Emission
    E_IT = emmision_count[States['S1'], Bases['T']] / S1Emission
    E_IC = emmision_count[States['S1'], Bases['C']] / S1Emission
    E_IG = 1 - E_IA - E_IT - E_IC 

    # E_IA = max(0, E_IA)
    # E_IT = max(0, E_IT)
    # E_IC = max(0, E_IC)
    # E_IG = max(0, E_IG)
  else:
    E_IA = 0
    E_IT = 0
    E_IC = 0
    E_IG = 1 - E_IA - E_IT - E_IC

  S3Emission = emmision_count[States['S3'], Bases['A']] + emmision_count[States['S3'], Bases['T']] + emmision_count[States['S3'], Bases['C']] + emmision_count[States['S3'], Bases['G']]
  S4Emission = emmision_count[States['S4'], Bases['A']] + emmision_count[States['S4'], Bases['T']] + emmision_count[States['S4'], Bases['C']] + emmision_count[States['S4'], Bases['G']]
  S5Emission = emmision_count[States['S5'], Bases['A']] + emmision_count[States['S5'], Bases['T']] + emmision_count[States['S5'], Bases['C']] + emmision_count[States['S5'], Bases['G']]
  if S3Emission + S4Emission + S5Emission != 0:
    E_GA = (emmision_count[States['S3'], Bases['A']] + emmision_count[States['S4'], Bases['A']]  + emmision_count[States['S5'], Bases['A']]) / (S3Emission + S4Emission + S5Emission)
    E_GT = (emmision_count[States['S3'], Bases['T']] + emmision_count[States['S4'], Bases['T']]  + emmision_count[States['S5'], Bases['T']]) / (S3Emission + S4Emission + S5Emission)
    E_GC = (emmision_count[States['S3'], Bases['C']] + emmision_count[States['S4'], Bases['C']]  + emmision_count[States['S5'], Bases['C']]) / (S3Emission + S4Emission + S5Emission)
    E_GG = 1 - E_GA - E_GT - E_GC

    # E_GA = max(0, E_GA)
    # E_GT = max(0, E_GT)
    # E_GC = max(0, E_GC)
    # E_GG = max(0, E_GG)
  else:
    E_GA = 0
    E_GT = 0
    E_GC = 0
    E_GG = 1 - E_GA - E_GT - E_GC

  # update
  set_emission_table()
  set_transition_table()


In [None]:
def viterbi_Training(seq, T_IG_INPUT, T_GI_INPUT, E_IA_INPUT, E_IT_INPUT, E_IC_INPUT, E_GA_INPUT, E_GT_INPUT, E_GC_INPUT, file):
  global model_emissions_test, model_transitions_test

  emmision_count = np.zeros((len(States),len(Bases)))
  transition_count = np.zeros((len(States),len(States)))

  # initiate vars and tables
  setInitialGuess(T_IG_INPUT, T_GI_INPUT, E_IA_INPUT, E_IT_INPUT, E_IC_INPUT, E_GA_INPUT, E_GT_INPUT, E_GC_INPUT)

  #get initial viterbi score
  emission, under, viterbi_score = viterbiAlg(seq, model_emissions_test, model_transitions_test, emmision_count, transition_count)

  print("\n\n"+emission)  
  print(under)
  print("|\tT_IG\tT_GI\tE_IA\tE_IT\tE_IC\tE_GA\tE_GT\tE_GC\t\t\tscore (Viterbi score)\t|")
  print(f"|\t{round(T_IG, 2):.2f}\t{round(T_GI, 2):.2f}\t{round(E_IA, 2):.2f}\t{round(E_IT, 2):.2f}\t{round(E_IC, 2):.2f}\t{round(E_GA, 2):.2f}\t{round(E_GT, 2):.2f}\t{round(E_GC, 2):.2f}\t\t\t{round(viterbi_score, 4):.4f}\t\t|")

  if file != None:
    file.write("\n\n" + emission)
    file.write("\n" + under)
    file.write("\n|\tT_IG\tT_GI\tE_IA\tE_IT\tE_IC\tE_GA\tE_GT\tE_GC\t\t\tscore (Viterbi score)\t|")
    file.write(f"\n|\t{round(T_IG, 2):.2f}\t{round(T_GI, 2):.2f}\t{round(E_IA, 2):.2f}\t{round(E_IT, 2):.2f}\t{round(E_IC, 2):.2f}\t{round(E_GA, 2):.2f}\t{round(E_GT, 2):.2f}\t{round(E_GC, 2):.2f}\t\t\t{round(viterbi_score, 4):.4f}\t\t|")
    
  old_viterbi_score = 0
  viterbi_score_less_then = True
  
  while viterbi_score_less_then:
    #update all parameters
    updateParameters(emmision_count, transition_count)

    #reset counts
    emmision_count = np.zeros((len(States),len(Bases)))
    transition_count = np.zeros((len(States),len(States)))

    emission, under, viterbi_score = viterbiAlg(seq, model_emissions_test, model_transitions_test, emmision_count, transition_count)
    print(f"|\t{round(T_IG, 2):.2f}\t{round(T_GI, 2):.2f}\t{round(E_IA, 2):.2f}\t{round(E_IT, 2):.2f}\t{round(E_IC, 2):.2f}\t{round(E_GA, 2):.2f}\t{round(E_GT, 2):.2f}\t{round(E_GC, 2):.2f}\t\t\t{round(viterbi_score, 4):.4f}\t\t|")

    if file != None:
      file.write(f"\n|\t{round(T_IG, 2):.2f}\t{round(T_GI, 2):.2f}\t{round(E_IA, 2):.2f}\t{round(E_IT, 2):.2f}\t{round(E_IC, 2):.2f}\t{round(E_GA, 2):.2f}\t{round(E_GT, 2):.2f}\t{round(E_GC, 2):.2f}\t\t\t{round(viterbi_score, 4):.4f}\t\t|")

    if abs(viterbi_score - old_viterbi_score) < epsilon:
      viterbi_score_less_then = False

    old_viterbi_score = viterbi_score

  return viterbi_score

In [None]:
def calculate_expected_transitions(seq, forward_matrix, backward_matrix, transition_count):
  global model_emissions_test, model_transitions_test

  for i in range(0, len(seq) - 1):
    for j in States:      
      for k in States:
        forward = np.exp(forward_matrix[States[j]][i])
        backward = np.exp(backward_matrix[States[k]][i+1])
        transition = np.exp(model_transitions_test[States[j]][States[k]])
        emission = np.exp(model_emissions_test[States[k]][Bases[seq[i+1]]])

        transition_count[States[j]][States[k]] +=  forward * backward * transition * emission

def calculate_expected_emmissions(seq, forward_matrix, backward_matrix, emmision_count):
  for i in range(0, len(seq)):
    for j in States:      
      forward = np.exp(forward_matrix[States[j]][i])
      backward = np.exp(backward_matrix[States[j]][i])

      emmision_count[States[j]][Bases[seq[i]]] += forward * backward

def baum_welch_Training(seq, T_IG_INPUT, T_GI_INPUT, E_IA_INPUT, E_IT_INPUT, E_IC_INPUT, E_GA_INPUT, E_GT_INPUT, E_GC_INPUT, file):
  global model_emissions_test, model_transitions_test
  emmision_count = np.zeros((len(States),len(Bases)))
  transition_count = np.zeros((len(States),len(States)))

  setInitialGuess(T_IG_INPUT, T_GI_INPUT, E_IA_INPUT, E_IT_INPUT, E_IC_INPUT, E_GA_INPUT, E_GT_INPUT, E_GC_INPUT)

  forwardMat = forwardAlg(seq, model_emissions_test, model_transitions_test) 
  backwardMat = backwardAlg(seq, model_emissions_test, model_transitions_test) 
  
  emission, under, baum_welch_score = computeMaximumAPosterioriProbability(seq, forwardMat, backwardMat)

  calculate_expected_emmissions(seq, forwardMat, backwardMat, emmision_count)
  calculate_expected_transitions(seq, forwardMat, backwardMat, transition_count)

  print("\n\n"+emission)  
  print(under)
  print("|\tT_IG\tT_GI\tE_IA\tE_IT\tE_IC\tE_GA\tE_GT\tE_GC\t\t\tscore (log likelihood)\t|")
  print(f"|\t{round(T_IG, 2):.2f}\t{round(T_GI, 2):.2f}\t{round(E_IA, 2):.2f}\t{round(E_IT, 2):.2f}\t{round(E_IC, 2):.2f}\t{round(E_GA, 2):.2f}\t{round(E_GT, 2):.2f}\t{round(E_GC, 2):.2f}\t\t\t{round(baum_welch_score, 4):.4f}\t\t|")

  if file != None:
    file.write("\n\n" + emission)
    file.write("\n" + under)
    file.write("\n|\tT_IG\tT_GI\tE_IA\tE_IT\tE_IC\tE_GA\tE_GT\tE_GC\t\t\tscore (log likelihood)\t|")
    file.write(f"\n|\t{round(T_IG, 2):.2f}\t{round(T_GI, 2):.2f}\t{round(E_IA, 2):.2f}\t{round(E_IT, 2):.2f}\t{round(E_IC, 2):.2f}\t{round(E_GA, 2):.2f}\t{round(E_GT, 2):.2f}\t{round(E_GC, 2):.2f}\t\t\t{round(baum_welch_score, 4):.4f}\t\t|")

  old_baum_welch_score = 0
  baum_welch_score_less_then = True

  while baum_welch_score_less_then:
    updateParameters(emmision_count, transition_count)
    emmision_count = np.zeros((len(States),len(Bases)))
    transition_count = np.zeros((len(States),len(States)))

    forwardMat = forwardAlg(seq, model_emissions_test, model_transitions_test) 
    backwardMat = backwardAlg(seq, model_emissions_test, model_transitions_test) 

    baum_welch_score = likelihoodAlgForward(seq, forwardMat)
    calculate_expected_emmissions(seq, forwardMat, backwardMat, emmision_count)
    calculate_expected_transitions(seq, forwardMat, backwardMat, transition_count)
    print(f"|\t{round(T_IG, 2):.2f}\t{round(T_GI, 2):.2f}\t{round(E_IA, 2):.2f}\t{round(E_IT, 2):.2f}\t{round(E_IC, 2):.2f}\t{round(E_GA, 2):.2f}\t{round(E_GT, 2):.2f}\t{round(E_GC, 2):.2f}\t\t\t{round(baum_welch_score, 4):.4f}\t\t|")

    if file != None:
      file.write(f"\n|\t{round(T_IG, 2):.2f}\t{round(T_GI, 2):.2f}\t{round(E_IA, 2):.2f}\t{round(E_IT, 2):.2f}\t{round(E_IC, 2):.2f}\t{round(E_GA, 2):.2f}\t{round(E_GT, 2):.2f}\t{round(E_GC, 2):.2f}\t\t\t{round(baum_welch_score, 4):.4f}\t\t|")

    if abs(baum_welch_score - old_baum_welch_score) < epsilon:
      baum_welch_score_less_then = False

    old_baum_welch_score = baum_welch_score

  return baum_welch_score

In [None]:
def gene_hmm_train(seq, mode, T_IG_INPUT = 0.4, T_GI_INPUT = 0.4, E_IA_INPUT = 0.1, E_IT_INPUT = 0.2, E_IC_INPUT = 0.3, E_GA_INPUT = 0.4, E_GT_INPUT = 0.3, E_GC_INPUT = 0.2, file = None):
  score = -np.inf

  if training_mode[mode] == 0:
    score = viterbi_Training(seq, T_IG_INPUT, T_GI_INPUT, E_IA_INPUT, E_IT_INPUT, E_IC_INPUT, E_GA_INPUT, E_GT_INPUT, E_GC_INPUT, file)
  elif training_mode[mode] == 1:
    score = baum_welch_Training(seq, T_IG_INPUT, T_GI_INPUT, E_IA_INPUT, E_IT_INPUT, E_IC_INPUT, E_GA_INPUT, E_GT_INPUT, E_GC_INPUT, file)

  return score

In [None]:
# viterbi
print("Viterbi:")
score = gene_hmm_train(S, 'V')
print("\n\n\nDone.")


Viterbi:


A  A  A  T  T  T  T  A  T  T  A  C  G  T  T  T  A  G  T  A  G  A  A  G  A  G  A  A  A  G  G  T  A  A  A  C  A  T  G  A  T  G  G
--------------------------------------------------------------------------------------------------------------------------------
|	T_IG	T_GI	E_IA	E_IT	E_IC	E_GA	E_GT	E_GC			score (Viterbi score)	|
|	0.40	0.40	0.10	0.20	0.30	0.40	0.30	0.20			-64.9658		|
|	0.29	0.18	0.50	0.00	0.00	0.39	0.33	0.06			-53.1671		|
|	0.15	0.22	0.58	0.00	0.00	0.33	0.41	0.07			-51.9011		|
|	0.15	0.22	0.58	0.00	0.00	0.33	0.41	0.07			-51.9011		|



Done.


In [None]:
# baum welch
print("Baum_Welch:")
score = gene_hmm_train(S, 'B')
print("\n\n\nDone.")

Baum_Welch:


A  A  A  T  T  T  T  A  T  T  A  C  G  T  T  T  A  G  T  A  G  A  A  G  A  G  A  A  A  G  G  T  A  A  A  C  A  T  G  A  T  G  G  
-----------------------------------------------------------------------------------------------------------------------------------
|	T_IG	T_GI	E_IA	E_IT	E_IC	E_GA	E_GT	E_GC			score (log likelihood)	|
|	0.40	0.40	0.10	0.20	0.30	0.40	0.30	0.20			-61.9375		|
|	0.30	0.28	0.42	0.24	0.00	0.39	0.29	0.07			-51.7691		|
|	0.24	0.30	0.52	0.17	0.00	0.35	0.33	0.08			-50.7574		|
|	0.19	0.32	0.56	0.11	0.00	0.31	0.38	0.08			-49.9532		|
|	0.16	0.30	0.58	0.09	0.00	0.28	0.42	0.09			-49.6298		|
|	0.14	0.28	0.58	0.08	0.00	0.27	0.44	0.09			-49.5454		|
|	0.13	0.27	0.58	0.08	0.00	0.27	0.45	0.09			-49.5224		|
|	0.13	0.26	0.58	0.08	0.00	0.27	0.45	0.09			-49.5152		|
|	0.13	0.26	0.58	0.08	0.00	0.27	0.45	0.09			-49.5119		|
|	0.12	0.26	0.58	0.08	0.00	0.26	0.46	0.09			-49.5091		|
|	0.12	0.26	0.57	0.08	0.00	0.26	0.46	0.09			-49.5058		|
|	0.12	0.26	0.57	0.08	0.00	0.26	0.46	0.

**Section d:**

Use your program to find for each of the three DNA sequences specified above the values of the parameters that maximize the Viterbi score (use Viterbi training). 

Run your program from multiple starting points for each input instance to increase your confidence. 

In your solution, write the best parameter set you found for each sequence, and describe your strategy – which starting points you chose, and how you chose them. 

Your strategy may use random sampling. 

Your solution should include a brief report of your experiments (~ one page) and a separate file named traces-viterbi.txt containing a well-marked list of the traces for the runs you made to support your final conclusion.

In [None]:
def viterbi_multiple_strategies(seq, num_of_strategies, file):
  print("Searching for the highest Viterbi score")
  max_viterbi_score = -np.inf

  for i in range(num_of_strategies):
    T_IG_RAND = np.random.uniform(0, 1)
    T_GI_RAND = np.random.uniform(0, 1)
    E_IA_RAND = np.random.uniform(0, 1)
    E_IT_RAND = np.random.uniform(0, 1 - E_IA_RAND)
    E_IC_RAND = np.random.uniform(0, 1 - E_IA_RAND - E_IT_RAND)
    E_GA_RAND = np.random.uniform(0, 1)
    E_GT_RAND = np.random.uniform(0, 1 - E_GA_RAND) 
    E_GC_RAND = np.random.uniform(0, 1 - E_GA_RAND - E_GT_RAND)
    score = gene_hmm_train(seq, 'V', T_IG_RAND, T_GI_RAND, E_IA_RAND, E_IT_RAND, E_IC_RAND, E_GA_RAND, E_GT_RAND, E_GC_RAND, file)
    if score > max_viterbi_score:
       max_viterbi_score = score
       best_strategy_start = f"|\t{round(T_IG_RAND, 2):.2f}\t{round(T_GI_RAND, 2):.2f}\t{round(E_IA_RAND, 2):.2f}\t{round(E_IT_RAND, 2):.2f}\t{round(E_IC_RAND, 2):.2f}\t{round(E_GA_RAND, 2):.2f}\t{round(E_GT_RAND, 2):.2f}\t{round(E_GC_RAND, 2):.2f}\t|"
       best_strategy_finish = f"|\t{round(T_IG, 2):.2f}\t{round(T_GI, 2):.2f}\t{round(E_IA, 2):.2f}\t{round(E_IT, 2):.2f}\t{round(E_IC, 2):.2f}\t{round(E_GA, 2):.2f}\t{round(E_GT, 2):.2f}\t{round(E_GC, 2):.2f}\t|"

  print("\n\nBest strategy:")
  # print("  ".join(seq))
  print("seq = ", seq)
  print("\nstart probs:")
  print("|\tT_IG\tT_GI\tE_IA\tE_IT\tE_IC\tE_GA\tE_GT\tE_GC\t|")
  print(best_strategy_start)
  print("\nfinish probs:")
  print("|\tT_IG\tT_GI\tE_IA\tE_IT\tE_IC\tE_GA\tE_GT\tE_GC\t|")
  print(best_strategy_finish)
  print("\nscore (Viterbi score) = ", max_viterbi_score)

  if (file != None):
    file.write("\n\nBest strategy:")
    file.write("\nseq = " + seq)
    file.write("\n\nstar probs:")
    file.write("\n|\tT_IG\tT_GI\tE_IA\tE_IT\tE_IC\tE_GA\tE_GT\tE_GC\t|")
    file.write("\n" + best_strategy_start)  
    file.write("\n\nfinish probs:")
    file.write("\n|\tT_IG\tT_GI\tE_IA\tE_IT\tE_IC\tE_GA\tE_GT\tE_GC\t|")
    file.write("\n" + best_strategy_finish)    
    file.write(f"\n\nscore (Viterbi score) = {max_viterbi_score}")


In [None]:
f = open("traces-viterbi.txt", "w")
f.write("Seq X1\n")
print("Seq X1\n")
viterbi_multiple_strategies(X1, 10, f)
f.write("\n\nDone.")
print("\n\nDone.")

Seq X1

Searching for the highest Viterbi score


A  A  A  T  T  T  T  A  T  T  A  C  G  T  T  T  A  G  T  A  G  A  A  G  A  G  A  A  A  G  G  T  A  A  A  C  A  T  G  A  T  G  G  T  T  C  A  G  T  G  G  T  G  C  T  A  G  A  T  G  A  A  C  A  A  A  C  A  A  T  T  A  T  A  A  A  A  T  A  A  A  A  T  G  A  A  G  T  A  T  T  T  G  T  A  T  A  G  A  A
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|	T_IG	T_GI	E_IA	E_IT	E_IC	E_GA	E_GT	E_GC			score (Viterbi score)	|
|	0.83	0.01	0.27	0.16	0.17	0.84	0.09	0.07			-214.5465		|
|	0.14	0.14	0.21	0.29	0.08	0.51	0.29	0.06			-128.3286		|
|	0.13	0.04	0.23	0.23	0.15	0.46	0.31	0.05			-124.8177		|
|	0.50	0.00	1.00	0.00	0.00	0.43	0.31	0.06			-120.1826		|
|	0.33	0.00	1.00	0.00	0.00	0.42	0.31	0.06			-120.2935		

In [None]:
print("Seq X2\n")
f.write("\n\n\nSeq X2\n")
viterbi_multiple_strategies(X2, 10, f)
f.write("\n\nDone.")
print("\n\nDone.")

Seq X2

Searching for the highest Viterbi score


C  C  C  C  C  C  A  G  G  G  G  G  G  G  G  G  G  G  G  T  C  C  C  C  C  C  C  C  C  C  C  C  C  C  C  C  C  C  C  C  C  C  A  G  G  G  G  G  G  G  G  G  G  G  G  G  G  G  G  G  G  T  C  C  C  A  G  G  G  G  G  G  G  G  G  G  G  G  G  G  G  T  C  C  A  G  G  G  T  C  C  C  C  C  C  C  C  C  C  C
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|	T_IG	T_GI	E_IA	E_IT	E_IC	E_GA	E_GT	E_GC			score (Viterbi score)	|
|	0.13	0.61	0.87	0.03	0.08	0.07	0.53	0.08			-194.3835		|
|	0.09	0.25	0.00	0.00	1.00	0.00	0.00	0.00			-22.6775		|
|	0.09	0.25	0.00	0.00	1.00	0.00	0.00	0.00			-22.6775		|


C  C  C  C  C  C  A  G  G  G  G  G  G  G  G  G  G  G  G  T  C  C  C  C  C  C  C  C  C  C  C  C  C  C  C  C  C  C

In [None]:
print("Seq X3\n")
f.write("\n\n\nSeq X3\n")
viterbi_multiple_strategies(X3, 10, f)
f.write("\n\nDone.")
print("\n\nDone.")
f.close()

Seq X3

Searching for the highest Viterbi score


C  G  C  A  C  A  C  G  T  C  C  T  T  G  A  G  G  G  C  A  G  T  T  T  T  T  T  T  G  T  C  G  C  C  C  C  C  A  C  G  A  T  T  T  T  T  C  T  C  G  G  C  C  G  C  A  G  T  T  C  C  C  G  T  T  T  T  T  T  T  T  T  G  T  T  T  T  T  T  T  T  G  T  T  G  G  C  C  T  C  T  G  G  T  T  T  T  C  T  A  C  G  A  G  G  C  C  G  G  G  G  A  G  A  G  G  C  C  G  G  G  G  C  G  G  C  A  G  A  T  T  T  T  C  T  T  G  T  T  T  T  T  C  A  G  G  A  T  T  G  C  T  G  G  T  T  T  G  C  T  C  A  G  T  G  T  T  T  T  T  C  T  T  C  T  T  T  G  T  T  T  G  G  C  T  G  T  G  C  C  G  G  A  A  G  A  G  A  T  G
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

**Section e:**

Use your program to find for each of the three DNA sequences specified above the values of the parameters that maximize the log likelihood (use Baum- Welch). 

Run your program from multiple starting points for each input instance to increase your confidence. 

In your solution, write the best parameter set you found for each sequence, and describe your strategy – which starting points you chose, and how you chose them. 

Your strategy may use random sampling. 

Your solution should include a brief report of your experiments (~ one page) and a separate file named traces-baum-welch.txt containing a well- marked list of the traces for the runs you made to support your final conclusion.

In [None]:
def baum_welch_multiple_strategies(seq, num_of_strategies, file):
  print("Searching for the max likelihood")
  max_likelihood = -np.inf

  for i in range(num_of_strategies):
    T_IG_RAND = np.random.uniform(0, 1)
    T_GI_RAND = np.random.uniform(0, 1)
    E_IA_RAND = np.random.uniform(0, 1)
    E_IT_RAND = np.random.uniform(0, 1 - E_IA_RAND)
    E_IC_RAND = np.random.uniform(0, 1 - E_IA_RAND - E_IT_RAND)
    E_GA_RAND = np.random.uniform(0, 1)
    E_GT_RAND = np.random.uniform(0, 1 - E_GA_RAND) 
    E_GC_RAND = np.random.uniform(0, 1 - E_GA_RAND - E_GT_RAND)
    score = gene_hmm_train(seq, 'B', T_IG_RAND, T_GI_RAND, E_IA_RAND, E_IT_RAND, E_IC_RAND, E_GA_RAND, E_GT_RAND, E_GC_RAND, file)
    if score > max_likelihood:
       max_likelihood = score
       best_strategy_start = f"|\t{round(T_IG_RAND, 2):.2f}\t{round(T_GI_RAND, 2):.2f}\t{round(E_IA_RAND, 2):.2f}\t{round(E_IT_RAND, 2):.2f}\t{round(E_IC_RAND, 2):.2f}\t{round(E_GA_RAND, 2):.2f}\t{round(E_GT_RAND, 2):.2f}\t{round(E_GC_RAND, 2):.2f}\t|"
       best_strategy_finish = f"|\t{round(T_IG, 2):.2f}\t{round(T_GI, 2):.2f}\t{round(E_IA, 2):.2f}\t{round(E_IT, 2):.2f}\t{round(E_IC, 2):.2f}\t{round(E_GA, 2):.2f}\t{round(E_GT, 2):.2f}\t{round(E_GC, 2):.2f}\t|"

  print("\n\nBest strategy:")
  # print("  ".join(seq))
  print("seq = ", seq)
  print("\nstart probs:")
  print("|\tT_IG\tT_GI\tE_IA\tE_IT\tE_IC\tE_GA\tE_GT\tE_GC\t|")
  print(best_strategy_start)
  print("\nfinish probs:")
  print("|\tT_IG\tT_GI\tE_IA\tE_IT\tE_IC\tE_GA\tE_GT\tE_GC\t|")
  print(best_strategy_finish)
  print("\nscore (log likelihood)  = ", max_likelihood)

  if (file != None):
    file.write("\n\nBest strategy:")
    file.write("\nseq = " + seq)
    file.write("\n\nstar probs:")
    file.write("\n|\tT_IG\tT_GI\tE_IA\tE_IT\tE_IC\tE_GA\tE_GT\tE_GC\t|")
    file.write("\n" + best_strategy_start)  
    file.write("\n\nfinish probs:")
    file.write("\n|\tT_IG\tT_GI\tE_IA\tE_IT\tE_IC\tE_GA\tE_GT\tE_GC\t|")
    file.write("\n" + best_strategy_finish)    
    file.write(f"\n\nscore (log likelihood)  = {max_likelihood}")


In [None]:
f = open("traces-baum-welch.txt", "w")
f.write("Seq X1\n")
print("Seq X1\n")
baum_welch_multiple_strategies(X1, 10, f)
f.write("\n\nDone.")
print("\n\nDone.")

Seq X1

Searching for the max likelihood


A  A  A  T  T  T  T  A  T  T  A  C  G  T  T  T  A  G  T  A  G  A  A  G  A  G  A  A  A  G  G  T  A  A  A  C  A  T  G  A  T  G  G  T  T  C  A  G  T  G  G  T  G  C  T  A  G  A  T  G  A  A  C  A  A  A  C  A  A  T  T  A  T  A  A  A  A  T  A  A  A  A  T  G  A  A  G  T  A  T  T  T  G  T  A  T  A  G  A  A  
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|	T_IG	T_GI	E_IA	E_IT	E_IC	E_GA	E_GT	E_GC			score (log likelihood)	|
|	0.79	0.19	0.90	0.01	0.06	0.18	0.59	0.05			-140.8959		|
|	0.57	0.11	0.89	0.00	0.06	0.41	0.30	0.07			-120.5812		|
|	0.57	0.07	0.92	0.00	0.04	0.41	0.31	0.06			-119.9817		|
|	0.58	0.05	0.96	0.00	0.02	0.42	0.31	0.06			-119.7259		|
|	0.59	0.04	0.98	0.00	0.01	0.42	0.31	0.06			-119.6039		|

In [None]:
f.write("\nSeq X2\n")
print("Seq X2\n")
baum_welch_multiple_strategies(X2, 10, f)
f.write("\n\nDone.")
print("\n\nDone.")

Seq X2

Searching for the max likelihood


C  C  C  C  C  C  A  G  G  G  G  G  G  G  G  G  G  G  G  T  C  C  C  C  C  C  C  C  C  C  C  C  C  C  C  C  C  C  C  C  C  C  A  G  G  G  G  G  G  G  G  G  G  G  G  G  G  G  G  G  G  T  C  C  C  A  G  G  G  G  G  G  G  G  G  G  G  G  G  G  G  T  C  C  A  G  G  G  T  C  C  C  C  C  C  C  C  C  C  C  
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|	T_IG	T_GI	E_IA	E_IT	E_IC	E_GA	E_GT	E_GC			score (log likelihood)	|
|	0.90	0.31	0.06	0.35	0.47	0.32	0.09	0.53			-199.8402		|
|	0.17	0.00	0.00	0.00	1.00	0.03	0.04	0.41			-73.8828		|
|	0.06	0.13	0.00	0.00	1.00	0.03	0.03	0.07			-30.3036		|
|	0.09	0.25	0.00	0.00	1.00	0.00	0.00	0.00			-22.6775		|
|	0.09	0.25	0.00	0.00	1.00	0.00	0.00	0.00			-22.6775		|


C

In [None]:
f.write("\nSeq X3\n")
print("Seq X3\n")
baum_welch_multiple_strategies(X3, 10, f)
f.write("\n\nDone.")
print("\n\nDone.")
f.close()

Seq X3

Searching for the max likelihood


C  G  C  A  C  A  C  G  T  C  C  T  T  G  A  G  G  G  C  A  G  T  T  T  T  T  T  T  G  T  C  G  C  C  C  C  C  A  C  G  A  T  T  T  T  T  C  T  C  G  G  C  C  G  C  A  G  T  T  C  C  C  G  T  T  T  T  T  T  T  T  T  G  T  T  T  T  T  T  T  T  G  T  T  G  G  C  C  T  C  T  G  G  T  T  T  T  C  T  A  C  G  A  G  G  C  C  G  G  G  G  A  G  A  G  G  C  C  G  G  G  G  C  G  G  C  A  G  A  T  T  T  T  C  T  T  G  T  T  T  T  T  C  A  G  G  A  T  T  G  C  T  G  G  T  T  T  G  C  T  C  A  G  T  G  T  T  T  T  T  C  T  T  C  T  T  T  G  T  T  T  G  G  C  T  G  T  G  C  C  G  G  A  A  G  A  G  A  T  G  
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------