<a href="https://colab.research.google.com/github/diegosanchezsanabria/DiegoSanSan/blob/master/DSS_Needleman_wunsch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<table border="0" align="left" width="1850" height="144">
<tbody>
<tr>
<td width="120"><img width="100" src="https://www.pngrepo.com/png/143936/512/dna.png" /></td>
<td style="width: 600px; height: 67px;">
<h1 style="text-align: left;">Needleman Wunsch Algorithm Demo</h1>
<p><a href="https://colab.research.google.com/github/diegosanchezsanabria/DiegoSanSan/blob/master/DSS_Needleman_wunsch.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" align="left" width="188" height="32" /> </a></p>
</td>
</tr>
</tbody>
</table>



This algorithm is used in bioinformatics to align protein or nucleotide sequences. Alignments are a powerful way to compare related DNA or protein sequences. They can be used to capture various facts about the sequences aligned, such as common evolutionary descent or common structural function.



### **Guide to test the algorithm**

* STEP 1. Choose two sequences to compare in FASTA format. You can copy two of the following snipped examples or search in a [database](https://subjectguides.lib.neu.edu/c.php?g=948457&p=6839134)

DETAILS | FASTA
--- | ---
`P01013 GENE X PROTEIN (OVALBUMIN-RELATED)` | **QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMMCMNNSFNVATLPAEKMKILELPFASGDLSMLVLLPDEVSDLERIEKTINFP**
`Mitochondrial ribosomal protein L27 Sus scrofa (9823)` | **MALAVLALRTRAAVTALLSPPQKGHYVHAGNILATQRHFRWHPNKCLYALEEGVVRYTKEVYVPNPSNSEAVDLVTRLPQGAVLYKTFVHVVPAKPEGTFKLVAML**
`Phalloidin-stabilized F-actin, Homo sapiens` | **MHHHHHHGSLVPRSENLYFQGSDRDAEMPATEKAPWKKIQQNTRWCNEHLKCVSKRIANLQTQMQLENVSVALEFLPNVDKHSVMTYLSQFPKAKLKPGAPLRPK**
`Tubulin alpha-1B chain, Bos taurus` | **MRECISIHVGQAGVQIGNACWELYCLEHGIQPDGQMPSDKTIGGGDDSFNTFFSETGAGKHVPRAVFVDLEPTVIDEVRTGTYRQLFHPEQLITGKEDAAADQCTGLQY**






In [1]:
#@title  { display-mode: "form" }
sequence_1 = '' #@param {type:"string"}
sequence_2 = '' #@param {type:"string"}


* STEP 2. Select using the slide your rewards and penalties for a Match, Mismatch or Gap. Next you can find a recommended scoring schema. Feel free to test any other model.

Result | Score
--- | ---
Match | 1
Mismatch | -1
Gap | -2



In [2]:
#@title  { vertical-output: true, display-mode: "form" }
match_reward = 1 #@param {type:"slider", min:-5, max:5, step:1}
mismatch_penalty = -1 #@param {type:"slider", min:-5, max:5, step:1} 
gap_penalty = -2 #@param {type:"slider", min:-5, max:5, step:1} 

### When ready, press **CTRL + F9** to run the aligment!

In [3]:
#@title  { display-mode: "form" }
#@title 
import numpy as np

# initial and check matrix
main_matrix = np.zeros((len(sequence_1)+1, len(sequence_2)+1))
match_checker_matrix = np.zeros((len(sequence_1),len(sequence_2)))

# Fill matrix according to match or mismatch
for i in range(len(sequence_1)):
  for j in range(len(sequence_2)):
    if sequence_1[i] == sequence_2[j]:
      match_checker_matrix[i][j] = match_reward
    else:
      match_checker_matrix[i][j] = mismatch_penalty

# Initialization 
for i in range(len(sequence_1)+1):
  main_matrix[i][0] = i * gap_penalty
for j in range(len(sequence_2)+1):
  main_matrix[0][j] = j * gap_penalty

# Matrix filling
for i in range (1, len(sequence_1)+1):
  for j in range (1, len(sequence_2)+1):
    main_matrix[i][j] = max(main_matrix[i-1][j-1] + match_checker_matrix[i-1][j-1],
                            main_matrix[i-1][j] + gap_penalty, 
                            main_matrix[i][j-1] + gap_penalty)
# Traceback 

aligned_1 = ""
aligned_2 = ""

ti = len(sequence_1)
tj = len(sequence_2)

while (ti > 0 and tj > 0):

  if (ti > 0 and tj > 0 and main_matrix[ti][tj] == main_matrix[ti-1][tj-1] + match_checker_matrix[ti-1][tj-1]):

    aligned_1 = sequence_1[ti-1] + aligned_1
    aligned_2 = sequence_2[tj-1] + aligned_2

    ti = ti - 1
    tj = tj - 1

  elif (ti > 0 and main_matrix[ti][tj] == main_matrix[ti-1][tj] + gap_penalty):
    aligned_1 = sequence_1[ti-1] + aligned_1
    aligned_2 = "-" + aligned_2
    ti = ti - 1

  else:
    aligned_1 = "-" + aligned_1
    aligned_2 = sequence_2[tj-1] + aligned_2
    tj = tj -1 



In [None]:
#@title # **• •  • Results • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •  • • • • • • • • • • • • • • • • • • • •  • • • •  • • •  • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •** { vertical-output: true, display-mode: "form" }
# test
print(aligned_1)
print(aligned_2)


In [None]:
#@title  { display-mode: "form" }
#@title
import ipywidgets as widgets
from IPython.display import display
button = widgets.Button(description="Click me!", button_style = "warning")
output = widgets.Output()

def on_button_clicked(b):
  with output:
     print("Thanks for testing, have a nice day (:")

button.on_click(on_button_clicked)
display(button, output)

### There is algo non-biological uses for this classification algorithm

**Optimal matching:** A sequence analysis method used in social science, to assess the dissimilarity of ordered arrays of tokens that usually represent a time-ordered sequence of socio-economic states two individuals have experienced.Optimal matching uses the Needleman-Wunsch algorithm.

**Historical and comparative linguistics:** Sequence alignment has been used to partially automate the comparative method by which linguists traditionally reconstruct languages.

**Business and marketing:** Research has also applied multiple sequence alignment techniques in analyzing series of purchases over time.

<table border="0" align="left" width="1850" height="144">
<tbody>
<tr>
<td width="120"><img width="100" src="https://www.pngrepo.com/png/143936/512/dna.png" /></td>
<td style="width: 600px; height: 67px;">
<h1 style="text-align: left;">Needleman Wunsch Algorithm Demo</h1>
<h4 style="text-align: left;">Diego Sánchez</h3>
</td>
</tr>
</tbody>
</table>