# Computational biology and bioinformatics - <span style="color:#1CA766">INFO-F-439</span>
# Assignment 1: <span style="color:#1CA766">aligning sequences	and	detecting motifs</span>
  	
  
> ## <span style="color:#2E66A7"> Alberto Parravicini</span>

## <span style="color:#2E66A7">Part 1:</span> implementing the sequence alignment algorithm

### Introduction:

Given two **sequences of aminoacids**, we would like to *align* them to highlight similarities between them. 
In order to do so, we can take inspiration from the famous [**Wagner–Fischer**](https://en.wikipedia.org/wiki/Wagner%E2%80%93Fischer_algorithm) algorithm, used to compute the **edit distance** between two strings.

> <span style="color:#1CA766">**Edit distance:**</span> also known as *Levenshtein distance*, it is the minimum number of characters that have to be inserted, removed or substituted to transform a string into another desired string.

Before tackling the more complex problem of aligning aminoacid sequences, let's build an algorithm to compute the edit distance of 2 strings, as a quick warm-up!

In [2]:
import numpy as np
import editdistance
import utils
import random

aminoacid_names = "ARNDCEQGHILKMFPSTWYV"
print("Number of aminoacids:", len(aminoacid_names))

gap_penalty = -1

min_string_size = 20
max_string_size = 70


def edit_levenshtein(c1, c2):
    """
    Default edit function cost introduced by Levenshtein
    
    Parameters 
    ----------
    c1, c2: any object on which equality is defined.
    
    Returns
    ----------
    int
        0 if c1 == c2 (i.e. no substitution is needed),
        -1 otherwise (i.e. a substitution is needed)
    """
    return 0 if c1 == c2 else -1
    

def edit_distance(s1, s2, gap_penalty = -1, edit_function = edit_levenshtein):

    """
    Compute the edit distance between 2 strings "s1" and "s2", 
    i.e. the number of character deletions, insertions and substitutions 
    required to turn "s1" into "s2".
    
    Parameters 
    ----------
    s1, s2: array-like
        
    gap_penalty: int, optional
        The penalty factor assigned to character deletions and insertions.
        
    edit_function: function, optional
        The function that is used to compute the cost of a character subtitution.
        
    
    Returns 
    ----------
    int
        The edit distance between s1 and s2
    """
    n_row= len(s1) + 1
    n_col = len(s2) + 1
    edit_matrix = np.zeros((n_row, n_col))
    
    for i in range(n_row):
        edit_matrix[i, 0] = i * gap_penalty
                    
    for j in range(n_col):
        edit_matrix[0, j] = j * gap_penalty
                       
    for i in range(1, n_row):
        for j in range(1, n_col):
            x_gap = edit_matrix[i - 1, j] + gap_penalty
            y_gap = edit_matrix[i, j - 1] + gap_penalty
            mut = edit_matrix[i - 1, j - 1] + edit_function(s1[i - 1], s2[j - 1])
            edit_matrix[i, j] = max(x_gap, y_gap, mut)
            
    return -edit_matrix[len(s1), len(s2)]

ImportError: No module named 'utils'