<a href="https://colab.research.google.com/github/AmitMandliya/multiple-sequence-alignment/blob/feature%2Fgoogle_collab_checkpoint/msa.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <img src="https://github.com/JuliaLang/julia-logo-graphics/raw/master/images/julia-logo-color.png" height="100" /> _Colab Notebook Template_

## Instructions
1. Work on a copy of this notebook: _File_ > _Save a copy in Drive_ (you will need a Google account). Alternatively, you can download the notebook using _File_ > _Download .ipynb_, then upload it to [Colab](https://colab.research.google.com/).
2. If you need a GPU: _Runtime_ > _Change runtime type_ > _Harware accelerator_ = _GPU_.
3. Execute the following cell (click on it and press Ctrl+Enter) to install Julia, IJulia and other packages (if needed, update `JULIA_VERSION` and the other parameters). This takes a couple of minutes.
4. Reload this page (press Ctrl+R, or ⌘+R, or the F5 key) and continue to the next section.

_Notes_:
* If your Colab Runtime gets reset (e.g., due to inactivity), repeat steps 2, 3 and 4.
* After installation, if you want to change the Julia version or activate/deactivate the GPU, you will need to reset the Runtime: _Runtime_ > _Factory reset runtime_ and repeat steps 3 and 4.

In [None]:
%%shell
set -e

#---------------------------------------------------#
JULIA_VERSION="1.4.2" # any version ≥ 0.7.0
JULIA_PACKAGES="IJulia BenchmarkTools Plots"
JULIA_PACKAGES_IF_GPU="CuArrays"
JULIA_NUM_THREADS=2
#---------------------------------------------------#

if [ -n "$COLAB_GPU" ] && [ -z `which julia` ]; then
  # Install Julia
  JULIA_VER=`cut -d '.' -f -2 <<< "$JULIA_VERSION"`
  echo "Installing Julia $JULIA_VERSION on the current Colab Runtime..."
  BASE_URL="https://julialang-s3.julialang.org/bin/linux/x64"
  URL="$BASE_URL/$JULIA_VER/julia-$JULIA_VERSION-linux-x86_64.tar.gz"
  wget -nv $URL -O /tmp/julia.tar.gz # -nv means "not verbose"
  tar -x -f /tmp/julia.tar.gz -C /usr/local --strip-components 1
  rm /tmp/julia.tar.gz

  # Install Packages
  if [ "$COLAB_GPU" = "1" ]; then
      JULIA_PACKAGES="$JULIA_PACKAGES $JULIA_PACKAGES_IF_GPU"
  fi
  for PKG in `echo $JULIA_PACKAGES`; do
    echo "Installing Julia package $PKG..."
    julia -e 'using Pkg; pkg"add '$PKG'; precompile;"'
  done

  # Install kernel and rename it to "julia"
  echo "Installing IJulia kernel..."
  julia -e 'using IJulia; IJulia.installkernel("julia", env=Dict(
      "JULIA_NUM_THREADS"=>"'"$JULIA_NUM_THREADS"'"))'
  KERNEL_DIR=`julia -e "using IJulia; print(IJulia.kerneldir())"`
  KERNEL_NAME=`ls -d "$KERNEL_DIR"/julia*`
  mv -f $KERNEL_NAME "$KERNEL_DIR"/julia  

  echo ''
  echo "Success! Please reload this page and jump to the next section."
fi

Installing Julia 1.4.2 on the current Colab Runtime...
2020-11-02 23:10:02 URL:https://storage.googleapis.com/julialang2/bin/linux/x64/1.4/julia-1.4.2-linux-x86_64.tar.gz [99093958/99093958] -> "/tmp/julia.tar.gz" [1]
Installing Julia package IJulia...
    Cloning default registries into `~/.julia`
    Cloning registry from "https://github.com/JuliaRegistries/General.git"
[2K[?25h      Added registry `General` to `~/.julia/registries/General`
  Resolving package versions...
  Installed Artifacts ─────── v1.3.0
  Installed VersionParsing ── v1.2.0
  Installed MbedTLS_jll ───── v2.16.8+1
  Installed ZeroMQ_jll ────── v4.3.2+5
  Installed SoftGlobalScope ─ v1.1.0
  Installed Parsers ───────── v1.0.11
  Installed IJulia ────────── v1.22.0
  Installed JLLWrappers ───── v1.1.3
  Installed MbedTLS ───────── v1.0.3
  Installed JSON ──────────── v0.21.1
  Installed Conda ─────────── v1.5.0
  Installed ZMQ ───────────── v1.2.1
Downloading artifact: MbedTLS
#####################################



# Checking the Installation
The `versioninfo()` function should print your Julia version and some other info about the system:

In [None]:
versioninfo()

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, broadwell)
Environment:
  JULIA_NUM_THREADS = 2


In [None]:
function fun()
  print("Hello!!!")

end
fun()

SyntaxError: ignored

In [None]:
using BenchmarkTools

M = rand(2048, 2048)
@benchmark M^2

SyntaxError: ignored

In [None]:
if ENV["COLAB_GPU"] == "1"
    using CuArrays

    M_gpu = cu(M)
    @benchmark CuArrays.@sync M_gpu^2
else
    println("No GPU found.")
end

LoadError: ignored

# Need Help?

* Learning: https://julialang.org/learning/
* Documentation: https://docs.julialang.org/
* Questions & Discussions:
  * https://discourse.julialang.org/
  * http://julialang.slack.com/
  * https://stackoverflow.com/questions/tagged/julia

If you ever ask for help or file an issue about Julia, you should generally provide the output of `versioninfo()`.

Add new code cells by clicking the `+ Code` button (or _Insert_ > _Code cell_).

Have fun!

<img src="https://raw.githubusercontent.com/JuliaLang/julia-logo-graphics/master/images/julia-logo-mask.png" height="100" />

In [None]:
using Random
using Printf
using Dates

In [None]:
# Generates t sequences all of length l, and returns an array of strings
function generate_sequences(t::Int64, l::Int64)
   # t is the number of sequences to create
    # l is the length of the sequences
    DNA = Array{String,1}(undef,0)
    base_arr = ["A", "T", "G", "C"]

    for t_index in 1:t
        push!(DNA, "")
        for l_value in 1:l
            r = convert(Int64, floor(Random.rand() * 4) + 1)
            DNA[t_index] = string(DNA[t_index], base_arr[r])
        end
    end
    return DNA    
end

generate_sequences (generic function with 1 method)

In [None]:
# test generate_sequences here
data = generate_sequences(10,1000)

10-element Array{String,1}:
 "ATTCAAGCTCCTCCGGTCCTCTGATCTTTGATTACCTGTATCGTTAGGGGTCTTCTCCCCGACCCAACTAAACGTTGAATAAGATCTTCCCGTGGGGCCGCTGGACGTTCACAGGAGTAGTTATCGACGCATTTGGCCAGAAGAGAGCAGTATAAATGCAGCCCTAACTCCTGCCAAACACCCCAACGGCTGCACAGCGGGGACGTGTGTCCTTAAGCAGAGGGAGCCGTCGGGCACGCGGGCTAAAACGTGCCTGAGGAGGATTGCCTTGATATACATGTTAACCAGCGCATTTATTATAAGGTGAAGAGTGTATTCTATGATTCTCCGTATACTAGTTAATAGGCATCCTAGTTTGATGCCCCTTGACCAGCGAACCCCCAGCGCCATCCGAGATACAGATCTGACTCAGACCATGTGCTGACCCTGCTCCACCCGTATCTGACCAAGGCCTTTAATCTCACAGCCTCATGATCATCCACCCAGATGCTAACCAGTCCTTTCGCCCGTAATTTACGCATAAGTTACCATGGCCCATCTTGCCCGAGGATCTAGTACACAGATTTAAAGGAAAGTAGAACAGAAGTCGGTGGGAACGGGGCTAGAGTATCCATTAACGATCAGCCGTAATGGGAAGAAGAAACCGGAGGGTCGGGTGTGATCGTCTGTGGTGTGAACTTCACCGCCAATGGAAACAAACCATCGAACCCTTACGCAAAGCAACAGACGGACCACGATACTCGCTCTGGAGACGTAATCCCAAACTCGGTTAGGTGCGTGACGGACGGTCAGATCCGGGGTCGACCCCACTCGCTAAGACATCTTAATGAGACCGGAGGTTGAATGGGTAGACAGTATAGGGTAGAGCCCCGGAATATATCTACGTTATCTTCGAACAACGTGTGACTGCAGATACTGATGCAAGTTTAATTAGTCGTCGTTGCTAGCAGAGCGCGGATCCAGCGACACA

In [None]:
function MSA_to_TSP(sequences)
  node = []
  for i in 1:length(sequences)
    push!(node,sequences[i])
  end

  graph = Array{Float64, 2}(undef, length(node), length(node))
  for i in 1:length(node)
    for j in i+1:length(node)
      score, align1, align2 = get_alignment_score(node[i],node[j]) 
      graph[i,j] = score
      graph[j,i] = score
    end
  end

  return graph
end

MSA_to_TSP (generic function with 1 method)

In [None]:
# test MSA_to_TSP here
MSA_to_TSP(data)

10×10 Array{Float64,2}:
   0.0   86.0        94.0       …  84.0       114.0        87.0
  86.0    2.5e-323  108.0          93.0        71.0       102.0
  94.0  108.0         2.0e-323     91.0        74.0        80.0
  94.0  103.0        83.0          81.0        78.0        73.0
  98.0   90.0        84.0          80.0       109.0       123.0
  83.0   81.0        83.0       …  83.0        80.0       101.0
  87.0  101.0       100.0          81.0       111.0       117.0
  84.0   93.0        91.0           5.0e-323   73.0        80.0
 114.0   71.0        74.0          73.0         6.4e-323   94.0
  87.0  102.0        80.0          80.0        94.0         7.4e-323

In [None]:
function get_alignment_score(v, w, match_penalty=1, mismatch_penalty=-1, deletion_penalty=-1)
    n1 = length(v)
    n2 = length(w)
    #if !use_preallocated_matrices
    s = zeros(Float64, n1+1, n2+1)
    b = zeros(Float64, n1+1, n2+1)
    #end

    for i in 1:(n1+1)
        s[i,1] = (i-1) * deletion_penalty
        b[i,1] = 2
    end
    for j in 1:(n2+1)
        s[1,j] = (j-1) * deletion_penalty
        b[1,j] = 3
    end

    for i in 2:(n1+1)
        for j in 2:(n2+1)
            if v[i-1] == w[j-1]
                ms = s[i-1,j-1] + match_penalty
            else
                # ignore cases where a letter is paired with a gap
                # do not consider this a mismatch
                # if v[i-1] != '-' && w[j-1] != '-'
                #     ms = s[i-1,j-1] + mismatch_penalty
                # else
                #     # if a letter is paired with a gap, add no penalty
                #     ms = s[i-1,j-1] #+ match_penalty # + 0.5 * mismatch_penalty
                # end
                ms = s[i-1,j-1] + mismatch_penalty
            end
            test = [ms, s[i-1,j] + deletion_penalty, s[i,j-1] + deletion_penalty]
            p = argmax(test)
            s[i,j] = test[p]
            b[i,j] = p
        end
    end

    i = n1+1
    j = n2+1
    sv = []
    sw = []
    while(i > 1 || j > 1)
        p = b[i,j]
        if (p == 1)
            i = i-1
            j = j-1
            push!(sv, v[i])
            push!(sw, w[j])
        elseif p == 2
            i=i-1
            push!(sv, v[i])
            push!(sw, "-")
        elseif p == 3
            j = j-1
            push!(sv, "-")
            push!(sw, w[j])
        else
            break
        end
    end

    return (s[n1+1,n2+1], join(reverse(sv)), join(reverse(sw)))
end

get_alignment_score (generic function with 4 methods)

In [None]:
# test get_alignment_score here
sequences = generate_sequences(10,1000)
score, align1, align2 = get_alignment_score(sequences[1],sequences[2])
println("Score = ",score)
println("Alignment of fir sequence = ",align1)
println("Alignment of sec sequence = ",align2)

Score = 73.0
Alignment of fir sequence = --GGAACGGT--GTG-AGA-CTAAA-A-CGGTGACCCCAGTGATGAGGCGAGCAAGTGCC-T-GA-ATAGGCCCTG-GAGATAGTAGTG-ACA-GGACGTCAA-AAGAAATTGTCGGTT--TA-TCGTTGGT-TTG-TAAAACCACGG-CTACCTCCTGGACAAGAGAATAGTCCCAGTAGCGTCGGAG-CGTGTAAACACTCAGGAGACACGGA-TT-CAGTGGCACCG-TCTTGCCACAGAGTTCCGTTTTTC-CCTCTGATCTCGCGCTACAA-GAGAGCGTCAAC--CCGCTCTAG--GGAAGAGTTACTCCT-GGGGGCTT-C---TCC----A--G---TT-C-AGCTA-AGACCGGGGTCACGTA-TGAAACCAAATGCCACCTGTACTGGGGGCGAAGAAAAATAGAA-GGTGGATAATGTTAAATTCTGAAGCCTGC-CAAAAACTTTCTGGCAGT-A-GGACCCGGTTAA-TTAAAACCGGGGACAA-CTTGTACGCCAGTCGAAATTTTGA-TTTTAGCGAACGTCTG--GGGTCGACT-TACGT-T--AA--C---AAC-TTGGC---CGGATACAGA-CTAG-TCGGACC-CATGGTAACTCCC--TC--TAAG-ATGGGTCCGAAATGTGGGACT--G-AAAAGTGT-TCGC--CCGT-CGA-GCTTGGCAC-CCA-TAC---A-GCGCAAAATGTGCATACTTCAACCAGACGTACAAGAGTGTATAC-ACCCCTCCTCATCGG--GGAAACTGT-ACTCGTGGATACTT-AGA-ATTAGGA-AATATAAATTTG--TAC--CTTACTACTTG-CTTAGTTCAT--CT-TTGCCTTTTGGGGTACGTCGTGGA--CATCCGC-CG-G-GCATAAACGTAGG-GAATCCCAGTAAATCT-CC-G-ATAGA-TCTGAAG-GTTGGGTACCGCACGCTTGTTTGCAC

In [None]:
# Return the ant colony with cities initialized to each ant

function create_colony(num_ants, num_nodes)
    colony = []
    for i in 1 : num_ants
       push!(colony, Dict("path"=>[rand(1:num_nodes)], "distance" => 0))
    end
    return colony
end

create_colony (generic function with 1 method)

In [None]:
create_colony(10, 4)

10-element Array{Any,1}:
 Dict{String,Any}("distance" => 0,"path" => [1])
 Dict{String,Any}("distance" => 0,"path" => [4])
 Dict{String,Any}("distance" => 0,"path" => [3])
 Dict{String,Any}("distance" => 0,"path" => [1])
 Dict{String,Any}("distance" => 0,"path" => [4])
 Dict{String,Any}("distance" => 0,"path" => [2])
 Dict{String,Any}("distance" => 0,"path" => [3])
 Dict{String,Any}("distance" => 0,"path" => [1])
 Dict{String,Any}("distance" => 0,"path" => [1])
 Dict{String,Any}("distance" => 0,"path" => [4])

In [None]:
function create_pheror_matrix(num_nodes)
    pheromone = zeros(Float64, num_nodes, num_nodes)
    for i in 1: num_nodes
        for j in 1: num_nodes
            pheromone[i,j] = 1/num_nodes
        end
    end
    return pheromone
end

create_pheror_matrix (generic function with 1 method)

In [None]:
create_pheror_matrix(4)

4×4 Array{Float64,2}:
 0.25  0.25  0.25  0.25
 0.25  0.25  0.25  0.25
 0.25  0.25  0.25  0.25
 0.25  0.25  0.25  0.25

In [191]:
function calculate_proba(num_nodes, pheromone, distance_matrix, alpha, beta)
    probability = zeros(Float64, num_nodes, num_nodes)
    for i in 1: num_nodes
        for j in 1: num_nodes
            probability[i,j] = (pheromone[i,j]^alpha) * (distance_matrix[i,j]^-beta)
            probability[j,i] = probability[i,j]
        end
    end
    return probability
end

calculate_proba (generic function with 1 method)

In [138]:
function calculate_proba_ant(pheromone, distance_matrix, unvisited_nodes, current_node, proba, alpha, beta)
  sigma = 0.0
  for unvisited_node in unvisited_nodes
    sigma += (pheromone[current_node,unvisited_node]^alpha) * (distance_matrix[current_node,unvisited_node]^-beta)
  end
  proba_ant = proba[current_node,:]/sigma
  return proba_ant
end

calculate_proba_ant (generic function with 2 methods)

In [122]:
function find_best_path(n_ants, colony)
  bpath = []
  best_distance = Inf32
  typeof(best_distance)
  idx_best = 0
  for i=1: n_ants
    if colony[i]["distance"] < best_distance
      best_distance = colony[i]["distance"]
      bpath = colony[i]["path"]
      idx_best = i
    end
  end
  best_path = Dict("path"=> bpath, "distance"=> best_distance, "ant"=> idx_best)
  return best_path
end

find_best_path (generic function with 1 method)

In [152]:
function update_pheror_matrix(num_nodes, n_ants, pheromone, distance_matrix, colony, Q, decay)
  depositpher = 0.0
  for i=1: n_ants
    ant = i
    for j= 1:(length(colony[ant]["path"])-1)
      src = colony[ant]["path"][j]
      dest = colony[ant]["path"][j+1]
      pheromone[src,dest] += Q/colony[i]["distance"]
    end
    depositpher += Q/colony[i]["distance"]
    for i= 1:num_nodes
      for j= 1:num_nodes
        pheromone[i,j] = (1-decay)*pheromone[i,j]*depositpher
        pheromone[j,i] = pheromone[i,j]
      end
    end
  end
  return pheromone
end

update_pheror_matrix (generic function with 2 methods)

In [153]:
function calculateDist_ant(ant, colony, distmatrix)
  dist = 0
  path = colony[ant]["path"]
  for i= 1:length(path)-1
    dist += distmatrix[path[i],path[i+1]]
  end
  return dist
end

calculateDist_ant (generic function with 1 method)

In [159]:
function traverse(ant, num_nodes, colony, pheromone, distance_matrix, proba, alpha, beta)
    unvisited = collect(1:num_nodes)
    current = colony[ant]["path"][1]
    print(current)
    deleteat!(unvisited, findfirst(isequal(current), unvisited))
    for j in 1: num_nodes
        if length(unvisited) > 1
            ant_probability = calculate_proba_ant(pheromone, distance_matrix, unvisited, current, proba, alpha, beta)
            prob = map((x) -> ant_probability[x] , unvisited)
            current = unvisited[findmax(prob)[2]]
            deleteat!(unvisited, findfirst(isequal(current), unvisited))
            push!(colony[ant]["path"], current)       
        else
            push!(colony[ant]["path"], unvisited[1])
        end
    end
    # complete the calculate_distance function
    colony[ant]["distance"] = calculateDist_ant(ant, colony, distance_matrix)
end

traverse (generic function with 2 methods)

In [192]:
function run(num_ants, num_nodes, distance_matrix, iterations, Q, decay, alpha, beta)
    pheromone = create_pheror_matrix(num_nodes)
    gbpath = Dict()
    for i= 1: iterations
        colony = create_colony(num_ants, num_nodes)
        probability = calculate_proba(num_nodes, pheromone, distance_matrix, alpha, beta)
        for ant in 1: num_ants
            traverse(ant, num_nodes, colony, pheromone, distance_matrix, probability, alpha, beta)
        end
        # complete update_pheromone_matrix
        pheromone = update_pheror_matrix(num_nodes, num_ants, pheromone, distance_matrix, colony, Q, decay)
        #complete find best path fucntion
        best_path = find_best_path(num_ants, colony)
        
        bpath = best_path
        if i == 1
            gbpath = bpath
        else
            if bpath["distance"] < gbpath["distance"]
                gbpath = bpath
            end
        end
        println("current best path = ",bpath["path"])
        println("current distance = ",bpath["distance"])
        println("global best path = ",gbpath["path"])
        println("global best path distance =",gbpath["distance"])
        println("iteration over")
    end    
    return pheromone
end

run (generic function with 3 methods)

In [202]:
#distance_matrix = MSA_to_TSP(generate_sequences(10,1000))
#distance_matrix = [0 5 7.07 5 10.44; 5 0 5 7.07 10.19; 7.07 5 0 5 5.38; 5 7.07 5 0 5.83; 10.44 10.19 5.38 5.83 0]
distance_matrix = [0 556.149 1160.79 786.014 556.149; 556.149 0 648.358 556.149 786.014; 1160.79 648.358 0 597.706 1131.61; 786.014 556.149 597.706 0 554.032; 556.149 786.014 1131.61 554.032 0]
typeof(distance_matrix)
run(50,5,distance_matrix,10,0.6,0.6,1,1)
#phero = run(20,10,distance_matrix,10,0.6,0.6,1,1)
phero

42451241214321442435311111325313532113235544443454current best path = [2, 1, 5, 4, 3, 3]
current distance = 2264.036
global best path = [2, 1, 5, 4, 3, 3]
global best path distance =2264.036
iteration over
22522432254222141322321453553111542435543113211455current best path = [4, 5, 1, 2, 3, 3]
current distance = 2314.688
global best path = [2, 1, 5, 4, 3, 3]
global best path distance =2264.036
iteration over
24525152521112251351115344215313235322255321512411current best path = [4, 5, 1, 2, 3, 3]
current distance = 2314.688
global best path = [2, 1, 5, 4, 3, 3]
global best path distance =2264.036
iteration over
45122141413524312532344132321315214521452253521511current best path = [3, 4, 5, 1, 2, 2]
current distance = 2264.036
global best path = [2, 1, 5, 4, 3, 3]
global best path distance =2264.036
iteration over
15532223232242135232334412541151552535323111455242current best path = [3, 4, 5, 1, 2, 2]
current distance = 2264.036
global best path = [2, 1, 5, 4, 3, 3]
global best path dist

10×10 Array{Float64,2}:
 2.20317e-17  0.0          1.07187e-21  …  0.0          0.0
 0.0          1.71389e-10  0.0             0.0          0.0
 1.07187e-21  0.0          1.26938e-91     0.0          0.0
 3.12506e-8   0.0          0.0             0.0          3.12506e-8
 0.0          1.07187e-21  0.0             1.07187e-21  0.0
 0.0          0.0          0.0          …  0.0          0.0
 0.0          3.12506e-8   0.0             0.0          0.0
 0.0          0.0          1.07187e-21     0.0          0.0
 0.0          0.0          0.0             2.96637e-8   1.07187e-21
 0.0          0.0          0.0             1.07187e-21  4.94424e-6

In [180]:
findmax(phero[10,:])

(2.956035478154747e-8, 5)

In [None]:
function ACO_on_TSP()
end

In [None]:
# test ACO_on_TSP here

In [None]:
function TSP_to_MSA()
end

In [None]:
# test TSP_to_MSA here

In [None]:
# test createPherorMatrix here
createPherorMatrix()

2×3 Array{Int64,2}:
 0  0  0
 0  0  0