<a href="https://colab.research.google.com/github/AmitMandliya/multiple-sequence-alignment/blob/feature%2Fgoogle_collab_checkpoint/msa.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <img src="https://github.com/JuliaLang/julia-logo-graphics/raw/master/images/julia-logo-color.png" height="100" /> _Colab Notebook Template_

## Instructions
1. Work on a copy of this notebook: _File_ > _Save a copy in Drive_ (you will need a Google account). Alternatively, you can download the notebook using _File_ > _Download .ipynb_, then upload it to [Colab](https://colab.research.google.com/).
2. If you need a GPU: _Runtime_ > _Change runtime type_ > _Harware accelerator_ = _GPU_.
3. Execute the following cell (click on it and press Ctrl+Enter) to install Julia, IJulia and other packages (if needed, update `JULIA_VERSION` and the other parameters). This takes a couple of minutes.
4. Reload this page (press Ctrl+R, or ⌘+R, or the F5 key) and continue to the next section.

_Notes_:
* If your Colab Runtime gets reset (e.g., due to inactivity), repeat steps 2, 3 and 4.
* After installation, if you want to change the Julia version or activate/deactivate the GPU, you will need to reset the Runtime: _Runtime_ > _Factory reset runtime_ and repeat steps 3 and 4.

In [None]:
%%shell
set -e

#---------------------------------------------------#
JULIA_VERSION="1.4.2" # any version ≥ 0.7.0
JULIA_PACKAGES="IJulia BenchmarkTools Plots"
JULIA_PACKAGES_IF_GPU="CuArrays"
JULIA_NUM_THREADS=2
#---------------------------------------------------#

if [ -n "$COLAB_GPU" ] && [ -z `which julia` ]; then
  # Install Julia
  JULIA_VER=`cut -d '.' -f -2 <<< "$JULIA_VERSION"`
  echo "Installing Julia $JULIA_VERSION on the current Colab Runtime..."
  BASE_URL="https://julialang-s3.julialang.org/bin/linux/x64"
  URL="$BASE_URL/$JULIA_VER/julia-$JULIA_VERSION-linux-x86_64.tar.gz"
  wget -nv $URL -O /tmp/julia.tar.gz # -nv means "not verbose"
  tar -x -f /tmp/julia.tar.gz -C /usr/local --strip-components 1
  rm /tmp/julia.tar.gz

  # Install Packages
  if [ "$COLAB_GPU" = "1" ]; then
      JULIA_PACKAGES="$JULIA_PACKAGES $JULIA_PACKAGES_IF_GPU"
  fi
  for PKG in `echo $JULIA_PACKAGES`; do
    echo "Installing Julia package $PKG..."
    julia -e 'using Pkg; pkg"add '$PKG'; precompile;"'
  done

  # Install kernel and rename it to "julia"
  echo "Installing IJulia kernel..."
  julia -e 'using IJulia; IJulia.installkernel("julia", env=Dict(
      "JULIA_NUM_THREADS"=>"'"$JULIA_NUM_THREADS"'"))'
  KERNEL_DIR=`julia -e "using IJulia; print(IJulia.kerneldir())"`
  KERNEL_NAME=`ls -d "$KERNEL_DIR"/julia*`
  mv -f $KERNEL_NAME "$KERNEL_DIR"/julia  

  echo ''
  echo "Success! Please reload this page and jump to the next section."
fi

Installing Julia 1.4.2 on the current Colab Runtime...
2020-11-02 03:12:46 URL:https://storage.googleapis.com/julialang2/bin/linux/x64/1.4/julia-1.4.2-linux-x86_64.tar.gz [99093958/99093958] -> "/tmp/julia.tar.gz" [1]
Installing Julia package IJulia...
    Cloning default registries into `~/.julia`
    Cloning registry from "https://github.com/JuliaRegistries/General.git"
[2K[?25h      Added registry `General` to `~/.julia/registries/General`
  Resolving package versions...
  Installed Artifacts ─────── v1.3.0
  Installed VersionParsing ── v1.2.0
  Installed MbedTLS_jll ───── v2.16.8+1
  Installed ZeroMQ_jll ────── v4.3.2+5
  Installed SoftGlobalScope ─ v1.1.0
  Installed Parsers ───────── v1.0.11
  Installed IJulia ────────── v1.22.0
  Installed JLLWrappers ───── v1.1.3
  Installed MbedTLS ───────── v1.0.3
  Installed JSON ──────────── v0.21.1
  Installed Conda ─────────── v1.4.1
  Installed ZMQ ───────────── v1.2.1
Downloading artifact: MbedTLS
#####################################

# Checking the Installation
The `versioninfo()` function should print your Julia version and some other info about the system:

In [None]:
versioninfo()

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, broadwell)
Environment:
  JULIA_NUM_THREADS = 2


In [None]:
function fun()
  print("Hello!!!")

end
fun()

Hello!!!

In [None]:
using BenchmarkTools

M = rand(2048, 2048)
@benchmark M^2

SyntaxError: ignored

In [None]:
if ENV["COLAB_GPU"] == "1"
    using CuArrays

    M_gpu = cu(M)
    @benchmark CuArrays.@sync M_gpu^2
else
    println("No GPU found.")
end

No GPU found.


# Need Help?

* Learning: https://julialang.org/learning/
* Documentation: https://docs.julialang.org/
* Questions & Discussions:
  * https://discourse.julialang.org/
  * http://julialang.slack.com/
  * https://stackoverflow.com/questions/tagged/julia

If you ever ask for help or file an issue about Julia, you should generally provide the output of `versioninfo()`.

Add new code cells by clicking the `+ Code` button (or _Insert_ > _Code cell_).

Have fun!

<img src="https://raw.githubusercontent.com/JuliaLang/julia-logo-graphics/master/images/julia-logo-mask.png" height="100" />

In [None]:
using Random
using Printf
using Dates

In [None]:
# Generates t sequences all of length l, and returns an array of strings
function generate_sequences(t::Int64, l::Int64)
   # t is the number of sequences to create
    # l is the length of the sequences
    DNA = Array{String,1}(undef,0)
    base_arr = ["A", "T", "G", "C"]

    for t_index in 1:t
        push!(DNA, "")
        for l_value in 1:l
            r = convert(Int64, floor(Random.rand() * 4) + 1)
            DNA[t_index] = string(DNA[t_index], base_arr[r])
        end
    end
    return DNA    
end

SyntaxError: ignored

In [None]:
# test generate_sequences here
data = generate_sequences(10,1000)

10-element Array{String,1}:
 "ACGAGAGTGAGTACAGTACACGGTGGTATGTTTAGCGATATACAGAGCCCGCCGTGGCTATAGGTTCAATAGGCATAGCGCTCTACGCTTTGGGTGGGGAAATAGATGATAATCATGAGTCTGTGTGCTGTGCCGAGCAGGAAAAGCCATACCCCATGGTAATCCGCAAACGGCCTTGGCCCTTATATGGTTCAACCTAGGCCTGCACGCCTGCTAAACCCTCGCTGGTTAATGACAGTCCTCCATCTAAATATATGCCTACCCTACTGTTCAATGGGCATGGATTACCTCGAAACAAACTACGGCGTGAGTGCATGGCCGGGGGCCGAGAGCAATACAGAGGATGCATTAACGCCGTTTAGACGCGGCACCAAGCGGCCCCATACTTGTGGATTGCGGGCCGGCTCGTTCCGTGGTCGGCGAGTGGTCCAGCACTACATTCGCGCGCGTTACCTGTCAAGTGCTTAAGAAATCCGCAAACCTCCAGAGGCGTCATGTATTGGACTGTAGGCTCTCACCTATCGGACTGAGTTTTCCTGTCACAAGTTGCATGGTATACTTCAACCACGGGATGCTTGGCTCCATACCTATCCAAATAATAGCTCTATTGTGGAACTATTTGGGAAAGCAAATTACGGTCAAGCCTGGTTCGCCCAGGTTTACCTTACTCCATTGCTAAGCGAGTCAAGTACGACGAGAGAGATGCAAGGCCCTGAAACCCTTGTAGGCAATAAACGCTCATCCTGCAGACCACTACGTACCACACTGTGACCAAGTGAGGTCCATTCTGATGAAAGAGCAGCAACGCTAATTTAGAAAACAGACAGGACTCGTCCCGATTCGACGCGGCCTTCTTAAGATCTATAAGCAATCTGTAAGAAGGGCGAGGTTGTCGAAACTGTACCGCGCTTACCTCCAGGCTAGACTCATTACCAGGGTCGGCAGTCAAGTTGGATGGTAGAATGCTGAA

In [None]:
function MSA_to_TSP(sequences)
  node = []
  for i in 1:length(sequences)
    push!(node,sequences[i])
  end

  graph = Array{Tuple{Int64,Int64,Float64},1}(undef,0)
  for i in 1:length(node)
    for j in i+1:length(node)
      score, align1, align2 = get_alignment_score(node[i],node[j]) 
      push!(graph, (i, j, score))
    end
  end

  return graph
end

MSA_to_TSP (generic function with 1 method)

In [None]:
# test MSA_to_TSP here
MSA_to_TSP(data)

45-element Array{Tuple{Int64,Int64,Float64},1}:
 (1, 2, 114.0)
 (1, 3, 91.0)
 (1, 4, 93.0)
 (1, 5, 105.0)
 (1, 6, 96.0)
 (1, 7, 90.0)
 (1, 8, 109.0)
 (1, 9, 109.0)
 (1, 10, 93.0)
 (2, 3, 73.0)
 (2, 4, 106.0)
 (2, 5, 126.0)
 (2, 6, 73.0)
 ⋮
 (5, 9, 101.0)
 (5, 10, 92.0)
 (6, 7, 89.0)
 (6, 8, 93.0)
 (6, 9, 79.0)
 (6, 10, 102.0)
 (7, 8, 88.0)
 (7, 9, 87.0)
 (7, 10, 112.0)
 (8, 9, 103.0)
 (8, 10, 109.0)
 (9, 10, 98.0)

In [None]:
function get_alignment_score(v, w, match_penalty=1, mismatch_penalty=-1, deletion_penalty=-1)
    n1 = length(v)
    n2 = length(w)
    #if !use_preallocated_matrices
    s = zeros(Float64, n1+1, n2+1)
    b = zeros(Float64, n1+1, n2+1)
    #end

    for i in 1:(n1+1)
        s[i,1] = (i-1) * deletion_penalty
        b[i,1] = 2
    end
    for j in 1:(n2+1)
        s[1,j] = (j-1) * deletion_penalty
        b[1,j] = 3
    end

    for i in 2:(n1+1)
        for j in 2:(n2+1)
            if v[i-1] == w[j-1]
                ms = s[i-1,j-1] + match_penalty
            else
                # ignore cases where a letter is paired with a gap
                # do not consider this a mismatch
                # if v[i-1] != '-' && w[j-1] != '-'
                #     ms = s[i-1,j-1] + mismatch_penalty
                # else
                #     # if a letter is paired with a gap, add no penalty
                #     ms = s[i-1,j-1] #+ match_penalty # + 0.5 * mismatch_penalty
                # end
                ms = s[i-1,j-1] + mismatch_penalty
            end
            test = [ms, s[i-1,j] + deletion_penalty, s[i,j-1] + deletion_penalty]
            p = argmax(test)
            s[i,j] = test[p]
            b[i,j] = p
        end
    end

    i = n1+1
    j = n2+1
    sv = []
    sw = []
    while(i > 1 || j > 1)
        p = b[i,j]
        if (p == 1)
            i = i-1
            j = j-1
            push!(sv, v[i])
            push!(sw, w[j])
        elseif p == 2
            i=i-1
            push!(sv, v[i])
            push!(sw, "-")
        elseif p == 3
            j = j-1
            push!(sv, "-")
            push!(sw, w[j])
        else
            break
        end
    end

    return (s[n1+1,n2+1], join(reverse(sv)), join(reverse(sw)))
end

get_alignment_score (generic function with 4 methods)

In [None]:
# test get_alignment_score here
sequences = generate_sequences(10,1000)
score, align1, align2 = get_alignment_score(sequences[1],sequences[2])
println("Score = ",score)
println("Alignment of fir sequence = ",align1)
println("Alignment of sec sequence = ",align2)

Score = 91.0
Alignment of fir sequence = -CA-ACACGCTCTTCCGCG-CATATA-AGGGGTTAAGGCCGCGGTAAGA-GA-CGCATTGCTGCAACAATTA-TATC--TA-G-CCATTTCTTGCCTGACACAGG-TC-TG---GCGTGCTTAG-CT-TAATC-CTCG-CC----CAA-TCCTTAGA-GTGGGCTTACT-TC-CG-A-CACCCGC-GCT-GGTCAATTGACG-C-GCTCA--TTACTT--CGAAGATGACATTTCTACGTGCCGAGGGACCCTT--GGTGCGAGTTGAG-GGAT-CTGAT---A--CGTAACTTCGCACCC-TT---TG---A--GTAACATTTTGCCAGG-GCTT-GTTTTGTAGTTCTGTCCGCC-GCGACAGTTAGATTGTCACA-A--CG-G--GTG--CGTACTAAA-GA-TT-CGTCTGGAGC--CA-ACGCCTG-A-CGTATA-CTCGCCCCCCCCCCCTATC-ATGCCGCA-CAATAAC--G-GTC-ACTTCGAGGACCCGTACAAATGTGTATCTGCAAATACGACTGCAGTGTACATCGCTCCGACCGG-ACAATC--AAATGCTGATAAGC-TTAATG-AT-A-CTGATGCAC-TCTGATAGAGCTGAGTTCTA-CGA--C---AACA-G--GGTGC-G-GATAGAGC-TAATCAACAGGCTCGTGCGGCTTCCGTCCGGGATCAG-C-TTGGGGTCCTCCCGGTGTCAGCTGATGTTC-CGTAGTAAAAAACTGTCGGGATCGGACCCCGCCATAACC-TCCCGTCCG--G--GAT--TCTTA-GTG---G--T-T-ACCCTATGTTAG---ACTA-CCATTGCGC-TATCAGCTCGCCGCGCAGACCTCACGACTTCATAAACAATTCTTTG-GT---TAACA-CTAC-TACTTGTAAA-CCGCGATTTT---CAATGTG-AGAGAAACG-GTGTTC-T-CA-AAATTCG-GAGTA

In [None]:
function ACO_on_TSP()
end

In [None]:
# test ACO_on_TSP here

In [None]:
function TSP_to_MSA()
end

In [None]:
# test TSP_to_MSA here