## Corona virus genome analysis

### Step 1: Data fetching

Download the corona virus genome sequences mentioned in the paper [Complete Genome Sequence of a 2019 Novel Coronavirus (SARS-CoV-2) Strain Isolated in Nepal](https://mra.asm.org/content/9/11/e00169-20)

__List of genomes__:

* [__NC_045512.2__](https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2): Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
* [__MN988668.1__](https://www.ncbi.nlm.nih.gov/nuccore/MN988668.1): Severe acute respiratory syndrome coronavirus 2 isolate 2019-nCoV WHU01, complete genome
* [__MN938384.1__](https://www.ncbi.nlm.nih.gov/nuccore/MN938384.1): Severe acute respiratory syndrome coronavirus 2 isolate 2019-nCoV_HKU-SZ-002a_2020, complete genome
* [__MN975262.1__](https://www.ncbi.nlm.nih.gov/nuccore/MN975262.1): Severe acute respiratory syndrome coronavirus 2 isolate 2019-nCoV_HKU-SZ-005b_2020, complete genome
* [__MN985325.1__](https://www.ncbi.nlm.nih.gov/nuccore/MN985325.1): Severe acute respiratory syndrome coronavirus 2 isolate 2019-nCoV/USA-WA1/2020, complete genome
* [__MN988713.1__](https://www.ncbi.nlm.nih.gov/nuccore/MN988713.1): Severe acute respiratory syndrome coronavirus 2 isolate 2019-nCoV/USA-IL1/2020, complete genome
* [__MN994467.1__](https://www.ncbi.nlm.nih.gov/nuccore/MN994467.1): Severe acute respiratory syndrome coronavirus 2 isolate 2019-nCoV/USA-CA1/2020, complete genome
* [__MN994468.1__](https://www.ncbi.nlm.nih.gov/nuccore/MN994468.1): Severe acute respiratory syndrome coronavirus 2 isolate 2019-nCoV/USA-CA2/2020, complete genome
* [__MN997409.1__](https://www.ncbi.nlm.nih.gov/nuccore/MN997409.1): Severe acute respiratory syndrome coronavirus 2 isolate 2019-nCoV/USA-AZ1/2020, complete genome

In [1]:
import os
import requests
import json
import time
from ete3 import Tree, TreeStyle,SeqMotifFace
from PIL import Image
os.environ['QT_QPA_PLATFORM']='offscreen'

In [2]:
def fetch_genome_fasta_from_ncbi(refseq_id,output_path='.'):
  '''
  A function for fetching the genome fasta sequences from NCBI
  
  :param refseq_id: NCBI genome id
  :param output_path: Path to dump genome files, default '.'
  '''
  try:
    url = \
      'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id={0}&rettype=fasta'.\
        format(refseq_id)
    r = requests.get(url)
    fasta_data = r.content.decode('utf-8')
    output_file = \
      os.path.join(
        os.path.abspath(output_path),
        '{0}.fa'.format(refseq_id))
    with open(output_file,'w') as fp:
      fp.write(fasta_data)
    print('Downloaded genome seq for {0}'.format(refseq_id))
  except Exception as e:
    print('Failed to download data for {0} from NCBI, error: {1}'.format(refseq_id,e))

In [3]:
genome_list = [
    'MN988668.1',
    'NC_045512.2',
    'MN938384.1',
    'MN975262.1',
    'MN985325.1',
    'MN988713.1',
    'MN994467.1',
    'MN994468.1',
    'MN997409.1'
]

In [None]:
for id in genome_list:
    fetch_genome_fasta_from_ncbi(id)
    time.sleep(2)

## Step 2: MSA with Muscle

Merge all fasta files to a single filne and create multiple sequence alignment

In [None]:
%%bash
cat NC*.fa > corona.fa
cat MN*.fa >> corona.fa
grep '>' corona.fa|wc -l

In [None]:
!muscle -in corona.fa -out muscle_out.afa

## Step 3:

Build phylogenetic tree with FastTree

In [None]:
!FastTree -nt -gtr < muscle_out.afa >corona.tree

## Step 4: Plot tree

In [None]:
t = Tree("corona.tree")

In [None]:
## ascii tree
print(t)

In [None]:
## plain tree
t.render('a.png');
im = Image.open("a.png")
im

In [None]:
## circular tree
ts = TreeStyle()
ts.mode = "c"
ts.scale = 20
t.render("b.png",  tree_style=ts)
im = Image.open("b.png")
im

In [None]:
## tree with branch length
ts = TreeStyle()
ts.show_leaf_name = True
ts.show_branch_length = True
ts.show_branch_support = True
t.render("c.png", tree_style=ts)
im = Image.open("c.png")
im

In [None]:
## 180 deg circular tree
ts = TreeStyle()
ts.show_leaf_name = True
ts.mode = "c"
ts.arc_start = -180 # 0 degrees = 3 o'clock
ts.arc_span = 180
t.render("d.png", tree_style=ts)
im = Image.open("d.png")
im

In [None]:
## read aligned fastq file
fasta_data = dict()
with open('muscle_out.afa','r') as fp:
    header = ''
    fasta_list = list()
    for line in fp:
        line = line.strip()
        if line.startswith('>'):
            if header != '':
                fasta_data.update({header:''.join(fasta_list)})
            header = line.split()[0].replace('>','')
            fasta_list = list()
        else:
            fasta_list.append(line)
    
    fasta_data.update({header:''.join(fasta_list)})

In [None]:
## tree with aligned seq
for seq_id,seq in fasta_data.items():
    print(seq_id)
    seqFace = SeqMotifFace(seq, gapcolor="red")
    (t & "{0}".format(seq_id)).add_face(seqFace, 0, "aligned")
ts = TreeStyle()
ts.tree_width = 100
t.render("e.png", tree_style=ts)
## check png using image viewer