# Crash Course - Protein Sequence Analysis of COVID-19 using Biopython

In this tutorial we will deep dive into some interesting applications of the popular **BioPython** package in biological sequence analysis.
- Credit: [JCharisTech](https://www.youtube.com/watch?v=dxVKG2gNSos&ab_channel=JCharisTech)
- The original tutorial can be found [here](https://www.youtube.com/watch?v=dxVKG2gNSos&ab_channel=JCharisTech).

Contents:
- DNA Sequence Manipulation
    - Join 2 sequences
    - Find position of a specific codon
    - Count number of nucleotides
    - Complement
    - Reverse complement
- Transcription
- Translation
    - DNA to Protein
    - RNA to Protein
- Reverse transcription
- Codon table
    - For DNA 
    - For RNA
- Sequence analysis of COVID-19
    - 



### DNA Sequence Manipulation

In [3]:
from Bio.Seq import Seq

#### Join 2 Sequences

In [19]:
dna1 = Seq('ATGGCTGGAAATCCTTCG')
dna2 = Seq('TCGGATGCAATCCCCGTT')
dna = dna1[0:10] + dna2[9:-1]
dna

Seq('ATGGCTGGAAATCCCCGT')

#### Find position of a specific codon

In [20]:
pos = dna.find('GGA') # GGA - Glycine
print(f'GGA (Glycine) code is in the {pos}th position.')

GGA (Glycine) code is in the 6th position.


#### Count number of nucleotides

In [21]:
print("Nucleotide counts:")
print(f'Count of A: {dna.count("A")}')
print(f'Count of T: {dna.count("T")}')
print(f'Count of G: {dna.count("G")}')
print(f'Count of C: {dna.count("C")}')

Nucleotide counts:
Count of A: 4
Count of T: 4
Count of G: 5
Count of C: 5


#### Complement

In [25]:
dna_comp = dna.complement()
print(f'DNA: {dna}')
print(f'DNA complement: {dna_comp}')

DNA: ATGGCTGGAAATCCCCGT
DNA complement: TACCGACCTTTAGGGGCA


#### Reverse complement

In [26]:
dna_rev_comp = dna.reverse_complement()
print(f'DNA: {dna}')
print(f'DNA reverse complement: {dna_rev_comp}')

DNA: ATGGCTGGAAATCCCCGT
DNA reverse complement: ACGGGGATTTCCAGCCAT


### Transcription

In [24]:
# DNA to RNA

rna = dna.transcribe()
print(f'DNA: {dna}')
print(f'RNA: {rna}')

DNA: ATGGCTGGAAATCCCCGT
RNA: AUGGCUGGAAAUCCCCGU


### Translation

In [28]:
# DNA to Protein

protein = dna.translate()
print(f'DNA: {dna}')
print(f'Protein: {protein}')

DNA: ATGGCTGGAAATCCCCGT
Protein: MAGNPR


In [29]:
# RNA to Protein

protein = rna.translate()
print(f'RNA: {rna}')
print(f'Protein: {protein}')

RNA: AUGGCUGGAAAUCCCCGU
Protein: MAGNPR


### Reverse transcription

In [30]:
cdna = rna.back_transcribe()
print(f'RNA: {rna}')
print(f'cDNA: {cdna}')

RNA: AUGGCUGGAAAUCCCCGU
cDNA: ATGGCTGGAAATCCCCGT


### Codon table

In [31]:
from Bio.Data import CodonTable

In [32]:
# for DNA
print("Codon Table for DNA:")
print(CodonTable.unambiguous_dna_by_name['Standard'])

Codon Table for DNA:
Table 1 Standard, SGC0

  |  T      |  C      |  A      |  G      |
--+---------+---------+---------+---------+--
T | TTT F   | TCT S   | TAT Y   | TGT C   | T
T | TTC F   | TCC S   | TAC Y   | TGC C   | C
T | TTA L   | TCA S   | TAA Stop| TGA Stop| A
T | TTG L(s)| TCG S   | TAG Stop| TGG W   | G
--+---------+---------+---------+---------+--
C | CTT L   | CCT P   | CAT H   | CGT R   | T
C | CTC L   | CCC P   | CAC H   | CGC R   | C
C | CTA L   | CCA P   | CAA Q   | CGA R   | A
C | CTG L(s)| CCG P   | CAG Q   | CGG R   | G
--+---------+---------+---------+---------+--
A | ATT I   | ACT T   | AAT N   | AGT S   | T
A | ATC I   | ACC T   | AAC N   | AGC S   | C
A | ATA I   | ACA T   | AAA K   | AGA R   | A
A | ATG M(s)| ACG T   | AAG K   | AGG R   | G
--+---------+---------+---------+---------+--
G | GTT V   | GCT A   | GAT D   | GGT G   | T
G | GTC V   | GCC A   | GAC D   | GGC G   | C
G | GTA V   | GCA A   | GAA E   | GGA G   | A
G | GTG V   | GCG A   | GAG E   | GGG

In [33]:
# for RNA
print("Codon Table for RNA:")
print(CodonTable.unambiguous_rna_by_name['Standard'])

Codon Table for RNA:
Table 1 Standard, SGC0

  |  U      |  C      |  A      |  G      |
--+---------+---------+---------+---------+--
U | UUU F   | UCU S   | UAU Y   | UGU C   | U
U | UUC F   | UCC S   | UAC Y   | UGC C   | C
U | UUA L   | UCA S   | UAA Stop| UGA Stop| A
U | UUG L(s)| UCG S   | UAG Stop| UGG W   | G
--+---------+---------+---------+---------+--
C | CUU L   | CCU P   | CAU H   | CGU R   | U
C | CUC L   | CCC P   | CAC H   | CGC R   | C
C | CUA L   | CCA P   | CAA Q   | CGA R   | A
C | CUG L(s)| CCG P   | CAG Q   | CGG R   | G
--+---------+---------+---------+---------+--
A | AUU I   | ACU T   | AAU N   | AGU S   | U
A | AUC I   | ACC T   | AAC N   | AGC S   | C
A | AUA I   | ACA T   | AAA K   | AGA R   | A
A | AUG M(s)| ACG T   | AAG K   | AGG R   | G
--+---------+---------+---------+---------+--
G | GUU V   | GCU A   | GAU D   | GGU G   | U
G | GUC V   | GCC A   | GAC D   | GGC G   | C
G | GUA V   | GCA A   | GAA E   | GGA G   | A
G | GUG V   | GCG A   | GAG E   | GGG

### Sequence analysis of COVID-19

- In this tutorial, we will fetch the COVID-19 genome (MN908947) from NCBI, sequenced from bronchiolar lavage fluid of a patient in Wuhan, China on 26 December 2019
- - The fasta file for analysis can be found [here]()

In [34]:
from Bio import SeqIO