# Crash Course - Protein Sequence Analysis of COVID-19 using Biopython

In this tutorial we will deep dive into some interesting applications of the popular **BioPython** package in biological sequence analysis.

Credit: [JCharisTech](https://www.youtube.com/watch?v=dxVKG2gNSos&ab_channel=JCharisTech)

The original tutorial can be found [here](https://www.youtube.com/watch?v=dxVKG2gNSos&ab_channel=JCharisTech).

Contents:

- Sequence Manipulation
    - Join 2 sequences
    - Find position of a specific codon
    - Count number of nucleotides


### DNA Sequence Manipulation

In [3]:
from Bio.Seq import Seq

#### Join 2 Sequences

In [19]:
dna1 = Seq('ATGGCTGGAAATCCTTCG')
dna2 = Seq('TCGGATGCAATCCCCGTT')
dna = dna1[0:10] + dna2[9:-1]
dna

Seq('ATGGCTGGAAATCCCCGT')

#### Find position of a specific codon

In [20]:
pos = dna.find('GGA') # GGA - Glycine
print(f'GGA (Glycine) code is in the {pos}th position.')

GGA (Glycine) code is in the 6th position.


#### Count number of nucleotides

In [21]:
print("Nucleotide counts:")
print(f'Count of A: {dna.count("A")}')
print(f'Count of T: {dna.count("T")}')
print(f'Count of G: {dna.count("G")}')
print(f'Count of C: {dna.count("C")}')

Nucleotide counts:
Count of A: 4
Count of T: 4
Count of G: 5
Count of C: 5


### Transcription

In [23]:

rna = dna.transcribe()
print(f'DNA sequence: {dna}')
print(f'RNA sequence: {rna}')

DNA sequence: ATGGCTGGAAATCCCCGT
RNA sequence: AUGGCUGGAAAUCCCCGU


### Translation