<h1>Handling Seuences with BioPython - Alignment</h1>
<p>The question I am attempting to answer is do two geographically distinct strains of the SARS-2 Virus differ in gene expression.</p>
<p>Comparing two geographically unique genomes of the same species can provide insight into rate of mutation and can potentially provide insight into the different infection rates among geogrpahically distinct communities.</p>

In [10]:
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
from Bio.Seq import Seq
from Bio import pairwise2

In [4]:
with open("covidWuhan.fasta", 'w') as aa_fa:
    for dna_record in SeqIO.parse("covidGenome1.fasta", 'fasta'):
        # use both fwd and rev sequences
        dna_seqs = [dna_record.seq, dna_record.seq.reverse_complement()]

        # generate all translation frames
        aa_seqs = (s[i:].translate(to_stop=True) for i in range(3) for s in dna_seqs)

        # select the longest one
        max_aa = max(aa_seqs, key=len)

        # write new record
        aa_record = SeqRecord(max_aa, id=dna_record.id, description="translated sequence")
        SeqIO.write(aa_record, aa_fa, 'fasta')

In [5]:
with open("covidJapan.fasta", 'w') as aa_fa:
    for dna_record in SeqIO.parse("covidGenome2.fasta", 'fasta'):
        # use both fwd and rev sequences
        dna_seqs = [dna_record.seq, dna_record.seq.reverse_complement()]

        # generate all translation frames
        aa_seqs = (s[i:].translate(to_stop=True) for i in range(3) for s in dna_seqs)

        # select the longest one
        max_aa = max(aa_seqs, key=len)

        # write new record
        aa_record = SeqRecord(max_aa, id=dna_record.id, description="translated sequence")
        SeqIO.write(aa_record, aa_fa, 'fasta')

<p>Note that the genomes below are only portions fo the transcribed genome<p>

In [11]:
japanGenome = "KVYTFPGNKPTNFRSLVDLFSKRTLKSVWLSLGCMLSALTQYN"
wuhanGenome = "LKVYTFPGNKPTNFRSLVDLFSKRTLKSVWLSLGCMLSALTQYN"
alignments = pairwise2.align.globalxx(japanGenome, wuhanGenome)

In [12]:
alignments

[('-KVYTFPGNKPTNFRSLVDLFSKRTLKSVWLSLGCMLSALTQYN',
  'LKVYTFPGNKPTNFRSLVDLFSKRTLKSVWLSLGCMLSALTQYN',
  43.0,
  0,
  44)]

In [13]:
for alignment in alignments: 
    print(alignment)

('-KVYTFPGNKPTNFRSLVDLFSKRTLKSVWLSLGCMLSALTQYN', 'LKVYTFPGNKPTNFRSLVDLFSKRTLKSVWLSLGCMLSALTQYN', 43.0, 0, 44)
