# Taking a look at the source code of BioNTech's SARS-CoV-2 Vaccine

<img src="https://imgs.xkcd.com/comics/coronavirus_genome.png" alt="Coronavirus Genome - xkcd" style="width: 600px;"/>

### Objectives of this presentation/notebook

- Help you understand what the mRNA vaccine actually is
- Walk through the source code of the vaccine and discuss how it works (and drawing some parallels to software)
- Convince you that biochemists are biological script kiddies (in the coolest way possible)

#### Fantastic article that inspired/guided this notebook by Bert Hubert: 
[Reverse Engineering the source code of the BioNTech/Pfizer SARS-CoV-2 Vaccine](https://berthub.eu/articles/posts/reverse-engineering-source-code-of-the-biontech-pfizer-vaccine/)

## Some Background

#### DNA and RNA

- Chains of nucleotides that encode genetic information
- Nucleotides: Adenine (A), Cytosine (C), Guanine (G), and Thymine (T) in DNA or Uracil (U) in RNA.

<img src="https://knowgenetics.org/wp-content/uploads/2012/12/Bases-1-e1354322315291.png" alt="Nucleotides" style="width: 400px;"/>

- DNA/RNA are broken up into logical sections called genes
- Genes encode the synthesis of things like proteins

#### Proteins

- Fill many roles
    - catalyze metabolic reactions
    - respond to stimuli
    - provide structure to cells and organisms
    - transport molecules
    - replicate DNA
- Built from a sequence of amino acids
- The sequence of amino acids dictate a protein's 3d shape
- The physical structure of a protein determines its function

<img src="https://upload.wikimedia.org/wikipedia/commons/5/54/Protein_composite.png" alt="Proteins" style="width: 600px;"/>

*From left to right are: immunoglobulin G (IgG, an antibody), hemoglobin, insulin (a hormone), adenylate kinase (an enzyme), and glutamine synthetase (an enzyme).*


A Tangentially interesting area of study: Protein folding prediction.

> Check out [AlphaFold](https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology) by the DeepMind team from Google.

#### Viruses

- Viruses are biological entities (not-alive) that replicate inside the living cells of organisms
- Viruses consist of
    - Genetic material (RNA/DNA): How to create more of itself
    - The capsid: A protective coat (proteins/lipids)
    - Proteins: Carry out the virus's function

#### Adaptive Immune System

- Creates immunological memory after an initial response to a specific pathogen, and leads to an enhanced response to future encounters with that pathogen.
- Antigens are any molecules present on a pathogen
- Antibodies are proteins that are created after a previous encounter with a pathogen
- Antibodies bind to the specific antigen they were created to recognize, essentially tagging them

#### Vaccines

- Vaccines aim to teach your body to recognize some agent as a threat, and prepare for future encounters with them
- Inactivated and attenuated vaccines
    - Deliver "killed" or weakened pathogens
- mRNA vaccines
    - Work started in the early 1990s
    - Delivering genetic information to produce an immune response 


## Tozinameran
*The COVID-19 vaccine developed by BioNTech in cooperation with Pfizer*
- January 2020: Development began
- April 2020: Trials began 
- November: Tested on more than 40,000 people 
- December 2nd: UK is the first to issue emergency use authorization of Tozinameran
- December 11th: FDA issues emergency use authorization
- December 20th: Over half a million people in the UK had received the vaccine

## What's the goal of this BioNTech mRNA vaccine?

Teach your cells how to produce a characteristic protein of the virus, and trigger your adaptive immune system to recognize and prepare for when/if you are exposed to the actual virus.

## The Source Code

[SARS-CoV-2 Virus Genome](https://www.ncbi.nlm.nih.gov/nuccore/NC_045512)

[BioNTech mRNA Vaccine Genome](https://mednet-communities.net/inn/db/media/docs/11889.doc) *(.doc file download, be warned)*

In [1]:
from dna import *

import re
from IPython.display import display, Markdown

vaccine_genome_file = 'res/vaccine_genome.txt'
virus_spike_protein_file = 'res/virus_spike_protein_gene.txt'

In [2]:
display(Markdown('### Let\'s take a look at what this genome looks like'))

with open(vaccine_genome_file) as f:
    lines = f.readlines()
    for line in lines[:10]:
        print(line, end='')
    print('\n...\n')
    for line in lines[len(lines) - 10:]:
        print(line, end='')

### Let's take a look at what this genome looks like

GAGAAΨAAAC ΨAGΨAΨΨCΨΨ CΨGGΨCCCCA CAGACΨCAGA GAGAACCCGC   50
CACCAΨGΨΨC GΨGΨΨCCΨGG ΨGCΨGCΨGCC ΨCΨGGΨGΨCC AGCCAGΨGΨG  100
ΨGAACCΨGAC CACCAGAACA CAGCΨGCCΨC CAGCCΨACAC CAACAGCΨΨΨ  150
ACCAGAGGCG ΨGΨACΨACCC CGACAAGGΨG ΨΨCAGAΨCCA GCGΨGCΨGCA  200
CΨCΨACCCAG GACCΨGΨΨCC ΨGCCΨΨΨCΨΨ CAGCAACGΨG ACCΨGGΨΨCC  250
ACGCCAΨCCA CGΨGΨCCGGC ACCAAΨGGCA CCAAGAGAΨΨ CGACAACCCC  300
GΨGCΨGCCCΨ ΨCAACGACGG GGΨGΨACΨΨΨ GCCAGCACCG AGAAGΨCCAA  350
CAΨCAΨCAGA GGCΨGGAΨCΨ ΨCGGCACCAC ACΨGGACAGC AAGACCCAGA  400
GCCΨGCΨGAΨ CGΨGAACAAC GCCACCAACG ΨGGΨCAΨCAA AGΨGΨGCGAG  450
ΨΨCCAGΨΨCΨ GCAACGACCC CΨΨCCΨGGGC GΨCΨACΨACC ACAAGAACAA  500

...

CΨGΨGGCAGC ΨGCΨGCAAGΨ ΨCGACGAGGA CGAΨΨCΨGAG CCCGΨGCΨGA 3850
AGGGCGΨGAA ACΨGCACΨAC ACAΨGAΨGAC ΨCGAGCΨGGΨ ACΨGCAΨGCA 3900
CGCAAΨGCΨA GCΨGCCCCΨΨ ΨCCCGΨCCΨG GGΨACCCCGA GΨCΨCCCCCG 3950
ACCΨCGGGΨC CCAGGΨAΨGC ΨCCCACCΨCC ACCΨGCCCCA CΨCACCACCΨ 4000
CΨGCΨAGΨΨC CAGACACCΨC CCAAGCACGC AGCAAΨGCAG CΨCAAAACGC 4050
ΨΨAGCCΨAGC CACACCCCCA CGGGAAACAG CAGΨGAΨΨAA CCΨΨΨAGCAA 4100
ΨAAACGAAAG ΨΨΨAACΨAAG CΨAΨACΨAAC C

## The layout of the vaccine mRNA

<img src="res/imgs/mRNA_schematic.png" alt="mRNA Schematic" style="width: 400px;"/>
<img src="res/imgs/vaccine_table.png" alt="Vaccine Table of Features" style="width: 600px;"/>

This schematic is a more-or-less standard layout for mRNA.

When writing this code, the authors implemented numerous techniques to optimize its efficacy.

<b>Let's go through each section!</b>

***

In [13]:
vaccine_genome = Gene(from_file=vaccine_genome_file)

### The 5' Cap

A sequence of two nucleotides that are chemically different from the normal bases.

In [4]:
vaccine_genome.display(start=0, end=2)

[33;49mG[0m[32;49mA[0m



#### Functions include:

**Prevent degradation**
- Protects the mRNA from exonucleases in the cytoplasm
**Promoting translation**
- Convince the ribosomes to translate this mRNA into proteins
**Disguises the mRNA as coming from the nucleus**
- This cap marks the mRNA as if it were a normal and real sequence that was produced by the cell's nucleus


[Bert](https://berthub.eu/articles/posts/reverse-engineering-source-code-of-the-biontech-pfizer-vaccine/) compares it to the shebang (`#!`) in a UNIX scripts.

**TL;DR**: Makes the mRNA look like a real mRNA produced by the cell so the cell machinery will use it to create proteins

***

## The 5' Untranslated Region
Non-coding metadata that directs how the mRNA should be used.  

In [5]:
vaccine_genome.display(start=2, end=54, split_codons=False)

[33;49mG[0m[32;49mA[0m[32;49mA[0m[35;49mΨ[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[34;49mC[0m[35;49mΨ[0m[32;49mA[0m[33;49mG[0m[35;49mΨ[0m[32;49mA[0m[35;49mΨ[0m[35;49mΨ[0m[34;49mC[0m[35;49mΨ[0m[35;49mΨ[0m[34;49mC[0m[35;49mΨ[0m[33;49mG[0m[33;49mG[0m[35;49mΨ[0m[34;49mC[0m[34;49mC[0m[34;49mC[0m[34;49mC[0m[32;49mA[0m[34;49mC[0m[32;49mA[0m[33;49mG[0m[32;49mA[0m[34;49mC[0m[35;49mΨ[0m[34;49mC[0m[32;49mA[0m[33;49mG[0m[32;49mA[0m[33;49mG[0m[32;49mA[0m[33;49mG[0m[32;49mA[0m[32;49mA[0m[34;49mC[0m[34;49mC[0m[34;49mC[0m[33;49mG[0m[34;49mC[0m[34;49mC[0m[32;49mA[0m[34;49mC[0m[34;49mC[0m


RNA is read in the 5' (five prime) to 3' (three prime) direction.

There are untranslated regions (not coding for the amino acids that make up the protein) on either side of the coding sequence.

The UTR on the 5' side holds metadata that helps control when protein translation occurs and how much protein to synthesize.

<b>Geneticists are biological [script kiddies](https://en.wikipedia.org/wiki/Script_kiddie)!</b>

> an unskilled individual who relies heavily on third-party scripts or programs developed by others to attack computer systems and networks

<div class="alert alert-block alert-info">

##### Optimization technique!

Copy over the 5' UTR from a known sequence that produces a LOT of proteins.
</div>

BioNTech chose to use the 5' UTR of the human alpha globin gene for this vaccine.

[Optimization of mRNA untranslated regions for improved expression of therapeutic mRNA
](https://www.tandfonline.com/doi/full/10.1080/15476286.2018.1450054)

[Enhancing mRNA Stability through the Addition of Stabilizing Untranslated Regions](https://dspace.mit.edu/bitstream/handle/1721.1/68694/773197160-MIT.pdf?sequence=2)

***

I thought that the nucleotides consisted of `A, T, G, U`, what is `Ψ`?

> `Ψ` is pseudouridine: 1-methyl-3'-pseudouridylyl

<div class="alert alert-block alert-info">

##### Optimization technique!

Replace uracils (U) in the mRNA with a modified molecule that functions just like uracil, but suppresses the immune system's interest in the mRNA.
</div>

Karikó and Weissman, 2005: [Suppression of RNA recognition by Toll-like receptors: the impact of nucleoside modification and the evolutionary origin of RNA](https://pubmed.ncbi.nlm.nih.gov/16111635/)
> Groundbreaking paper that showed that using modified nucleosides suppresses the immune system's response to mRNA

2020: [N 1-Methylpseudouridine substitution enhances the performance of synthetic mRNA switches in cells](https://pubmed.ncbi.nlm.nih.gov/32090264/)
> Synthetic messenger RNA (mRNA) tools often use pseudouridine and 5-methyl cytidine as substitutions for uridine and cytidine to avoid the immune response and cytotoxicity induced by introducing mRNA into cells.

> Here we show that synthetic mRNA switches containing N1-methylpseudouridine (m1Ψ) as a substitution of uridine substantially out-performed all other modified bases studied, exhibiting enhanced microRNA and protein sensitivity, better cell-type separation ability, and comparably low immune stimulation.

***

## Onto the Coding Sequence

The coding sequence is the portion of the mRNA that is translated into protein.


### Signal Peptide
The first part of the coding sequence is the signal peptide

In [6]:
vaccine_genome.display(start=54, end=102, show_aminos=True)

 M   F   V   F   L   V   L   L   P   L   V   S   S   Q   C   V 
[32;49mA[0m[35;49mΨ[0m[33;49mG[0m [35;49mΨ[0m[35;49mΨ[0m[34;49mC[0m [33;49mG[0m[35;49mΨ[0m[33;49mG[0m [35;49mΨ[0m[35;49mΨ[0m[34;49mC[0m [34;49mC[0m[35;49mΨ[0m[33;49mG[0m [33;49mG[0m[35;49mΨ[0m[33;49mG[0m [34;49mC[0m[35;49mΨ[0m[33;49mG[0m [34;49mC[0m[35;49mΨ[0m[33;49mG[0m [34;49mC[0m[34;49mC[0m[35;49mΨ[0m [34;49mC[0m[35;49mΨ[0m[33;49mG[0m [33;49mG[0m[35;49mΨ[0m[33;49mG[0m [35;49mΨ[0m[34;49mC[0m[34;49mC[0m [32;49mA[0m[33;49mG[0m[34;49mC[0m [34;49mC[0m[32;49mA[0m[33;49mG[0m [35;49mΨ[0m[33;49mG[0m[35;49mΨ[0m [33;49mG[0m[35;49mΨ[0m[33;49mG[0m


The signal sequence is "metadata" that tell the cell where the resulting protein should be delivered after it's constructed.
\[[Ref](https://en.wikipedia.org/wiki/Signal_peptide#Function_(translocation))\]

The signal sequence in this vaccine is "identical" to the signal sequence in the actual virus gene.

So what protein are we actually constructing here?

### The Spike Protein
<img src="https://images.theconversation.com/files/373899/original/file-20201209-19-1lujbpa.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=595&fit=crop&dpr=1" alt="What the virus looks like" style="width: 300px;"/>


#### The spike protein facilitates viral entry.

<img src="https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41577-020-00480-0/MediaObjects/41577_2020_480_Fig1_HTML.png?as=webp" alt="Fusion" style="width: 500px;"/>

The spike protein binds to a protein called [ACE2](https://en.wikipedia.org/wiki/Angiotensin-converting_enzyme_2) on the surfaces of our cells, transforming, and then pulling the virus into the cell.

**Research has shown that targeting the spike protein is most effective for vaccines.**

[Viral targets for vaccines against COVID-19](https://www.nature.com/articles/s41577-020-00480-0)



In [7]:
vaccine_spike = Gene(vaccine_genome.sequence[54:3879])
vaccine_spike.display(end=60)

[32;49mA[0m[35;49mΨ[0m[33;49mG[0m [35;49mΨ[0m[35;49mΨ[0m[34;49mC[0m [33;49mG[0m[35;49mΨ[0m[33;49mG[0m [35;49mΨ[0m[35;49mΨ[0m[34;49mC[0m [34;49mC[0m[35;49mΨ[0m[33;49mG[0m [33;49mG[0m[35;49mΨ[0m[33;49mG[0m [34;49mC[0m[35;49mΨ[0m[33;49mG[0m [34;49mC[0m[35;49mΨ[0m[33;49mG[0m [34;49mC[0m[34;49mC[0m[35;49mΨ[0m [34;49mC[0m[35;49mΨ[0m[33;49mG[0m [33;49mG[0m[35;49mΨ[0m[33;49mG[0m [35;49mΨ[0m[34;49mC[0m[34;49mC[0m [32;49mA[0m[33;49mG[0m[34;49mC[0m [34;49mC[0m[32;49mA[0m[33;49mG[0m [35;49mΨ[0m[33;49mG[0m[35;49mΨ[0m [33;49mG[0m[35;49mΨ[0m[33;49mG[0m [32;49mA[0m[32;49mA[0m[34;49mC[0m [34;49mC[0m[35;49mΨ[0m[33;49mG[0m [32;49mA[0m[34;49mC[0m[34;49mC[0m [32;49mA[0m[34;49mC[0m[34;49mC[0m


In [8]:
virus_spike = Gene(from_file=virus_spike_protein_file)
virus_spike.display(end=60)

[32;49mA[0m[35;49mT[0m[33;49mG[0m [35;49mT[0m[35;49mT[0m[35;49mT[0m [33;49mG[0m[35;49mT[0m[35;49mT[0m [35;49mT[0m[35;49mT[0m[35;49mT[0m [34;49mC[0m[35;49mT[0m[35;49mT[0m [33;49mG[0m[35;49mT[0m[35;49mT[0m [35;49mT[0m[35;49mT[0m[32;49mA[0m [35;49mT[0m[35;49mT[0m[33;49mG[0m [34;49mC[0m[34;49mC[0m[32;49mA[0m [34;49mC[0m[35;49mT[0m[32;49mA[0m [33;49mG[0m[35;49mT[0m[34;49mC[0m [35;49mT[0m[34;49mC[0m[35;49mT[0m [32;49mA[0m[33;49mG[0m[35;49mT[0m [34;49mC[0m[32;49mA[0m[33;49mG[0m [35;49mT[0m[33;49mG[0m[35;49mT[0m [33;49mG[0m[35;49mT[0m[35;49mT[0m [32;49mA[0m[32;49mA[0m[35;49mT[0m [34;49mC[0m[35;49mT[0m[35;49mT[0m [32;49mA[0m[34;49mC[0m[32;49mA[0m [32;49mA[0m[34;49mC[0m[34;49mC[0m


### Let's compare the spike protein between the virus and the vaccine genes

In [9]:
# Head
print('First codons:')
virus_spike.visual_compare(vaccine_spike, end=66)

# Tail
print('Last codons:')
virus_spike.visual_compare(vaccine_spike, start=3756)

First codons:
Comparing sequences between offsets [0:66]
 M   F   V   F   L   V   L   L   P   L   V   S   S   Q   C   V   N   L   T   T   R   T 
[32;49mA[0m[35;49mU[0m[33;49mG[0m [35;49mU[0m[35;49mU[0m[35;1;4;45;39mU[0m [33;49mG[0m[35;49mU[0m[35;1;4;45;39mU[0m [35;49mU[0m[35;49mU[0m[35;1;4;45;39mU[0m [34;49mC[0m[35;49mU[0m[35;1;4;45;39mU[0m [33;49mG[0m[35;49mU[0m[35;1;4;45;39mU[0m [35;1;4;45;39mU[0m[35;49mU[0m[32;1;4;42;39mA[0m [35;1;4;45;39mU[0m[35;49mU[0m[33;49mG[0m [34;49mC[0m[34;49mC[0m[32;1;4;42;39mA[0m [34;49mC[0m[35;49mU[0m[32;1;4;42;39mA[0m [33;49mG[0m[35;49mU[0m[34;1;4;44;39mC[0m [35;49mU[0m[34;49mC[0m[35;1;4;45;39mU[0m [32;49mA[0m[33;49mG[0m[35;1;4;45;39mU[0m [34;49mC[0m[32;49mA[0m[33;49mG[0m [35;49mU[0m[33;49mG[0m[35;49mU[0m [33;49mG[0m[35;49mU[0m[35;1;4;45;39mU[0m [32;49mA[0m[32;49mA[0m[35;1;4;45;39mU[0m [34;49mC[0m[35;49mU[0m[35;1;4;45;39mU[0m [32;49mA[0m[34;49mC

### What do we notice?

- Some bases are different
- The codons map to the same amino acids

#### What bases were changed?

> The vaccine gene has many more `C`s and `G`s

<div class="alert alert-block alert-info">

##### Optimization technique!

Sequences with more guanines (G) and cytosines (C) result in more productive protein synthesis.
</div>

2018: [Optimization of mRNA translation and stability](https://www.nature.com/articles/nrd.2017.243)
> Enrichment of G:C content constitutes another form of sequence optimization that has been shown to increase steady-state mRNA levels

2006: [High Guanine and Cytosine Content Increases mRNA Levels in Mammalian Cells](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1463026/)
> We performed transient and stable transfections of mammalian cells with GC-rich and GC-poor versions of Hsp70, green fluorescent protein, and IL2 genes. The GC-rich genes were expressed several-fold to over a 100-fold more efficiently than their GC-poor counterparts.


### Let's look at the entire gene comparison

In [10]:
virus_spike.visual_compare(vaccine_spike)

Comparing sequences between offsets [0:3822]
 M   F   V   F   L   V   L   L   P   L   V   S   S   Q   C   V   N   L   T   T   R   T   Q   L 
[32;49mA[0m[35;49mU[0m[33;49mG[0m [35;49mU[0m[35;49mU[0m[35;1;4;45;39mU[0m [33;49mG[0m[35;49mU[0m[35;1;4;45;39mU[0m [35;49mU[0m[35;49mU[0m[35;1;4;45;39mU[0m [34;49mC[0m[35;49mU[0m[35;1;4;45;39mU[0m [33;49mG[0m[35;49mU[0m[35;1;4;45;39mU[0m [35;1;4;45;39mU[0m[35;49mU[0m[32;1;4;42;39mA[0m [35;1;4;45;39mU[0m[35;49mU[0m[33;49mG[0m [34;49mC[0m[34;49mC[0m[32;1;4;42;39mA[0m [34;49mC[0m[35;49mU[0m[32;1;4;42;39mA[0m [33;49mG[0m[35;49mU[0m[34;1;4;44;39mC[0m [35;49mU[0m[34;49mC[0m[35;1;4;45;39mU[0m [32;49mA[0m[33;49mG[0m[35;1;4;45;39mU[0m [34;49mC[0m[32;49mA[0m[33;49mG[0m [35;49mU[0m[33;49mG[0m[35;49mU[0m [33;49mG[0m[35;49mU[0m[35;1;4;45;39mU[0m [32;49mA[0m[32;49mA[0m[35;1;4;45;39mU[0m [34;49mC[0m[35;49mU[0m[35;1;4;45;39mU[0m [32;49mA[0m[34;49mC[0m

### Let's zoom in on those changed amino acids

In [11]:
virus_spike.visual_compare(vaccine_spike, start=2949, end=2967)

Comparing sequences between offsets [2949:2967]
 L   D  [1;4;7m K [0m [1;4;7m V [0m  E   A 
[34;49mC[0m[35;49mU[0m[35;1;4;45;39mU[0m [33;49mG[0m[32;49mA[0m[34;49mC[0m [32;1;4;42;39mA[0m[32;1;4;42;39mA[0m[32;1;4;42;39mA[0m [33;1;4;43;39mG[0m[35;1;4;45;39mU[0m[35;49mU[0m [33;49mG[0m[32;49mA[0m[33;49mG[0m [33;49mG[0m[34;49mC[0m[35;1;4;45;39mU[0m
[34;49mC[0m[35;49mU[0m[33;1;4;43;39mG[0m [33;49mG[0m[32;49mA[0m[34;49mC[0m [34;1;4;44;39mC[0m[34;1;4;44;39mC[0m[35;1;4;45;39mU[0m [34;1;4;44;39mC[0m[34;1;4;44;39mC[0m[35;49mU[0m [33;49mG[0m[32;49mA[0m[33;49mG[0m [33;49mG[0m[34;49mC[0m[34;1;4;44;39mC[0m
 L   D  [1;4;7m P [0m [1;4;7m P [0m  E   A 




This segment shows the only amino acid changes from the virus, the **Lysine (K)** and **Valine (V)** were changed to two **Prolines (P)**.

<img src="https://acs-h.assetsadobe.com/is/image//content/dam/cen/98/38/WEB/09838-feature1-spike.jpg/?$responsive$&wid=700&qlt=90,0&resMode=sharp2" alt="Prefusion and Postfusion" style="width: 200px;"/>

<div class="alert alert-block alert-info">

##### Optimization technique!

Keep the protein from transforming by substituting a pair of bases with two prolines.
</div>

Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation
https://science.sciencemag.org/content/367/6483/1260

Distinct conformational states of SARS-CoV-2 spike protein
https://science.sciencemag.org/content/369/6511/1586

2017: [Immunogenicity and structures of a rationally designed prefusion MERS-CoV spike antigen](https://www.pnas.org/content/114/35/E7348)
> Thus, the introduction of two consecutive proline residues at the beginning of the central helix seems to be a general strategy for retaining betacoronavirus S proteins in the prototypical prefusion conformation.

***

### 3' UTR

In [16]:
vaccine_genome.display(start=3880, end=4174, split_codons=False)

[35;49mΨ[0m[34;49mC[0m[33;49mG[0m[32;49mA[0m[33;49mG[0m[34;49mC[0m[35;49mΨ[0m[33;49mG[0m[33;49mG[0m[35;49mΨ[0m[32;49mA[0m[34;49mC[0m[35;49mΨ[0m[33;49mG[0m[34;49mC[0m[32;49mA[0m[35;49mΨ[0m[33;49mG[0m[34;49mC[0m[32;49mA[0m[34;49mC[0m[33;49mG[0m[34;49mC[0m[32;49mA[0m[32;49mA[0m[35;49mΨ[0m[33;49mG[0m[34;49mC[0m[35;49mΨ[0m[32;49mA[0m[33;49mG[0m[34;49mC[0m[35;49mΨ[0m[33;49mG[0m[34;49mC[0m[34;49mC[0m[34;49mC[0m[34;49mC[0m[35;49mΨ[0m[35;49mΨ[0m[35;49mΨ[0m[34;49mC[0m[34;49mC[0m[34;49mC[0m[33;49mG[0m[35;49mΨ[0m[34;49mC[0m[34;49mC[0m[35;49mΨ[0m[33;49mG[0m[33;49mG[0m[33;49mG[0m[35;49mΨ[0m[32;49mA[0m[34;49mC[0m[34;49mC[0m[34;49mC[0m[34;49mC[0m[33;49mG[0m[32;49mA[0m[33;49mG[0m[35;49mΨ[0m[34;49mC[0m[35;49mΨ[0m[34;49mC[0m[34;49mC[0m[34;49mC[0m[34;49mC[0m[34;49mC[0m[33;49mG[0m[32;49mA[0m[34;49mC[0m[34;49mC[0m[35;49mΨ[0m[34;49mC[0m[33;49mG[0m[33;49mG[0

TODO

***

### Polyadenylation

In [12]:
vaccine_genome.display(start=4174, split_codons=False)

[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[33;49mG[0m[34;49mC[0m[32;49mA[0m[35;49mΨ[0m[32;49mA[0m[35;49mΨ[0m[33;49mG[0m[32;49mA[0m[34;49mC[0m[35;49mΨ[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0m[32;49mA[0

> The poly-A tail is a long chain of adenine nucleotides that is added to a messenger RNA (mRNA) molecule during RNA processing to increase the stability of the molecule. 

>  The poly-A tail makes the RNA molecule more stable and prevents its degradation. 

\[[Ref](https://www.nature.com/scitable/definition/poly-a-tail-276/)\]

The tail is shortened over time by exonucleases, and, when it is short enough, the mRNA is enzymatically degraded.


<b> What are the non-As in the poly(A)-tail? <b>

<div class="alert alert-block alert-info">

##### Optimization technique!

Adding a short "random" sequence in the poly-A tail increases vaccine stability.
</div>

##### 2014: BioNTech Patent

[Stabilization of poly(a) sequence encoding dna sequences](https://patents.google.com/patent/US20170166905A1/en)

> Introduction of a 10 nucleotide random sequence in this sensitive region led to an **increase of the poly(dA:dT) stability**. Constructs with 30 or 40 adenosine nucleotides, followed by the linker sequence and another 70 or 60 adenosines (A30L70 and A40L60) respectively, resulted in an poly(dA:dT) instability of only 3-4% in E. coli.

> A 10 nucleotide linker (L) was inserted in the poly(dA:dT) stretch in different positions of the poly(dA:dT) sequence. The linker sequence (GCATATGACT (SEQ ID NO: 2)) was chosen in a way to contain a **balanced contribution of all 4 nucleotides** (2×G, 2×C, 3×T and 3×A).

> Introduction of linker sequences in this sequence area led to a further increases of the poly(dA:dT) stability by at least **2-fold** as compared to the other constructs



## Conclusions

### Sections
<img src="res/imgs/mRNA_schematic.png" alt="mRNA Schematic" style="width: 400px;"/>

##### Cap
>Makes our synthesized mRNA look like real mRNA and protects it from mRNA predators.

##### 5' UTR
>Copied from another gene that is known to produce a lot of proteins.

##### Signal Peptide
>Encodes where to delivered the nascent proteins

##### Spike Protein
>Stolen from the virus itself, with the double-proline substitution

##### 3' UTR
>Copied from other genes that are known to produce a lot of proteins.

##### Poly-A Tail
>Extends the half-life of the mRNA

### Optimizations

- Copy over the 5' UTR from a known sequence that produces a LOT of proteins.
- Replace uracils (U) in the mRNA with a modified molecule that functions just like uracil, but suppresses the immune system's interest in the mRNA.
- Sequences with more guanines (G) and cytosines (C) result in more productive protein synthesis.
- Keep the protein from transforming by substituting a pair of bases with two prolines.
- Adding a short "random" sequence in the poly-A tail increases vaccine stability.


### Takeaways

- This stuff is really cool
- Looking in with a programmer's lens is elucidating
- There is so much more that bioengineers could do in the future
- mRNA vaccines basically exploit your bodies poor input sanitization

<img src="https://imgs.xkcd.com/comics/exploits_of_a_mom.png"
     alt="Exploits of a Mom - xkcd" style="width: 600px;"/>

## FAQ

Does the vaccine reprogram your cells or change your DNA?
> No, it provides a recipe that is used by your cell's machinery to create proteins characteristic of the virus so that your adaptive immune system can prepare for the virus.

Why does the vaccine have to teach your cell how to create the spike protein, can't we just deliver the protein itself?
> By just delivering the source, your cells can produce far more protein than could be delivered in an injection. Additionally, the protein itself isn't enough to trigger a significant immune response, whereas the mRNA hijacking the cells machinery to produce massive amounts of the protein is enough to do so.

What "editor" do the vaccine developers actually use? How do they go from an abstract sequence of letters to a piece of mRNA?
> They use "DNA printers" like the [Codex DNA BioXp 3200 DNA printer](https://codexdna.com/products/bioxp-system/). See [PCR](https://en.wikipedia.org/wiki/Polymerase_chain_reaction) for more.