# PCR Primer design
So far we have used Snapgene to design our primers. However in order to explore open-source alternatives this notebook will use `Biopython` in conjunction with Primer3
## Installation
First we need to install the python wrapper `primer3-py`

In [1]:
!pip install primer3-py



## Import
We can now import it along with other libraries we need

In [1]:
from Bio import SeqIO
import primer3
import os
os.chdir("..")

## Designing our first primer

In [3]:
dna_seq = str("AGCGCACATCTCTAATCTTTGATTTACGTATAATCGTTACACAACCATCCCCCTAATTCCTCTACAGAGGCGCGTCCTGCTGCCCCTTCCCATGGCACATGCTTCCAACTTACGATTTCTCACGTCCCACTCGTCCCAGGTTTCGCCTTTTCGGATTTCGTCCGTGTTGTGTTTTGTCTTCTTCCGTTGTCTTCCTCCGTTGTCTTCGATTTGATTCCCCACACCCGACTGCCTATCATTGTTCTTGCTTCGGATTCCTGGGCCACCTCGTTGAACTGGTGCTACTCAATAATTTATCGTAATTTACCATACGTTATCGATACATACTCCCCCCCATTCCGTTGTATTCCTTCGCTGTAGCTTTTGAGACCACCACCACCACCCTAGCCTCTGTTGTATAGTTTTCGACTTCGTCCGCCCGCCGTTGTCCTTGTGAACCATTTTTGTTGCTATCGCACACATACATACCAAGTACACTGCTTCTTGCTATCCTCCCTACTAGCTTTCTACATGCATTGCTACTATACAGCCCTCTACAGCACTTGAAATGAAGTAATTTGAGCCCGGAGCATCCACGAAACCTACGCGAGGAAGAGGTTCTTTAGGCGCTGGTCTGCATTTTCGCCCATACTCAGTCCCAAATTCCAGGGAACGAGGCCAGTAATACTACTGTATTGATTGTCGTTTTACAGCCACTGTCACTGTCATTCAATGTCAACGAACGGCGTCCAGACTTGTCAAGCATCGTGCCAACGATTTGTTTCCGCGTTCACAGCATCCGCTCCAACATCTCCCTATTTTCGACGTACAAATCAAACCCTAGTTGACAGCAAACCTAACGGTAGTGTTGTTGCGAGGTGGAGGTCCCTCAAGGTGGAGTTACCTTGAGGTCCCGTATTTGTGATGGTGGCGTTCTCAAGGTCCCTTGAGGTGGCCTTGATGTGTTTTCCCATCCCTGTAACGGAATCTCCGGGACGGTGATGTGGCAGCGAGTGCGAGC")

primers = primer3.bindings.design_primers({'SEQUENCE_TEMPLATE': dna_seq,}, {'PRIMER_OPT_SIZE': 20,'PRIMER_MIN_SIZE': 18, 'PRIMER_MAX_SIZE': 25, 'PRIMER_OPT_TM': 60.0, 'PRIMER_MIN_TM': 57.0, 'PRIMER_MAX_TM': 63.0, 'PRIMER_PAIR_MAX_DIFF_TM': 3.0, 'PRIMER_PRODUCT_SIZE_RANGE': [[900, 1100]],'SEQUENCE_INCLUDED_REGION': [0, len(dna_seq)],'SEQUENCE_PRIMER_PAIR_OK_REGION_LIST': [[0, 25, len(dna_seq)-25, 25]]})

# Print the first primer pair
print("Forward primer:", primers['PRIMER_LEFT_0_SEQUENCE'])
print("Reverse primer:", primers['PRIMER_RIGHT_0_SEQUENCE'])
print("Product size:", primers['PRIMER_PAIR_0_PRODUCT_SIZE'])

Forward primer: AGCGCACATCTCTAATCTTTGA
Reverse primer: ACTCGCTGCCACATCACC
Product size: 994


As we can see the primer is slightly shorter than the sequence due to primer3's optimization even when specifying strict positions with the last argument. In this case we choose to ignore it as it will likely not be a problem however if an important motif is left out the promoter will not work very well. To solve this one could screen the input sequence for restriction sites and choose a non-cutter restriction sequence to append to the primer and include a wider piece of the genome to ensure the entire upstream region is extracted.
## Designing primers for genomic extraction of upstream sequences

In [8]:
from src.utils import design_primers

pth = "data/primer_design/promoters_for_extraction.fasta"

design_primers(pth)

Record: PKG1_promoter
up primer: CGTGATTTCCTCTGCCTCGT
down primer: ACGGAATGGTTATCGCCCTT
product size: 935
primr melting temp:  60.108614558536146 59.45530340378093

Record: ADH1_promoter
up primer: CAGGGTCAAGCAGAGCAGAA
down primer: GGCACAGGGAGGTGTAAGTC
product size: 947
primr melting temp:  59.96411926156776 60.0362939165052

Record: TDH3_promoter
up primer: CGTTCCCACTTTGGACGTGT
down primer: GAGAGAAAAGCGCAGTTGGC
product size: 944
primr melting temp:  60.812412072024244 60.11000999450579

Record: ACT_promoter
up primer: TCGAGAAGAGAGGTAGGCGG
down primer: TGATAGAGCTGTAGGGCGGG
product size: 950
primr melting temp:  60.464013016904346 60.827924519505984

Record: TEF1_promoter
up primer: GGGCGAGTGTCCATTCATGA
down primer: CATCTCGCACCAGTGGATGA
product size: 926
primr melting temp:  60.10770297561146 59.82355063247462

Record: DED1_promoter
up primer: CCACAGATGCAAACGCAACA
down primer: ATCGCTGTGGATATCGGTGG
product size: 933
primr melting temp:  59.96949614280919 59.681559465123314

Record: gpdA_

We'll also generate primers for extraction of the downstream sequences used as terminators:

In [9]:
pth = "data/primer_design/terminators_for_extraction.fasta"

design_primers(pth)

Record: PKG1_terminator
up primer: TGATGATTATTAGTGAGAGCGTGGA
down primer: GCCCGATACCTCCAAGGAAA
product size: 951
primr melting temp:  59.69819491103988 59.45530340378093

Record: ADH1_terminator
up primer: TGTGACAGATGACGGACACA
down primer: AAGGCTTGGTTTACGACGGT
product size: 931
primr melting temp:  58.96329264587649 59.89273163212596

Record: TDH3_terminator
up primer: TCGTTACACAACCATCCCCC
down primer: ACACATCAAGGCCACCTCAA
product size: 912
primr melting temp:  59.67439756087623 59.52192805017313

Record: ACT_terminator
up primer: ACGACCTCCTTACGACCCTT
down primer: TCAACCAAGCCACAAGTCGT
product size: 992
primr melting temp:  60.25151896976945 60.107410468332944

Record: TEF1_terminator
up primer: TCACCACCTCGTTCTCGTTT
down primer: TGACTAGGCTGCCTTTGACC
product size: 942
primr melting temp:  59.251549596098016 59.67505708032144

Record: DED1_terminator
up primer: GCCCCTTCTCTTTTCGACGA
down primer: ATGTGATGCCAACATGCTGC
product size: 938
primr melting temp:  60.0376045227884 59.825155079357785

### A very specific function for generating a table in our report

In [10]:
from src.utils import design_primers_latex_table
#promoters
design_primers_latex_table("data/primer_design/promoters_for_extraction.fasta")

\textit{PKG1_promoter} & Up & 60 & 5'-CGTGATTTCCTCTGCCTCGT-3' \\
& Down & 59 & 5'-ACGGAATGGTTATCGCCCTT-3' \\
\textit{ADH1_promoter} & Up & 60 & 5'-CAGGGTCAAGCAGAGCAGAA-3' \\
& Down & 60 & 5'-GGCACAGGGAGGTGTAAGTC-3' \\
\textit{TDH3_promoter} & Up & 61 & 5'-CGTTCCCACTTTGGACGTGT-3' \\
& Down & 60 & 5'-GAGAGAAAAGCGCAGTTGGC-3' \\
\textit{ACT_promoter} & Up & 60 & 5'-TCGAGAAGAGAGGTAGGCGG-3' \\
& Down & 61 & 5'-TGATAGAGCTGTAGGGCGGG-3' \\
\textit{TEF1_promoter} & Up & 60 & 5'-GGGCGAGTGTCCATTCATGA-3' \\
& Down & 60 & 5'-CATCTCGCACCAGTGGATGA-3' \\
\textit{DED1_promoter} & Up & 60 & 5'-CCACAGATGCAAACGCAACA-3' \\
& Down & 60 & 5'-ATCGCTGTGGATATCGGTGG-3' \\
\textit{gpdA_promoter} & Up & 61 & 5'-CGTTCCCACTTTGGACGTGT-3' \\
& Down & 60 & 5'-GAGAGAAAAGCGCAGTTGGC-3' \\
\textit{pkiA_promoter} & Up & 60 & 5'-GAGGCAATGCTGGGTTTTCC-3' \\
& Down & 60 & 5'-GTGTCCCTTTAAGTGGCGGA-3' \\
\textit{mdhA_promoter} & Up & 60 & 5'-CCAGTACCGCGATCCTTTGT-3' \\
& Down & 60 & 5'-GAAGGTGGTGGTTGTGGAGA-3' \\


In [11]:
#terminators
pth = "data/primer_design/terminators_for_extraction.fasta"
design_primers_latex_table(pth)

\textit{PKG1_terminator} & Up & 60 & 5'-TGATGATTATTAGTGAGAGCGTGGA-3' \\
& Down & 59 & 5'-GCCCGATACCTCCAAGGAAA-3' \\
\textit{ADH1_terminator} & Up & 59 & 5'-TGTGACAGATGACGGACACA-3' \\
& Down & 60 & 5'-AAGGCTTGGTTTACGACGGT-3' \\
\textit{TDH3_terminator} & Up & 60 & 5'-TCGTTACACAACCATCCCCC-3' \\
& Down & 60 & 5'-ACACATCAAGGCCACCTCAA-3' \\
\textit{ACT_terminator} & Up & 60 & 5'-ACGACCTCCTTACGACCCTT-3' \\
& Down & 60 & 5'-TCAACCAAGCCACAAGTCGT-3' \\
\textit{TEF1_terminator} & Up & 59 & 5'-TCACCACCTCGTTCTCGTTT-3' \\
& Down & 60 & 5'-TGACTAGGCTGCCTTTGACC-3' \\
\textit{DED1_terminator} & Up & 60 & 5'-GCCCCTTCTCTTTTCGACGA-3' \\
& Down & 60 & 5'-ATGTGATGCCAACATGCTGC-3' \\
\textit{gpdA_terminator} & Up & 61 & 5'-CGTTCCCACTTTGGACGTGT-3' \\
& Down & 60 & 5'-GAGAGAAAAGCGCAGTTGGC-3' \\
\textit{pkiA_terminator} & Up & 60 & 5'-GCTTTCGCCATTCTACTCGC-3' \\
& Down & 60 & 5'-CCCTTGCCTGTCTATCGACC-3' \\
\textit{mdhA_terminator} & Up & 60 & 5'-GGCGGGTGGTTAGATGGTAG-3' \\
& Down & 60 & 5'-CCGATTTACCTCTCCCAGCG-3' 

## Generating primers for PCR validation of construct
To validate the inserts with PCR, we need primers annealing to known sequences on each promoter and terminator. This way, we can easily verify whether the assembly was successfull by combining the products with restriction enzymes and running on gel. We can then assess whether the observed lengths match our expectations. 
### Full construct validation
Each promoter and terminator needs its own primer. We design the primers so they anneal at different lengths from the end of promoter. This allows us to verify which promoter or terminator was inserted by checking the length of the resulting bands. Furthermore we need both forward and reverse primers at each CDS and a forward primer on the upstream PABA and a reverse primer on the downstream PABA. With this assembly we are able to verify each step in the final assembly. Please see the figure below:

![alt text](<Layer 1.png>)

Note that we will also need reverse primers on promoters and forward primers on terminators to verify assembly of the PABA-ends. Also note that on the figure the CDS primers are shown near the middle. In reality they are placed near the end to keep the resulting products small to make the difference distinguisable.
### Primers for promoters
Each promoter needs a forward promoter annealing at varying positions. The reverse promoter does not need to vary in position as the promoter can be identified via the product between promoter and CDS.

In [4]:
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

def create_end_primers(fasta_path, out_fasta=None, primer_len=30, first_offset=20, step=30, rc=False):
    """
    For each record in fasta_path, generate primers of length `primer_len`.
    The first primer ends `first_offset` bases upstream of the sequence end,
    subsequent primers move upstream by `step` (offsets = first_offset, first_offset+step, ...).
    If rc=True, primers are returned as reverse complements (useful if you want primers that bind the
    reverse strand so their 5'->3' sequence is shown).
    Writes primers to out_fasta if provided (FASTA), and returns a dict mapping record.id -> list of (offset, start, end, seq).
    """
    primers_by_record = {}
    for rec in SeqIO.parse(fasta_path, "fasta"):
        seq_len = len(rec.seq)
        primers = []
        offset = first_offset
        while True:
            start = seq_len - offset - primer_len  # primer spans [start, start+primer_len)
            end = start + primer_len
            if start < 0:
                break
            primer_seq = rec.seq[start:end]
            if rc:
                primer_seq = primer_seq.reverse_complement()
            primer_seq = str(primer_seq)
            primers.append((offset, int(start), int(end), primer_seq))
            offset += step
        primers_by_record[rec.id] = primers

    # optionally write to fasta
    if out_fasta:
        out_records = []
        for rec_id, plist in primers_by_record.items():
            for offset, start, end, pseq in plist:
                rid = f"{rec_id}_off{offset}"
                out_records.append(SeqRecord(Seq(pseq), id=rid, description=f"start={start};end={end};offset={offset}"))
        out_dir = os.path.dirname(out_fasta)
        if out_dir:
            os.makedirs(out_dir, exist_ok=True)
        SeqIO.write(out_records, out_fasta, "fasta")

    return primers_by_record

# Example usage:
pth = "data\promoter_terminator_library\promoter_library.fasta"
out = "data/primer_design/promoters_end_primers.fasta"
primers = create_end_primers(pth, out_fasta=out, primer_len=30, first_offset=20, step=30, rc=False)

# Print a quick summary
for rid, plist in primers.items():
    print(f"{rid}: {len(plist)} primers")
    for offset, start, end, seq in plist:
        print(f"  offset={offset:3d} start={start:5d} end={end:5d} seq={seq}")
    print()

PKG1_promoter: 32 primers
  offset= 20 start=  950 end=  980 seq=GCGATAACCATTCCGTTCCTTCACGATCCA
  offset= 50 start=  920 end=  950 seq=GAGATTGTCTATGGGATTCGGTATAAAAGG
  offset= 80 start=  890 end=  920 seq=AGGCAACCTGATTTGGGATGACACTGATGA
  offset=110 start=  860 end=  890 seq=GATTGGTGGATAGCCCCTAACCTTAGGTAA
  offset=140 start=  830 end=  860 seq=GGACAACGATGAGCAGAAAGGATCGTTGGT
  offset=170 start=  800 end=  830 seq=GAGAGAGAGCGAGGGGATGGTGGTCGTTGT
  offset=200 start=  770 end=  800 seq=AACCTGAGGAGGGGTCGGTTCTGCCATCTT
  offset=230 start=  740 end=  770 seq=TTGTTGGAAGTCAAGGCCGAGAGTATTCGA
  offset=260 start=  710 end=  740 seq=AGCTGGTTCTTCGTTGTCCTTGATGCTGAG
  offset=290 start=  680 end=  710 seq=AGGACTGGGTTGGGAAGGATCCACTTCTGG
  offset=320 start=  650 end=  680 seq=TTCTGTTTCTGTGGATTCAGGCTGGGTCGA
  offset=350 start=  620 end=  650 seq=ATTGACGTATGGCTTTGGTTTACCTCTCTT
  offset=380 start=  590 end=  620 seq=AGCGGCACCTCCAGTTTTGACGCGGTCGGG
  offset=410 start=  560 end=  590 seq=CAAGAGGTTGTATGTAGGTCGACTT