# Primer design

Optimal primer design for Bsa1 involves several key considerations for Golden Gate cloning or other PCR applications. 
Primers should be 18-24 base pairs long, 
have a GC content of 40-60%
end in G or C for a "GC clamp" to enhance binding. 

Avoiding secondary structures and primer dimers is also crucial. Specific primers for a particular application (e.g., Golden Gate cloning with BsaI) will also need to incorporate the BsaI recognition sequence. 

#### Primer Design Guidelines:
Length: 18-24 base pairs. 
GC Content: Aim for 40-60%, with 50-55% being ideal.   
Tm (Melting Temperature): Strive for a Tm between 50 and 65°C.   
GC Clamp: Consider ending the 3' end of the primer with a G or C.   
Avoid Repeats: Minimize long runs of Gs or Cs.   
Secondary Structure: Design primers to avoid regions of high secondary structure in the target sequence.   
Primer Dimers: Ensure primers do not form dimers, which can interfere with PCR.   


CGCCGCGGCCGC-GGTCTC-C-NNNNNNNNNNN-C-GGTCTC-CGCGGGGCGGCG

In [3]:
! pip install pyswarms

In [2]:
# Designing primers for BSA1 
from oligopoolio import *


bsa1_site = 'GGTCTC'

primer_min, primer_max = 18, 24
gc_optimal = 0.5
tm_optimal = 0.65
structure_dg = 0 # aim for greater than 0

# Probably best to use something like pyswarm 


In [5]:
import pyswarms as ps
import numpy as np
import itertools
from primer3 import calc_hairpin, calc_homodimer

nt_to_index = {'A': 0, 'C': 1, 'G': 2, 'T': 3}
index_to_nt = {v: k for k, v in nt_to_index.items()}

def encode_primer(seq):
    return np.array([nt_to_index[nt] for nt in seq])

def decode_primer(arr):
    rounded = np.clip(np.round(arr), 0, 3).astype(int)
    return ''.join(index_to_nt[i] for i in rounded)

def gc_content(seq):
    return (seq.count('G') + seq.count('C')) / len(seq)

def melting_temp(seq):
    # Basic Wallace rule
    return 2 * (seq.count('A') + seq.count('T')) + 4 * (seq.count('G') + seq.count('C'))

def objective_function(x):
    penalties = []
    for row in x:
        seq = decode_primer(row)
        gc = gc_content(seq)
        tm = melting_temp(seq)

        results = check_secondary_structure(primer_overlap)
        homodimer_tm = results['homodimer']['homodimer_dg']
        hairpin_tm = results['hairpin']['hairpin_dg']
        
        # Penalize if Tm deviates from 60
        primer_tm = primer3.bindings.calcTm(seq)
        tm_penalty = (primer_tm - 60)**2
        
        # Penalize if GC content is far from 0.5
        gc_penalty = (gc - 0.5)**2
        
        # Example: simple penalty for homopolymer runs
        max_homopolymer = max(len(list(g)) for _, g in itertools.groupby(seq))
        homo_penalty = 10 * max(0, max_homopolymer - 4)
        
        penalties.append(tm_penalty + gc_penalty + homo_penalty)
    return np.array(penalties)

In [6]:
import pyswarms as ps

primer_len = 20
options = {'c1': 1.5, 'c2': 1.5, 'w': 0.7}
bounds = (np.zeros(primer_len), np.ones(primer_len) * 3)

optimizer = ps.single.GlobalBestPSO(
    n_particles=50,
    dimensions=primer_len,
    options=options,
    bounds=bounds
)

best_cost, best_pos = optimizer.optimize(objective_function, iters=200)
print("Best primer:", decode_primer(best_pos))