<a href="https://colab.research.google.com/github/Reclone-org/DNA-Scripts/blob/main/Reclone_Syntax_Primer_Generator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🧬 Type IIS Primer Design Tool

This notebook generates forward and reverse primers for **Type IIS cloning** (BsaI, BbsI, SapI, BsmbI) using the **Reclone syntax standard**.  
It ensures the correct 5′ and 3′ overhangs are applied (including scar bases), and outputs a ready-to-order primer list.

---

## 📂 Input: CSV file
Upload a CSV file with the following column headers:

| Name        | Seq                     | 5' Syntax | 3' Syntax | Enzyme |
|-------------|-------------------------|-----------|-----------|--------|
| MyGene1     | ATGCGTACGATCGATCGT...   | A         | B         | BsaI   |
| MyGene2     | ATGAGTCCGATCGTACGA...   | N1        | C         | BbsI   |
| MyGene3     | ATGCGGCTTACGAGTACG...   | D         | N5        | SapI   |

You can generate a template in Step 1 below.

### Column details:
- **Name** → A short identifier for the DNA part (e.g. `GeneX`, `Promoter1`).  
- **Seq** → The full DNA sequence (≥ 20 bp). Non-ATGC characters will be ignored.  
- **5' Syntax** → The syntax tag for the **left end** of the part (e.g. `A`, `B`, `N1`, `N2`, `N5`).  
- **3' Syntax** → The syntax tag for the **right end** of the part.  
- **Enzyme** → Restriction enzyme used (`BsaI`, `BbsI`, `SapI`, `BsmbI`).  

⚠️ **Important**:  
- Sequences must be at least 20 bp to allow primer binding.  
- Syntax tags must match the Reclone standard (A–F, N1–N5, or N).  
- Overhangs differ for 5′ and 3′ ends (built in automatically).

---

## ⚙️ What the code does
1. Reads your CSV input.  
2. Looks up the correct **overhangs and scar bases** for each part.  
3. Adds the correct **restriction site, landing pad, spacer, and overhang** to the primers.  
4. Generates both **forward** and **reverse** primers for each part.  
5. Exports a final **CSV file** with all primers.

---

## 📤 Output
After running the notebook you’ll get:

- **`typeIIS_primers.csv`** → Downloadable file with two columns:  

| Name            | Seq                                              |
|-----------------|--------------------------------------------------|
| MyGene1-A-FWD   | ATGCGGGTCTCN...                                  |
| MyGene1-B-REV   | ATGCGGGTCTCN...                                  |
| MyGene2-N1-FWD  | ATGCGGGTCTCN...                                  |
| MyGene2-C-REV   | ATGCGGGTCTCN...                                  |

- Each DNA part generates **two primers**: one forward and one reverse.  
- Primer names follow the pattern:  
PartName-SyntaxTag-FWD
PartName-SyntaxTag-REV


---


✅ **You can copy-paste the output sequences directly into your primer ordering system.**


## Step 0: Install libraries and packages

In [None]:
# 🧬 Type IIS Primer Designer – Colab Version
!pip install biopython
import pandas as pd
from Bio.Seq import Seq
from google.colab import files
import io



## Step 1 [Optional]: Download a template csv file

In [None]:
# Generate a sample CSV file for users to download
import pandas as pd

example_data = [
    {"Name": "MyGene1", "Seq": "ATGCGTACGATCGATCGTACGATCGTAGCTAGCTAG", "5' Syntax": "A",  "3' Syntax": "B",  "Enzyme": "BsaI"},
    {"Name": "MyGene2", "Seq": "ATGAGTCCGATCGTACGATCGTACGCTAGCTAGCTA", "5' Syntax": "N1", "3' Syntax": "C",  "Enzyme": "BbsI"},
    {"Name": "MyGene3", "Seq": "ATGCGGCTTACGAGTACGCTAGCTAGCTAACGTCGA", "5' Syntax": "D",  "3' Syntax": "N5", "Enzyme": "SapI"},
]

example_df = pd.DataFrame(example_data)
example_filename = "example_input.csv"
example_df.to_csv(example_filename, index=False)

from google.colab import files
files.download(example_filename)

print(f"📥 Example CSV generated: {example_filename}")

## Step 2: Upload csv and run primer design pipeline

In [None]:
# Colab-ready script: upload a CSV with columns:
# Name, Seq, 5' Syntax, 3' Syntax, Enzyme
import io
import pandas as pd
from google.colab import files
from typing import Dict

# ---------- 1) Upload CSV ----------
print("📤 Please upload your input CSV with columns: Name, Seq, 5' Syntax, 3' Syntax, Enzyme")
uploaded = files.upload()
if not uploaded:
    raise RuntimeError("No file uploaded.")
for fn in uploaded:
    df = pd.read_csv(io.BytesIO(uploaded[fn]))
    break  # first file only

# ---------- 2) Overhang cores & end-specific specs ----------
RECLONE_CANON: Dict[str, str] = {
    "A":  "GGAG",
    "B":  "TACT",
    "N1": "CCAT",
    "N2": "GTCA",
    "N3": "TCCA",
    "C":  "AATG",
    "D":  "AGGT",
    "N4": "TTCG",
    "N5": "CGGC",
    "E":  "GCTT",
    "F":  "CGCT",
    "N":  "TCCA",  # optional alias
}

# Lowercase = extra scar bases outside the canonical 4-nt overhang.
# These strings represent the final fragment ends after digestion.
RECLONE_END_SPECS: Dict[str, Dict[str, str]] = {
    "A":  {"5": "GGAG",    "3": "GGAG"},
    "B":  {"5": "TACT",    "3": "TACT"},
    "N1": {"5": "CCATg",   "3": "tCCAT"},
    "N2": {"5": "GTCA",    "3": "ggGTCA"},
    "N3": {"5": "TCCAtg",  "3": "TCCA"},
    "C":  {"5": "AATG",    "3": "ggAATG"},
    "D":  {"5": "AGGT",    "3": "ggAGGT"},
    "N4": {"5": "TTCG",    "3": "ggTTCG"},
    "N5": {"5": "CGGC",    "3": "ggCGGC"},
    "E":  {"5": "GCTT",    "3": "GCTT"},
    "F":  {"5": "CGCT",    "3": "CGCT"},
    "N":  {"5": "TCCA",    "3": "TCCA"},
}

# ---------- 3) Type IIS enzyme configs ----------
# Adjust landing_pad and spacer_len to your lab standards if needed.
restriction_enzymes = {
    'BsaI':  {'site': 'GGTCTC',  'spacer_len': 1, 'landing_pad': 'ATGCG'},  # GGTCTC N ↓
    'SapI':  {'site': 'GCTCTTC', 'spacer_len': 1, 'landing_pad': 'GTTAC'},  # GCTCTTC N ↓
    'BbsI':  {'site': 'GAAGAC',  'spacer_len': 2, 'landing_pad': 'GACTG'},  # GAAGAC NN ↓
    'BsmbI': {'site': 'CGTCTC',  'spacer_len': 1, 'landing_pad': 'TGCAG'},  # CGTCTC N ↓
}

# Primer binding length (nt)
PRIMER_BIND_LEN = 20

# ---------- 4) Utilities ----------
def reverse_complement(seq: str) -> str:
    comp = str.maketrans("ACGTNacgtn", "TGCANtgcan")
    return seq.translate(comp)[::-1]

def normalize_token(tok: str) -> str:
    """Case-insensitive map to defined tokens (A–F, N, N1–N5)."""
    t = tok.strip().upper()
    if t not in RECLONE_END_SPECS:
        raise ValueError(f"Unknown syntax token '{tok}'. Allowed: {', '.join(RECLONE_END_SPECS.keys())}")
    return t

def build_fwd_primer(binding_20bp: str, left_token: str, enzyme_cfg: dict) -> str:
    pad   = enzyme_cfg['landing_pad']
    site  = enzyme_cfg['site']
    Ns    = 'N' * enzyme_cfg['spacer_len']
    ovhg5 = RECLONE_END_SPECS[left_token]["5"]   # 5′-end spec for left side
    return pad + site + Ns + ovhg5 + binding_20bp

def build_rev_primer(binding_20bp_rc: str, right_token: str, enzyme_cfg: dict) -> str:
    pad   = enzyme_cfg['landing_pad']
    site  = enzyme_cfg['site']
    Ns    = 'N' * enzyme_cfg['spacer_len']
    ovhg3 = RECLONE_END_SPECS[right_token]["3"]  # 3′-end spec for right side
    # Keep recognition site text in the same orientation, matching prior working style.
    return pad + site + Ns + ovhg3 + binding_20bp_rc

# ---------- 5) Row processor ----------
def generate_primers(row):
    name    = str(row['Name']).strip()
    seq_raw = str(row['Seq']).strip()
    left_in = str(row["5' Syntax"]).strip()
    right_in= str(row["3' Syntax"]).strip()
    enzyme  = str(row['Enzyme']).strip()

    # Clean and validate sequence
    sequence = ''.join([c for c in seq_raw if c in 'ACGTacgt']).upper()
    if len(sequence) < PRIMER_BIND_LEN:
        raise ValueError(f"{name}: sequence must be ≥ {PRIMER_BIND_LEN} bp; got {len(sequence)} bp")

    # Enzyme config
    if enzyme not in restriction_enzymes:
        raise ValueError(f"{name}: unsupported enzyme '{enzyme}'. "
                         f"Choose from: {', '.join(restriction_enzymes.keys())}")
    enz = restriction_enzymes[enzyme]

    # Normalize tokens
    left_tok  = normalize_token(left_in)
    right_tok = normalize_token(right_in)

    # Binding regions (uppercase). Keep overhang cases per table.
    fwd_bind_20        = sequence[:PRIMER_BIND_LEN]
    rev_bind_20_rc     = reverse_complement(sequence[-PRIMER_BIND_LEN:])

    fwd_name = f"{name}-{left_tok}-FWD"
    rev_name = f"{name}-{right_tok}-REV"

    fwd_seq = build_fwd_primer(fwd_bind_20, left_tok, enz)
    rev_seq = build_rev_primer(rev_bind_20_rc, right_tok, enz)

    return pd.Series([fwd_name, fwd_seq, rev_name, rev_seq])

# ---------- 6) Run + export ----------
required = {"Name", "Seq", "5' Syntax", "3' Syntax", "Enzyme"}
missing = required - set(df.columns)
if missing:
    raise ValueError(f"CSV is missing required columns: {', '.join(missing)}")

primers = df.apply(generate_primers, axis=1)
primers.columns = ['Name1', 'Seq1', 'Name2', 'Seq2']

final_df = pd.DataFrame({
    'Name': primers['Name1'].tolist() + primers['Name2'].tolist(),
    'Seq':  primers['Seq1'].tolist() + primers['Seq2'].tolist(),
})

out_fn = 'typeIIS_primers.csv'
final_df.to_csv(out_fn, index=False)
files.download(out_fn)

print("✅ Primer design complete. Output saved as", out_fn)


📤 Please upload your input CSV with columns: Name, Seq, Syntax, Enzyme


Saving HUH-LgBiT .csv to HUH-LgBiT .csv


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

✅ Primer design complete. Output saved and ready for download.
