# Universal Plasmid Maker — Example Run

This notebook demonstrates a complete example run of the **Universal Plasmid Maker** pipeline.

It shows how:
- an insert sequence (`Input.Fa`)
- a design specification (`Design.txt`)

are combined to generate a fully assembled plasmid sequence.


In [1]:
import os
from Bio import SeqIO

PROJECT_ROOT = "/home/suchir/Documents/Suchir/Programming/CollegeProjects/BBL434 Lab/Assignment1/Universal-Plasmid-Maker"

os.chdir(PROJECT_ROOT)

print("Working directory set to:", os.getcwd())
from plasmid_builder.pipeline import run_pipeline
from plasmid_builder.ori_finder import predict_ori, predict_ori_candidates

Working directory set to: /home/suchir/Documents/Suchir/Programming/CollegeProjects/BBL434 Lab/Assignment1/Universal-Plasmid-Maker


In [2]:
host_fasta = "data/input/pUC19.fa"
design_file = "data/input/Design_pUC19.txt"

output_fasta = "outputs/pUC19_constructed.fa"
output_genbank = "outputs/pUC19_constructed.gb"

assert os.path.exists(host_fasta)
assert os.path.exists(design_file)

print("Input files loaded")


Input files loaded


In [3]:
host_record = SeqIO.read(host_fasta, "fasta")
host_seq = str(host_record.seq)

print("Host sequence length:", len(host_seq))


Host sequence length: 2686


In [4]:
ori = predict_ori(host_seq)

print("Predicted ORI:")
print("Start:", ori["ori_start"])
print("End:", ori["ori_end"])
print("Length:", ori["ori_end"] - ori["ori_start"])
print("Method:", ori["method"])


Predicted ORI:
Start: 500
End: 800
Length: 300
Method: GC skew + k-mer enrichment


In [5]:
candidates = predict_ori_candidates(host_seq, top_n=5)

for i, cand in enumerate(candidates, 1):
    print(f"\nCandidate {i}")
    print(f"Position: {cand['start']}–{cand['end']}")
    print("Sequence (first 60 bp):", cand["sequence"][:60])
    if cand.get("enriched_kmers"):
        print("Top enriched k-mers:")
        for kmer, score in cand["enriched_kmers"]:
            print(f"  {kmer}: {score:.2f}")



Candidate 1
Position: 500–800
Sequence (first 60 bp): TGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGCCCCGACACCCG
Top enriched k-mers:
  TGGTGCAC: 9.14
  GGTGCACT: 9.14
  GTGCACTC: 9.14
  TGCACTCT: 9.14
  GCACTCTC: 9.14


In [6]:
run_pipeline(
    input_fasta=host_fasta,
    design_file=design_file,
    output_fasta=output_fasta,
    output_genbank=output_genbank
)

print("Pipeline completed successfully. Outputs written to:", output_fasta, output_genbank)


Pipeline completed successfully. Outputs written to: outputs/pUC19_constructed.fa outputs/pUC19_constructed.gb




In [7]:
plasmid_fasta = SeqIO.read(output_fasta, "fasta")

print("Final plasmid length:", len(plasmid_fasta.seq))
print("FASTA header:", plasmid_fasta.id)


Final plasmid length: 9106
FASTA header: Assembled_Plasmid


In [8]:
plasmid_gb = SeqIO.read(output_genbank, "genbank")

print("Annotated features:")
for feat in plasmid_gb.features:
    print("-", feat.type, feat.qualifiers.get("note", ""))


Annotated features:
- CDS 
- CDS 
- CDS 
- rep_origin ['Predicted Origin of Replication']
- CDS ['antibiotic resistance marker']
- CDS ['antibiotic resistance marker']
- CDS ['antibiotic resistance marker']
- misc_feature ['Multiple Cloning Site']


In [9]:
from plasmid_builder.io import parse_design

mcs_list, antibiotic_markers = parse_design(design_file)

print("=== Restriction Enzymes / MCS Design ===")
for mcs in mcs_list:
    print(mcs)

print("\n=== Selection Markers Added ===")
for marker in antibiotic_markers:
    print(marker)

=== Restriction Enzymes / MCS Design ===
('Multiple_Cloning_Site1', 'BamHI')
('Multiple_Cloning_Site2', 'GGATCC')
('Multiple_Cloning_Site3', 'KpnI')
('Multiple_Cloning_Site4', 'SmaI')
('Multiple_Cloning_Site5', 'XbaI')
('Multiple_Cloning_Site6', 'SAL1')

=== Selection Markers Added ===
('ATGCGTACGTAGCTAGCTAGCTAGCTAGCTAGCTAG', 'Ampicillin_Resistance')
('ATGCGTAGCTAGCTAGCTAGCTAGCTAGCTAGCTA', 'Chloramphenicol_Resistance')
('TCAGCTATGACCATGATTACG', 'Blue_White_Screening')
