I used Python with Biopython online to analyse GenBank files of CRISPR plasmids designed for the project on BEM1 upregulation in *Saccharomyces cerevisiae *using CRISPR activation (CRISPRa) to model cancer therapeutics. This analysis allowed me to:
Extract and visualise key features such as sgRNA guide sequences, PAM sites, promoters, dCas9-VP64 coding regions, and terminators.
Validate the correct positioning of essential plasmid components, including sgRNA scaffolds, regulatory elements, and selectable markers, ensuring dCas9 functionality and efficient sgRNA expression.
Ensure the plasmid sequences align with the experimental requirements for targeting BEM1, with high specificity and minimal off-target effects.
The outputs from Python were cross-referenced with Benchling annotations to confirm the plasmids' integrity and suitability for CRISPRa-mediated gene activation experiments in yeast.

In [1]:
from google.colab import files

# Upload your file
uploaded = files.upload()

Saving crispra-in-saccharomyces-cerevisiae-for-gene-bem1.gb to crispra-in-saccharomyces-cerevisiae-for-gene-bem1.gb


In [2]:
file_path = "/content/crispra-in-saccharomyces-cerevisiae-for-gene-bem1.gb"

In [3]:
# Install Biopython
!pip install biopython

# Import necessary libraries
from Bio import SeqIO

# Load the GenBank file
plasmid_record = SeqIO.read(file_path, "genbank")

# General information
print(f"Plasmid ID: {plasmid_record.id}")
print(f"Sequence Length: {len(plasmid_record.seq)} bp")
print(f"Description: {plasmid_record.description}")

# Extract and display features
print("\nFeatures:")
for feature in plasmid_record.features:
    print(f"Type: {feature.type}, Location: {feature.location}, Strand: {feature.strand}")
    if 'note' in feature.qualifiers:
        print("  Notes:", feature.qualifiers['note'])
    if 'gene' in feature.qualifiers:
        print("  Gene:", feature.qualifiers['gene'])
    if 'product' in feature.qualifiers:
        print("  Product:", feature.qualifiers['product'])

Collecting biopython
  Downloading biopython-1.84-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Downloading biopython-1.84-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.2/3.2 MB[0m [31m25.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: biopython
Successfully installed biopython-1.84
Plasmid ID: CRISPRa_in_Saccharomyc
Sequence Length: 11470 bp
Description: 

Features:
Type: primer, Location: [0:21](+), Strand: 1
  Notes: ['sequence: TGGCTAGGTCTCCTTctacaggaattcgaagtacgg']
Type: source, Location: [0:10269](+), Strand: 1
Type: UAS, Location: [18:136](+), Strand: 1
  Notes: ['upstream activating sequence mediating Gal4-dependent induction']
Type: TPGI promoter, Location: [18:460](+), Strand: 1
  Notes: ['S. cerevisiae GAL1 promoter modified to contain two copies of the tet operator (Ellis et al., 2009)']
Type: tet operator, Location: [343:362](+), Strand



# Feature	Location	Correct Positioning?

sgRNA Guide	[11364:11384]	✅
PAM Site	[11384:11387]	✅
Chimeric Guide	[11387:11470]	✅
TPGI Promoter	[18:460]	✅
UAS	[18:136]	✅
LEU2	[10269:11364]	✅
CYC1 Terminator	[4877:5067]	✅
AmpR	[6691:6711]	✅
ColE1 Origin	[5309:5898]	✅