I used Python with Biopython online to analyse GenBank files of CRISPR plasmids designed for project 1 - BRCA2 and PIK3CA sgRNA Validation for CRISPR Knockout. This analysis allowed me to:

Extract and visualise key features such as promoters, sgRNA guide sequences, Cas9 coding regions, and replication origins.

Validate the correct positioning of essential plasmid components, such as sgRNA scaffolds, PAM sites, and NLS signals, for Cas9 functionality.

Ensure that the plasmid sequences align with the experimental requirements, focusing on the knockout of target genes like PIK3CA and BRCA2.

The outputs from Python were cross-referenced with Benchling annotations to confirm the plasmids' integrity and suitability for CRISPR-mediated gene editing.

In [18]:
from google.colab import files

# Upload your file
uploaded = files.upload()



Saving grch38_pik3ca_crispr_ko_-px4850_backbone (4).gb to grch38_pik3ca_crispr_ko_-px4850_backbone (4).gb


In [19]:
file_path = "/content/grch38_pik3ca_crispr_ko_-px4850_backbone (4).gb"


In [20]:
# Install Biopython
!pip install biopython

# Import necessary libraries
from Bio import SeqIO

# Load the GenBank file
plasmid_record = SeqIO.read(file_path, "genbank")

# General information
print(f"Plasmid ID: {plasmid_record.id}")
print(f"Sequence Length: {len(plasmid_record.seq)} bp")
print(f"Description: {plasmid_record.description}")

# Extract and display features
print("\nFeatures:")
for feature in plasmid_record.features:
    print(f"Type: {feature.type}, Location: {feature.location}, Strand: {feature.strand}")
    if 'note' in feature.qualifiers:
        print("  Notes:", feature.qualifiers['note'])
    if 'gene' in feature.qualifiers:
        print("  Gene:", feature.qualifiers['gene'])
    if 'product' in feature.qualifiers:
        print("  Product:", feature.qualifiers['product'])


Plasmid ID: GRCh38_PIK3CA_CRISPR_KO
Sequence Length: 9292 bp
Description: 

Features:
Type: promoter, Location: [0:249](+), Strand: 1
Type: primer, Location: [245:270](+), Strand: 1
  Notes: ['sequence: CACCGACCCGATGCGGTTAGAGCCG']
Type: primer, Location: [249:274](-), Strand: -1
  Notes: ['sequence: aaacCGGCTCTAACCGCATCGGGTC']
Type: misc_feature, Location: [250:270](+), Strand: 1
Type: misc_feature, Location: [270:274](+), Strand: 1
Type: Chimeric Gui..., Location: [270:346](+), Strand: 1
Type: U6 Terminator, Location: [346:352](+), Strand: 1
Type: CBh, Location: [442:1241](+), Strand: 1
Type: 3X FLAG, Location: [1253:1322](+), Strand: 1
Type: NLS, Location: [1322:1373](+), Strand: 1
Type: hSpCsn1, Location: [1373:5474](+), Strand: 1
Type: NLS, Location: [5474:5522](+), Strand: 1
Type: EcoRI, Location: [5522:5528](+), Strand: 1
Type: T2A, Location: [5528:5591](+), Strand: 1
Type: GFP, Location: [5591:6305](+), Strand: 1
Type: EcoRI, Location: [6305:6311](+), Strand: 1
Type: bGH polyA, 


Feature	Location	Correct Positioning?
Promoter	[0:249]	✅
sgRNA Guide	[250:270]	✅
PAM Site	[270:274]	✅
Chimeric Guide	[270:346]	✅
U6 Terminator	[346:352]	✅
CBh Promoter	[442:1241]	✅
Cas9 Coding Sequence	[1373:5474]	✅
NLS	[1322:1373] & [5474:5522]	✅
3X FLAG	[1253:1322]	✅
T2A Sequence	[5528:5591]	✅
GFP	[5591:6305]	✅
bGH polyA	[6314:6546]	✅
Replication Origins	[6786:7093], [8619:9287]	✅
Ampicillin Resistance	[7611:8469]	✅

In [13]:
from google.colab import files

# Upload your file
uploaded = files.upload()


Saving brca2_region_grch37_ass.gb to brca2_region_grch37_ass.gb


In [16]:
file_path = "/content/brca2_region_grch37_ass.gb"

In [17]:
# Install Biopython
!pip install biopython

# Import necessary libraries
from Bio import SeqIO

# Load the GenBank file
plasmid_record = SeqIO.read(file_path, "genbank")

# General information
print(f"Plasmid ID: {plasmid_record.id}")
print(f"Sequence Length: {len(plasmid_record.seq)} bp")
print(f"Description: {plasmid_record.description}")

# Extract and display features
print("\nFeatures:")
for feature in plasmid_record.features:
    print(f"Type: {feature.type}, Location: {feature.location}, Strand: {feature.strand}")
    if 'note' in feature.qualifiers:
        print("  Notes:", feature.qualifiers['note'])
    if 'gene' in feature.qualifiers:
        print("  Gene:", feature.qualifiers['gene'])
    if 'product' in feature.qualifiers:
        print("  Product:", feature.qualifiers['product'])

Plasmid ID: BRCA2_Region_GRCh37_Ass
Sequence Length: 8509 bp
Description: 

Features:
Type: promoter, Location: [0:249](+), Strand: 1
Type: primer, Location: [245:270](+), Strand: 1
  Notes: ['sequence: CACCGCGTTTTGCCCGATTCCGTAT']
Type: primer, Location: [249:274](-), Strand: -1
  Notes: ['sequence: aaacATACGGAATCGGGCAAAACGC']
Type: sgRNA Guide, Location: [250:270](+), Strand: 1
Type: PAM Site, Location: [270:274](+), Strand: 1
Type: Chimeric gui..., Location: [270:346](+), Strand: 1
Type: U6 terminator, Location: [346:352](+), Strand: 1
Type: CBh, Location: [442:1241](+), Strand: 1
Type: 3XFLAG, Location: [1253:1322](+), Strand: 1
Type: misc_feature, Location: [1322:1373](+), Strand: 1
Type: hSpCsn1, Location: [1373:5474](+), Strand: 1
Type: D10A, Location: [1396:1399](+), Strand: 1
Type: NLS, Location: [5474:5522](+), Strand: 1
Type: bGH polyA, Location: [5531:5763](+), Strand: 1
Type: R-ITR, Location: [5771:5912](+), Strand: 1
Type: rep_origin, Location: [6003:6310](+), Strand: 1
Ty


Feature	Location	Correct Positioning?
Promoter	[0:249]	✅
sgRNA Guide	[250:270]	✅
PAM Site	[270:274]	✅
Chimeric Guide	[270:346]	✅
U6 Terminator	[346:352]	✅
CBh Promoter	[442:1241]	✅
Cas9 Coding Sequence	[1373:5474]	✅
NLS (Nuclear Localization)	[5474:5522]	✅
D10A Mutation	[1396:1399]	✅
bGH polyA	[5531:5763]	✅
Replication Origins	[6003:6310] & [7836:8504]	✅
Ampicillin Resistance	[6828:7686]	✅
