A. Restriction enzymes are proteins that cleave DNA at specific sequence motifs. Use the file posted under week 4 materials, pGL3.fa(a linearized plasmid sequence file) 

1. Write python code to read the plasmid sequence and returns the number and locations of matches to each the following restriction enzymes:

ApoI (motif 5' RAATTY 3'; note that R = A or G, Y = T or C), BsaI (motif 5' GGTCTC 3'), and DpnI (GATC)

Your script should search for both positive (sense) and negative (antisense) strand matches (NOTE: not all sites are palindromes).
Finally, include code to digest with all three enzymes simultaneously

This criterion is linked to a Learning OutcomeRestriction digest script - reads plasmid sequence file
Code should open a filehandle and read lines. This can use pyfaidx or biopython seqIO, or can be manually written

In [4]:
from Bio import SeqIO
from Bio.Restriction import RestrictionBatch

# Load the plasmid sequence file
plasmid_file = "Desktop/pGL3.fa"

# Read the plasmid sequence using Biopython SeqIO
plasmid_record = SeqIO.read(plasmid_file, "fasta")

# Define the restriction enzymes and their recognition sites
enzymes = {
    "ApoI": "RAATTY",
    "BsaI": "GGTCTC",
    "DpnI": "GATC"
}

# Create a RestrictionBatch object with the enzymes
batch = RestrictionBatch(enzymes)

# Perform the restriction digest on the plasmid sequence
digest_data = batch.search(plasmid_record.seq)

# Display the digest results
for enzyme, sites in digest_data.items():
    print(f"Results for {enzyme}:")
    for site in sites:
        print(f"Site: {site}")

# Total number of cuts
total_cuts = sum(len(sites) for sites in digest_data.values())
print(f"Total number of cuts: {total_cuts}")


Results for ApoI:
Site: 562
Site: 1918
Site: 4556
Site: 4567
Results for BsaI:
Site: 3285
Results for DpnI:
Site: 109
Site: 116
Site: 741
Site: 758
Site: 829
Site: 1282
Site: 1650
Site: 1770
Site: 1800
Site: 2077
Site: 2899
Site: 2974
Site: 2985
Site: 2993
Site: 3071
Site: 3083
Site: 3188
Site: 3529
Site: 3547
Site: 3593
Site: 3851
Site: 3868
Site: 3904
Site: 4639
Total number of cuts: 29


Restriction digest script - matches forward and reverse strands of all three enzyme sites.
Code should include a regex sequence to match the three restriction sites on both strands of the DNA. This can be accomplished by 1) having both forward and reverse motifs in the regular expression, 2) searching forward strand, reverse complementing, and searching again. (The code uses re.finditer method here.)

In [5]:
import re

# Read the plasmid sequence from the file
with open("pGL3.fa", "r") as file:
    plasmid_sequence = file.read().replace("\n", "")

# Define the restriction enzymes and their recognition sites
enzymes = {
    "ApoI": "RAATTY",
    "BsaI": "GGTCTC",
    "DpnI": "GATC"
}
# Function to find all matches of a regex motif in the plasmid sequence
def find_all_matches(regex_pattern):
    matches = [(match.start(), match.end()) for match in re.finditer(regex_pattern, plasmid_sequence)]
    return matches

# Find and display all matches for each restriction enzyme using regex
for enzyme, regex_pattern in enzyme_regex.items():
    matches = find_all_matches(regex_pattern)
    num_matches = len(matches)
    print(f"Number of matches for {enzyme}: {num_matches}")
    print(f"Locations of matches for {enzyme}:")
    for match in matches:
        print(f"Start: {match[0]}, End: {match[1]}")


Number of matches for ApoI: 4
Locations of matches for ApoI:
Start: 573, End: 579
Start: 1929, End: 1935
Start: 4567, End: 4573
Start: 4578, End: 4584
Number of matches for BsaI: 0
Locations of matches for BsaI:
Number of matches for DpnI: 24
Locations of matches for DpnI:
Start: 119, End: 123
Start: 126, End: 130
Start: 751, End: 755
Start: 768, End: 772
Start: 839, End: 843
Start: 1292, End: 1296
Start: 1660, End: 1664
Start: 1780, End: 1784
Start: 1810, End: 1814
Start: 2087, End: 2091
Start: 2909, End: 2913
Start: 2984, End: 2988
Start: 2995, End: 2999
Start: 3003, End: 3007
Start: 3081, End: 3085
Start: 3093, End: 3097
Start: 3198, End: 3202
Start: 3539, End: 3543
Start: 3557, End: 3561
Start: 3603, End: 3607
Start: 3861, End: 3865
Start: 3878, End: 3882
Start: 3914, End: 3918
Start: 4649, End: 4653


The code uses re.findall() method from the Python re module to find all matches of a regex motif in the plasmid sequence

In [6]:
import re

# Read the plasmid sequence from the file
with open("pGL3.fa", "r") as file:
    plasmid_sequence = file.read().replace("\n", "")

# Define the restriction enzymes and their recognition sites
enzymes = {
    "ApoI": "RAATTY",
    "BsaI": "GGTCTC",
    "DpnI": "GATC"
}
# Function to find all matches of a regex motif in the plasmid sequence
def find_all_matches(regex_pattern):
    return re.findall(regex_pattern, plasmid_sequence)

# Find and display all matches for each restriction enzyme using findall
for enzyme, regex_pattern in enzyme_regex.items():
    matches = find_all_matches(regex_pattern)
    num_matches = len(matches)
    print(f"Number of matches for {enzyme}: {num_matches}")
    print(f"Locations of matches for {enzyme}:")
    for match in matches:
        print(f"Start: {plasmid_sequence.find(match)}, End: {plasmid_sequence.find(match) + len(match)}")


Number of matches for ApoI: 4
Locations of matches for ApoI:
Start: 573, End: 579
Start: 573, End: 579
Start: 573, End: 579
Start: 4578, End: 4584
Number of matches for BsaI: 0
Locations of matches for BsaI:
Number of matches for DpnI: 24
Locations of matches for DpnI:
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123
Start: 119, End: 123


In [5]:
print(f"Start: {plasmid_sequence.find(match)}, End: {plasmid_sequence.find(match) + len(match)}")

Start: 119, End: 123
