In [3]:
# Define the transcript given in the question
transcript = "ATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCACTAA"


In [4]:
import re  # Import the regular expression module

def find_target_sites(transcript):
    pairings = ['AT', 'GT', 'AC', 'GC']  # Define the pairings we're interested in
    target_sites = []  # Create an empty list to store the target sites

    # Loop over each pairing
    for pairing in pairings:
        # Find all occurrences of the pairing in the transcript
        for match in re.finditer(pairing, transcript):
            start_index = match.start()  # Get the start index of the pairing

            # Loop over the range from 7 to 9
            for i in range(7, 10):
                # Check if the target site is within the transcript
                if start_index - i >= 0 and start_index + len(pairing) + i <= len(transcript):
                    # If it is, extract the target site
                    target_site = transcript[start_index - i : start_index + len(pairing) + i]
                    # And add it to the list of target sites, along with the pairing and the lengths
                    target_sites.append((pairing, target_site, len(target_site), i, i))

    return target_sites  # Return the list of target sites



# Call the function with the transcript
target_sites = find_target_sites(transcript)

# Loop over the target sites and print them
for site in target_sites:
    print(f"Pairing: {site[0]}, Target site: {site[1]}, total = {site[2]}, left (5') = {site[3]}, right (3') = {site[4]}")


Pairing: AT, Target site: ATGGTGCATCTGACTC, total = 16, left (5') = 7, right (3') = 7
Pairing: AT, Target site: AACGTGGATGAAGTTG, total = 16, left (5') = 7, right (3') = 7
Pairing: AT, Target site: GAACGTGGATGAAGTTGG, total = 18, left (5') = 8, right (3') = 8
Pairing: AT, Target site: TGAACGTGGATGAAGTTGGT, total = 20, left (5') = 9, right (3') = 9
Pairing: AT, Target site: TTTGGGGATCTGTCCA, total = 16, left (5') = 7, right (3') = 7
Pairing: AT, Target site: CTTTGGGGATCTGTCCAC, total = 18, left (5') = 8, right (3') = 8
Pairing: AT, Target site: CCTTTGGGGATCTGTCCACT, total = 20, left (5') = 9, right (3') = 9
Pairing: AT, Target site: ACTCCTGATGCTGTTA, total = 16, left (5') = 7, right (3') = 7
Pairing: AT, Target site: CACTCCTGATGCTGTTAT, total = 18, left (5') = 8, right (3') = 8
Pairing: AT, Target site: CCACTCCTGATGCTGTTATG, total = 20, left (5') = 9, right (3') = 9
Pairing: AT, Target site: TGCTGTTATGGGCAAC, total = 16, left (5') = 7, right (3') = 7
Pairing: AT, Target site: ATGCTGTTAT

In the above code, we identified several 'target sites' in the given DNA sequence or transcript. Each target site is a section of the sequence that includes a specific pair of nucleotides (in the first case, 'AT') and a certain number of nucleotides on either side of that pair. These target sites are of interest because they may have particular biological significance, such as being the binding site for a specific protein or the location of a genetic mutation. The 'total' number represents the total length of the target site, while 'left (5')' and 'right (3')' represent the number of nucleotides on the left and right of the pairing, respectively.