# File handling – The Hidden Message

- The file "genome.fa" is a 1 million bp. piece from a bacterial genome

- Find all open reading frames >= 450 nucleotides / 150 AA
    - Remember an ORF can also be on the complementary strand!
    - An ORF starts with "ATG"
    - An ORF stops with "TAA", "TAG" or "TGA"

- Translate the ORF into an single letter amino acid sequence
    - ATG --> M

- Sort the ORFs on length (large to small)

- From the ORFs take in order the 25th AA

- What is the hidden message?

In [1]:
# Obtain the AA translation code
bases = ["T","C","A","G"]
codons = [a+b+c for a in bases for b in bases for c in bases]
amino_acids = "FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG"
codon_table = dict(zip(codons, amino_acids))

In [2]:
import re

# Open the genome file, read the first line, and concatenate the sequence

sequence = ""
file = open("genome.fa","r")
header = file.readline()

for line in file:
    sequence += line.rstrip()

# Make the sequence reverse complement and merge at the end
from string import maketrans
complement = maketrans("acgtACGT", "tgcaTGCA")
sequence += sequence.translate(complement)[::-1]

In [3]:
# Find all start codons
atg_start = []
for match in re.finditer("ATG",sequence):
    atg_start.append(match.start())

# Find all stop codons
stop_start =[]
for match in re.finditer("TAA|TAG|TGA",sequence):
    stop_start.append(match.start())

In [4]:
# Find the first stop codon in frame after every start and check if length >= 450
orf = {}
for start in sorted(atg_start, key = int):
    for stop in stop_start:
        if ((start < stop) and ((stop-start)%3 == 0)):
            if start not in orf:
                orf[start] = stop
    if (start in orf) and ((orf[start] - start) < 450):
        del orf[start]

In [None]:
# Get all lengths of the ORFs to sort on later
order = {}
for key in orf:
    order[orf[key]-key] = key

# On the sorted ORFs translate the 25th AA
message = ""
for key in sorted(order, key = int, reverse = True):
    message += codon_table[sequence[order[key]+72:order[key]+75]]
print message