## Longest ORF Finder Across Reading Frames
Imported functions for FASTA file parsing and sequence cleaning from previous scripts. 

Implemented logic to identify and extract the longest open reading frame (ORF) across all three reading frames of a given DNA sequence. 

Ran a custom FASTA file through the workflow to successfully obtain and validate ORF results.

Exercise: Longest ORF Finder

Task:

1. Parse a multi-FASTA file and clean the sequences.


2. For each cleaned sequence, find all ORFs in all 3 reading frames (using the find_orf_frames style function).


3. Among all ORFs found in all frames, identify the longest ORF for each sequence.


4. Print the header, start position, end position, frame, length, and ORF sequence of the longest ORF.


In [12]:
from ORF_separate_frames import fasta_parsing, clean # imported my function from previous notebook

def find_longest_orfs_in_frames(seq):
    start_codon= 'ATG'
    stop_codons= {'TAG', 'TGA', 'TAA'}
    longest_orf = {1: None, 2: None, 3:None}
    positions= []
    for frame in range(3):
        i= frame
        max_orf= ""
        while i < len(seq)-2:
            codon= seq[i:i+3]
            if codon == start_codon:
                for j in range(i+3, len(seq)-2, 3):
                    stop_codon= seq[j:j+3]
                    if stop_codon in stop_codons:
                        orf= seq[i:j+3]
                        positions.append((frame+1, i+1, j+3))
                        if len(orf) > len(max_orf):
                            max_orf= orf
                            longest_orf[frame+1]= max_orf                           
                        i= j+3
                        break
                else:
                    i +=3
            else:
                i += 3
                
    return longest_orf, positions

sequences= fasta_parsing("practice3.txt")

cleaned_sequences= {}
for header, seq in sequences.items():
    cleaned_sequences[header]= clean(seq)

for header, seq in cleaned_sequences.items():
    print(f"Header: {header}")
    longest_orf, positions= find_longest_orfs_in_frames(seq)
    print(f"Longest Orfs : {longest_orf}\n")
    for f_index, start, end in positions:
        print(f"frame: {f_index}\nstart index: {start}\nstop index: {end}\n")
    

Header: Human_sequence
Longest Orfs : {1: 'ATGCTAGCTAGCTAA', 2: None, 3: 'ATGCTAGCTAGCTGA'}

frame: 1
start index: 1
stop index: 15

frame: 3
start index: 18
stop index: 32

Header: Mouse_sequence
Longest Orfs : {1: None, 2: None, 3: None}

Header: Plant_sequence
Longest Orfs : {1: None, 2: None, 3: None}

