# Simulating Point Mutation and Translating Longest Protein

## Description
This notebook demonstrates:
- Introducing point mutations into DNA sequences
- Translating the mutated sequence to protein
- Finding the longest open reading frame
- Combining file handling and bioinformatics operations

Challenge: Simulate Point Mutation and Translate the Longest Protein

Problem Statement:
You have a DNA sequence stored in dna.txt.
Your task is to:

1. Clean the sequence to only valid bases: A, T, G, C


2. Simulate a point mutation:

Replace all 'A' with 'G'

Replace all 'T' with 'C'



3. Find the complement of the mutated DNA


4. Find the reverse of that complement


5. Transcribe the reversed complement into RNA (T → U)


6. Split the RNA into codons


7. Translate into amino acids using the standard RNA codon table


8. Extract the longest continuous protein (no stop codons) from the start of the sequence


In [16]:
with open("dna.txt", "r") as file:
    sequence= file.read().upper()

cleaned_sequence= "".join([base for base in sequence if base in 'ATGC'])

mutation_bases= {'A':'G', 'T':'C'}
mutated_dna = ""
for base in cleaned_sequence:
    if base in mutation_bases:
        mutated_dna += mutation_bases[base]
    else:
        mutated_dna += base

complement_of_mutated_dna =""
complement_bases= {'A':'T', 'T':'A', 'G':'C', 'C':'G'}
for base in mutated_dna:
    if base in complement_bases:
        complement_of_mutated_dna += complement_bases[base]
    else:
        complement_of_mutated_dna += base

reverse_complement_of_mutated_dna = complement_of_mutated_dna[::-1]

rna= reverse_complement_of_mutated_dna.replace('T', 'U')

codons= [rna[i:i+3] for i in range(0, len(rna)-2, 3)]

codon_table = {
    'UUU':'F', 'UUC':'F', 'UUA':'L', 'UUG':'L',
    'CUU':'L', 'CUC':'L', 'CUA':'L', 'CUG':'L',
    'AUU':'I', 'AUC':'I', 'AUA':'I', 'AUG':'M',
    'GUU':'V', 'GUC':'V', 'GUA':'V', 'GUG':'V',
    'UCU':'S', 'UCC':'S', 'UCA':'S', 'UCG':'S',
    'CCU':'P', 'CCC':'P', 'CCA':'P', 'CCG':'P',
    'ACU':'T', 'ACC':'T', 'ACA':'T', 'ACG':'T',
    'GCU':'A', 'GCC':'A', 'GCA':'A', 'GCG':'A',
    'UAU':'Y', 'UAC':'Y', 'UAA':'_', 'UAG':'_',
    'CAU':'H', 'CAC':'H', 'CAA':'Q', 'CAG':'Q',
    'AAU':'N', 'AAC':'N', 'AAA':'K', 'AAG':'K',
    'GAU':'D', 'GAC':'D', 'GAA':'E', 'GAG':'E',
    'UGU':'C', 'UGC':'C', 'UGA':'_', 'UGG':'W',
    'CGU':'R', 'CGC':'R', 'CGA':'R', 'CGG':'R',
    'AGU':'S', 'AGC':'S', 'AGA':'R', 'AGG':'R',
    'GGU':'G', 'GGC':'G', 'GGA':'G', 'GGG':'G'
}

aminoacids= list(map(lambda codon: codon_table.get(codon, '?'), codons))
protein = []
for aa in aminoacids:
    if aa == '_':
        break
    elif aa =='?':
        "Unknown codon found"
    else:
        protein.append(aa)

longest_protein = "".join(protein)

print(f"""Mutated DNA: {mutated_dna}\nComplement: {complement_of_mutated_dna}
      \nReverse Complement: {reverse_complement_of_mutated_dna}
       \nRNA: {rna}\nCodons: {codons}\nLongest Translated Protein: {longest_protein}""")
    
                 

Mutated DNA: GGCCGGGCGCGCCGGGCGC
Complement: CCGGCCCGCGCGGCCCGCG
      
Reverse Complement: GCGCCCGGCGCGCCCGGCC
       
RNA: GCGCCCGGCGCGCCCGGCC
Codons: ['GCG', 'CCC', 'GGC', 'GCG', 'CCC', 'GGC']
Longest Translated Protein: APGAPG
