# Compare Mutated vs Wild Type Protein

## Description
This notebook demonstrates:
- Comparing protein sequences from mutated and wild type RNA
- Detecting differences in amino acid sequences
- Using Python for sequence comparison and basic bioinformatics analysis

Challenge: Compare Mutated Protein to Wild-Type Protein

Task:

You already have the mutated DNA → RNA → protein sequence.

Now:

1. Also translate the original (wild-type) DNA into a protein.


2. Compare the wild-type and mutated proteins.


3. Identify differences (amino acid substitutions).


4. Print both protein sequences + indicate where the mutation(s) occurred.


In [14]:
with open("dna.txt", "r") as file:
    sequence= file.read().upper()

cleaned_sequence= "".join([base for base in sequence if base in 'ATGC'])

mutated_dna= ""
mutation_complement= {'A':'G', 'T':'C'}
for base in cleaned_sequence:
    if base in mutation_complement:
        mutated_dna += mutation_complement[base]
    else:
        mutated_dna += base

complement_of_mutated_dna = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}
mutated_dna_complement= ""
for base in mutated_dna:
    if base in complement_of_mutated_dna:
        mutated_dna_complement += complement_of_mutated_dna[base]
    else:
        mutated_dna_complement += base

reverse_complement_of_mutated_dna= mutated_dna_complement[::-1]

rna= reverse_complement_of_mutated_dna.replace('T', 'U')

codon_table = {
    'UUU':'F', 'UUC':'F', 'UUA':'L', 'UUG':'L',
    'CUU':'L', 'CUC':'L', 'CUA':'L', 'CUG':'L',
    'AUU':'I', 'AUC':'I', 'AUA':'I', 'AUG':'M',
    'GUU':'V', 'GUC':'V', 'GUA':'V', 'GUG':'V',
    'UCU':'S', 'UCC':'S', 'UCA':'S', 'UCG':'S',
    'CCU':'P', 'CCC':'P', 'CCA':'P', 'CCG':'P',
    'ACU':'T', 'ACC':'T', 'ACA':'T', 'ACG':'T',
    'GCU':'A', 'GCC':'A', 'GCA':'A', 'GCG':'A',
    'UAU':'Y', 'UAC':'Y', 'UAA':'_', 'UAG':'_',
    'CAU':'H', 'CAC':'H', 'CAA':'Q', 'CAG':'Q',
    'AAU':'N', 'AAC':'N', 'AAA':'K', 'AAG':'K',
    'GAU':'D', 'GAC':'D', 'GAA':'E', 'GAG':'E',
    'UGU':'C', 'UGC':'C', 'UGA':'_', 'UGG':'W',
    'CGU':'R', 'CGC':'R', 'CGA':'R', 'CGG':'R',
    'AGU':'S', 'AGC':'S', 'AGA':'R', 'AGG':'R',
    'GGU':'G', 'GGC':'G', 'GGA':'G', 'GGG':'G'
}

translated = []
codons= [rna[i:i+3] for i in range (0, len(rna)-2, 3)]
aminoacids= list(map(lambda codon: codon_table.get(codon, '?'), codons))
for aa in aminoacids:
    if aa == '_':
        break
    elif aa == '?':
        print(f"Unknown codon found: {codon}")
    else:
        translated.append(aa)

mutated_protein= "".join(translated)

# Translate the original(wild-type) DNA into protein

complement= {'A':'T', 'T':'A', 'C':'G', 'G':'C'}
dna_complement= ""

for base in cleaned_sequence:
    if base in complement:
        dna_complement += complement[base]
    else:
        dna_complement +=base

reverse_complement = dna_complement[::-1]

original_rna = reverse_complement.replace('T', 'U')

original_codons= [original_rna[i:i+3] for i in range(0, len(original_rna)-2, 3)]
original_translated=[]

for codon in original_codons:
    aa = codon_table.get(codon,'?')
    if aa =='_':
        break
    elif aa =='?':
        print(f"Unknown codon found!!!: {codon}")
    else:
        original_translated.append(aa)

original_protein= "".join(original_translated)
    
print(f"Wild-type protein: {original_protein}\nMutated protein: {mutated_protein}")
    


Wild-type protein: TFSALD
Mutated protein: APGAPG
