# Genes and Mutations

Each DNA molecule contains many genes - specific regions that code for proteins.

A gene usually consists of:
- A start codon (`ATG` in DNA and `AUG` in mRNA)
- A series of codons
- A stop codon (`TAA`, `TAG`, `TGA` for DNA - and `UAA`, `UAG`, `UGA` for mRNA)

However, DNA can be read in **three possible reading frames**, depending on where you start reading the triplets. Changing the frame changes the entire protein.

For example:
```yaml
DNA: ATGGGCTTTAAC
Reading Frame 1: ATG GGC TTT AAC -> AUG GGC UUU AAC -> Met Gly Phe Asn
Reading Frame 2: TGG GCT TAA AC -> UGG GCU UUA AC -> Trp Ala Leu
Reading Frame 3: GGG CTT TAA C -> GGG CUU UAA -> Gly Leu Stop
```

So a simple shift in the reading frame (called **Frameshift Mutation**) can completely change the resulting protein.

In [12]:
from dna import DNASequence, translate, transcribe, BASES

## Get Reading Frames

Let's create a helper to show all possible mRNA reading frames for a given DNA sequence.

In [13]:
def get_reading_frames(dna: DNASequence) -> list:
    """Returns all three reading frames for a DNA sequence (as RNA)"""
    rna = dna.transcribe()
    frames = [rna[i:] for i in range(3)]
    return frames


In [14]:
seq = DNASequence.random()

In [15]:
get_reading_frames(seq)

['UCCAUUAUACGC', 'CCAUUAUACGC', 'CAUUAUACGC']

In [16]:
def translate_all_frames(dna: DNASequence):
    frames = get_reading_frames(dna)
    translations = []
    for frame in frames:
        amino_acids = translate(frame)
        translations.append(amino_acids)
    return translations

In [17]:
seq = DNASequence.random(128)

In [18]:
translations = translate_all_frames(seq)
translations

['MEVLV', 'MEVLV', 'MEVLV']

In [19]:
for i, amino_seq in enumerate(translations, start=1):
    print(f"Frame {i}: {'-'.join(amino_seq) if amino_seq else '(no valid sequence)'}")

Frame 1: M-E-V-L-V
Frame 2: M-E-V-L-V
Frame 3: M-E-V-L-V


## Mutations

Mutations are simply change in the DNA sequence:
- `Insertion`: One or more bases added
- `Substitution`: One base replaced by another
- `Deletion`: One or more bases removed

In [20]:
import random

In [21]:
def mutate(seq: str, rate: float = 0.01) -> str:
    """Randomly mutate a DNA sequence with the given mutation rate"""
    new_dna = []
    for base in seq:
        if random.random() < rate:
            new_base = random.choice([b for b in BASES if b != base])
            new_dna.append(new_base)
        else:
            new_dna.append(base)
    return ''.join(new_dna)

In [22]:
seq = DNASequence.random()
mutated = mutate(seq.sequence)

print("Original:", seq.sequence)
print("Mutated:", mutated)
print("\nProtein from mutated:")
print('-'.join(translate(transcribe(mutated))))

Original: TTGGAGTGGACG
Mutated: TTGGAGTGGACG

Protein from mutated:



Even a single mutation can have major effects:
- `Silent Mutation`: No change in amino acid
- `Missense Mutation`: One amino acid changes
- `Nonsense Mutation`: Premature stop codon
- `Frameshift Mutation`: Shifts the reading frame -> catastrophic change