# Protein Structure and Codon Usage Bias

Proteins are made up of chains of amino acis. Each amino acid can be encoded by more than one codon, this is known as the **degeneracy of the genetic code**.

Example:
```yaml
Amino Acid: Leucine (Leu)
Codons: UUA, UUG, CUU, CUC, CUA, CUG
```

Different organisms prefer certain codons even when they encode the same amino acid, this is called **codon usage bias**. It affects how proteins are made.

In [17]:
from dna import DNASequence, transcribe, CODON_TABLE

In [18]:
from collections import Counter

def codon_usage(seq: str):
    """Return frequency of each codon in the DNA sequence"""
    rna = transcribe(seq)
    codons = [rna[i:i+3] for i in range(0, len(rna) - 2, 3)]
    counts = Counter(codons)
    total = sum(counts.values())
    usage = {codon: count / total for codon, count in counts.items()}
    return usage

In [19]:
seq = DNASequence.random(512)
seq

<DNASequence(length=512)?

In [20]:
usage = codon_usage(seq.sequence)

for codon, freq in usage.items():
    print(f'{codon}: {freq:.2%}')

CGA: 1.76%
UGU: 2.35%
GGU: 2.94%
CGG: 1.18%
UUU: 1.76%
CGU: 0.59%
GUG: 2.94%
GAA: 2.94%
CUC: 1.18%
UAU: 1.18%
ACC: 2.35%
AUU: 0.59%
AAG: 3.53%
AGU: 2.94%
CCC: 2.35%
GAG: 1.18%
CAA: 1.18%
GGG: 1.76%
GGA: 1.18%
AUG: 3.53%
GUU: 2.94%
GAC: 1.76%
CCG: 1.76%
UGA: 2.94%
UGG: 2.94%
CAU: 0.59%
CAC: 2.35%
CUA: 1.18%
GCG: 1.18%
UUC: 0.59%
AAA: 0.59%
CCA: 1.18%
CGC: 2.35%
ACU: 2.35%
CAG: 1.76%
AAC: 1.76%
CUG: 2.35%
GAU: 1.18%
GCA: 1.76%
AGA: 0.59%
GCU: 2.35%
GCC: 2.35%
GUA: 1.18%
UUG: 1.18%
UAG: 1.18%
GGC: 1.76%
ACG: 1.18%
AGC: 1.76%
UCC: 1.76%
AAU: 0.59%
UCA: 1.18%
AUA: 1.18%
UUA: 1.18%
UAA: 1.18%
CUU: 1.18%
UCU: 1.76%
AGG: 1.18%
UCG: 0.59%
CCU: 0.59%
ACA: 1.18%


## Visualize Codon Usage

In [21]:
import plotly.graph_objects as go

In [22]:
def plot_codon_usage(usage):
    codons = list(usage.keys())
    freqs = list(usage.values())

    fig = go.Figure(data=[go.Bar(x=codons, y=freqs)])
    fig.update_layout(
        title = "Codon Usage Frequency",
        xaxis_title = "Codon",
        yaxis_title = "Frequency",
        height = 400
    )

    fig.show()

In [23]:
plot_codon_usage(usage)

A codon usage bias indicates which codons are "preferred" by a species.

Scientists use this information to:
- Optimize genes for expression in another organism (e.g. bacteria producing human insulin)
- Study evolutionary relationships
- Identify genes with abnormal codon patterns (potentially horizontal gene transfer)

## Visualize Codon Usage by Amino Acids

Each amino acid can be represented by 1-6 codons.

For example:
|Amino Acid|Codons|
|---|---|
|Leucine (Leu)| UUA, UUG, CUU, CUC, CUA, CUG|
|Serine (Ser)|UCU, UCC, UCA, UCG, AGU, AGC|
|Methionine (Met)|AUG|

If an organism prefers one codon (say, CUG for Leucine), it may indicate optimization for specific tRNAs and expression efficiency.

In [24]:
from collections import defaultdict
import plotly.graph_objects as go

In [25]:
def codon_usage_by_amino_acid(seq: str):
    """Returns a mapping of amino acid -> {codon: frequency}"""
    usage = codon_usage(seq)
    aa_usage = defaultdict(dict)

    for codon, freq in usage.items():
        aa = CODON_TABLE.get(codon, "Unknown")
        aa_usage[aa][codon] = freq

    return aa_usage

In [26]:
def plot_codon_usage_by_amino_acid(seq: str):
    """Visualize codon usage grouped by amino acids"""
    aa_usage = codon_usage_by_amino_acid(seq)

    amino_acids = []
    codon_labels = []
    freqs = []

    # Flatten grouped data
    for aa, codons in aa_usage.items():
        for codon, freq in codons.items():
            amino_acids.append(aa)
            codon_labels.append(codon)
            freqs.append(freq)

    fig = go.Figure(data=[go.Bar(
        x = amino_acids,
        y = freqs,
        text = codon_labels,
        hovertemplate = "Amino Acid: %{x}<br>Codon: %{text}<br>Frequency: %{y}",
        marker=dict(color="rgba(0, 123, 255, 0.6)")
    )])

    fig.update_layout(
        title = "Codon Usage Grouped by Amino Acids",
        xaxis_title = "Amino Acid",
        yaxis_title = "Codon Frequency",
        height = 500
    )

    fig.show()

In [27]:
plot_codon_usage_by_amino_acid(seq.sequence)