# SynBioPython codon usage module

This notebook demonstrates the functionalities of the codon usage module of [SynBiopython](https://synbiopython.org).

In [2]:
from collections import Counter

from synbiopython import codon
from synbiopython.codon import table, utils

## Plates

The codon module supports both organism names and taxonomy ids, and mapping between both identifiers:

In [3]:
name = codon.get_name(4932)
tax_id = codon.get_tax_id("Saccharomyces cerevisiae")

print("Name:", name)
print("Taxonomy id:", tax_id)

Name: Saccharomyces cerevisiae
Taxonomy id: 4932


Codon usage tables can be retrieved through either name or taxonomy id:

In [4]:
name_table = table.get_table(name)
tax_id_table = table.get_table(tax_id)

These will be the same, irrespective of the method of retrieval:

In [5]:
assert name_table == tax_id_table

The table is a simple dictionary of amino acids to codons, which are themselves a codon to usage frequency dictionary:

In [6]:
l_codons = name_table["L"]
print(l_codons)

{'CTC': 0.06, 'CTG': 0.11, 'CTT': 0.13, 'CTA': 0.14, 'TTA': 0.28, 'TTG': 0.29}


## Utility methods

Utility methods are available for random sampling of codons, and for performing simple codon optimisations.

Repeated sampling of lysine-encoding codons results in a comparable frequency between sampled codon usage and that expected from the supplied codon usage table:

In [7]:
sampled = [utils.sample(name_table, "L") for _ in range(10000)]

codons = Counter(sampled)

for codon, count in codons.items():
    print(codon, count / len(sampled), l_codons[codon])

TTA 0.2919 0.28
TTG 0.2748 0.29
CTT 0.1346 0.13
CTA 0.1361 0.14
CTC 0.0537 0.06
CTG 0.1089 0.11


Based on this probabilistic sampling, a simple codon usage optimisation method is provided:

In [8]:
aa_seq = 'ACDEFGHIKLMNPQRSTVWY'
print(utils.optimise(name_table, aa_seq))

GCGTGTGACGAGTTCGGGCATATCAAACTAATGAACCCACAGAGGTCAACTGTTTGGTAT
