# SynBioPython codon usage module

This notebook demonstrates the functionalities of the codon usage module of [SynBiopython](https://synbiopython.org).

In [1]:
from collections import Counter

from synbiopython.codon import table, taxonomy_utils, utils

The taxonomy_utils module supports both organism names and taxonomy ids, and mapping between both identifiers:

In [1]:
name = taxonomy_utils.get_organism_name(4932)
tax_id = taxonomy_utils.get_tax_id("Saccharomyces cerevisiae")

print("Name:", name)
print("Taxonomy id:", tax_id)

Name: Saccharomyces cerevisiae
Taxonomy id: 4932


Codon usage tables can be retrieved through either name or taxonomy id:

In [1]:
name_table = table.get_table(name)
tax_id_table = table.get_table(tax_id)

These will be the same, irrespective of the method of retrieval:

In [1]:
assert name_table == tax_id_table

The table is a simple dictionary of amino acids to codons, which are themselves a codon to usage frequency dictionary:

In [1]:
l_codons = name_table["L"]
print(l_codons)

{'CTC': 0.06, 'CTG': 0.11, 'CTT': 0.13, 'CTA': 0.14, 'TTA': 0.28, 'TTG': 0.29}


## Utility methods

Utility methods are available for random sampling of codons, and for performing simple codon optimisations.

Repeated sampling of lysine-encoding codons results in a comparable frequency between sampled codon usage and that expected from the supplied codon usage table:

In [1]:
sampled = [utils.sample(name_table, "L") for _ in range(10000)]

codons = Counter(sampled)

for cdn, count in codons.items():
    print(cdn, count / len(sampled), l_codons[cdn])

CTT 0.1274 0.13
CTC 0.0624 0.06
TTA 0.2834 0.28
TTG 0.2823 0.29
CTA 0.1441 0.14
CTG 0.1004 0.11


Based on this probabilistic sampling, a simple codon usage optimisation method is provided:

In [1]:
aa_seq = 'ACDEFGHIKLMNPQRSTVWY'
print(utils.optimise(name_table, aa_seq))

GCGTGTGATGAATTCGGTCATATCAAATTAATGAACCCGCAACGTTCTACCGTATGGTAT
