# Codon models

Sometimes we may not be able to differentiate between the function or fitnesses of different codons encoding the same aminoacid, but still want to take into account the connectivity at the nucleotide level for visualizing the landscape as in a codon model of evolution.

The following table contains the fitnesses associated to each of the 20 aminoacids

In [3]:
# Import required libraries
from os.path import join

import pandas as pd
import numpy as np

from gpmap.src.space import SequenceSpace
from gpmap.src.settings import TEST_DATA_DIR

## 1. Serine landscape

We can start by simply using the protein data to define a protein space, in this case for sequences of length 1

In [4]:
fpath = join(TEST_DATA_DIR, 'serine.protein.csv')
protein_data = pd.read_csv(fpath, index_col=0)
protein_data

Unnamed: 0_level_0,function
protein,Unnamed: 1_level_1
A,1
C,1
D,1
E,1
F,1
G,1
H,1
I,1
K,1
L,1


In [5]:
protein_space = SequenceSpace(X=protein_data.index.values, y=protein_data['function'].values)
print(protein_space)

Sequence Space:
	Type: protein
	Sequence length: 1
	Number of alleles per site: [20]
	Genotypes: [A,C,D,...,V,W,Y]
	Function y: [1,1,1,...,1,1,1]


In [6]:
nc_space = protein_space.to_nucleotide_space(codon_table='Standard', stop_y=0)
print(nc_space)

Sequence Space:
	Type: dna
	Sequence length: 3
	Number of alleles per site: [4, 4, 4]
	Genotypes: [AAA,AAC,AAG,...,TTC,TTG,TTT]
	Function y: [1.00,1.00,1.00,...,1.00,1.00,1.00]


### Using CodonSpace class

We also provide a more generic CodonSpace class that does this operation for us so that we only need to provide the aminoacid(s) are are going to be under selection, enabling also to visualizing the structure of the landscape corresponding to aminoacids with certain properties

In [8]:
from gpmap.src.space import CodonSpace

In [9]:
space = CodonSpace(allowed_aminoacids=['S'], codon_table='Standard', add_variation=True, seed=0)
print(space)

Sequence Space:
	Type: dna
	Sequence length: 3
	Number of alleles per site: [4, 4, 4]
	Genotypes: [AAA,AAC,AAG,...,TTC,TTG,TTT]
	Function y: [1.18,1.04,1.10,...,0.96,0.92,0.83]


Note that we could also test how these landscapes would change under different genetic codes other than the standard. We use biopython module to translate the nucleotide sequence into protein sequence using [NCBI reference](https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi) for different codon tables or genetic codes
