Use generative AI to create highly-expressed synthetic genes
For now, install Espresso by cloning the repository and using your favorite Python package management system. For example, using a regular virtual environment, first create and activate your environment (here I am using the Fish shell)
python -m venv .venv
```shell
git clone https://github.com/dacarlin/espresso.git
cd espresso
python -m pip install -e .
First, make sure you can import Espresso!
import espresso
Next, let's try generating a coding sequence for E. coli using a protein sequence that we provide
my_protein = "MENFHHRPFKGGFGVGRVPTSLYYSLSDFSLSAISIFPTHYDQPYLNEAPSWYKYSLESGLVCLYLYLIYRWITRSF"
my_gene = espresso.design_coding_sequence(my_protein, model="ec")
The resulting sequence will be optimized for expression in E. coli. Read on to learn more about Espresso and how to perform more complicated tasks.
For the independent model, learn the codon frequency from a set of genes
using the provided learn.py
script. For example
python learn.py espresso/data/cds/Saccharomyces_cerevisiae.R64-1-1.cds.all.fa.gz
This will output a JSON file that can be used with the IndependentEncoder
class
For the transformer models, learn whole sequence design from native sequences stored in a FASTA file