Skip to content

Background

Jeffrey Barrick edited this page Aug 9, 2021 · 19 revisions

Gene Expression in Bacteria

Gene expression in bacteria begins with an RNA polymerase transcribing portions of the genome from DNA into RNA. For messenger RNAs (mRNAs) that encode proteins, a ribosome then binds to a start codon and translates its nucleotide sequence into an amino acid sequence until it encounters a stop codon and terminates. Because the pool of ribosomes in a cell is limited, start codons on different mRNAs compete for ribosome binding. This makes the rate of translation initiation at a gene's start codon an important determinant of how much protein it will produce relative to other genes.

Thermodynamic Model of Translation Initiation

OSTIR uses a thermodynamic model developed by Salis et al. (2009) to predict the relative rate of translation initiation from different start codons in bacterial mRNAs. This model assumes that the rate of translation initiation is determined by the free energy change upon binding of the ribosome to utilize a given start codon in an mRNA sequence. This energy is broken down into several different components that can be calculated using mRNA folding/interaction energies or derived from experiments. OSTIR uses the open source ViennaRNA package for the necessary RNA secondary structure energy calculations (Lorenz et al. 2011).

The five energy components are:

1. Ribosome binding to the mRNA

The term is the energy of base pairing between the anti-Shine-Dalgarno sequence in the ribosome and the ribosome binding site located upstream of the start codon in the mRNA. The anti-Shine-Dalgarno sequence consists of the nine RNA bases located at the 3′ end of the 16S rRNA subunit. This energy is calculated as the minimum free energy found for these nine bases pairing to the mRNA sequence.

2. Start codon binding to the initiator tRNA

The term is the energy of base pairing between the three bases in the anti-codon of the initiator transfer RNA (tRNA) and the start codon in the mRNA.

3. Spacing between the ribosome binding site and the start codon

There is an optimal spacing in the mRNA between the start codon and the ribosome binding site that supports efficient translation initiation. Deviation from this optimal spacing incurs the energy penalty based on a quadratic model fit to experimental data for longer-than-optimal spacing and a sigmoidal model fit to experimental data for shorter-than-optimal spacing.

4. Standby site structure

The term is the energy of any mRNA secondary structure that hides the "standby site", which consists of the four bases upstream of the ribosome binding site (RBS). The standby site is the location where the ribosome initially begins to interact with the mRNA during ribosome binding. It must be single-stranded for this interaction to occur. This energy is calculated by finding the difference between the minimum free energy of the mRNA subsequence [start-35, start+35] when the standby site subsequence [RBS-5, RBS-1] is allowed to base pair versus when it is forced to be single-stranded. Note that this RBS position is taken as the most 5′ base pair formed to the anti-Shine-Dalgarno sequence. (It's binding may not involve all nine of its bases).

5. mRNA structure

The term is the energy of mRNA secondary structure that is initially present around the start codon. It must be unfolded for the ribosome to bind and translation to begin. This energy is calculated by finding the minimum free energy of the mRNA subsequence [start-35, start+35].

Predicting the translation initiation rate

These five energies are combined to calculate the total free energy change of ribosome binding to a given start codon. The standby and mRNA structure energies are subtracted because these structures must be disrupted for translation initiation.


Finally, we can predict translation initiation rates based on an equation that is derived from the standard relationship between the standard change in the Gibbs free energy of the overall ribosome binding reaction and the equilibrium constant that it predicts for binding of ribosomes to this particular start codon.


The proportionality constant and Boltzmann factor in this equation are fit using measurements of fluorescent protein expression from plasmids constructed to have different ribosome binding site contexts as described in Salis et al. (2009). is proportional to the relative mRNA levels of different transcripts. (If a promoter is twice as strong, it will double.) is the reciprocal of the universal gas constant times the effective temperature of the system .

The input data and a script to fit these constants and the additional parameters in the spacing penalty model are provided in the calibration directory.

A full descripton of the thermodynamic model and some assumptions of this approach can be found in Salis (2011).

Results for start codons of E. coli genes

Although the general thermodynamic model used by OSTIR should apply for any bacterial species, it is specifically parameterized based on experiments in Escherichia coli.

Because OSTIR predictions are relative, it can be helpful to have a frame of reference for what value indicates a start codon with a strong ribosome binding site that is likely to drive significant gene expression in a bacterial cell versus a weak start codon that does not have appreciable activity.

The graph below shows the results of using OSTIR on the entire E. coli MG1655 genome sequence (GenBank:NC_000913.3). For the 4,249 annotated protein coding genes, there is a peak in the predicted relative initiation rate around 1,000 and most values are in the range 100-10,000. The distribution for other start codons peaks around a value of 10 and most predicted rates are from roughly from 1 to 100. Note that there are still many potential start codons that do not begin annotated genes that are predicted to have relatively high translation initiation rates. However, they will not lead to translation if they are not in a region of the genome that is transcribed into mRNA.

References

Lorenz R, Bernhart SH, Höner Zu Siederdissen C, Tafer H, Flamm C, Stadler PF, Hofacker, IL. (2011) ViennaRNA Package 2.0. Algorithms Mol. Biol. 6:26. https://doi.org/10.1186/1748-7188-6-26.

Salis HM, Mirsky EA, Voigt CA. (2009) Automated Design of Synthetic Ribosome Binding Sites to Control Protein Expression. Nat. Biotechnol. 27:946–950. https://doi.org/10.1038/nbt.1568.

Salis HM. (2011) The Ribosome Binding Site Calculator. Methods Enzymol. 498:19–42. https://doi.org/10.1016/B978-0-12-385120-8.00002-4.