# **Variations on a Theme of Control**

##***Learning Objectives:***


*   Become familiar with the negative autoregulatory transcription circuit
*   Understand the importance of learning different mathematical methods
*   Explore common experimental techniques used in systems biology

##***Section 1.1: Variations:***

When I was a young piano player, my teacher gave me a new piece to learn that I found
fascinating. It was Mozart’s variations on a tune that I knew as “Twinkle, Twinkle, Little
Star.” The piece begins with a simple rendition of the tune, and then catapults into a fast
series of scales, followed by eleven other variations, including key changes, dramatic
shifts in tempo, and plenty else to keep my fingers busy.

What amazed me about this piece was how much I was able to learn by studying a simple
melody from almost every technical angle. Mozart took advantage of the tune’s ubiquity
to explore all kinds of techniques and styles. I was therefore able to focus on learning
these techniques without having to relearn a new melody. At the same time, the
variations offer a much deeper understanding of the original tune by presenting it simply
at first, but then with increasing levels of complexity and nuance. Each variation is a new
way of looking at the tune, and as I considered all of the variations together I realized that
I would never hear “Twinkle, Twinkle, Little Star” the same way again.

In the next several chapters I will present variations on a ubiquitous and fairly simple
theme in biology: the regulation of gene expression by proteins binding to DNA. We will
examine a relatively simple biological system, going over it again and again with
increasingly sophisticated mathematical approaches. My goals are inspired by Mozart:
first, I want you to grasp all these approaches without learning new biology, and second, I
want you to see this “simple” system in all of its beautiful complexity and nuance. You
will find that some of the methods we learn highlight certain aspects of the system, while
other methods yield very different insights.

##***Section 1.2: Autoregulation:***

Our theme falls within the general topic of control, which plays a central role in
engineered systems. In fact, control systems deeply impact our existence without most
people ever being aware of it, which is why the following point by John Doyle, a
prominent control theorist at the California Institute of Technology, is useful:


> Without control systems there could be no manufacturing, no
vehicles, no computers, no regulated environment – in short, no technology.
(Doyle, J. C., Francis, B. A., Tannenbaum, A. R. Feedback Control Theory. Dover
Publications, 2009).

For example, if you look under the hood of your Honda Civic (just a guess), or examine the screen of a Toyota Prius, you’re going to find systems that monitor and control virtually every aspect of the car’s operation, from engine performance and battery charge to braking and airbag deployment.

Yet control doesn’t just permeate our daily lives, it is the foundation for life itself.  Without control systems in biology, our cells couldn’t maintain homeostasis, divide, or carry out any process that relies on **feedback**, control, or “memory” of earlier physical, chemical, biological, or environmental states.

Even in some of the least complicated organisms, biological control appears to be extensive. You are probably familiar with the so-called **central dogma** of molecular biology, scrawled by high school biology teachers on chalkboards all across the world:



> **<h3>DNA $\rightarrow$ RNA $\rightarrow$ Protein**

in which DNA, filled with information units called genes, can be transcribed to make molecules called messenger RNA (mRNA), which can further be translated to create proteins, molecules that perform many vital cellular functions (of course, there are many caveats to this simplified scheme, many of which we will touch on later).


> <u>Sidebar 1.1: On notation</u>

>For better or worse, each of the organisms that biologists like to focus on – the **model organisms** – is associated with its own, sometimes unique, nomenclature and notation systems. Escherichia coli (species names always appear in italics) has a long history of study, and thus its notation system is fairly well standardized. Gene names in E. coli (we can shorten the species name once it’s defined) are italicized and usually lower-case (crp, arcA), while protein names appear in roman font with a capital first letter. Sometimes these gene names are actually abbreviations of description of what the protein does; for example, arcA stands for “aerobic respiration control, gene A.” Careful notation is part of the accurate and specific language of science, and helps us
avoid confusing one molecule for another – or writing unnecessarily long sentences!

![Figure 1.1](https://drive.google.com/uc?export=view&id=1bR3kqUFlZaokCRr50DDPX98yjBi67Q1q)

>**Figure 1.1. The transcription regulatory network of the bacterium Escherichia
coli exhibits extensive autoregulation.** Protein factors (ovals) bind DNA and
modulate the transcription of target genes. Links between transcription factors and
target genes are indicated by connecting lines; solid lines denote activation, dotted
lines represent repression, and dashed lines indicate both activation and repression.
Note the extensive presence of autoregulation (red), in which a protein activates or
represses transcription of its own gene. Modified from Herrgård, M. J., Covert, M.
W., Palsson, B. Ø. Reconciling gene expression data with known genome-scale
regulatory network structures. Genome Research. 2003. 13(11):2423-34, with
permission from Cold Spring Harbor Laboratory Press.

We will start by considering how cells regulate the transcription of DNA into RNA.  A cell’s ability to create active, operational proteins from DNA-encoded information at the right time and under the appropriate conditions is absolutely critical to cellular survival, and as a result, is carefully and extensively regulated by the cell by what is called a **transcriptional regulatory network.**  Figure 1.1 is a graphic depiction of the known transcriptional regulatory network in *Escherichia coli*, a gut bacterium that is probably the best studied of all organisms.  To give you an example of how extensive the control is, only about half of the roughly 4,400 genes in *E. coli* are expressed under typical laboratory growth conditions.  The network in Figure 1.1 contains most of what we know about *E. coli* transcriptional regulation; the picture includes 116 proteins that regulate transcription (often called **transcription factors**) and 577 **target genes.**

One-half of the transcription factors in Figure 1.1 (58 proteins) appear in red.  These transcription factors regulate their own expression; they are transcription factors, but also target genes!  This phenomenon is called **autoregulation.**  Hopefully you get a sense of how important autoregulation is to the cell simply by seeing how pervasive it is in Figure 1.1, but scientists have also quantified this extensiveness by comparing the *E. coli* network to randomly generated networks of the same size.  Specifically, they calculated how many cases of autoregulation you would get by chance if the links between transcription factors and target genes were shuffled randomly throughout the network; they found that autoregulation is 30-60 times less likely to occur in a random network that is similar in size to the *E. coli* network (An Introduction to Systems Biology: Design Principles of Biological Circuits, which is listed in the Recommended Reading for this chapter, considers this calculation in more detail).  Interestingly, autoregulation is even more common for transcription factors that regulate the expression of large numbers of genes.  Approximately 70% of these transcription factors are autoregulated (Figure 1.1), suggesting that autoregulation plays a fundamental role in these cases.

Let’s focus on one example of autoregulation, the transcription factor FNR (fumarate and nitrate reduction, near the center of Figure 1.1).  This transcription factor’s activity depends on the presence of molecular oxygen and it regulates *E. coli’s* transitions between aerobic and anaerobic environments, for example between the gut and fecal environments.  Figure 1.2 depicts how the FNR transcription factor regulates its own expression as a **dimer**.  The figure depicts a particular location on *E. coli’* chromosome (a “locus”).  The hinged arrow represents the **promoter** (where transcription starts) for the gene that’s being expressed (*fnr*), and the helix to the right of the promoter depicts the part of the gene that actually contains information for making the FNR protein.  In the upstream DNA (to the left of the arrow), the dark regions are places where the FNR protein can bind to the promoter of the *fnr* gene.

Binding of FNR to the promoter physically blocks access to its own gene, preventing *fnr* from being transcribed.  Whether or not FNR can bind to DNA depends on the presence or absence of molecular oxygen.  When there is no oxygen in the environment, FNR is active and it binds DNA.  When oxygen is present, it binds the FNR protein, which undergoes a structural change that prevents it from binding DNA.  Once FNR gets activated again, it represses its own transcription.  Interestingly, this means that when this transcription factor becomes active, expression of its gene can go down!


![Figure 1.2](https://drive.google.com/uc?export=view&id=1ww9wMvepm4S0HOzdzVZ019J6F_3OJOFX)


>**Figure 1.2. The *fnr* promoter contains binding sites for its own protein
product.** In this autoregulatory circuit, pairs of FNR protein molecules, known as dimers, form in the absence of oxygen and are able to bind upstream of the *fnr* promoter (above), preventing RNA polymerase from transcribing mRNA from *fnr*. When oxygen is present (below), FNR dimers cannot form, and FNR can no longer repress its own transcription. The fnr promoter also contains binding sites for other binding proteins, which are not depicted here.

Active *E. coli* transcription factors can exert a positive, negative, or dual (positive in some cases, negative in others) influence on gene expression (Figure 1.1), but in regulating expression of their own genes (autoregulation), they strongly favor negative regulation. Roughly 70% of the autoregulatory transcription factors in *E. coli* exert a negative influence on their own gene expression.

##***Section 1.3: Our theme: a typical negative autoregulatory circuit***

Our goal for the next several chapters is to consider a typical negative autoregulatory transcription unit somewhat analogous to FNR.  A schematic diagram of how this type of circuit works is shown in Figure 1.3.  A protein **complex**, RNA polymerase, binds the promoter region just before (“upstream of”) the gene.  It then travels along the gene, transcribing it to mRNA in the process.  Next, the free mRNA is bound by a **ribosome**, which travels along the mRNA and translates it into a protein, our transcription factor.  We’ll always use the word “transcribe” to describe the transition of information from DNA to mRNA, and the word “translate” for the transition from mRNA to protein.

![Figure 1.3](https://drive.google.com/uc?export=view&id=12pCpZkFsHFVbTOUNtrBiTwn2nRBDbddG)

>**Figure 1.3. Our typical negative autoregulatory circuit.** Our favorite gene is transcribed into mRNA by RNA polymerase, then translated into protein by the
ribosome. The protein is activated by an external signal; the active protein product binds the DNA directly upstream of the gene and blocks access by RNA
polymerase. The gene is thus shut off – no more mRNA or protein can be made –
and over time the amounts of protein and mRNA will be reduced.

Figure 1.3 also illustrates how expression of the gene is controlled.  The protein is activated by an external **stimulus** and becomes able to bind a specific site in the promoter, called an **operator** or cis-regulatory element (the darker region of the helix at the left side of the DNA molecule in Figure 1.3).  When the transcription factor is bound to the operator, RNA polymerase is no longer able to bind the promoter region for this gene and as a result, no more mRNA or protein is expressed.  After some time, the cellular amounts of both molecules will diminish.

Of course, people have been studying circuits like this for decades, and so many experimental and computational approaches have been developed for detailed analyses.  Sidebar 1.2 will give you a general flavor of the diversity of approaches currently available.  There are also many ways to study these systems computationally.  Using **bioinformatics**-based approaches, we could compare sequences of the gene itself or parts of the promoter region to find similar transcription factors in other organisms or genes that are likely to have similar regulation in the same organism, respectively.  Several interesting data-mining strategies and databases have also been developed to help you learn what is already known about any given gene; EcoCyc, which stores information such as that used to construct Figure 1.2, is just such a database.


> <u>Sidebar 1.2: Experimental measures we could use to interrogate our circuit</u>

> One of the great revolutions in modern biology came about through the recent development of techniques to make hundreds, thousands, and even millions of measurements simultaneously.  We refer to these methods as high-throughput or “global” approaches, in contrast to the classical “local” approaches first employed by biologists.  It’s important to remember that even though these modern approaches appear more powerful, they are also more expensive and need advanced analysis techniques incorporating computer programming and statistics – so if you only want to measure the abundances of a handful of molecules, it may be most efficient to consider classical approaches.

> In systems biology we commonly measure the abundances of RNAs and proteins. First let’s consider how we’d measure RNA abundance.  If you only wanted to measure the abundances of a few different RNAs, you’d probably use a technique such as **northern blotting** or fluorescence *in situ* **hybridization**, both of which “count” RNA molecules by binding them to other molecules carrying a detectable label.  Measuring a fairly large set of RNAs could be achieved with **quantitative polymerase chain reaction (qPCR)**, a highly sensitive but gene-specific technique.  Finally, if you wanted to count many thousands of RNAs – or all of the RNAs encoded by an organism – you could take advantage of gene **microarray analysis** or sequencing the RNA molecules themselves.

> Protein abundance is also a critical component of cellular behavior.  To measure the abundance of a small number of proteins, you could use **western blotting**, a technique involving **antibody** detection that is conceptually similar to northern blotting.  **Mass spectrometry** would provide measurements of a larger set of proteins, and is especially useful for cases in which you don’t have an antibody against your proteins.  Two-dimensional protein **gel electrophoresis** can provide simultaneous relative measurements of a large variety of proteins, and may even capture an organism’s entire protein repertoire.

> We also have tools to investigate the special case of **DNA-protein interactions**, which form an input to the circuits we consider in this book.  The most popular way to detect these interactions is to first cross-link the proteins to the DNA with formaldehyde, then gather either a particular set of DNA-protein interactions with an antibody (**chromatin immunoprecipitation, or ChIP**) or the entire repertoire of DNA-protein interactions from an organism, depending on your needs: if you expect to find, or are interested in, only a small number of binding sites, you would use qPCR of the DNA bound to the protein, but if you wanted to query many binding sites or the entire genome, you could hybridize the pool of DNA that was bound to your protein to a microarray (ChIP-chip) or sequence it (ChIP-seq).

Learn to love this autoregulatory circuit now!  You’re going to be seeing a lot of it, because how we model the integrated function of this circuit will be broadly applicable to systems biology as a whole.  We will use five different approaches on the circuit.  First, we’ll use **Boolean logic.**  Next, we will consider sets of **ordinary differential equations**, using **analytical solving techniques, graphical solving techniques,** and **numerical solving techniques** to solve these equations.  Finally, we will learn how to build **stochastic simulations.**  These methods will form the foundation of your systems biology toolkit.

On to Boolean!


##**Chapter Summary**
The goal of the first six chapters of this book is to explore several different modeling techniques, and to investigate how they can be applied to molecular systems in cells.  We will approach this goal by taking a simple and very important biological circuit, the negative autoregulatory circuit, and applying our techniques to analyze it.  This circuit is ubiquitous in E. coli, and although it is relatively simple in biological terms, you will see that it can behave in ways that are quite complex.

## **Recommended Reading**

>Alon, U.  An Introduction to Systems Biology: Design Principles of Biological Circuits.  Chapman & Hall/CRC, 2007.

>Alon, U.  E. coli transcriptional regulations. http://www.weizmann.ac.il/mcb/UriAlon/Network_motifs_in_coli/ColiNet-1.1/regInterFullFiltered.html (accessed April 22, 2013).

>Doyle, J. C., Francis, B. A., Tannenbaum, A. R.  Feedback Control Theory.  Dover Publications, 2009.

>Neidhardt, F. C., Ingraham, J. L., Schaechter, M.  Physiology of the Bacterial Cell: A Molecular Approach.  Sinauer Associates Inc., 1990.

>EcoCyc: Encyclopedia of Esherichia coli K-12 Genes and Metabolism.  www.ecocyc.org.

>Herrgård, M. J., Covert, M. W., Palsson, B. Ø.  Reconciling gene expression data with known genome-scale regulatory network structures.  Genome Research.  2003. 13(11):2423-34. https://genome.cshlp.org/content/13/11/2423.full
