# **Chapter 7: Transcriptional Regulation**

## ***Learning Objectives:***

* Identify the major motifs in gene transcriptional regulatory networks
* Calculate the dynamic properties associated with these motifs using multiple
methods
* Compare model outputs with corresponding experimental data
* Understand how simpler motifs combine to form larger, more complex motifs

You’ve seen most of the commonly used approaches to modeling biological networks,
and you’ve applied these approaches to some relatively simple circuits. Now it’s time to
extend these techniques to larger, more complicated biological systems. Section II will
lead you through the strategies that researchers have used to tackle three kinds of
biological networks at a larger scale: transcriptional regulation, or the modification of
transcription factor activity to affect gene expression; **signal transduction**, or how cells

sense their surrounding environments and initiate appropriate responses; and carbon-
energy **metabolism**, which breaks down nutrients from the environment to produce all of

the building blocks to make a new cell.
To address these topics, I’ll have to cover much more biology. We’ll also learn another
analysis method or two along the way, but by and large we will be applying methods that
you have already learned to more difficult problems. You will soon find that you are
already well equipped to understand and critique existing models as well as to create your
own!

## **Section 7.1. Transcriptional regulation and complexity**

As an example, let’s consider gene expression again. We’ve already looked at regulation
extensively in Section I, but this chapter is going to add a new layer of complexity,
including the interactions of multiple transcription factors to produce more complex
expression dynamics.

Let’s start with one of the most exciting events in the history of biology – the publication
of the human genome in 2001. David Baltimore, a preeminent biologist and Nobel
laureate, commented on the event with this quote: “I’ve seen a lot of exciting biology
emerge over the past 40 years. But chills still ran down my spine when I first read the
paper that describes the outline of our genome ...” (see Baltimore in Recommended
Reading; reprinted by permission from Macmillan Publishers Ltd: *Nature*, 2001).

At the time, I remember that a number of interesting aspects of the sequence had us
talking. As Baltimore wrote, “What interested me most about the genome? The number
of genes is high on the list ... it is clear that we do not gain our undoubted complexity
over worms and plants by using many more genes. Understanding what does give us our complexity... remains a challenge for the future ...” (see Baltimore in Recommended
Reading; reprinted by permission from Macmillan Publishers Ltd: *Nature*, 2001).

It’s not only the number of the genes in these genomes that was surprising; many genes
encode proteins that are essentially the same, and carry out the same functions. Many
scientists had previously assumed that the differences between species depended mostly
on different genes: a human had human genes, a mouse had mouse genes, a fish had fish
genes, and a sea urchin had sea urchin genes. However, the genomic sequences of all of
these organisms suggested that differences in the gene complement played a much
smaller role than first anticipated. For example, we share nearly all of our genes with
mice.

So what makes us different? The key is not primarily in the genes themselves, but how
they are expressed. Humans have approximately eight times as much DNA sequence as
the puffer fish, but essentially the same number of genes. That “extra” DNA used to be
called “junk” (honestly!), but it’s now clear that most of it is functional, and its primary
function appears to be the regulation of gene expression.

Gene expression can be regulated at several points, including transcription of DNA to
RNA; RNA processing, localization, and degradation; translation of the mRNA transcript
into a peptide chain; and activity of the final protein. In this chapter, we’ll concentrate on
transcriptional regulation (protein activity will appear in the next chapters). As you’ll
remember from Figure 1.2, we introduced this control with the example of a transcription
factor that is bound by a small molecule, changing the protein’s affinity for its binding
site on the DNA. The binding of the transcription factor in turn affects the recruitment of
the RNA polymerase complex and, subsequently, the expression of mRNA from the
gene.

![Figure 7.1](https://drive.google.com/uc?export=view&id=19-CCvFUodSrB3qmMosTz1OTcquvrR8Ux)

> **Figure 7.1. The E. coli transcription factors CRP and LacI interact on the
lac promoter to control gene expression.** (A) Schematic of the various inputs
to the lac promoter, the combinations of which determine the transcriptional
state of the lac operon. (B) The same system represented as a circuit diagram.
Pointed arrowheads denote positive regulation, while blunt arrowheads indicate
negative regulation.

## **Section 7.2. More complex transcriptional circuits**

Section 7.1 contained the simplest example we could have considered; now, let’s move
toward more complex modeling of transcriptional regulation by considering a pair of
transcription factors acting on the same gene promoter.

The classic real-world example of such regulation is the regulation of the lac genes,
whose gene products enable *E. coli* to grow on lactose, and whose expression depends on
two transcription factors (Figure 7.1A). E. coli prefers to eat glucose, and it won’t
metabolize anything else until the glucose is gone. In *E. coli*, this metabolic switching is accomplished with the transcription factor CRP (you encountered CRP in Figure 1.1).
CRP binds the promoters of hundreds of genes once it is bound to cAMP, a small
molecule whose presence indicates that none of *E. coli’s* favorite sugar sources are
available. The CRP-cAMP complex can then bind operator sites that control the
expression of genes that enable the uptake and metabolism of other carbon sources.

One of these carbon sources is lactose, a sugar characterized by a particular link between
galactose and glucose. Utilization of lactose depends on enzymes and a transporter, the
genes for which appear together on the *E. coli* chromosome as a single transcription unit
– an operon. The transcription of multiple genes as a single transcript is a more efficient
way for bacteria to coordinate gene expression. Operons are common in bacteria, but are

largely absent from more complex organisms, which tend to rely on complex post-
transcriptional regulatory processes.

The promoter of the *lac* operon contains a binding site for the CRP-cAMP complex.
Thus, the operon is only fully expressed in the presence of CRP-cAMP, which only
appears in the absence of glucose. However, what if there is no lactose in the
environment either? It doesn’t make sense to express the *lac* genes unless both glucose is
absent and lactose is present. *E. coli* addresses this problem with another transcription
factor: lacI (the “I” stands for “inhibitor”). Free lacI inhibits transcription by binding its
own operator in the lac operon promoter. However, when lactose is present in the
external environment, one of its metabolic products (allolactose) binds lacI, reducing
lacI’s binding affinity to the operator and enabling the transcription of the *lac* operon.

So now you know that there are two interactions between transcription factors and
operator sites upstream of the *lac* operon (Figure 7.1A). However, this information is not
sufficient to predict the transcriptional state of the *lac* operon, because you also have to know how these two sites interact. In our case, CRP-cAMP has to be bound to the DNA,
and LacI must be absent from the DNA for maximum expression (an AND relationship).
In other cases where two transcription factors regulate expression of the same
transcription unit, either one of the transcription factors may be sufficient to induce
expression – an OR relationship.

The biology that we’ve described in detail here is sometimes represented more compactly
(Figure 7.1B). Signals are shown as activating (arrow) or inhibiting (blunt arrow) the
activity of transcription factors, which then interact at the chromosome to determine the
expression of target genes. Based on this diagram, we can use a simple function (CRP
AND NOT LacI) to represent the interaction, but the representation of regulatory control
doesn’t have to be Boolean.

As you can imagine, the regulation of transcription can be much more complicated in
mammalian systems, with some promoters more than 10 kb long (~100x longer than the
typical bacterial promoter), and significantly longer than the gene itself. There are also
many more factors that interact at the promoter, ~30 on average in some systems. We’ll
consider an example in Chapter 8.

## **Section 7.3. The transcriptional regulatory feed-forward motif**

To move toward such complex networks, let’s discuss more advanced regulation
architectures in E. coli. Earlier, we examined a representation of the E. coli
transcriptional regulatory network (Figure 1.1), and we spent a considerable amount of
time studying the most common motif of that network: simple autoregulation, where the
protein product of a given gene regulates the expression of that gene. This motif was the
only significant motif that was identified in the regulatory network when only a single
gene was considered; when two genes were considered, no additional motifs of
significance were found (see Rosenfeld et al. in Recommended Reading).


Consideration of three-gene combinations, however, revealed a very interesting motif
(Figure 7.2). Here, transcription factor X regulates the expression of two other proteins,
Y and Z. However, Y is also a transcription factor, and X and Y both regulate the
expression of Z. This motif was called a **feed-forward loop** (control theorists: remain
calm. This utilization of “feed-forward” does not correspond well to the usage in control
theory. That’s okay.).

![Figure 7.2](https://drive.google.com/uc?export=view&id=1qTKJCAM4RimwW3Xvt0Qmd5tJRdQIuhKc)

> **Figure 7.2. The feed-forward loop.** This motif was found much more
frequently in biological transcription networks than in randomly generated
networks. The transcription factor X regulates the expression of two proteins, Y
and Z. However, Y also regulates Z.

This motif occurred far more often in the E. coli regulatory network than would be
expected by random chance, as was demonstrated by performing the same analysis on
randomized networks that had the same nodes and the same number of connections as the
E. coli network, but the connections were randomly scrambled. In the randomized
networks, an average of only about two feed-forward motifs were identified, but in the
real E. coli network, there were 42 (Douglas Adams, are you reading this?!?).

So what makes feed-forward loops special? The short answer is that they exhibit
fascinating and useful dynamics in terms of how they express their genes. To arrive at
this answer, we first have to recognize that there are many kinds of feed-forward loops.
Figure 7.3 illustrates all eight of the possible combinations of negative and positive
regulation. Each combination has a direct arm from X to Z (top) and an indirect arm
through Y (bottom), and either arm can exert a positive or negative influence on the
expression of Z.

Now, notice that for some of the combinations, both arms exhibit the same kind of
influence, whether positive or negative. For other combinations, the arms have different
influences: one arm regulates positively and the other negatively. When the direct and
indirect arms regulate in the same way, the combination is called **internally consistent**
(or sometimes, “coherent”), meaning that the direct regulation from X to Z matches the
indirect regulation through Y. For example, on the upper left loop in Figure 7.3, X has a
positive and direct effect on Z, but it also exerts a positive and indirect effect because X
positively regulates Y, and Y positively regulates Z. Keep in mind that if Y has a
negative effect on Z, and X has a negative effect on Y, then the overall indirect effect of
X on Z is positive. Four of the feed-forward loop combinations are internally consistent,
and the other half are internally inconsistent; the direct and indirect actions of X on Z are
opposite.

![Figure 7.3](https://drive.google.com/uc?export=view&id=18lw6Vh4Qj0puiYokn21pytrknx1rLNRR)


> **Figure 7.3. Possible instances of regulatory feed-forward loops.** Both arms of
internally consistent feed-forward loops exert the same type of control, whether
positive or negative. Internally inconsistent loops have one positive arm and one
negative arm. As in Figure 7.2, pointed arrowheads represent positive regulation
and blunt arrowheads denote negative regulation. The red box highlights the two
instances that were specifically overrepresented in both *E. coli* and yeast. Alon, U.
*An Introduction to Systems Biology: Design Principles of Biological Circuits.*
Chapman & Hall/CRC, 2007. Reproduced with permission of TAYLOR &
FRANCIS GROUP LLC in the format Republish in a book via Copyright Clearance
Center.

## **7.4. Boolean analysis of the most common internally consistent feed-forward motif identified in *E. coli***

Going back to the *E. coli* network, only two of the feed-forward loop submotifs, one
coherent and one incoherent, were actually found to significantly contribute to the
regulatory network (red box in Figure 7.3). First let’s consider the coherent feed-forward
loop (Figure 7.4). Here, a gene $g_x$ is transcribed to produce a protein $p_x$, which can then
be activated by a signal $s_x$. The active protein ${p_x}^*$ can bind the promoters for genes gy
and gz. The product of gy is $p_y$, which can be activated by sy to produce ${p_y}^*$. The
combination of ${p_x}^*$ and ${p_y}^*$ at the gz promoter leads to the production of $p_z$.

![Figure 7.3](https://drive.google.com/uc?export=view&id=1gtNEqyMdPTYPE62gxwsK4fnVE8Wu9EqE)

> **Figure 7.4. Gene regulatory circuit diagram for the most common
internally consistent feed-forward loop in *E. coli.*** Notation is as in Figure
7.1B, but notice the arrow (red), which highlights the positive regulation of
py by px.

Let’s look at the dynamics of this circuit. A good place to start is with a Boolean
analysis, similar to Chapter 2. Our inputs will be $s_x$ and $s_y$, and we’ll assume that all of
the genes are present $(g_x = g_y = g_z = 1)$. Furthermore, as in earlier chapters, let’s assume
that sufficient activation of $p_x$ and $p_y$ occurs essentially instantaneously if $s_x$ and $s_y$ are
present. Our equations are then reduced to:

$\text{Activation}_{px}$ = IF $s_x$

${p_x}^*$ = IF $\text{Activation}_{px}$ AFTER SOME TIME

$\text{Activation}{py}$ = IF ${p_x}^*$ AND $s_y$

${p_y}^*$ = IF $\text{Activation}_{py} AFTER SOME TIME

$\text{Expression}_{pz}$ = IF ${p_x}^*$ AND ${p_y}^*$

$p_z$ = IF $\text{Expression}_{pz}$ AFTER SOME TIME

Using these equations, we can draw the (partial) state diagram shown in Figure 7.5. We
begin with initial conditions of no active protein ${p_x}^*$, ${p_y}^*$, or expressed protein pz, and
sudden addition of signals $s_x$ and $s_y$. Addition of $s_x$ leads to the activation of px to ${p_x}^*$,
followed by the expression and activation of ${p_y}^*$ (expression is the step that takes more
time; both processes are lumped into our equations). Finally,once ${p_x}^*$ and ${p_y}^*$ are active,
$p_z$ is expressed to yield the steady state of the circuit.

![Figure 7.5](https://drive.google.com/uc?export=view&id=1FT2tpH9naHhuVRprzneoknorI7TVqzeg)

> **Figure 7.5. State matrix (A) and dynamics (B) of the coherent feed-forward
loop in Figure 7.4, in the presence of both signals.** After $p_x$ is activated to ${p_x}^*$,
$p_y$ can be expressed and subsequently activated, after which pz can be expressed.
Note that the increase in ${p_x}^*$ occurs after stimulus is added because the Boolean
rule is AFTER SOME TIME.

The dynamics of this circuit appear in Figure 7.5B. Notice that there are two periods of
expression between activation of the circuit and the expression of $p_z$. If the circuit were a
simple induction circuit with no feedback (if only ${p_x}^*$ regulated the expression of $p_z$),

then $p_z$ would be expressed in roughly one-half the time that it takes with the feed-
forward motif. The motif therefore increases the time required for expression of the

target gene.

Figure 7.5 is our analysis of what happens when the circuit is initially inactive (no
stimuli) and we then activate it; now, let’s contrast that behavior with a circuit that is
initially active and is then inactivated. Let’s begin with the state-matrix approach shown
in Figure 7.6A. When the stimulus is removed, ${p_x}^*$ becomes rapidly deactivated to $p_x$.
The expression of both $p_y$ and $p_z$ depends on ${p_x}^*$, and so neither protein can be produced.
As a result, both proteins decay over the same time period (Figure 7.6B).

Thus, when this system is activated, there are two time steps: one for the expression and
activation of ${p_y}^*$ and one for the expression of $p_z$. However, when the system is
deactivated, there is only a single time step, because ${p_y}^*$ and $p_z$ are removed
simultaneously. This incoherent feed-forward loop created a switch with different ON
and OFF times! Why is this strategy useful? Uri Alon, whose team at the Weizmann
Institute of Science originally identified these motifs, gives the example of the door of an
elevator. It’s very important that the door be safe, which means that it should start to
close slowly but stop closing quickly, for example at the instant that someone’s foot
triggers the safety mechanism.

![Figure 7.6](https://drive.google.com/uc?export=view&id=11EzQsMBNRpWK0drfFiuQyTIRjH7un9HS)

> **Figure 7.6. State matrix (A) and the dynamics (B) illustrate the de-
activation of the feed-forward loop in Figure 7.4.** In this case, $s_x$ is switched
to zero, which leads to deactivation of ${p_x}^*$ (red arrow 1), followed by a halt in
the expression of $p_y$ and $p_z$. Compare these figures with Figure 7.5.

## **Section 7.5. An ODE-based approach to analyzing the coherent feed-forward loop**

We’ve discussed how to analyze this circuit using a Boolean approach; a parallel
approach (favored by Uri Alon, see Recommended Reading) would be to write ODEs as
follows. We’ll begin with $d[p_y]/dt$ in the usual formulation:

> <h3> $\frac{d[p_y]}{dt} = prod - loss$

*(Equation 7.1)*

We still represent the loss term as proportional to the amount of $d[p_y]/dt$. For production,
we will invoke a **threshold concentration**: transcription of $p_y$ can only begin once $[{p_x}^*]$
reaches a certain value. We will indicate threshold notations with the notation $K_{ab}$ where
a denotes the transcription factor and b the target gene; therefore, we will add a $K_{xy}$ term
to Equation 7.1. We further assume that the expression of $p_y$ is maximal when $[{p_x}^*]$ is
greater than the threshold constant $K_{xy}$, and that otherwise the expression of $p_y$ is equal to
zero. Adding these details to our equation, we obtain:

> <h3> $\frac{d[p_y]}{dt} = k_{yprod}Θ ([{p_x}^*] > K_{xy}) - k_{ydeg}[p_y]$

*(Equation 7.2)*

where the function θ (statement) is equal to one if the statement in parentheses is true,
and equal to zero if the statement is false. For the case in which $s_x$ has been added at a sufficient concentration for activation, θ = 1, and Equation 7.2 is reduced to:

> <h3> $\frac{d[p_y]}{dt} = k_{yprod} - k_{ydeg}[p_y]$

*(Equation 7.3)*

We’ve solved equations like this before (in Chapter 3, for example), and so you should be
able to show that the solution of this equation is:

> <h3> $[p_y](t) = [p_{y,ss}](1 - e^{-k_{ydeg}t})$

*(Equation 7.4)*

where $[p_{y,ss}]$ is determined by:

> <h3> $0 = k_{yprod} - k_{ydeg}[p_{y,ss}]$

*(Equation 7.5)*

and therefore:

> <h3> $ [p_{y,ss}] = \frac{k_{yprod}}{k_{ydeg}}$

*(Equation 7.6)*

In general, the reactions that lead to a transcription factor being activated happen at a
significantly faster rate (on the order of seconds or less) than the rate of gene expression
(minutes). As a result, for our purposes here we’ll assume that the transition from py to
${p_y}^*$ is very fast in the presence of sy, and so in this case, $p_y(t) = [p_y]^*(t)$.

The equation for $p_z$ is similar to the equation for $p_y$, but in this case the production of $p_z$ is
based on two conditions occurring simultaneously:

> <h3> $ \frac{d[p_z]}{dt} = k_{zprod}Θ([{p_x}^*] > K_{xz})Θ([{p_y}^*] > K_{yz}) - k_{zdeg}[p_z]$

*(Equation 7.7)*

When ${p_x}^*$and ${p_y}^*$ are sufficiently large, the solution is similar to that for [$p_y$]:

> <h3> $[p_z](t) = [p_{z,ss}](1 - e^{-k_{zdeg}t}), [p_{z,ss}] = \frac{k_{zprod}}{k_{zdeg}}$

*(Equation 7.8)*

The response of the system to a sudden addition of $s_x$ and $s_y$ is depicted in Figure 7.7.
After $s_x$ is added, the expression of py and consequently ${p_y}^*$ increases until [${p_y}^*$] reaches the threshold $K_{yz}$. At this time, $p_z$ expression is induced.

![Figure 7.7](https://drive.google.com/uc?export=view&id=1MEKG_JvFERUl6rE2DimTqWMDOkxDgmga)

> **Figure 7.7. Dynamic response of the coherent feed-forward loop in
Figure 7.4 to (A) sudden addition or (B) sudden removal of $s_x$ at time
zero.** Notice that there is a delay (Ton) between expression changes in ${p_y}^*$
and $p_z$ when $s_x$ is added, but not when $s_x$ is removed, as we also observed
in our Boolean analysis (Figures 7.5 and 7.6). Remember that $p_y$ = ${p_y}^*$
due to an assumption of rapid $p_y$ activation. Modified from Alon, U. *An
Introduction to Systems Biology: Design Principles of Biological Circuits.*
Chapman & Hall/CRC, 2007. Reproduced with permission of TAYLOR
& FRANCIS GROUP LLC in the format Republish in a book via
Copyright Clearance Center.

Let’s consider the time required for $[{p_y}^*]$ to reach the threshold $K_{yz}$, as this delay was found to be the most notable aspect of this circuit’s dynamic response. As shown in Figure 7.8, we’ll call this delay Ton, and determine it from Equation 7.4 (remembering
that $[p_y](t) = [p_y]^*(t)$ if $s_y$ is present):

> <h3> $[{p_y}^*](T_{on}) = [p_{y,ss}](1 - e^{-k_{ydeg}T_{on}})$

*(Equation 7.9)*

Since $[{p_y}^*](T_{on}) = K_{yz}$, we can solve for $T_{on}$:

> <h3> $K_{yz} = [p_{y,ss}](1 - e^{-k_{ydeg}T_{on}})$

*(Equation 7.10)*



> <h3> $T_{on} = \frac{1}{k_{ydeg}}ln(\frac{1}{1 - \frac{K_{yz}}{[p_{y,ss}]}})$

*(Equation 7.11)*

From Equation 7.11, you can see that when $[p_{y,ss}]$ is much larger than $K_{yz}$, $K_{yz}/[p_{y,ss}]$
approaches zero, which means that $T_{on}$ will also reduce to zero for a given kydeg. As the value of $[p_{y,ss}]$ approaches the threshold value $K_{yz}$, the logarithm term of Equation 7.11 increases rapidly, meaning that the value of $T_{on}$ increases dramatically as well.

## **Section 7.6. Robustness of the coherent feed-forward loop**

As you can see, the overall conclusions that we drew in Section 7.5 using ODEs were
very similar to the results of our Boolean-based approach in Section 7.4. However,
ODE-based formulation of the model also allows us to demonstrate another interesting
aspect of this coherent feed-forward circuit: its **robustness**. In this context, robustness
means that the system won’t change much in response to a small perturbation. As an
example, let’s say that our circuit is exposed only briefly to $s_x$. As before, $p_y$ begins to be
expressed, but it doesn’t reach the critical threshold for pz expression, and thus $p_z$ is never
expressed (Figure 7.8). The robustness of the system is tuned by the value of the
threshold: a higher threshold takes longer for the $p_y$ value to obtain, and so expression of
$p_z$ would be robust to even longer stimulus times.

![Figure 7.8](https://drive.google.com/uc?export=view&id=1RFNoZXQYwcR8vGa8DDvje2Tz-AGv1D1P)

> **Figure 7.8. Response of the feed-forward loop in Figure 7.4 to a brief pulse
of $s_x$.** The pulse of stimulus (highlighted by an arrow in the bottom plots) leads
to only brief activation of $p_x$, which in turn leads to a maximum expression of $p_y$
that is below the threshold $K_{yz}$ (dashed line). As a result, $p_z$ expression is never
induced. For contrast, a longer pulse of $s_x$ and its consequences are shown at
right (and in Figure 7.7A). Adapted by permission from Macmillan Publishers
Ltd: Shen-Orr, S. S., Milo, R., Alon, U. Network motifs in the transcriptional
regulation network of *Escherichia coli.* Nature Genetics. 2002. **31**(1):64-8.

## **Section 7.7. Experimental interrogation of the coherent feed-forward loop**

Having mathematically analyzed this coherent feed-forward circuit in detail, Alon’s team
decided to see whether the experimental data from a naturally occurring feed-forward
circuit in *E. coli* actually exhibited the dynamics that theory predicted. They focused on the *ara* genes, which are regulated by the transcription factors AraC and CRP. AraC is regulated transcriptionally by CRP, and is activated in the presence of arabinose (Figure
7.9A). By keeping the bacteria in an arabinose environment, and suddenly adding or
“removing” cAMP (see figure legend), the team replicated the situation that they
modeled. To enable comparison, the team simultaneously considered a control circuit
that responded directly to CRP but not to AraC (grey lines in Figure 7.9B and C).
Consistent with the theory shown in Figure 7.7, the expression of the *ara* genes was
delayed when cAMP was added (Figure 7.9B), but not when cAMP was “removed”
(Figure 7.9C) – an elegant demonstration of their theory!

![Figure 7.9](https://drive.google.com/uc?export=view&id=1u4aOJicLO1TQvv2ujP4HBvr00kvLBlVW)

> **Figure 7.9. Monitoring promoter states when turning on and off circuits with
(black) and without (grey) feed-forward components.** The feed-forward circuit
(A) and a control circuit (based on the *lac* operon) were (B) turned on by adding
saturating amounts of cAMP to growing cells, and (C) turned off by “removing”
cAMP (actually by adding saturating glucose, which inactivates CRP). The
promoters were transcriptionally fused to GFP, which acted as a reporter of
promoter activity. Note that turning on the circuit is delayed in the feed-forward
loop (B), but the two circuits have similar dynamics when turning off (C).
Mangan, S., Zaslaver, A., Alon, U. The coherent feedforward loop serves as a
sign-sensitive delay element in transcription networks. *Journal of Molecular
Biology.* 2003. **334**(2): 197-204. Reprinted with minor modifications with
permission from Elsevier.

## **Section 7.8. Changing the interaction from an AND to an OR relationship**

In our analysis of the coherent feed-forward circuit in Sections 7.4-7, we focused on an
AND relationship between the transcription factors controlling gene expression: both ${p_x}^*$ AND ${p_y}^*$ were required. Several other relationships are possible that can lead to
differences in network behavior. For example, let’s change our current circuit such that
the interaction at the promoter changes from an AND interaction to an OR interaction.
We can use our Boolean toolbox for a quick analysis. Our equations for $\text{Activation}_{px}$, ${p_x}^*$, $\text{Activation}_{py}$, ${p_y}^*$, and pz remain the same as in Section 7.4, but the equation for $p_z$
expression becomes:

$\text{Expression}_{pz} = \text{IF } {p_x}^* \text{OR } {p_y}^*$

![Figure 7.9](https://drive.google.com/uc?export=view&id=14RrScCJRrL72qHohpW795u8-ApI5YWpj)

> **Figure 7.10. The state matrix (A) and dynamics (B) for the OR circuit.**
Note the delay when the stimulus sx is removed, but not when it is added.

The state matrix and dynamics for the OR circuit appear in Figure 7.10. Notice that in
this case, the delay occurs when the stimulus is removed! This simple change in the
interaction at the promoter therefore determines whether the delay occurs in the
expression of $p_z$ or in the decay of $p_z$.

Once again, Alon’s team set out to experimentally verify these predictions. They focused
their investigation on some of the genes that regulate expression of the bacterial
flagellum; these genes are naturally regulated by a coherent feed-forward circuit with an
OR interaction. As shown in Figure 7.11, their experimental results strongly agreed with
the theory we worked through above.

![Figure 7.11](https://drive.google.com/uc?export=view&id=10-p4ZZl4UrW2v2l8w5h8cbyuuOjPGbWA)

> **Figure 7.11. Experimental validation of feed-forward dynamics for the coherent
feed-forward loop with an OR interaction at the promoter. (A) The *E. coli*
flagellar system, a control circuit with two inputs and an OR interaction. Both of
these circuit types occur naturally in *E. coli*. (B) For experimental validation, the
production of FlhDC is controlled with a promoter that is induced by the addition of
arabinose (not the native promoter), which serves as $s_x$. The signal $s_y$ comes from a
checkpoint system that monitors the production of a component of the flagellum. The
promoter controlling the *fli* sequences at the bottom of the circuit is fused to GFP as a
reporter. Here, the “on” step is similar to that for a feed-forward circuit, which occurs
when FliA is deleted. (C) Turning the circuit off (by shifting the cells into medium
without arabinose) is delayed for the feed-forward circuit when FliA is deleted (note
the similarity to turning off the circuit in Figure 7.9). Adapted with permission from
Macmillan Publishers Ltd: Kalir, S., Mangan, S., Alon, U. A coherent feed-forward
loop with a SUM input function prolongs flagella expression in *Escherichia coli.
Molecular Systems Biology.* Epub 2005 Mar 29. doi: 10.1038/msb4100010.

## <u> **Practice Problem 7.1** </u>

*Now that we’ve analyzed a coherent feed-forward loop from E. coli, let’s consider the
primary incoherent feed-forward loop in Figure 7.12. Draw a state diagram and
calculate the dynamic response for the case in which $s_x$ and $s_y$ are suddenly added.*

![Figure 7.9](https://drive.google.com/uc?export=view&id=10F7ApyC-4ERbSG-iTtce5ujGLIDS43hK)

> **Figure 7.12. An incoherent feed-forward loop, where px exerts a
positive influence on $p_y$ and $p_z$, but $p_y$ has a negative influence on $p_z$.**

**Solution:** The regulatory rules for this feed-forward loop can be written as:


$
\text{Activation}_{px} = \text{IF } s_x \\
p_x^* = \text{IF Activation}_{px} \text{ AFTER SOME TIME} \\
\text{Activation}_{py} = \text{IF } p_x^* \text{ AND } s_y \\
p_y^* = \text{IF Activation}_{py} \text{ AFTER SOME TIME} \\
\text{Expression}_{pz} = \text{IF } p_x^* \text{ AND NOT } p_y^* \\
p_z = \text{IF Expression}_{pz} \text{ AFTER SOME TIME}
$


Using these rules, we draw a state matrix (Figure 7.13A) and plot the resulting dynamics
(Figure 7.13B).

![Figure 7.13](https://drive.google.com/uc?export=view&id=1EMxjjQIpUr8_QVGFVEBUu11ujgXBnMxN)

> **Figure 7.13. The state matrix (A) and dynamics (B) for the incoherent feed-
forward loop in Figure 7.12.** Notice that the expression of pz rises, then falls
again, in a pulse.

This circuit creates a “pulse” of $p_z$ expression! You can imagine that this response is quite useful to the cell. When real instances of these feed-forward loops were examined in $E. coli$, a circuit was identified (Figure 7.14A) in which $[p_z]$ decreases to a new, non- zero steady-state level predicted by the ODEs (Figure 7.14B); experimental data confirmed this prediction (Figure 7.14C).

![Figure 7.14](https://drive.google.com/uc?export=view&id=1IKyJ1T6DNflk68ut-50o9y4dJIs1c3Ld)

> **Figure 7.14. Dynamics of the incoherent feed-forward loop (A) as determined
using ODEs (B) and experimentally (C).** Here, the *galE* promoter is fused to GFP
as a reporter. $s_x$ is cAMP, which activates CRP, and sy is galactose, which causes
GalS to unbind from the galE promoter. Mangan, S., Itzkovitz, S., Zaslaver, A.,
Alon, U. The incoherent feed-forward loop accelerates the response-time of the *gal*
system of *Escherichia coli. Journal of Molecular Biology.* 2006. **356**, 1073-1081.
Reprinted with minor modifications with permission from Elsevier.

## **Section 7.9. The single-input module**

The feed-forward loop was the only motif that Alon’s team found in the all of the
possible three-node interaction sets (Figure 7.3). Now let’s talk about some of the other
motifs that were found when considering greater numbers of nodes, which are easy to
describe but exhibit behaviors that are a bit more complex.

A very common motif, the **single-input module** has one regulator that is solely
responsible for the regulation of several genes, often including itself. The genes may be located in the same operon or the genes may be spread across the genome, in which case
the genes are said to be in the same **regulon** (meaning that they are regulated by the same transcription factor, but aren’t necessarily in physical proximity to the gene encoding that transcription factor). Single-input modules are very unlikely to occur in a random network, especially in the case of one transcription factor regulating 14 or 15 genes.

You can imagine that genes in the same regulon have coordinated expression, just as
genes in operons do. However, all of the genes in a given regulon are not necessarily
expressed at the same time in a single transcript, as is the general rule for genes in an
operon.

For example, consider the arginine biosynthesis single-input module in *E. coli* (Figure 7.15. These genes encode the enzymes required to synthesize the amino acid arginine
(we’ll discuss this metabolic process in more detail in Chapter 9), and they are only
expressed when arginine is not present in *E. coli’s* environment. The genes fall into
seven transcription units: the five individually expressed genes *argA, argD, argE, argF*, and *argI*, the operon encoding *argC, argB,* and *argH*, and the operon encoding *carA* and *carB*. All of these genes are regulated solely by the transcription factor ArgR, which is why they belong to the same single-input module. The gene encoding ArgR is also in the
regulon because it regulates its own expression. However, *argG* is technically not in the regulon, even though its expression is regulated by ArgR, because CRP is known to also control the expression of argG.

![Figure 7.14](https://drive.google.com/uc?export=view&id=1yKneBsN5AZsugwyw7YECRJb8bHOFehii)

> **Figure 7.15. The arginine biosynthesis single-input motif and just-in-time
expression.** At the top left is the transcriptional regulatory motif in which the
argR gene product controls its own expression as well as that of several genes that
encode metabolic enzymes. These enzymes are arranged into the metabolic
pathways that are responsible for arginine biosynthesis (metabolite names appear
in boxes; enzymes that catalyze the conversion of one metabolite to another are
represented by circles and arrows). The expression of each enzyme is shown in
greyscale, with the lightest grey (boxed in red) indicating the inflection point at
which expression of the gene reached half of its maximum. Comparing these
inflection points can give you an idea of the order of gene expression. The Lux
protein, which is luminescent, was used here as the reporter of promoter activity.
Adapted by permission from Macmillan Publishers Ltd: Zaslaver, A., Mayo, A. E.,
Rosenberg, R., Bashkin, P., Sberro, H., Tsalyuk, M., Surette, M. G., Alon, U.
Just-in-time transcription program in metabolic pathways. *Nature Genetics.* 2004.
**36**(5): 486-91.

Figure 7.15 illustrates the relationship between the genes in the single-input module, both in terms of transcriptional regulation and the metabolic pathway (boxes are metabolites, circles are enzymes). The figure also includes a time course detailing the expression of each gene when arginine is removed from *E. coli’s* surroundings.

## **Section 7.10. Just-in-time gene expression**

Interestingly, in Figure 7.15, the genes appear to be expressed roughly in the order
they’re needed, or **just in time**. For example, the enzymes encoded by *argB* and *argC* are induced just after the expression of *argA*, which catalyzes the step in the biosynthetic pathway immediately preceding them. This strategy seems to be a very efficient way to ensure that the enzymes are only made when absolutely required: in the absence of arginine. For a single-celled organism like *E. coli*, the energy and space required to produce and store extra proteins are extremely valuable, and so this efficiency, small as it seems, could yield important dividends in terms of growth rate.

How is just-in-time expression achieved? Let’s consider a transcription factor px that
regulates the expression of three genes: $g_a, g_b$, and, you guessed it, $g_c$. We’ll use the same ODE-based framework as in Section 7.5, and we’ll assume as before that once $s_x$ is added, all of the $p_x $is rapidly activated to ${p_x}^*$. The equations for gene expression are:

> <h3> $\frac{d[p_x]}{dt} = k_{\text{prod}} - k_{\text{xdeg}} [p_x]
$

*(Equation 7.12)*

> <h3> $\frac{d[p_a]}{dt} = k_{\text{aprod}} \theta \left( [p_x^*] > K_{xa} \right) - k_{\text{adeg}} [p_a]$

*(Equation 7.13)*

> <h3> $\frac{d[p_b]}{dt} = k_{\text{bprod}} \theta \left( [p_x^*] > K_{xb} \right) - k_{\text{bdeg}} [p_b]$

*(Equation 7.14)*

> <h3> $\frac{d[p_c]}{dt} = k_{\text{cprod}} \theta \left( [p_x^*] > K_{xc} \right) - k_{\text{cdeg}} [p_c]$

*(Equation 7.15)*

![Figure 7.14](https://drive.google.com/uc?export=view&id=1Icmd4a9ueJwVYH_Fy0dSYrFA7LD09KhB)

> **Figure 7.16. The temporal program of expression from a single-input
module.** As the activity of $p_x$ rises, it crosses the thresholds of activity for
each promoter ($K_{xa}$, $K_{xb}$, and $K_{xc}$) in order. When the activity of $p_x$ declines, it exhibits a “first in, last out” behavior. Adapted by permission from Macmillan
Publishers Ltd: Shen-Orr, S. S., Milo, R., Alon, U. Network motifs in the
transcriptional regulation network of *Escherichia coli. Nature Genetics*. 2002.
**31**(1):64-8.

The equations are identical in form, and as you already know, the production and decay terms determine the steady-state expression level of protein once the genes are induced. The thresholds $K_{xa}$, $K_{xb}$, and $K_{xc}$ determine how long it takes for gene expression to be induced after $p_x$ is expressed. For example, if $K_{xa} < K_{xb}$, then it will take longer for $p_x^*$ to be greater than $K_{xb}$ and so $p_b$ expression won’t be induced until later. Similarly, you can see in Figure 7.16 that if $K_{xa} < K_{xb} < K_{xc}$, expression of these genes will occur in the order $g_a$, then $g_b$, then $g_c$ – just as we saw with the arginine regulon (Figure 7.15)

What does it mean biologically for these thresholds to be different from each other?
Most likely, the difference is related to the binding affinity of the transcription factor for
each gene’s promoter. A promoter with a high-affinity binding site would lead to a low
threshold: even at a low concentration of the transcription factor, binding occurs and
expression is induced.

The single-input module yields concurrent expression and has the potential for sequential induction of gene expression. Notice from Figure 7.16, however, that when expression of $p_x$ decays, $g_c$ is the first gene to respond, followed by $g_b$ and finally $g_a$. In other words, de-activation of the system occurs in reverse order: first in, last out.

## **Section 7.11. Generalization of the feed-forward loop**

Are there any network structures that would give us **first in, first out** behavior? After all, you might expect that this strategy would be even more efficient in terms of enzyme production; enzymes are produced just in time, and only maintained as long as they are needed.

As it happens, another motif that was shown to be overrepresented in *E. coli’s* network
versus randomized networks can indeed produce “first in, first out” behavior. This motif
is essentially a generalization of the feed-forward loop. Whereas the feed-forward loops that we considered in Figure 7.3 had only one target gene, in the generalized feed- forward loop, two transcription factors – one under the transcriptional control of the other – control the expression of many target genes. In this sense, the generalized or multi- gene feed-forward loop may also be conceptualized as a hybrid between the feed-forward loop and the single-input module.

## <u> **Practice Problem 7.2** </u>

*Using the ODE approach we adopted for the single-input motif, write the equations for
the multi-gene feed-forward loop in Figure 7.17, and describe the conditions for which
first in, first out expression of the genes z1 and z2 will be achieved.*

![Figure 7.14](https://drive.google.com/uc?export=view&id=12qntOFoyzNGpu97zAhE2tbWo1D3AAaKe)


> **Figure 7.17. A two-node feed-forward loop with two gene targets.** The
transcription factors that control expression are common to both targets.

**Solution:** The equations appear below. Assume that $s_x$ and $s_y$ are sufficiently available such that $p_x = p_x^*$ and $p_y = p_y^*$.



$
\frac{d[p_x]}{dt} = k_{x_{\text{prod}}} - k_{x_{\text{deg}}} [p_x]
$

$
\frac{d[p_y]}{dt} = k_{y_{\text{prod}}} \theta \left( \left[ p_x^* \right] > K_{xy} \right) - k_{y_{\text{deg}}} [p_y]
$

$
\frac{d[p_{z1}]}{dt} = k_{z1_{\text{prod}}} \left[ \theta \left( \left[ p_x^* \right] > K_{xz1} \right) \text{ OR } \theta \left( \left[ p_y^* \right] > K_{yz1} \right) \right] - k_{z1_{\text{deg}}} [p_{z1}]
$

$
\frac{d[p_{z2}]}{dt} = k_{z2_{\text{prod}}} \left[ \theta \left( \left[ p_x^* \right] > K_{xz2} \right) \text{ OR } \theta \left( \left[ p_y^* \right] > K_{yz2} \right) \right] - k_{z2_{\text{deg}}} [p_{z2}]
$

From our work in Section 7.10, you should already have an intuition that first in, first out behavior depends on the threshold $K$ parameters. If $K_{xz1} < K_{xz2}$, then $p_{z1}$ will begin to be expressed earlier than $p_{z2}$ (Figure 7.18).

![Figure 7.14](https://drive.google.com/uc?export=view&id=1th9lYEs1Ghhh9LffUcGUFnU40XAhiCFI)

> **Figure 7.18. The output of the multi-node feed-forward loop in Figure
7.17, which exhibits “first in, first out” dynamics.** Due to the OR
relationship between the transcription factors, the expression of $p_x$, which
precedes $p_y$ expression, controls the initial induction of $p_{z1}$ (black) and $p_{z2}$
(red). $p_{z1}$ is expressed before $p_{z2}$ because Kxz1 < Kxz2. The decrease in $p_y$
expression occurs after $p_x$ decreases, so $p_y$ controls the decreases in $p_{z1}$ and
$p_{z2}$. Here, $K_{yz1} > K_{yz2}$, and so $p_{z1}$ expression decreases before $p_{z2}$ expression
does. $p_{z1}$ levels go up first, then $p_{z2}$ levels rise; since the levels of $p_x$ and $p_y$
decrease, once again $p_{z1}$ levels decrease before $p_{z2}$ levels do. Modified from
Alon, U. *An Introduction to Systems Biology: Design Principles of
Biological Circuits.* Chapman & Hall/CRC, 2007. Reproduced with
permission of TAYLOR & FRANCIS GROUP LLC in the format Republish
in a book via Co$p_y$right Clearance Center.

Look at the right side of Figure 7.18 to find the requirements for “first out,” or in our case, $p_{z1}$’s expression dropping before that of $p_{z2}$. Notice that the removal of $p_x$ from the system does not affect $p_{z1}$ or $p_{z2}$ due to the OR relationship that is encoded into the expression of both of the corresponding genes. Instead, it is the removal of $p_y$ after $p_x$ has already begun to be lost (and passed the $K_{xy}$ threshold) that determines when $p_{z1}$ and $p_{z2}$ begin to be removed from the system. In this case, the requirement for $p_{z1}$ to be removed first is reflected by the fact that $K_{yz1} > K_{yz2}$. Taken together, these parameters lead to first in, first out expression.

## **Section 7.12. An example of a multi-gene feed-forward loop: flagellar biosynthesis in *E.coli***

Now we’ve identified the network structure (multi-node feed-forward loop) and the
conditions (OR interactions at the promoter, and certain relationships between the
threshold parameters) that lead to first in, last out expression of target genes. But is this
kind of behavior ever exhibited by live *E. coli*?

An impressive real example that produces similar dynamics occurs during construction of
the bacterial flagellum, the long “tail” that *E. coli* uses to propel itself through its liquid
environment. When *E. coli* that grow flagella (some lab strains don’t) are placed in an
environment without much food, they grow flagella so that they can look for more
nutrient-rich environments. This system has been very well characterized; we know
which genes encode all of the parts of the flagellum (Figure 7.19A). As we saw with the
arginine biosynthesis single-input module (Figure 7.15), the flagellar genes are expressed
in the order in which they are required (Figure 7.19B). However, in this case, two
transcription factors regulate the expression of the flagellar biosynthesis genes: FlhDC
and FliA. Furthermore, as you saw in Section 7.8, FliA expression is also regulated by
FlhDC. The resulting multi-node feed-forward loop is shown in Figure 7.19C.

![Figure 7.14](https://drive.google.com/uc?export=view&id=1HipdIWX4TbY-KbIQR-a-vWppKEL1vQv6)

> **Figure 7.19. Regulation of the flagellar biosynthesis genes via a multi-node
feed-forward loop driven by FlhCD and FliA.** (A) The genes are linked to the
part of the flagellum that they encode. (B) Time course of gene expression for all of
the genes involved in the circuit. Promoter activity was monitored by transcriptional
fusions to GFP. As in Figure 7.16, locating the inflection point facilitates
visualization of the order of gene expression. (C) Schematic of the regulation of the system. A “+” indicates that only the first gene in the operon is listed. Modified
from Kalir, S., McClure, J., Pabbaraju, K., Southward, C., Ronen, M., Leibler, S.,
Surette, M. G., Alon, U. Ordering genes in a flagella pathway by analysis of
expression kinetics from living bacteria. Science. 2001. 292: 2080-3. Reprinted
with permission from AAAS.

Alon’s group was particularly interested in the structure of this multi-gene feed-forward
loop and subjected it to intense experimental scrutiny (see Kalir et al. 2001 in
Recommended Reading). They verified that FliA regulated the expression of even the
early-expressed genes (previously thought to be only under FlhDC control), probed the
nature of the interaction between FlhDC and FliA at the promoter regions (a SUM
relationship, which is very similar to, but slightly more complex than, an OR
relationship), and determined the dynamics of activation for FlhDC, FliA, and the target
transcription units.

Those dynamics can be seen in Figure 7.20. The activity of each transcription factor
(corresponding roughly to ${p_x}^*$ and ${p_y}^*$ in Practice Problem 7.2) is measured by the
expression of GFP, whose gene is fused to a protein that is under the sole control of either
FlhDC or FliA (Figure 7.20A). The other seven promoter activities (Figure 7.20B) are
taken from the endogenous flagellar biosynthesis promoters, also linked to GFP
expression.

![Figure 7.14](https://drive.google.com/uc?export=view&id=1YPNC5KGz0ajrV6uhY1wVCxxdDzaN2Xza)

> **Figure 7.20. Dynamics from the flagellar biosynthesis network, determined
experimentally.** See Figure 7.19C for the regulatory schematic of this system. (A)
Black line, the abundance of the fluorescent reporter fused to a mutated *fliL* promoter
that does not bind FliA reflects the activity of the *flhDC* promoter. Grey line,
fluorescent protein abundance when the reporter is fused to a promoter that is only
responsive to FliA. (B) The activities of the various promoters change as the network
shifts from FlhDC-dominated to FliA-dominated. OD stands for optical density, a
measure of how much light passes through a bacterial culture; the more bacteria that
have grown over time, the denser the solution. Promoter activity is normalized by
dividing the measured promoter fluorescence by the OD. Normalization controls for
the fact that fluorescence increases not only as a result of promoter activity, but also
as a by-product of cell growth. Kalir, S. and Alon, U. Using a quantitative blueprint
to reprogram the dynamics of the flagella gene network. *Cell.* 2004. **117**:713-20.
Reprinted with minor modifications with permission from Elsevier.

Notice that FliA activity increases sharply after FlhDC reaches its full activity. This
makes sense, because FliA expression depends on FlhDC. Now look at the endogenous
promoter readouts in Figure 7.20B. During the first phase of expression, when FlhDC is
the predominantly active transcription factor, the promoter activities differ within an
order of magnitude. We already saw in Figure 7.19B that these genes are expressed in a
specific order. However, once FliA becomes the predominant transcription factor, some
promoter activities increase and others decrease, such that all of the promoters have the
same activity (Figure 7.20B).

In other words, these activities do not quite lead to “first-in, first-out” expression
patterns; it’s more like “first-in” followed by a constant rate of maintenance across the
transcription units. Such a pattern is interesting in its own right, and would not be
possible with a single-input module. In any event, the coordination of two transcription
factors can produce dynamics that are quite complex!

## **Section 7.13. Other regulatory motifs**

Two other motifs were found to be overrepresented in the *E. coli* and yeast transcriptional
regulatory networks: the **bi-fan motif** (Figure 7.21A) and the **dense overlapping region** (Figure 7.21B). The bi-fan motif was the only overrepresented four-node motif; it occurs when two transcription factors each regulate the same two target genes. The dense
overlapping region is a generalization of the four-node motif in which a set of
transcription factors regulates a common set of target genes. For this motif, it’s not
strictly necessary that all of the transcription factors regulate all of the target genes, only
that all transcription factors regulate or are regulated by more than one factor or gene and all transcription factors are connected. The properties of these last two motifs haven’t been investigated in detail but I mention them here for completeness. Maybe
characterizing them is work for you to do someday!

![Figure 7.14](https://drive.google.com/uc?export=view&id=1QVJLu5ofXOwVCog8F9qriEnZ9lYtXcvL)

> **Figure 7.21. The last two motifs found to be overrepresented in *E. coli* and yeast.**
(A) A bi-fan motif. (B) A dense overlapping region.

Now we’ve covered essentially all of the motifs that have been identified in a
transcriptional regulatory network! I hope you’ve come to appreciate how dynamics can
change with network structure, and that you’ve obtained a sense of how you could put
many of these motif models together to begin assembling networks representative of the
whole cell.

## **Chapter Summary**

To move toward whole-cell modeling, we need to progress from the simple regulatory
circuits we considered in the earlier chapters of the book to more complex regulatory
circuits. Motifs that occur significantly more often in the *E. coli* and yeast transcriptional
regulatory networks (as compared to randomly generated networks) have been identified:
autoregulation, the feed-forward loop, the single-input motif, multi-gene feed-forward
loops, bi-fans, and dense overlapping regions.

Feed-forward loops are motifs in which a gene’s expression is regulated by two
transcription factors, one of which is also regulated by the other. Such loops can be
internally consistent, or coherent, if both arms of the loop exert the same type of
regulation (are either both positive or both negative). Transcription factors can interact in
various ways to regulate gene expression; we considered AND and OR relationships, and
mentioned the slightly more complicated SUM relationship.

We modeled the dynamics of three instances of feed-forward loops, primarily using the
Boolean approaches developed in Chapter 2. First, we modeled a coherent feed-forward
loop in which all regulation was positive and there was an AND relationship between the
transcription factors. Expression of the target gene was delayed (relative to the
expression of the regulated transcription factor) during induction, but not during the
repression of expression. A similar feed-forward loop (only differing by an OR
interaction) exhibited a delay in expression change during repression but not induction.
We used the analogy of controlling an elevator door’s opening and closing to suggest
how such direction-sensitive delays could be useful to the cell (such utility has not been
established to date). Using ODEs, we derived an equation for the delay time, and showed
that these feed-forward loops also exhibit robustness to short perturbations. The third instance of a feed forward loop was incoherent: one arm regulated target gene expression
positively and the other negatively. The most common type of incoherent feed-forward
loop in *E. coli* produces short pulses of target gene expression. Satisfyingly, examples of
all three of these feed-forward loops have been investigated in *E. coli*, and exhibit the dynamics predicted by theory.

The single-input motif has a single transcription factor that regulates the expression of
many transcription units, often including its own gene. Using an ODE-based approach,
we demonstrated that such networks can exhibit just-in-time dynamics in which gene
expression is induced sequentially rather than simultaneously. Just-in-time expression
dynamics have been observed in *E. coli* in metabolic biosynthesis pathways, where it
could be advantageous to produce the next metabolite in the pathway at sufficient
concentrations before expressing the enzyme that can bind and convert it to something
else.

According to our simple model, the first protein expressed in the single-input motif is

destined to be the last protein remaining after repression of expression. This first-in, last-
out expression may be useful in some cases, but intuition suggests that first-in, first-out

dynamics, which can be displayed by multi-node feed-forward loops under certain
conditions, may be more useful to the cell. Although first-in, first-out dynamics have not
been observed to my knowledge, we did consider a related case in the *E. coli* flagellar
biosynthesis transcriptional regulatory network.

**Recommended Reading**


* Alon, U. An Introduction to Systems Biology: Design Principles of Biological Circuits. Chapman & Hall/CRC, 2007.

* Baltimore, D. Our genome unveiled. 2001. Nature. 409:814-16.

* Davidson, E. H. The Regulatory Genome: Gene Regulatory Networks in Development
and Evolution. Academic Press, 2006.

* Human Genome Project Information,
http://web.ornl.gov/sci/techresources/Human_Genome/home.shtml. This website, from
the Oak Ridge National Laboratory and the Department of Energy, contains information
about the Human Genome Project and associated issues spanning bioethics, medicine,
education, and progress after the publication of the human genome sequence.

* Kalir, S., Alon, U. Using a quantitative blueprint to reprogram the dynamics of the
flagella gene network. Cell. 2004. 117:713-20.

* Kalir, S., Mangan, S., Alon, U. A coherent feed-forward loop with a SUM input function
prolongs flagella expression in Escherichia coli. Molecular Systems Biology. Epub 2005
Mar 29. doi: 10.1038/msb4100010

* Kalir, S., McClure, J., Pabbaraju, K., Southward, C., Ronen, M., Leibler, S., Surette, M.
G., Alon, U. Ordering genes in a flagella pathway by analysis of expression kinetics
from living bacteria. Science. 2001. 292: 2080-3.

* Mangan, S., Itzkovitz, S., Zaslaver, A., Alon, U. The incoherent feed-forward loop
accelerates the response-time of the gal system of Escherichia coli. Journal of Molecular
Biology. 2006. 356, 1073-1081.

* Mangan, S., Zaslaver, A., Alon, U. The coherent feedforward loop serves as a sign-
sensitive delay element in transcription networks. Journal of Molecular Biology. 2003.334(2): 197-204.

* Rosenfeld, N., Elowitz, M. B., Alon, U. Negative autoregulation speeds the response
times of transcription networks. Journal of Molecular Biology. 2002. 323(5): 785-93.

* Shen-Orr, S. S., Milo, R., Alon, U. Network motifs in the transcriptional regulation
network of Escherichia coli. Nature Genetics. 2002. 31(1):64-8.

* Zaslaver, A., Mayo, A. E., Rosenberg, R., Bashkin, P., Sberro, H., Tsalyuk, M., Surette,
M. G., Alon, U. Just-in-time transcription program in metabolic pathways. Nature
Genetics. 2004. 36(5): 486-91.

## **Problems**

### **Problem 7.1. A coherent feed-forward loop with repression**

In Section 7.4, we examined the coherent type-1 feed-forward loop, which is the most abundant type of feed-forward loop in biological networks. Here, let’s use the notation we developed in Section 7.4 to look at a different type of coherent feed-forward loop in which ${p_x}^*$ induces $p_y$ expression, but both ${p_x}^*$ and $p_y^*$ repress $p_z$ expression. Lumping expression and activation of $p_x$ and $p_y$ into single equations, the Boolean rules that describe this circuit are:



$\text{Activation}_{px} = IF (s_x)$

$\text{Activation}_{py} = $IF $({p_x}^*)$ AND $(s_y)$

$\text{Expression}_{pz}$ = IF NOT $({p_x}^)$ AND NOT $({p_y}^)$

${p_x}^* = \text{IF} (\text{Activation}_{px}) \text{AFTER SOME TIME}$

$p_y^* = \text{IF} (\text{Activation}_{py}) \text{AFTER SOME TIME}$

$p_z = \text{IF} (\text{Expression}_{pz}) \text{AFTER SOME TIME}$

**(a)** Draw a diagram of the feed-forward loop for this system.

**(b)** Construct a state diagram for this motif, for the case in which $s_x = 1$ and $s_y = 1$. Start from $p_x^* = p_y^* = p_z = 0$ and fill in the rows in your matrix until you reach a stable state.

**(c)** What happens when you remove $s_x$ from the system in (b)? Add another column to your state diagram to answer this question.

**(d)** Use your matrix from (b) to graph the time dynamics of $p_x^$, $p_y^$, and $p_z$ in the feed-forward loop. Begin with the addition of $s_x$ and $s_y$, graph the progression of the system until it reaches a steady state, then remove $s_x$ and follow the system to the new steady state. Are there delays in the dynamics of $p_z$?


**(e)** Repeat your state-diagram analysis from (b), but change the motif so that the equation for $p_z^*$ contains an OR instead of an AND gate. Are the dynamics of $p_z$ delayed now?








### **Problem 7.2: A synthetic edge-detection system**

Here, we will analyze a synthetic transcriptional regulatory circuit designed to detect the
edges of an image of light and produce a dark pigment in response (inspired by an actual
circuit described in Tabor, J. J., Salis, H. M., Simpson, Z. B., et. al. A synthetic genetic
edge detection program. Cell. 2009. **137**:1272-81). The circuit consists of two signals
as inputs: Darkness and Light. There are three proteins, px, py, and pz, all of which also
have active forms. The Boolean rules that describe the system are:

$\text{Activation}_{p_x^*} = \text{IF (Darkness)}$

$\text{Activation}_{p_y^*} = \text{IF (}p_x^* \text{) AND (Light)}$

$\text{Expression}_{p_z} = \text{IF } p_x^* \text{ AND } p_y^*$

$p_x^* = \text{IF (Activation}_{p_x} \text{) AFTER SOME TIME}$

$p_y^* = \text{IF (Activation}_{p_y} \text{) AFTER SOME TIME}$

$p_z = \text{IF (Expression}_{p_z} \text{) AFTER SOME TIME}$

$\text{Pigment} = \text{IF (} p_z \text{)}$


**(a)** Draw a diagram of the feed-forward loop in this system.  


**(b)** Draw the state matrix for the system. Note that in this case, the system must have at least one of the signals (don’t make a column for $s_x = s_y = 0$). Circle any stable states.

**(c)** For the schematic below, indicate which numbered regions would produce Pigment. Justify your answer.

![Figure 7.14](https://drive.google.com/uc?export=view&id=1EDcFnMD9vjtuFdBM4XCnVqy9trvHHo1a)

### **Problem 7.3: Advanced analysis of an incoherent feed-forward loop**

Certainly! Here's the problem with the appropriate LaTeX formatting for subscripts using dollar signs and the 'Activation' within the \text brackets:



Let's examine one of the most regulatory common motifs, the incoherent feed-forward loop shown below. Your goal is to analyze this system, first using ODEs (Chapter 3) and then using a stochastic simulation (Chapter 6). We will assume that $p_x$ and $p_y$ are always active (signals $s_x$ and $s_y$ are always present). We will also assume that the expression of $p_x$ depends on $s_x$ such that production rate = max production rate $\cdot s_x = \beta_x \cdot s_x$. The Boolean rules for this system are:


$$
p_x = \text{IF } s_x > 0
$$

$$
p_y = \text{IF } p_x > K_{xy}
$$

$$
p_z = \text{IF } p_x > K_{xz} \text{ AND NOT (} p_y > K_{yz} \text{)}
$$

Constants are defined as:
$$
\beta_x = \text{maximal production rate of } p_x \\
\beta_y = \text{maximal production rate of } p_y \\
\beta_z = \text{maximal production rate of } p_z \\
\alpha_x = \text{rate of dilution/degradation of } p_x \\
\alpha_y = \text{rate of dilution/degradation of } p_y \\
\alpha_z = \text{rate of dilution/degradation of } p_z
$$

Assume that you start with no $p_x$, $p_y$, or $p_z$.


**(a)** Write an ODE for the change in $[p_x]$ over time.

**(b)** Write an ODE for the change in $[p_y]$ over time.


**(c)** Assume that $s_x$ is always present at a maximum value of 1, and that the levels of $p_x$ are higher than the threshold for $p_y$ expression ($[p_x] > K_{xy}$). Write the analytical solution for $[p_y](t)$ for the ODE you wrote in (c).


**(d)** Assume that $[p_x] > K_{xy}$ and $[p_x] > K_{xz}$, but initially $[p_y] < K_{yz}$. Write the ODE for $[p_z]$ and solve it analytically for $[p_z](t)$.

**(e)** Continue to assume the same level of $p_x$ ($[p_x] > K_{xy}$ and $[p_x] > K_{xz}$), but the level of $p_y$ is now high enough to stop the expression of $p_z$ ($[p_y] > K_{yz}$). Write an ODE for the levels of $p_z$ over time for this case.

**(f)** Use `Python` and all of the expressions above to plot the levels of $p_x$, $p_y$, and $p_z$ over time. Use the following information:

$$
s_x = 0 \text{ for } -1 \leq t < 0 \\
s_x = 1 \text{ for } 0 \leq t < 10
$$

Constants:
$$
\beta_x = 1 \text{ mM/h} \\
\beta_y = 1 \text{ mM/h} \\
\beta_z = 1 \text{ mM/h} \\
\alpha_x = 1 \text{ /h} \\
\alpha_y = 1 \text{ /h} \\
\alpha_z = 1 \text{ /h} \\
K_{xz} = 0.4 \text{ mM} \\
K_{xy} = 0.4 \text{ mM} \\
K_{yz} = 0.5 \text{ mM}
$$

Initially, $p_x = p_y = p_z = 0$ mM.

**(g)** Previously, we assumed that the signal $s_x$ was available for full activation. Now, let’s consider sub-maximal signaling by $s_x$. Using the rate constants and activation thresholds in (f), plot the dynamics of $p_x$ for $s_x = [0.1:0.1:1]$. What behavior do you see in the response of $p_z$ to different levels of $s_x$? What function might this behavior serve in a living cell?


**(h)** Increase $K_{xy}$ to 0.8 and run your simulation again for $s_x = [0.1:0.1:1]$. Now how does $p_z$ expression behave in response to various levels of $s_x$? How might this type of regulation be useful for biological systems?


**(i)** Build and fill a state diagram for the network and circle the stable states. Does this analysis agree with your results from (f)? Why or why not?


**(j)** Finally, let’s examine this network using a stochastic simulation. Again, we assume that $p_x$ and $p_y$ will always be active ($s_x$ and $s_y$ are abundant). The reactions are:

$$
\text{px expression: } DNA_x \rightarrow DNA_x + p_x \\
\text{py expression: } DNA_y + p_x \rightarrow DNA_y + p_x + p_y \\
\text{pz expression: } DNA_z + p_x \rightarrow DNA_z + p_x + p_z \\
\text{py:DNA association: } DNA_z + p_y \rightarrow DNA_z, p_y \\
\text{py:DNA dissociation: } DNA_z, p_y \rightarrow DNA_z + p_y \\
\text{px degradation: } p_x \rightarrow 0 \\
\text{py degradation: } p_y \rightarrow 0 \\
\text{pz degradation: } p_z \rightarrow 0
$$

Let the stochastic rate constants be:
$$
c_{px\text{Expression}} = 0.8/(\text{molecules } DNA_x \cdot \text{ min}) \\
c_{py\text{Expression}} = 0.8/(\text{molecules } DNA_y \cdot \text{ min}) \\
c_{pz\text{Expression}} = 0.8/(\text{molecules } DNA_z \cdot \text{ min}) \\
c_{py\text{BindingDNAz}} = 0.8/(\text{molecules } p_y \cdot \text{molecules } DNA_z \cdot \text{ min}) \\
c_{py\text{UnbindingDNAz}} = 0.4/(\text{molecules } py:DNA_z \text{ complex} \cdot \text{ min}) \\
c_{px\text{Degradation}} = 0.001/(\text{molecules } p_x \cdot \text{ min}) \\
c_{py\text{Degradation}} = 0.001/(\text{molecules } p_y \cdot \text{ min}) \\
c_{pz\text{Degradation}} = 0.001/(\text{molecules } p_z \cdot \text{ min})
$$

Assume that you begin with no $p_x$, $p_y$, or $p_z$, but you have one molecule each of $DNA_x$, $DNA_y$, and $DNA_z$. Use `Python` to write a Gillespie algorithm and run it for 500,000 steps (you may choose to only store every tenth timepoint). Describe the system’s behavior and interpret it in biological terms.

(k) What happens if you change the relative protein activities such that $p_x$ promotes the production of $p_y$ with ten-fold less affinity than it does for $p_z$? (Hint: change a rate constant.) Describe resulting behavior, and compare your results qualitatively with your results from (g), (i), and (j).

### **Problem 7.4: A single-input module**

Consider the following metabolic pathway, which converts glutamate to ornithine in *E.
coli* (slightly modified from Figure 7.15):

![Figure 7.14](https://drive.google.com/uc?export=view&id=1gPTvvDrnbA4svtrSpYFDZbYHhpO4EBVE)

The genes in this pathway are all regulated by the transcription factor ArgR via the
single-input motif depicted at right.

**(a)** List two main benefits to the cell of regulating all of these genes with ArgR.

**(b)** Assume that when a signal to activate ArgR is present (sArgR), the change in ArgR
production can be defined as:


$
\frac{d[\text{ArgR}^*]}{dt} = \beta_{\text{ArgR}} - \alpha_{\text{ArgR}} [\text{ArgR}^*]
$

The change in all of the other gene products can be defined as:

$
\frac{d[\text{Arg}_i]}{dt} = \beta_i \theta \left( [\text{ArgR}^*] > K_{\text{ArgR}, \text{Arg}_i} \right) - \alpha_i [\text{Arg}_i]
$

where $i = \{ \text{A}, \text{B}, \text{C}, \text{D}, \text{E} \}$. Given that $S_{\text{ArgR}}$ is added at $t = 0$ min and removed at $t = 7$ min, find values for all $K_{\text{ArgR}, \text{Arg}_i}$ such that $ \text{Arg}_i$ will be induced at equal time intervals within the range $[0, t([\text{ArgR}] = [\text{ArgR}_{\text{ss}}]/2)]$, where ss denotes steady state. Note that the gene with the highest threshold parameter will obey $K_{\text{ArgR}, \text{Arg}_i} = [\text{ArgR}_{\text{ss}}]/2$. Use values of 1/min and 1 $\mu$M/min for all $\alpha$ and $\beta$ terms, respectively.

**(c)** Use `Python` with the above information to plot the activity of ArgR* as well as
the expression of all Arg proteins over the time range [0:14] minutes (choose the
most appropriate time step). What can you say about the timing of the decay of
the various genes, as compared to the timing of induction? Specifically describe
the order and the duration of the decays.