# 1. Introduction to biological circuit design

<hr>

**Key concepts**

- Biological circuits:
    - We focus on biological circuits, defined as sets of interacting molecular components that work together to control cellular behaviors.
    - Circuit design principles relate a feature of a circuit to the function it provides for the cell. They often take the form *Feature X enables Function Y*.
    - Design principles can help explain why circuits use one design or architecture instead of another.  
    
- Methods:
    - Ordinary differential equations for protein production and removal allow analysis of simple gene expression processes.
    - Separation of time scales simplifies circuit analysis.
    - Gene regulation circuits can be analyzed in terms of binding of activators and repressors to binding sites.
    - Demand theory relates the fraction of time a gene is expressed to its mode of regulation.
    - Executable Jupyter notebooks (like this one) enable interactive exploration of key concepts.
<hr>

In [1]:
# Colab setup ------------------
import os, sys, subprocess
if "google.colab" in sys.modules:
    cmd = "pip install --upgrade watermark"
    process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout, stderr = process.communicate()
# ------------------------------

import numpy as np
import bokeh.io
import bokeh.plotting

bokeh.io.output_notebook()

<hr>

## Biological circuit design

The living cell is a device unlike any other: It can sense its environment, search out nutrients, avoid threats, control its own division and growth, and keep track of time. Individual cells can switch into a variety of different states, coordinate with other cells to build complex multicellular tissues and organs (even brains!), develop into social organisms, and generate immune systems that can patrol those organisms to repair damage and destroy pathogens. Most of the time, they do these things reliably, with minimal external help, and without complaining. How are these incredible behaviors programmed within the cell? Can we understand these programs well enough to predict, control, or even reprogram cellular behaviors? 

This course seeks to establish key concepts, approaches, and examples that together provide a foundation for addressing these critical biological challenges. To do so, we focus on the **circuit** level. Circuits represent one of the most fascinating levels of cellular organization, midway between inanimate chemistry and a fully living cell. More specifically, we will analyze **molecular circuits** of interacting genes, proteins, and other biomolecules operating within cells, as well as **cell circuits** in which cell types interact through signals. As we explore different circuits, we will approach them from a **design** point of view, seeking to understand the tradeoffs among alternative desgins and suggest reasons for using one design rather than another.


### Beyond molecular biology: 

Historically, the framework of molecular biology became dominant in the 1950s, with the central idea that we could understand biological phenomena in terms of the molecular structure and biochemical function of DNA, mRNA, and other molecules. ("The Eighth Day of Creation" by Horace Freeland Judson provides a riveting history of molecular biology.) Despite the focus on individual molecules, it was clear to the pioneers of molecular biology that all of these molecules functioned together in circuits. For example, in a 1962 essay, speaking about the organization of transcriptional regulatory systems, Jacob and Monod wrote, 

> It is obvious from the analysis of these [bacterial genetic regulatory] mechanisms that their known elements
could be connected into a wide variety of ‘circuits’ endowed with any desired degree of stability. *Jacob F, Monod J, "On the regulation of gene activity," Symposium on cellular regulatory mechanisms, 1962*. 

Around the 2000s, interest in the structure and function of circuits began to explode, fueled in part by new technologies that improved the ability to make quantitative, high-throughput measurements of biological systems. The new field took the name *systems biology*. 


<div style="width: 600px; margin: auto;">

![Hanahan and Weinberg](figs/Hanahan-Weinberg-2000.png)

</div>

*This image of the cell as a set of circuits comes from a classic review of cancer ([Hanahan and Weinberg, _Cell_, 2000](https://doi.org/10.1016/S0092-8674%2800%2981683-9)).*

<!-- Once Sphinx can handle percents in links, use DOI: https://doi.org/10.1016/S0092-8674%2800%2981683-9 -->

<!-- Link to reader: https://reader.elsevier.com/reader/sd/pii/S0092867400816839?token=0484E13D03429AA97AEB30B7F31F6FA354F58B5084EC65AAD2F7378F8799578FA1A1D3336D1FB292B5CE9FFA10DF5359. -->

### What is a biological circuit?

For this course, we will think about at least two levels of biological circuitry: 

**Molecular circuits**  consist of molecular species (genes, proteins, etc.) that interact with one another in specific ways. For example, a given gene can be transcribed to produce a corresponding mRNA, which can in turn be translated to produce a specific protein. That protein may be a repressor that turns off expression of a different gene, or even its own gene. Similarly, a kinase may phosphorylate a specific target protein, altering its ability to catalyze a reaction or modify another protein. The molecular specificity of these interactions—analogous to wires in electronics—is the key property that enables them to form molecular circuits. 

One level up, we will also consider **cell circuits**. In this case, cells in different states, of different types, or even from different species signal to one another to control each other's growth, death, proliferation, and differentiation. The key variables in these circuits are the population sizes and spatial arrangements of each type of cell. For example, the immune system, in which different cell types influence each other's proliferation and differentiation through cytokines and other signals, represents a collection of complex, interconnected cell circuits.

The two levels are not independent. The behavior of each cell type within a cell circuit is determined by the structure of its  molecular circuits. 

At either level, we can also distinguish between ***natural* circuits** and ***synthetic* circuits.** The former are discovered in microbes, plants, and animals, while the latter are designed and engineered in cells out of well-characterized, re-engineered, or de novo designed genes, proteins, and other molecular components. Here, we take the point of view that natural and synthetic circuits, despite their different origins, should share a common set of design principles, many yet to be discovered. Treating them in a unified way is helpful, because natural circuits can provide design inspiration for synthetic circuits, while the process of engineering synthetic circuits can provoke unexpected questions about natural circuits. 


### Biological circuit design

**Design problems** emerge whenever one can build many different products from arrangements of the same elements. Electronic circuits are composed of a handful of different kinds of elements: transistors, resisters, capacitors, etc. that can be connected in different ways to produce circuits with different properties. Design requires comparing different circuit designs that may appear to perform similar functions, but in fact exhibit different tradeoffs, e.g. between power and performance, or speed and precision. Design problems are also prevalent outside of science and engineering. For example, to make a movie poster one has to choose and arrange graphical elements in relation to one another. 

We now have much information about the molecular components of cells (genes, RNAs, proteins, metabolites, and many other molecules) and their interactions. In many cases, we know the sites at which transcription factors bind genome-wide, which proteins chemically modify which others, and which proteins function together in complexes. It might seem as if we ought to already be able to understand, predict, and control cellular circuits with great precision. However, fundamental questions about the designs of these circuits still remain unclear. For example:

* What capabilities does each circuit provide for the cell? (function, design principles)
* How do these capabilities emerge from circuit architecture? (mechanism)
* How can we control cells in predictable ways using these circuits? (biomedical applications)
* How can we use circuit design principles to program predictable new behaviors in living cells? (synthetic biology and bioengineering)

In this course, we will address these questions for both *natural* and *synthetic* circuits, with the idea that the principles that allow a circuit to function effectively within or among cells do not necessarily depend on whether that circuit evolved naturally or is constructed in the lab. Having said that, we also recognize that evolution may be able to produce designs that are more complex or different from those we are currently able to construct, or even conceive. In fact, a major goal of the course is to see to what extent we can learn principles from natural circuits that will allow us to design synthetic circuits more effectively.


<div style="width: 600px; margin: auto;">

![electronics and plumbing](figs/electronics_software_plumbing.png)

</div>

*Electronics, software, and plumbing are great examples of human-designed systems that possess many properties analogous to biological circuits. These systems are based on known design principles that sometimes overlap with, and sometimes differ from, those of biological circuits. Plumbing picture by flickr user frozen-tundra, [CC-BY-2.0 licensed](https://creativecommons.org/licenses/by/2.0/deed.en).*

### Biological circuits differ from many other types of circuits or circuit-like systems

Is biological circuit design a solved problem? Electronics, software, plumbing, construction, and other human designed systems are based on connections between modular components (see Figure). Can we not just apply known principles of those systems biological circuits? The answer is generally "no," or "only a little," because biological systems differ in fundamental ways from these systems:

* Natural circuits were not designed by people. They evolved. That means they are not "well-documented" and their function(s) are often totally unclear.
* Even synthetic circuits, which are designed by people, often use evolved components (such as transcription factors) that we do not fully understand.
* Biological circuits use fundamentally different designs than human-engineered counterparts. For example, in cells, molecular components exhibit extensive many-to-many interactions ("crosstalk") among their components. This property is typically avoided in electronics but may provide unique capabilities to cells.
* Noise: While electronic circuits can function deterministically, biological circuits  function with high levels of stochastic (random) fluctuations in their own components. These fluctuations are often called "noise." And noise is not just a nuisance: some biological circuits take advantage of it to enable behaviors that would not be possible without it.
* Biological circuits use parallelism: The same circuit can operate in many different genetically identical individual cells, whether in a bacterial population or in a multicellular organism.
* Electrical systems use positive or negative voltages and currents, allowing for positive or negative effects. By contrast, biological circuits are built out of molecules (or cells) whose concentrations cannot be negative. That means they must use other mechanisms for "inverting" activities.
* From a more practical point of view, we have a very limited ability to construct, test, and compare designs. Even with recent developments such as CRISPR, our ability to rapidly and precisely produce cells with well-defined genomes remains limited compared to what is possible in more advanced disciplines. (This situation is rapidly improving!)
* *What other fundamental differences between biological circuits and human designed systems can you think of?* 

### Inspiration from electronics

In their classic book, [*The Art of Electronics*](https://artofelectronics.net), 3rd Ed., Horowitz and Hill explain something similar to the excitement many now feel now about biological circuit design.

>The field of electronics is one of the great success stories of the 20th century. From the crude spark-gap transmitters and "cat's-whisker" detectors at its beginning, the first half-century brought an era of vacuum-tube electronics that developed considerable sophistication and found ready application in areas such as communications, navigation, instrumentation, control, and computation. The latter half-century brought "solid-state" electronics — first as discrete transistors, then as magnificent arrays of them within "integrated circuits" (ICs) — in a flood of stunning advances that shows no sign of abating. Compact and inexpensive consumer products now routinely contain many millions of transistors in VLSI (very large-scale integration) chips, combined with elegant optoelectronics (displays, lasers, and so on); they can process sounds, images, and data, and (for example) permit wireless networking and shirt-pocket access to the pooled capabilities of the Internet. Perhaps as noteworthy is the pleasant trend toward increased performance per dollar. The cost of an electronic microcircuit routinely decreases to a fraction of its initial cost as the manufacturing process is perfected....
>
>On reading of these exciting new developments in electronics, you may get the impression that you should be able to construct powerful, elegant, yet inexpensive, little gadgets to do almost any conceivable task — all you need to know is how all these miracle devices work.

Indeed, the marvelous progression of electronic circuit capabilities they describe could well describe biological circuits decades from now. Like electronics, we may will soon be able to construct "gadgets" using biological materials to do myriad tasks. Our goal in this book is to help you know how the "miracle devices," the components of biological circuits and the emergent properties thereof when they are "wired" together, work.

### Premise and goals

This book explores foundational concepts needed to understand, predict, and control living systems with greater precision (systems biology), and to design synthetic circuits that provide new functions (synthetic biology). We will develop quantitative approaches for analyzing different circuit designs (tools), and also identify circuit design principles that provide insight and intuition into how different designs operate, and why they were selected by evolution or synthetic biologists. 


### Design principles relate circuit features to circuit functions

We will define a circuit **design principle** as a statement of the form: *Circuit feature X enables function Y*. Each module of the course will explore a different design principle. Here are some examples:

* Negative autoregulation of a transcription factor accelerates its response to a change in input.
* Kinases that also act as phosphatases (bifunctional kinases) provide tunable linear amplifiers in two-component signaling systems.
* Pulsing a transcription factor on and off at different frequencies (time-based regulation) can enable coordinated regulation of many target genes.
* Noise-excitable circuits enable cells to control the probability of transiently differentiating into an alternate state.
* Mutual inactivation of receptors and ligands in the same cell enable equivalent cells to signal unidirectionally.
* Independent tuning of gene expression burst size and frequency enables cells to control cell-cell heterogeneity in gene expression.
* Feedback on morphogen mobility allows tissue patterns to scale with the size of a tissue

## Developing intuition: analzying the simplest gene regulation circuits

We will start with the simplest possible "circuit"—hardly a circuit at all, really, just a single gene—coding for a single corresponding protein. This minimal example will allow us to develop intuition for the dynamics of the simplest gene regulation systems and lay out a procedure that we can further extend to analyze more complex circuits. 

How much protein will the gene *x* produce? We assume that the gene will be transcribed to mRNA and those mRNA molecules will in turn be translated to produce proteins, such that new proteins are produced at a total rate $\beta$ molecules per unit time. The $x$ protein does not simply accumulate over time. It is also removed both through active degradation as well as dilution as cells grow and divide. For simplicity, we will assume that both processes tend to reduce protein concentrations through a simple first-order process, with a rate constant $\gamma$. 

The approach we are taking can be described as "phenomenological modeling." We do not explicitly represent every underlying molecular step. Instead, we assume those steps give rise to "coarse grained" relationships that we can model in a manner that is independent of many underlying molecular details. The test of this approach is whether it allows us to understand and experimentally predict the behavior of real biological systems. See [Wikipedia's article on phenomenological models](https://en.wikipedia.org/wiki/Phenomenological_model) and [this article](https://doi.org/10.1186/1741-7007-12-29) by Jeremy Gunawardena for insights and commentary.

Thus, we can draw a diagram of our simple gene, *x*, with its protein being produced and removed (dashed circle):

<div style="width: 250px; margin: auto;">

![simplest_protein](figs/simplest_protein.png)

</div>

Here, protein production occurs at rate $\beta$ and degradation+dilution at rate $\gamma x$. We can then write down a simple ordinary differential equation describing these dynamics:

\begin{align} 
&\frac{dx}{dt} = \mathrm{production - (degradation+dilution)} \\[1em]
&\frac{dx}{dt} = \beta - \gamma x
\end{align}

where 

\begin{align}
\gamma = \gamma_\mathrm{dilution} + \gamma_\mathrm{degradation}
\end{align}

*A note on effective degradation rates*: When cells are growing, protein is removed through both  degradation and dilution. For stable proteins, dilution dominates. For very unstable proteins, whose half-life is much smaller than the cell cycle period, dilution may be negligible. In bacteria, mRNA half-lives (1-10 min, typically) are much shorter than protein half-lives. In eukaryotic cells this is not necessarily true (mRNA half-lives can be many hours in mammalian cells).

### Solving for the steady state

Often, one of the first things we would like to know is the concentration of protein under steady state conditions. To obtain this, we set the time derivative to 0, and solve:

\begin{align}
&\frac{dx}{dt} = \beta - \gamma x = 0 \\[1em]
&\Rightarrow x_\mathrm{ss} = \beta / \gamma 
\end{align}

In other words, the steady-state protein concentration depends on the ratio of production rate to degradation rate.

### Including transcription and translation as separate steps 

This description does not distinguish between transcription and translation. However, considering both processes separately can be important in the more dynamic and stochastic contexts that we will encounter later in the course. To do so, we can simply add an additional variable to represent the mRNA concentration, which is now transcribed, translated to protein, and degraded (and diluted), as shown schematically here:

<div style="width: 350px; margin: auto;">

![transcript_and_translation](figs/transcription_and_translation.png)

</div>

These reactions can be described by two coupled differential equations for the mRNA ($m$) and protein ($x$) concentrations
:

\begin{align}
&\frac{dm}{dt} = \beta_m - \gamma_m m, \\[1em]
&\frac{dx}{dt} = \beta_p m - \gamma_p x. 
\end{align}

Now, we can  determine the steady state mRNA and protein concentrations straightforwardly, by setting both time derivatives to 0 and solving. We find:

\begin{align}
&m_\mathrm{ss} = \beta_m / \gamma_m, \\[1em]
&x_\mathrm{ss} = \frac{\beta_p m_\mathrm{ss}}{\gamma_p} = \frac{\beta_p \beta_m}{\gamma_p \gamma_m}.
\end{align}

From this, we see that the steady state protein concentration is proportional to the product of the two synthesis rates and inversely proportional to the product of the two degradation rates. 

And this gives us our first **design puzzle**, which you can explore in an end-of-chapter problem: The cell could control protein expression level in at least four different ways. It could modulate (1) transcription, (2) translation, (3) mRNA degradation, or (4) protein degradation rates, or combinations thereof. Are there tradeoffs between these different options? Are they all used indiscriminately or is one favored in natural contexts? 

## From gene expression to gene regulation: Adding a repressor

In principle, genes could be left "on" all the time. In actuality, the cell activity regulates them, turning their expression levels lower or higher depending on environmental conditions and cellular state. **Repressors** provide a key mechanism for regulation. Repressors are proteins that bind to cognate specific sequences at or near a promoter, to change its expression. Often the strength of repressor binding depends on external inputs. For example, the LacI repressor normally turns off the genes for lactose utilization in *E. coli*. However, in the presence of lactose in the media, a modified form of lactose binds to LacI, inhibiting its ability to repress its target genes. Thus, a nutrient (lactose) can regulate expression of genes that allow the cell to use it. (The book [*The lac Operon*](https://doi.org/10.1515/9783110879476) by B. Müller-Hill provides the fascinating scientific and historical saga of this iconic system.)

In the following diagram, we label the repressor R.

<div style="width: 450px; margin: auto;">

![repressible_gene_2](figs/repressible_gene2.png)

</div>


Within the cell, the repressor binds and unbinds its target site. We assume that the expression level of the gene is lower when the repressor is bound and higher when it is unbound. The mean expression level of the gene is then proportional to the fraction of time that the promoter region is not bound with a repressor.

The process of binding and unbinding of a repressor can be represented as the chemical reaction

\begin{align}
\require{mhchem}
\ce{P + R <=>[k_+][k_-] P}_\mathrm{occ}.
\end{align}

We can model the dynamics of this chemical reaction using **mass action kinetics** in which the rate of a reaction is proportional to the product of the concentrations of its reactants. We therefore represent the "concentration" of target sites in occupied or unoccupied states. Within a single cell an individual site on the DNA is either bound or unbound, but averaged over a population of cells, we can talk about the mean occupancy of the site. Let $r$ be the concentration of repressor, $p$ be the concentration of unoccupied promoter, $p_\mathrm{occ}$ be the concentration of promoter occupied by a repressor. Then,

\begin{align}
\frac{\mathrm{d}p}{\mathrm{d}t} = -k_+\,p\,r + k_- p_\mathrm{occ}.
\end{align}

We can assume a **separation of timescales** between the rates of binding and unbinding of the repressor to the DNA binding site are both often fast compared to the timescales over which mRNA and protein concentrations vary. (Be careful applying this assumption; in some contexts, such as mammalian cells, it is not true.) Then, on the time scale of variation in mRNA and protein concentrations, the repressor-promoter binding-unbinding reaction dynamics are fast and the reaction is essentially at steady state such that $\mathrm{d}p/\mathrm{d}t \approx 0$, giving

\begin{align}
-k_+\,p\,r + k_- p_\mathrm{occ} = 0.
\end{align}

If $p_\mathrm{tot}$ is the total concentration of promoters, occupied or not, then $p_\mathrm{tot} = p + p_\mathrm{occ}$, and we have

\begin{align}
-k_+\,p\,r + k_- (p_\mathrm{tot} - p) = 0,
\end{align}

which can be rearranged to give the fraction of promoters that are free to allow transcription,

\begin{align}
\frac{p}{p_\mathrm{tot}} = \frac{1}{1+r/K_\mathrm{d}},
\end{align}

where we have defined the **dissociation constant** for repressor-target binding

\begin{align}
K_\mathrm{d} = \frac{k_-}{k_+}
\end{align}

Because we have a separation of time scales, the rate of production of gene product should be proportional to the probability of the promoter being unbound,

\begin{align}
\beta(r) = \beta_0 \frac{p}{p_\mathrm{tot}} = \frac{\beta_0}{1+r/K_\mathrm{d}}.
\end{align}

## Properties of the simple binding curve

This is our first encounter with a soon to be familiar function. Note that this function has two parameters: $K_\mathrm{d}$ specifies the concentration of repressor at which the response is reduced to half its maximum value. The coefficient $\beta_0$ is simply the maximum expression level, and is a parameter that multiples the rest of the function. Also notice that for small values of $r$, the slope is $1/K_d$ 

In [2]:
# Build theoretical curves for dimensionless r and beta
r = np.linspace(0, 20, 200)
beta = 1 / (1 + r)
init_slope = 1 - r

# Build plot
p = bokeh.plotting.figure(
    frame_height=225,
    frame_width=350,
    x_axis_label="r∕Kd",
    y_axis_label="β(r)∕β₀",
    x_range=[r[0], r[-1]],
    y_range=[0, 1],
)
p.line(r, init_slope, line_width=2, color="orange", legend_label="initial slope")
p.line(r, beta, line_width=2, legend_label="β(r)∕β₀")
p.legend.click_policy = "hide"
p.legend[0].items = p.legend[0].items[::-1]

bokeh.io.show(p)

## Gene expression can be "leaky"

In real life, many genes never get repressed all the way to zero expression, even when you add a lot of repressor. Instead, there is a baseline, or "basal", expression level that still occurs. A simple way to model this is by adding an additional constant term, $\alpha_0$ to the expression 

\begin{align}
\beta(R) = \alpha_0 + \beta_0 \frac{p}{p_\mathrm{tot}} = \alpha_0 + \frac{\beta_0}{1+r/K_\mathrm{d}}.
\end{align}

Where does such leaky expression come from? Although there are many molecular sources of leakiness, the fundamental reason is that the molecular interactions inside a cell are always probabilistic. Even if there are many more repressors than there are genes to repress, at equilibrium these repressors are always transiently binding and unbinding from their targets so there is always a chance for gene expression to occur from a transiently un-repressed promoter.

Given the ubiquity of leakiness, it is important to check that circuit behaviors do not depend on the absence of leaky expression.


<!--
<div class="alert alert-block alert-info">
Potential elaboration here: where does leak come from, molecularly? Is it the fact that you never get 100% occupancy? Or is it off-target binding of polymerases? etc.
</div>

<div class="alert alert-block alert-info">
JPM: I've added some commentary in the cell above.
</div>
-->

In [3]:
# Build the theoretical curves
r = np.linspace(0, 20, 200)
alpha_0 = 0.25
beta = alpha_0 + 1 / (1 + r)

# Build plot
p = bokeh.plotting.figure(
    frame_height=225,
    frame_width=350,
    x_axis_label="r∕Kd",
    y_axis_label="β(r)∕β₀",
    x_range=[r[0], r[-1]],
    y_range=[0, beta.max()],
)
p.line(
    [r[0], r[-1]],
    [alpha_0, alpha_0],
    line_width=2,
    color="orange",
    legend_label="basal expression, α₀∕β₀",
)
p.line(r, beta, line_width=2, legend_label="β(r)∕β₀")
p.legend.click_policy = "hide"
p.legend[0].items = p.legend[0].items[::-1]

bokeh.io.show(p)

## Activation

Genes can be regulated by **activators** as well as repressors. Treating the case of activation just involves switching the state that is actively expressing from the unbound state to the state where the promoter region is bound by the protein (now called an activator). And, just as the binding of a repressor to DNA can be modulated by small molecule inputs, so too can the binding of the activator be modulated by binding to small molecules. In bacteria, one of many examples is the [Lux-type quorum sensing system](https://en.wikipedia.org/wiki/LuxR-type_DNA-binding_HTH_domain), where the transcription factor LuxR acts as an activator in the presence of its cognate ligand.

<div style="width: 350px; margin: auto;">

![activation](figs/activation.png)

</div>

<br />

The rate of production of gene product as a function of activator concentration $a$ is

\begin{align}
\beta(a) = \beta_0 \frac{p_\mathrm{occ}}{p_\mathrm{tot}} = \beta_0\,\frac{a/K_\mathrm{d}}{1+a/K_\mathrm{d}}.
\end{align}

This produces the opposite, mirror image response compared to repression, shown below with no leakage.

In [4]:
a = np.linspace(0, 20, 200)
beta_A = a / (1 + a)
beta_R = 1 / (1 + r)

# Build plot
p = bokeh.plotting.figure(
    frame_height=225,
    frame_width=350,
    x_axis_label="a∕Kd, r∕Kd",
    y_axis_label="β∕β₀",
    x_range=[r[0], r[-1]],
    y_range=[0, 1],
)

p.line(a, beta_A, line_width=2, legend_label="β(a)∕β₀")
p.line(r, beta_R, line_width=2, color="tomato", legend_label="β(r)∕β₀")
p.legend.location = "center_right"
bokeh.io.show(p)

## Hill functions and ultrasensitivity

While the activating and repressing functions we derive above indeed capture the behavior of this simple model of transcriptional regulation, in practice we find that many responses in gene regulation and protein-protein interactions have a switch-like shape, or **ultrasensitive** behavior. Ultrasensitivity can arise from many sources, but a major factor is often the **cooperativity** in molecular interactions. For example, consider a situation in which binding of a protein at one DNA binding site increases the affinity for binding of a second protein at an adjacent site. Or, imagine a protein with an alternative molecular conformation that is stabilized by binding of multiple agonist effector molecules and, in that conformation, has a higher affinity for the same effectors. In these and many other situations, an increasing concentration of one species can have little effect for a while, and then suddenly have a large effect. 

The Hill function provides a way to analyze systems that exhibit ultrasensitive responses. While it can be derived from models of some processes, it is often used in a more generic way to analyze how a circuit would behave with different levels of ultrasensitivity.

An activating Hill function is defined by 

\begin{align}
f_\mathrm{act}(x) &= \frac{x^n}{k^n +x^n} = \frac{(x/k)^n}{1 + (x/k)^n}.
\end{align}

You can also make a mirror image repressive Hill function. 

\begin{align}
f_\mathrm{rep}(x) &= \frac{k^n}{k^n +x^n} = \frac{1}{1 + (x/k)^n}.
\end{align}

In these expressions, $k$ is often referred to as an **activation coefficient**; it represents the concentration at which the function attains half its maximal value. It is a measure of the concentration $x$ that is necessary to affect the regulation. The **Hill coefficient**, $n$, parametrizes how ultrasensitive the response is. When $n=1$, we recover the simple binding curves introduced earlier. When $n>1$, however, we achieve ever sharper, more ultrasensitive, responses. In the limit of $n=\infty$ we have a step function. 

The production rate of the products of genes under control of an activator and repressor, respectively, operating with ultrasensitivity modeled with Hill function is

\begin{align}
&\beta(a) = \beta_0\,f_\mathrm{act}(a) = \beta_0 \,\frac{(a/k)^n}{1 + (a/k)^n},\\[1em]
&\beta(r) = \beta_0\,f_\mathrm{rep}(r) = \beta_0 \,\frac{1}{1 + (r/k)^n}.
\end{align}

Here we plot Hill functiona for a few values of $n$, with activating Hill functions in blue and repressing Hill functions in red.

In [5]:
# Compute response functions
x = np.logspace(-2, 2, 200)
f_a = [x**n / (1 + x**n) for n in [1, 2, 10]]
f_r = [1 - f for f in f_a]

# Build plots
p_act = bokeh.plotting.figure(
    frame_height=225,
    frame_width=350,
    x_axis_label="x/k",
    y_axis_label="activating Hill function",
    x_range=[x[0], x[-1]],
    x_axis_type="log"
)
p_rep = bokeh.plotting.figure(
    frame_height=225,
    frame_width=350,
    x_axis_label="x/k",
    y_axis_label="repressing Hill function",
    x_range=[x[0], x[-1]],
    x_axis_type="log"
)

# Set up toggling between activating and repressing
p_act.visible = True
p_rep.visible = False

radio_button_group = bokeh.models.RadioButtonGroup(
    labels=["activating", "repressing"], active=0, width=100
)
col = bokeh.layouts.column(
    p_act, p_rep, bokeh.layouts.row(bokeh.models.Spacer(width=145), radio_button_group)
)
radio_button_group.js_on_click(
    bokeh.models.CustomJS(
        args=dict(p_act=p_act, p_rep=p_rep),
        code="""
if (p_act.visible == true) {
    p_act.visible = false;
    p_rep.visible = true;
}
else {
    p_act.visible = true;
    p_rep.visible = false;
}
"""))

# Populate plots
colors_act = bokeh.palettes.Blues5[1:-1][::-1]
colors_rep = bokeh.palettes.Reds5[1:-1][::-1]

for f_act, n, color in zip(f_a, [1, 2, 10], colors_act):
    p_act.line(x, f_act, line_width=2, color=color, legend_label=f"n = {n}")

for f_rep, n, color in zip(f_r, [1, 2, 10], colors_rep):
    p_rep.line(x, f_rep, line_width=2, color=color, legend_label=f"n = {n}")

p_act.legend.location = "center_right"
p_rep.legend.location = "center_right"

bokeh.io.show(col)

## Activator vs. Repressor—which to choose?

*And now at last we have reached our first true "design" question:* The cell has at least two different ways to regulate a gene: using an activator or using a repressor. Which should it choose? Which would you choose if you were designing a synthetic circuit? Why? Are they completely equivalent ways to regulate a target gene? Is one better in some or all conditions? How could we know?  

In particular, if we were to design a gene regulation system to turn on in the presence of a specific target molecule (we will call this the **inducer**), then we could design the gene to be regulated by an activator whose activity is turned on by the inducer (see left figure below), or alternatively, to be regulated by a repressor whose activity is turned off by the inducer (see right figure below). At first glance, these architectures may look equivalent in their output, but is that really the case?


<div style="width: 350px; margin: auto;">

![equivalent systems](figs/equivalent_systems.png)

</div>

Michael Savageau posed these question in the context of bacterial metabolic gene regulation in his paper, "Design of molecular control mechanisms and the demand for gene expression" ([_PNAS_, 1977](https://www.ncbi.nlm.nih.gov/pubmed/271992)). He focused on "demand" as a critical factor that influences the choice of activation or repression. *Demand* can be defined as the fraction of time that the gene is needed at the high end of its regulatory range in the cells natural environments. Savageau made the empirical observation that high-demand genes are more frequently regulated by activators, while low-demand genes are more often regulated by repressors. 

<div style="width: 500px; margin: auto;">

![low demand, high demand](figs/low_demand-high_demand3.png)

</div>

He suggested that this relationship could be explained by a **"use it or lose it"** rule of evolutionary selection. Recall that mutations, especially those that diminish the function of a gene, occur frequently. A high demand gene controlled by an activator needs the activator to be on most of the time. Under these conditions, evolution would select against mutations that eliminate the activator. By contrast, if the same high demand system were regulated by a repressor then, most of the time, there would be weaker selection against mutations that removed the repressor, increasing the potential for evolutionary loss of the regulatory system. This reasoning assumes no direct fitness advantage for either regulation mode, just a difference in the average strength of selection pressure needed to maintain them.

In 2009, [Gerland and Hwa](https://dx.doi.org/10.1073%2Fpnas.0808500106) formulated and analyzed a model to explore these ideas mathematically. They showed that the "use it or lose it" principle indeed dominates when timescales of switching between low and high demand environments are long and populations are small. However, the same model could select for the opposite demand rule in other regimes. 

[Shinar, et al.](https://doi.org/10.1073/pnas.0506610103) introduced a different explanation for the demand rules. The authors assumed that "naked" DNA binding sites, which are not bound to proteins, are susceptible to non-specific binding of transcription factors. These inadvertent encounters between transcription factors and DNA could lead to inappropriate activation or repression of adjacent genes, imposing a low, but non-zero fitness cost. "Intentionally" keeping these sites occupied most of the time minimizes the fraction of time that such events can occur. 

This explanation make the same predictions as Savageau's explanation but for different reasons. Here, a high demand gene should preferentially use an activator since this arrangement minimizes unoccupied binding sites more of the time. Conversely, a low demand gene would preferentially use repression to maintain the binding site in an occupied state under most conditions. This argument can be generalized to other examples of seemingly equivalent regulatory systems and is described in [Uri Alon's book](https://www.amazon.com/gp/product/1439837171/ref=dbs_a_def_rwt_bibl_vppi_i0). 

<div style="width: 250px; margin: auto;">

![Error load](figs/ErrorLoad.png)

</div>

*In this figure, a repressor, denoted R binds strongly and specifically to its target site. In the absence o the repressor, that site could also be bound non-specifically, and inappropriately by a range of other factors, denoted A-F.*

Remarkably, we still lack definitive experimental evidence to fully resolve this fundamental design question. As a challenge in the end-of-chapter problems, you are aked to devise an experimental way to discriminate among these potential explanations.

<hr>

## Computing environment


In [6]:
%load_ext watermark
%watermark -v -p numpy,bokeh,jupyterlab

Python implementation: CPython
Python version       : 3.9.7
IPython version      : 8.1.1

numpy     : 1.20.3
bokeh     : 2.4.2
jupyterlab: 3.3.2



<hr>

## Problems

- [1.1: Strategies for controlling protein expression](../problems/01/problem_1.1.ipynb)
- [1.2: Separation of time scales](../problems/01/problem_1.2.ipynb)
- [1.3: Rate of production of gene product by an activator](../problems/01/problem_1.3.ipynb)
- [1.4: Activators vs. repressors](../problems/01/problem_1.4.ipynb)
- [1.5: Bound and unbound promoter regions](../problems/01/problem_1.5.ipynb)