<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Nuclear-volume-scaling-with-cell-volume" data-toc-modified-id="Nuclear-volume-scaling-with-cell-volume-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Nuclear volume scaling with cell volume</a></span><ul class="toc-item"><li><span><a href="#Fitting-the-nuclear-fraction" data-toc-modified-id="Fitting-the-nuclear-fraction-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Fitting the nuclear fraction</a></span></li></ul></li><li><span><a href="#Model-for-how-transcription-scales-with-cell-size" data-toc-modified-id="Model-for-how-transcription-scales-with-cell-size-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Model for how transcription scales with cell size</a></span></li><li><span><a href="#calculate-the-RNA-concentration-assuming-that-transcription-is-proportional-to-bound-rna-pol-II-and-that-the-global-RNA-degradation-rate-is-proportional-to-the-mRNA-degradation-rates-$\beta$-measured-for-the-two-MET-genes-in-Fig-1-of-the-draft.-$C$-is-a-constant." data-toc-modified-id="calculate-the-RNA-concentration-assuming-that-transcription-is-proportional-to-bound-rna-pol-II-and-that-the-global-RNA-degradation-rate-is-proportional-to-the-mRNA-degradation-rates-$\beta$-measured-for-the-two-MET-genes-in-Fig-1-of-the-draft.-$C$-is-a-constant.-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>calculate the RNA concentration assuming that transcription is proportional to bound rna pol II and that the global RNA degradation rate is proportional to the mRNA degradation rates $\beta$ measured for the two MET genes in Fig 1 of the draft. $C$ is a constant.</a></span></li></ul></div>

In [None]:
import matplotlib.pyplot as plt
import numpy as np

from scipy.optimize import curve_fit

In [None]:
import pandas as pd

# Nuclear volume scaling with cell volume

In [None]:
vols = pd.read_csv("nuc_vol.txt", delimiter="\t", header=None)

In [None]:
vols[0] = vols[0]/1.6

In [None]:
plt.scatter(vols[1], vols[0])

In [None]:
plt.scatter(vols[0], vols[1]/vols[0])

## Fitting the nuclear fraction

In [None]:
def nuc_frac_model(x, alpha, beta, delta): 
    return alpha + beta * np.exp(-delta * (x-15))

In [None]:
# fit curve
(alpha, beta, delta), _ = curve_fit(nuc_frac_model, vols[0], vols[1]/vols[0])

In [None]:
xs = np.linspace(10, 200, 100)

plt.scatter(vols[0], vols[1]/vols[0])
plt.plot(xs, nuc_frac_model(xs, alpha, beta, delta), ls = '--', lw=4, c='red')

In [None]:
err = np.sqrt(np.mean(((vols[1]/vols[0]).values - nuc_frac_model(vols[0].values, alpha, beta, delta))**2))

In [None]:
err

In [None]:
alpha

In [None]:
beta

In [None]:
delta

# Model for how transcription scales with cell size

The number of RNA Polymerase II molecules is $PII_{tot}$ of which $PII_{DNA}$ is on the DNA and $PII_{nuc}$ is free in the nucleoplasm :

$PII_{tot} = PII_{DNA} + PII_{nuc}$

And the number of RNA Polymerase II molecules is proportional to cell size $V$ so that:

$PII_{tot} = c V$,

where c = 5-10,000 molecules for a 50 fl yeast cell

The volume of the nucleus is about 10% of the cell volume so that:
$V_{nuc} = \frac{V}{10}$

We assume that everything is taking place in the nuclear volume and that it follows simple first order kinetics for the nucleoplasmic $PII$ to bind the DNA:

$\frac{\partial [PII_{DNA}]}{\partial t} = k_{on} [PII_{nuc}]* [DNA] - k_{off}[PII_{DNA}]$

the $k_{off}$ rate is ~1/minute, which is about the time it takes to transcribe a 1kb budding yeast gene. The number of DNA bound $PII$ molecules is given by multiplying the above by the nuclear volume: 

$\frac{\partial PII_{DNA}}{\partial t} = k_{on} [PII_{nuc}]* DNA - k_{off}PII_{DNA}$

we now eliminate $[PII_{nuc}] = [PII_{tot}] - [PII_{DNA}]$ so that 

$\frac{\partial PII_{DNA}}{\partial t} = k_{on} ([PII_{tot}] - [PII_{DNA}])* DNA - k_{off}pII_{DNA}$

we now assume that the on and off kinetics of $PII$ are fast relative to the volume growth of the cell so that we are in equilibrium and $\frac{\partial PII_{DNA}}{\partial t} = 0$ so that 

$0 = k_{on} ([PII_{tot}] - [PII_{DNA}])* DNA - k_{off}pII_{DNA}$

which we can write in terms of the number of molecules by multiplying again by the nuclear volume $V_{nuc}$ so that 

$ 0 = k_{on} (PII_{tot} - PII_{DNA})* DNA - k_{off}pII_{DNA}V_{nuc}$

which we can solve for the DNA bound pol II:

$PII_{DNA} = \frac{k_{on}PII_{tot} DNA}{k_{on}DNA + k_{off}V_{nuc}}$

and use the fact that the number of polymerases is proportional to the cell volume $PII_{tot} = c V$ and the nuclear volume is $V/10$:

$PII_{DNA} = \frac{k_{on}c V DNA}{k_{on}DNA + k_{off}V/10}$

We can use this expression to calculate the fraction of bound $PII$ as 

$\frac{PII_{DNA}}{PII_{tot}} = \frac{k_{on} DNA}{k_{on}DNA + k_{off}V/10}$

We can use these expressions to calculate the amount of DNA bound RNA pol II for a diploid, where $DNA_{diploid} = 2 * DNA_{haploid}$ so that $DNA \to 2 DNA$ and 

$PII_{DNA} = \frac{c V}{1 + \frac{k_{off}V}{{10 * k_{on} * 2 DNA}}}$

We can also see that if you reduce the amount of RNA Polymerase by half, you expect to see the amount bound to DNA drop by half. Intuitively, this is because each molecule of pol II is independent of each other molecule - so they have no competition for spots on the genome


In [None]:
# load input variables from a file - to fit the haploid elutriation G1 arrest data

#G1 arrest replicates averaged mean size
x_values_av = [30.3561053, 41.0563371, 53.9679581, 69.5260199, 85.7878708, 109.8508, 139.284533]
#G1 arrest replicates averaged mean after replicates seperate and each rescaled to t=1
y_values_av = [1.08808962, 1.281816857, 1.623124174, 1.78260975, 1.940886253, 2.160079661, 2.213655257]

#G1 arrest replicates seperate mean size
x_values = [29.4901011950314, 41.5499478547111, 56.2999132223838, 72.3800582138898, 88.4701975863986, 117.9997262242110, 143.4001214406140, 28.1498966859752, 40.2099941573393, 53.6200084659693, 69.7000995971186, 87.1299029291209, 113.8999558721200, 151.5000801586290]

#G1 arrest replicates seperate not re-scaled per experiment
y_values = [1.004635373,1.272252161,1.501839433,1.641307461,1.734718777,2.004981673,2.096972061,1.171543866,1.291381553,1.744408914,1.923912038,2.147053728,2.315177649,2.330338452]

# objective function
def objective(x, a, b): 
    return a*(x/b)/(1+x/b)

def objective_varying_fraction(x, a, b): 
    nuclear_volume = 0.038 * x + 0.55
    return a * (x/b) / (1 + b * nuclear_volume)

def obj_exp_fraction(x, a, b):
    nuc_volume = x * (alpha + beta * np.exp(-delta * (x-15)))
    return a * (x/b) / (1 + b * nuc_volume)

# fit curve
(a, b), _ = curve_fit(objective, x_values, y_values)

(a2, b2), _ = curve_fit(objective_varying_fraction, x_values, y_values)

(a3, b3), _ = curve_fit(obj_exp_fraction, x_values, y_values)

# define new input values
x_new = np.linspace(0,150,100)

# use optimal parameters to calculate new values
y_new = objective(x_new, a, b)

y_2 = objective_varying_fraction(x_new, a2, b2)

y3 = obj_exp_fraction(x_new, a3, b3)

fig = plt.figure()
plt.plot(x_values_av ,y_values_av, 'ro',x_new,y_new, 'r')
plt.plot(x_new, y_2, 'g--')
plt.plot(x_new, y3, 'b--')

plt.xlabel('cell size (fl)')
plt.ylabel('DNA bound RNA Pol II')

# show the plot
plt.show()

In [None]:
# use the elutriation data fit to look at the single molecule data / there is no free fitting parameter
x_frac=[20.7154709145946, 32.7949322420818, 50.6573218458102, 90.4710354592766, 150.480403151359]
y_frac=[0.511800612594595, 0.483242974304699, 0.434892181601164, 0.362998958987166, 0.300707144719272]

x = np.linspace(0,180,100)
y = 1/(1+x/b)
# setting the axes at the centre
fig = plt.figure()

# plot the function
plt.plot(x_frac,y_frac, 'ro', label="data")
plt.plot(x,y, 'r', label="constant nuclear fraction")

nuclear_volume = 0.038 * x + 0.55
plt.plot(x, 1/(1 + b2 * nuclear_volume), label="varying nuclear fraction")

nuc_volume = x * (alpha + beta * np.exp(-delta * (x-15)))
plt.plot(x, 1/(1 + b3 * nuc_volume), label="exponential nuclear fraction")

plt.legend()
plt.grid()
plt.xlabel('cell size (fl)')
plt.ylabel('RNAP II bound fraction')
plt.xlim([15, 175])


In [None]:
# load input variables from a file - to fit the elutriation data

#G1 arrest replicates averaged mean size
x_values_haploid_av = [30.3561053, 41.0563371, 53.9679581, 69.5260199, 85.7878708, 109.8508, 139.284533]
#G1 arrest replicates averaged mean after replicates seperate and each rescaled to t=1
y_values_haploid_av = [1.08808962, 1.281816857, 1.623124174, 1.78260975, 1.940886253, 2.160079661, 2.213655257]

#G1 arrest replicates averaged mean size
x_values_diploid_av = [44.2178124, 64.3088305, 83.7608578, 107.347687, 136.410443, 167.910268, 206.300094]
#G1 arrest replicates averaged mean after replicates seperate and each rescaled to t=1
y_values_diploid_av = [1.934693045, 2.525299895, 2.897389887, 3.337740239, 3.580094407, 4.086145932, 4.055419358]

#G1 arrest replicates seperate mean
x_values_haploid = [29.4901011950314, 41.5499478547111, 56.2999132223838, 72.3800582138898, 88.4701975863986, 117.9997262242110, 143.4001214406140, 28.1498966859752, 40.2099941573393, 53.6200084659693, 69.7000995971186, 87.1299029291209, 113.8999558721200, 151.5000801586290]
#G1 arrest replicates seperate not re-scaled per experiment
y_values_haploid = [1.004635373,1.272252161,1.501839433,1.641307461,1.734718777,2.004981673,2.096972061,1.171543866,1.291381553,1.744408914,1.923912038,2.147053728,2.315177649,2.330338452]

#G1 arrest replicates seperate mean
x_values_diploid = [40.9965266219634, 59.3518208843852, 79.8297119995648, 101.711003356412, 130.55158384503, 159.935134942701, 201.407658700946, 47.4390981862732, 69.2658400752731, 87.6920036174882, 112.984369891381, 142.269301571159, 175.885400925365, 211.192529642018]
#G1 arrest replicates seperate not re-scaled per experiment
y_values_diploid = [1.866108552, 2.50025176, 2.716898491, 3.114691256, 3.163973993, 3.814225799, 4.026124967, 2.003277538, 2.55034803, 3.077881282, 3.560789221, 3.99621482, 4.358066064, 4.084713748]

# objective function
def objective(x, a, b): 
    return a*(x/b)/(1+x/(1*b))

# fit curve
popt, _ = curve_fit(objective, x_values_haploid, y_values_haploid)

# define new input values
x_new = np.linspace(0,250,100)
# unpack optima parameters for the objective function
a, b = popt
# use optimal parameters to calculate new values
y_new = objective(x_new, a, b)

# calculate the result for twice the DNA (ie diploid) 
def objective_diploid(x,a2,b2):
    return a2*(x/b2)/(1+x/(2*b2))

y_diploid = objective_diploid(x_new, a, b)
 
    
fig = plt.figure()
plt.plot(x_values_haploid_av ,y_values_haploid_av, 'ro', x_values_diploid_av, y_values_diploid_av, 'bo',x_new,y_new, 'r',x_new,y_diploid,'b')

plt.xlabel('cell size (fl)')
plt.ylabel('DNA bound RNA Pol II; blue=diploid prediction')

# show the plot
plt.show()

In [None]:
b

# calculate the RNA concentration assuming that transcription is proportional to bound rna pol II and that the global RNA degradation rate is proportional to the mRNA degradation rates $\beta$ measured for the two MET genes in Fig 1 of the draft. $C$ is a constant.

$\frac{\partial{mRNA}}{\partial t} \sim PII_{DNA} - C \beta \, mRNA$ 

and we can assume steady state $\frac{\partial{mRNA}}{\partial t} = 0$ so that we can solve for the $mRNA$ to be

$ C\,mRNA \approx \frac{PII_{DNA}}{\beta}$

for $whi5\Delta$ cells $V \approx 32 fl$, $PII_{DNA} \approx 1.19$, $\beta \approx 0.2 \, min^{-1}$, so that
$C\, mRNA \approx 5.95$ and $C\,[mRNA] \approx 0.19$

for $cln3\Delta$ cells $V \approx 66 fl$, $PII_{DNA} \approx 1.67$, $\beta \approx 0.15 \, min^{-1}$, so that
$C\, mRNA \approx 11.1$ and $C\,[mRNA] \approx 0.17$

In both cases the $[mRNA]$ is within 10% implying that most of the compensation to produce mRNA homeostasis could be taking place via regulation of mRNA decay rates. I used the $\beta$ from $MET3$ since that was easier to get from figure, but the half life change for $MET17$ seems similar.

I think this can be tested with a more granular measure of the Pol II ChIP to really see if the functional form fits. Because if there was appreciable feedback regulation on $k_{on}$ for pol II then that should lead to a systematic deviation from the fit for DNA bound pol II as a function of cell size

If pol II on the DNA is proportiona to the transcription rate, with constant $\alpha$, we can write the following equation for the mRNA synthesis rate:

$\frac{\partial \, mRNA}{\partial t} = \alpha PII_{DNA} - \beta([mRNA],V)\,mRNA$

which, we take at steady state and divide by the cell volume $V$ to yield

$\beta([mRNA],V) = \frac{\alpha PII_{DNA}}{V\,[mRNA]}$

But, until cells get really large then $[mRNA] = [mRNA]_0$, a constant, so that the degradation rate is just a function of the cell volume. After substituting the expression for the DNA bound pol II yields:

$\beta(V) = \frac{\alpha PII_{DNA}}{V\,mRNA} = \frac{\alpha \, c }{[mRNA]_0}\frac{1}{1+\frac{k_{off}\,V}{10 k_{on}\,DNA}}\approx \frac{\alpha \, c }{[mRNA]_0}\frac{1}{1+\frac{V}{40 fl}} $

This can be tested experimentally, but we need more than 2 cell sizes to measure the degradation rates, and it would be good to measure the degradation rates of the mRNA globally rather than just for the 2 genes if possible. 

But then, how to explain figure 4, or the general phenomenon that mutations to RNA stability affects the transcription rate? It could be that there is also a feedback from mRNA concentration to polymerase synthesis, stability, or the loading rate to the genome. This would then increase the amount of the nuclear amount of pol II on the genome relative to that expected when, for example, 50% of the pol II is removed from the nucleus. 

In [None]:
# perform error analysis using bootstrap fitting

#G1 arrest replicates seperate mean size
x_values = [29.4901011950314, 41.5499478547111, 56.2999132223838, 72.3800582138898, 88.4701975863986, 117.9997262242110, 143.4001214406140, 28.1498966859752, 40.2099941573393, 53.6200084659693, 69.7000995971186, 87.1299029291209, 113.8999558721200, 151.5000801586290]

#G1 arrest replicates seperate not re-scaled per experiment
y_values = [1.004635373,1.272252161,1.501839433,1.641307461,1.734718777,2.004981673,2.096972061,1.171543866,1.291381553,1.744408914,1.923912038,2.147053728,2.315177649,2.330338452]



# objective function
def objective(x, a, b): 
    return a*(x/b)/(1+x/b)

# fit curve
popt, _ = curve_fit(objective, x_values, y_values)

# define new input values
x_new = np.linspace(0,250,100)
# get optimal parameters for the objective function
a, b = popt
# use optimal parameters to calculate new values
y_new = objective(x_new, a, b)

print(a, b)


fig = plt.figure()
plt.plot(x_values ,y_values, 'ro',x_new,y_new, 'r')

plt.xlabel('cell size (fl)')
plt.ylabel('DNA bound RNA Pol II')

# show the plot
plt.show()

In [None]:
list1 = [i for i in range(14)]

xrand = [0 for i in range(14)]
yrand = [0 for i in range(14)]

# define new input values
# x_new = np.linspace(0,150,100)
# unpack optima parameters for the objective function
# a, b = popt
# use optimal parameters to calculate new values
# y_new_rand = objective(x_new, a, b)

# objective function

def objective(x, a, b): 
    return a*(x/b)/(1+x/b)

simulations = 10000

print(simulations)

a_rand = [0 for i in range(simulations)] 
b_rand = [0 for i in range(simulations)] 

for sim in range(simulations):
    for x in range(14):
        y=np.random.choice(list1)
        xrand[x]=x_values[y]
        yrand[x]=y_values[y]
    
    popt, _ = curve_fit(objective, xrand, yrand)
    # print(popt)
    a_rand[sim]=popt[0]
    b_rand[sim]=popt[1]

# print(a_rand)
# print(b_rand)

plt.hist(a_rand)



In [None]:
import statistics

x_new = np.linspace(0,250,100)


y_median = [0 for i in range(100)]
y_fifth = [0 for i in range(100)]
y_ninetyfifth = [0 for i in range(100)]

# print(simulations)
# print(y_median)
# print(x_new)


#y_new = objective(x_new, a_rand[0], b_rand[0])

#y_dist = [0 for i in range(simulations)] 

y_dist = [0 for i in range(simulations)] 

for xx in range(100):
    for x in range(simulations):
        y_dist[x]= objective(x_new[xx], a_rand[x], b_rand[x])
    y_median[xx]=np.percentile(y_dist, 50)
    y_fifth[xx]=np.percentile(y_dist, 5)
    y_ninetyfifth[xx]=np.percentile(y_dist, 95)



# print(y_median)

fig = plt.figure()
plt.plot(x_values ,y_values, 'ro',x_new,y_new, 'r',x_new,y_median,'b',x_new,y_fifth,'b',x_new,y_ninetyfifth,'b')

plt.xlabel('cell size (fl)')
plt.ylabel('DNA bound RNA Pol II')



In [None]:
# calculate the result for twice the DNA (ie diploid) and generate 5, 50, and 95th percentiles 

def objective_diploid(size,a2,b2):
    return a2*(size/b2)/(1+size/(2*b2))


x_new = np.linspace(0,250,100)

ydip_median = [0 for i in range(100)]
ydip_fifth = [0 for i in range(100)]
ydip_ninetyfifth = [0 for i in range(100)]

ydip_dist = [0 for i in range(simulations)] 

for xx in range(100):
    for x in range(simulations):
        ydip_dist[x]= objective_diploid(x_new[xx], a_rand[x], b_rand[x])
    ydip_median[xx]=np.percentile(ydip_dist, 50)
    ydip_fifth[xx]=np.percentile(ydip_dist, 5)
    ydip_ninetyfifth[xx]=np.percentile(ydip_dist, 95)

fig = plt.figure()
plt.plot(x_values_haploid_av ,y_values_haploid_av, 'ro', x_values_diploid_av, y_values_diploid_av, 'bo',x_new,y_new, 'r',x_new,ydip_median,'b',x_new,ydip_fifth,'b',x_new,ydip_ninetyfifth,'b')

plt.xlabel('cell size (fl)')
plt.ylabel('DNA bound RNA Pol II')

print(x_new)

In [None]:
print(*ydip_fifth, sep = "\n")

In [None]:
print()
print(*ydip_median, sep = "\n")

In [None]:
print(ydip_ninetyfifth)
