# Equations

The N experiments reporting both activation and inactivation:
- $\mu_{aj}$ and $\sigma_{aj}$ where $j = 1, 2, ..., N$
- $\mu_{ij}$ and $\sigma_{ij}$ where $j = 1, 2, ..., N$
For clarity, omitting index $j$ where possible.

## Projection along best fit line

Best-fit line:

\begin{align}
v_i = a + bv_a
\end{align}

Mean (of means):
\begin{align}
\bar{\mu}_a &= \frac{1}{N}\sum \mu_a \\
\bar{\mu}_i &= \frac{1}{N}\sum \mu_i
\end{align}

Mean-centered points:
\begin{align}
x &= \mu_a - \bar{\mu}_a \\
y &= \mu_i - \bar{\mu}_i
\end{align}

Best-fit line through mean-centered points:

\begin{align}
y = bx
\end{align}

Projection and orthogonal vector ("rejection"):

\begin{align}
\textbf{p}_1 = \frac{x + by}{1 + b^2} \begin{bmatrix}1\\b\end{bmatrix}
\end{align}

\begin{align}
\textbf{p}_2 = \frac{y - bx}{1 + b^2} \begin{bmatrix}-b\\1\end{bmatrix}
\end{align}


Magnitude of $\textbf{p}_1$ and $\textbf{p}_2$:

\begin{align}
\|\textbf{p}_1\| = \sqrt{\frac{(x + by)^2(1 + b^2)}{(1 + b^2)^2}} = \sqrt{\frac{(x + by)^2}{1 + b^2}}
\end{align}

\begin{align}
\|\textbf{p}_2\| = \sqrt{\frac{(y - bx)^2(1 + b^2)}{(1 + b^2)^2}} = \sqrt{\frac{(y - bx)^2}{1 + b^2}}
\end{align}

The square root has two solutions. To get directional "principal components" we pick our favourites:

\begin{align}
d_1 \equiv \frac{x + by}{\sqrt{1 + b^2}} &&
d_2 \equiv \frac{y - bx}{\sqrt{1 + b^2}}
\end{align}

### Is that right?

We can check this by comparing the (square of the) length $\|d_1, d_2\|$ with $\|x, y\|$:

\begin{align}
d_1^2 + d_2^2 
    &= \frac{(x + by)^2+(y - bx)^2}{1 + b^2} \\
    &= \frac{x^2 + b^2y^2 + 2bxy + y^2 + b^2x^2 - 2bxy}{1 + b^2} \\
    &= \frac{(x^2 + y^2)(1 + b^2)}{1 + b^2} = x^2 + y^2
\end{align}

### Approximation $b \approx 1$

\begin{align}
d_1 = \frac{x + y}{\sqrt{2}} &&
d_2 = \frac{y - x}{\sqrt{2}}
\end{align}


## Statistical model

The values within an experiment are given by
\begin{align}
V_{aj} \sim \mathcal{D}(\mu_{aj}, \sigma_{aj}) &= V_a + \Delta_j + \Delta_{aj} + \mathcal{D}(0, \sigma_{aj}^2) \\
V_{ij} \sim \mathcal{D}(\mu_{ij}, \sigma_{ij}) &= V_i + \Delta_j + \Delta_{ij} + \mathcal{D}(0, \sigma_{ij}^2)
\end{align}

where $\Delta_j$ is a shift affecting both parameters in experiment $j$, where $\Delta_{aj}$ and $\Delta_{ij}$ are shifts affecting activation and inactivation separately, and where $\mathcal{D}$ is some unknown distribution with mean zero.


\begin{align}
\Delta_j    &\sim \mathcal{D}(0, \sigma_{\Delta}^2) \\
\Delta_{aj} &\sim \mathcal{D}(0, \sigma_{\Delta_a}^2) \\
\Delta_{ij} &\sim \mathcal{D}(0, \sigma_{\Delta_j}^2)
\end{align}

**Having $\Delta$, $\Delta_a$, and $\Delta_j$ instead of 2 variables is unidentifiable by default, but it feels a bit unfair to blame only one of the variables by making one a random function of the other, so leaving it like this until we get past the conceptual part.**

The "true" values of the midpoints of activation and inactivations are $V_a$ and $V_i$.

Focussing only on the means, and approximating the true values with the mean-of-means:

\begin{align}
\mu_a &= \bar{\mu}_a + \Delta + \Delta_a \\
\mu_i &= \bar{\mu}_i + \Delta + \Delta_i
\end{align}

### Relationship to principal components

\begin{align}
d_1 = \frac{x + y}{\sqrt{2}} = \frac{2 \Delta + \Delta_a +\Delta_i}{\sqrt{2}}
\end{align}

\begin{align}
d_2 = \frac{\Delta_i - \Delta_a}{\sqrt{2}}
\end{align}

To reduce it to two variables, we can write

\begin{align}
d_1 = \frac{2(\Delta + \Delta_a) + (\Delta_i - \Delta_a)}{\sqrt{2}} &= X + Y \\
d_2 = \frac{\Delta_i - \Delta_a}{\sqrt{2}} &= Y
\end{align}

But instead, lets assume normal distributions and go

\begin{align}
d_1 &\sim \mathcal{N}(0, 2 \sigma_\Delta^2) + \mathcal{N}(0, \sigma_a^2 + \sigma_i^2) = \mathcal{N}(0, 2 \sigma_\Delta^2 + \sigma_{\Delta_a}^2 + \sigma_{\Delta_i}^2)\\
d_2 &\sim \mathcal{N}(0, \sigma_{\Delta_a}^2 + \sigma_{\Delta_i}^2)
\end{align}

**Forgot about the sqrt 2 here. Might help get rid of the counterintuitive 2 in the next equations**

#### Approximation: $\sigma_{\Delta_a}^2 = \sigma_{\Delta_i}^2$

\begin{align}
d_1 &\sim \mathcal{N}(0, 2\sigma_\Delta^2 + 2\sigma_{\Delta_a}^2) \\
d_2 &\sim \mathcal{N}(0, 2\sigma_{\Delta_a}^2)
\end{align}




\begin{align}
\end{align}
\begin{align}
\end{align}
\begin{align}
\end{align}
\begin{align}
\end{align}
\begin{align}
\end{align}
\begin{align}
\end{align}
\begin{align}
\end{align}
\begin{align}
\end{align}
\begin{align}
\end{align}
\begin{align}
\end{align}
\begin{align}
\end{align}
\begin{align}
\end{align}
\begin{align}
\end{align}
\begin{align}
\end{align}
\begin{align}
\end{align}
\begin{align}
\end{align}
\begin{align}
\end{align}
\begin{align}
\end{align}
\begin{align}
\end{align}
\begin{align}
\end{align}
\begin{align}
\end{align}


## Extra

### Solving for a

\begin{align}
\mu_i - \bar{\mu}_i &= b(\mu_a - \bar{\mu}_a) \\
\mu_i &= (\bar{\mu}_i - b \bar{\mu}_a) + b \mu_a \\
      &= a + b \mu_a \\
a     &= \bar{\mu}_i - b \bar{\mu}_a
\end{align}

### How do we attribute factors to $\Delta, \Delta_a, \Delta_j, \sigma_a, \sigma_j$ ?

Different factors will be spread over the five statistical parameters differently.
We could make complicated models for that, but only if we knew all experimental procedures exactly.

The hardest one is probably time.
Say we need 5 minutes to set up, then 1-3 minutes to run an activation protocol (we might need to restart etc), then 1-3 minutes to run an inactivation protocol.
We might try encoding this as saying $\Delta$ incurs the 5 minutes, $\sigma_a$ some function of 2 minutes, $\Delta_i$ the 1 minute it takes to do activation _at minimum_, and $\sigma_i$ some function of 2 minutes _and_ some dependency on the activation experiment time...
It all gets complicated and depends on data we don't have (and can't get).
So let's not bother too much and see what we can do.