# Overview of implementing omega on real experimental dataset

## Steps

1. Determine biochemical process that generated the experimental data and program this into Omega
    1. An example of this is a Gillespie simulation (although it doesn't need to be)
2. This implementation can run simulations which follow the same biochemical laws as the experiment
    1. For a given set of cells $X$ we can generate traces over some time period $T$ 
    2. For each time point $t_i$ in $T$ we can describe the cell abundance as $P(x_1, x_2, .., x_n|t_i)$
3. We can condition the implementation on our experimental data to recreate the experimental state
    $$\sum\nolimits_{t_i \in T} P(x_1, x_2,.., x_n|\text{Data}, t_i)$$
4. Use Omega's functionality (ie replace) on the conditioned trace to ask counterfactual queries about the experiment

## Biochemical model of organelle dynamics

Quick reminder on what our experimental data looks like. 

In [3]:
import pandas as pd
day1 = pd.read_csv('../../../../../Research/Causal_Inference/SDE_inference/Experimental_Data/Data/Day1/all.dat', sep=',').iloc[:, 1:]
day1.head()

Unnamed: 0,time,count,Cell
0,0,240,1
1,336,269,1
2,672,258,1
3,1008,264,1
4,1344,263,1


There are regular time intervals at every 336 and count of cells

#### Gillespie

Our model needs to be expressive enough to be conditioned on cell counts at *specific* times.

This rules out stochastic models which do not step at regular intervals, such as Gillespie. For any time $t_i$ Gillespie steps as follows: 
$$t_{i+1} = t_i + {a_0}e^{-a_0x}$$

Where $a_0$ is the sum of the hazards (probability values of each reaction) and $x$ is a value sampled from the uniform distribution $[0,1)$. This approximates a sampling from an exponential distribution.

In this model time is also a random variable. This would change our posterior, such that for any observation i, $P(x_i, t_i|Data_i)$ or in other words $x_i = Data_{x_i}$ and $t_i = Data_{t_i}$. This adds complications to our model and makes conditioning on the posterior much more challenging.

#### From *Statistical Inference of Peroxisome Dynamics*

Given join effect of three stochastic processes, the probabiliy $p(x, t)$ that the count equals x at time t is governed by the following equation

$$\frac{dp(x,t)}{dt} = [k_d + k_f(x-1)]p(x-1,t) + [\gamma(x+1)]p(x+1,t)-[k_d+(k_f + \gamma)x]p(x,t)$$

This can be further reformulated as an SDE:

$$dx(t) = [k_d + (k_f - \gamma)x(t)]dt + [k_d + (k_f + \gamma)x(t)]dW(t)$$

Where $W(t)$ is Brownian motion. Finally this equation can be solved with the Euler-Maruyama method.

$$x_{t+1} = [k_d + (k_f - \gamma)x_t]\Delta t + [k_d + (k_f + \gamma)x(t)]\sqrt{\Delta t}Z$$

$$Z \sim N(0,1)$$
$$\Delta t = t_{t+1}- t_{t}$$
$$k_d, k_f, \gamma \sim \text{Rate Parameters}$$

This model has the added bonus that time intervals can be set by the user, while still keeping the cell count a random variable.