## [Feb 22] Causal Inference and Transfer Learning II

Presenter: Yuchen Ge  
Affiliation: University of Oxford  
Contact Email: gycdwwd@gmail.com  
Website: https://yuchenge-am.github.io

---

### 1. Counting Process Preliminary

> A **counting process** is an a.s. finite $\big($ i.e. $\mathbb{P}(\bigcup_t N(t)<\infty )=1$ $\big)$ stochastic process  $(N(t): t \geqslant 0)$  taking values in  $\mathbb{N}_{0}$  s.t.  $N(0)=0$, and is a right-continuous step function with increments of size +1.

It can be considered as a **Point Process**: a sequence of random variables  $T=   \left\{T_{1}, T_{2}, \ldots\right\}$, taking values in  $[0, \infty)$, has $ \mathbb{P}\left(0 \leqslant T_{1} \leqslant T_{2} \leqslant \ldots\right)=1$, and # points in a bounded region is a.s. finite.

> For all choices of  $0\leq t_{1}<t_{2}<\cdots<t_{n}$, a process has **independent** increments if $N\left(t_{i}\right)-N\left(t_{i-1}\right)$, $i \leq n$, are independent; a process has **stationary** increments if $N\left(t+h\right)-N\left(t\right)$ depends only on $h$, $\forall t \geq 0$.

Now we may define 

> ( **Poisson Process** ) A counting process  $(N(t): t \geqslant 0)$  is a homogeneous Poisson process with rate  $\lambda>0$  if $N(I) \sim \operatorname{Poi}(\lambda|I|)$; $N\left(I_{1}\right)$, $ N\left(I_{2}\right), \ldots, N\left(I_{n}\right)$  are independent, $\forall$ disjoint intervals  $I_{1}, I_{2}, \ldots, I_{n}$.


It follows that 

$$ \mathbb{P}(N(t)=n)=\frac{(\lambda t)^{n}}{n !} \mathrm{e}^{-\lambda t}, \quad n=0,1,2, \ldots
$$

There is a link between Bernoulli and Poisson processes. Fixing some $h>0$, partition time into disjoint intervals  $J_{1}^{(h)}=(0, h], J_{2}^{(h)}=(h, 2 h], \ldots$

Set $X_{n}$ = # arrivals in  $J_{n}^{(h)} \sim \operatorname{Ber}(p) = \operatorname{Ber}(\lambda h)$. Defining  $Y_{n}=   X_{1}+\cdots+X_{n}, Y_{n}$ = # arrivals in the interval  $(0, n h]$.

> The random variables $ N(t)$  and  $Y_{n}$  have approximately the same distribution.

**Proof.** Now consider a fixed time  $t>0$, which falls in the interval  $n=\lfloor t / h\rfloor$. Then,  $Y_{n} \sim \operatorname{Bin}(n, p)$  and

$$\begin{aligned}
\mathbb{P}\left(Y_{n}=m\right) & =\left(\begin{array}{l}
n \\
m
\end{array}\right)(\lambda h)^{m}(1-\lambda h)^{n-m} \approx\left(\begin{array}{l}
n \\
m
\end{array}\right)(\lambda t / n)^{m}(1-\lambda t / n)^{n-m} \\
& =\frac{n !}{(n-m) ! n^{m}} \frac{(\lambda t)^{m}}{m !}\left(1-\frac{\lambda t}{n}\right)^{n-m}.
\end{aligned}$$

As $ n \rightarrow \infty$, $\lim _{n \rightarrow \infty} \mathbb{P}\left(Y_{n}=m\right)=\frac{(\lambda t)^{m}}{m !} \mathrm{e}^{-\lambda t}$.

> **Thm I.** The interarrival times of the Bernoulli process are geometric, and those of the Poisson process are exponential. Furthermore, They both have independent interarrival times.
>
> **Thm II.** Both processes have superposition and thinning property.

**Proof.** We only prove thm I for Poisson process. Define $\mathbb{P}(N(t_2 - t_1) = k) = P(k; t_2 - t_1)$. Computation shows that for $\Delta T_k = T_k - T_{k-1}$ where $T_k$'s are the stopping times,

$$\begin{aligned}
\mathbb{P}\left(t_{1} \leq \Delta T_{1} \leq t_{1}+\delta,\right. & \left.t_{2} \leq \Delta T_{2} \leq t_{2}+\delta\right) \\
\approx & P\left(0 ; t_{1}\right) \cdot P(1 ; \delta) \cdot P\left(0 ; t_{2}-t_{1}-\delta\right) \cdot P(1 ; \delta) \\
= & e^{-\lambda t_{1}} \lambda \delta e^{-\lambda\left(t_{2}-\delta\right)} \lambda \delta.
\end{aligned}$$


We divide both sides by  $\delta^{2}$, and take the limit as  $\delta \downarrow 0$, to obtain

$$f_{\Delta T_{1}, \Delta T_{2}}\left(t_{1}, t_{2}\right)=\lambda e^{-\lambda t_{1}} \lambda e^{-\lambda t_{2}} . \quad t_{1}, t_{2}>0.$$


This shows that  $\Delta T_{2}$  is independent of  $\Delta T_{1}$ and has the same exponential distribution, which can easily generalize.


### 2. Stochastic Intensity from a Martingale Viewpoint

> $A \in \mathcal{V}_{T}$ (i.e. of finite variation) if it's an adapted cadlag process s.t. $\int_{0}^{T}\left|d A_{t}\right|<\infty$.
>
> $A \in \mathcal{A}_{T}$ (i.e. of integrable variation) if $A \in  \mathcal{V}_{T}$ s.t. $E\left[\int_{0}^{T}\left|d A_{t}\right|\right]<\infty$.

The optional  $\sigma$-algebra on $ R_{+} \times \Omega$  is generated by all $Y_{t}(\omega)=Z(\omega) I\{r \leq t<s\}$ () where $Z$ is ()-measurable. Also,

> The optional $\sigma$-algebra is generated by the class of adapted cadlag processes.

For “practical” purposes, **the difference between an adapted process and an optional process is very small and one may, without great risk, interpret the term “optional” as “adapted”.**


> Given the history up until the last arrival $u$, $H(u)$, the conditional cumulative distribution function (CDF) is 
>
> $$ \begin{aligned} F^*(t) & = F(t \mid \mathcal{H}(u))=\int_{u}^{t} \frac{\mathbb{P}\left(T_{k+1} \in(s, s+\mathrm{d} s] \mid \mathcal{H}(u)\right)}{\mathrm{d} s} \mathrm{d} s \\ & = \int_{u}^{t} f(s \mid \mathcal{H}(u)) \mathrm{d} s = \int_{u}^{t} f^*(s) \mathrm{d} s. \end{aligned}$$

Then we define 

$$\lambda^{*}(t, \omega) =  \lambda^{*}(t)=\frac{f^{*}(t)}{1-F^{*}(t)}$$

$$\lim _{h \searrow 0} \frac{\mathbb{E}[N(t+h)-N(t) \mid \mathcal{H}(t)]}{h}
$$

Intuitively,  $\lambda^{*}(t) \mathrm{d} t = \mathrm{E}\left[N(\mathrm{~d} t) \mid \mathcal{H}_{t}\right]$.

### 3. Hawkes Process

A **Hawkes Process** is a generalization of Poission process as is shown below.

> A **Hawkes Process** is   a counting process $(N(t): t \geqslant 0)$, with associated history  $(\mathcal{H}(t): t \geqslant 0)$, s.t.
>
> $$
\mathbb{P}(N(t+h)-N(t)=m \mid \mathcal{H}(t))=\left\{\begin{array}{ll}
1-\lambda^{*}(t) h+\mathrm{o}(h), & m=0 \\
\lambda^{*}(t) h+\mathrm{o}(h), & m=1 \\
\mathrm{o}(h), & m>1.
\end{array}\right. $$

Here we supppose $\lambda^{*}(t)=\lambda+\int_{0}^{t} \mu(t-u) \mathrm{d} N(u)$ where  $\lambda>0$  and  $\mu:(0, \infty) \rightarrow[0, \infty)$  which are called the background intensity and excitation function, respectively. 

> $\mu(\cdot) = 0 $ refers to a homogeneous Poisson process.






---

### Reference

1. Shuxiao Chen. Minimax Rates and Adaptivity in Combining Experimental and Observational Data.
2. Qingyuan Zhao. Lecture Notes on Causal Inference. 
2. Joaquin Quiñonero-Candela. Dataset Shift In Machine Learning.
3. Geoff K. Nicholls. Bayes Methods.
4. Patrick J. Laub. Hawkes Processes.
5. Tomas Björk. An Introduction to Point Processes from a Martingale Point of View.