# Random Variables

$\newcommand{\ffrac}{\displaystyle \frac}
\newcommand{\Tran}[1]{{#1}^{\mathrm{T}}}
\newcommand{\d}[1]{\displaystyle{#1}}
\newcommand{\EE}[2][\,\!]{\mathbb{E}_{#1}\left[#2\right]}
\newcommand{\dd}{\mathrm{d}}
\newcommand{\Var}[2][\,\!]{\mathrm{Var}_{#1}\left[#2\right]}
\newcommand{\Cov}[2][\,\!]{\mathrm{Cov}_{#1}\left(#2\right)}
\newcommand{\Corr}[2][\,\!]{\mathrm{Corr}_{#1}\left(#2\right)}
\newcommand{\using}[1]{\stackrel{\mathrm{#1}}{=}}
\newcommand{\I}[1]{\mathrm{I}\left( #1 \right)}
\newcommand{\N}[1]{\mathrm{N} \left( #1 \right)}
\newcommand{\space}{\text{ }}
\newcommand{\QQQ}{\boxed{?\:}}
\newcommand{\SB}[1]{\left[ #1 \right]}
\newcommand{\P}[1]{\left( #1 \right)}
\newcommand{\abs}[1]{\left| #1 \right|}
\newcommand{\norm}[1]{\left\| #1 \right\|}
\newcommand{\CB}[1]{\left\{ #1 \right\}}$During an experiment, the quantities of interest, or the real-valued functions defined on the sample space are known as the ***Random Variables***, shortened as $r.v.$. And the value of the outcome variable is determined by the outcome of the experiment, we shall assign probabilities to the possible values of the $r.v.$.

A special case for this is the ***Indicator Variable***, denoted as $I$ for an event $E$.

$$I = \begin{cases}
1 & \text{if } E \text{ occurs} \\
0 & \text{if } E \text{ doesn't occur}
\end{cases}$$

**e.g.**  
Suppose that independent trials, each of which results in any of $m$ possible outcomes with respectively probabilities $p_1, \dots, p_m$, $\sum p_i = 1$, are continually performed. Let $X$ denote the number of trials needed until each outcome has occurred at least once. What's $P\CB{X = n}$?

>Instead solve that directly, we first calculate $P\CB{X = n}$. Let $A_i$ denote the event that outcome $i$ has not yet occured after the first $n$ trials, $i = 1, \dots, m$.
>
>$\begin{align}
P\left\{X > n\right\} &= P\left( \bigcup_{i=1}^{m} A_i \right) \\
&= \sum_{i=1}^{m} P(A_i) - \underset{i<j}{\sum\sum} P(A_iA_j) \\
& \;\;\; + \underset{i<j<k}{\sum\sum\sum} P(A_iA_jA_k) - \cdots + (-1)^{m+1}P(A_1 A_2 \cdots A_m)
\end{align}$
>
>Now, $P(A_i)$ is the probability that each of the first $n$ trials results in a $\text{non-}i$ outcome, and so by independence
>
>$P(A_i) = (1 - p_i)^{n}$
>
>And similarly, $P(A_iA_j)$ is the probability that the first $n$ trials all result in a $\text{non-}i$ and $\text{non-}j$ outcome, and so
>
>$P(A_iA_j) = (1-p_i - p_j)^{n}$
>
>As all of the other possibilities are similar, we see that
>
>$\begin{align}
P\left\{X>n\right\} = \sum_{i=1}^{n}(1-p_i)^n - \underset{i<j}{\sum\sum} (1-p_i - p_j)^n + \underset{i<j<k}{\sum\sum\sum} (1 - p_i - p_j - p_k)^n - \cdots
\end{align}$
>
>Since $P\left\{X=n\right\} = P\left\{X>n-1\right\} -P\left\{X>n\right\}$, and $(1-a)^{n-1} - (1-a)^ n = a(1-a)^{n-1}$ that 
>
>$\begin{align}
P\left\{X=n\right\} &= \sum_{i=1}^{m} p_i (1-p_i)^{n-1} - \underset{i<j}{\sum\sum} (p_i + p_j) (1-p_i - p_j)^{n-1} \\ 
&\;\;\;+ \underset{i<j<k}{\sum\sum\sum} (p_i+p_j+p_k)(1 - p_i - p_j - p_k)^n - \cdots
\end{align}$

***
Other than this **discrete** $r.v.$, we still have the **continuous** $r.v.$, like the lifetime of the car. 

We also define the ***culmulative distribution function***, $F(\cdot)$, of the $r.v.$ $X$, on any real number $b$, $-\infty < b < \infty$, by $F(b) = P\left\{X \leq b\right\}$. And some properties of the cdf $F$ are:

- $F(b)$ is nondecreasing function of $b$.$\\[0.7em]$
- $\lim\limits_{b \to \infty} F(b) = F(\infty) = 1\\[0.7em]$
- $\lim\limits_{b \to -\infty} F(b) = F(-\infty) = 0$

Also, we have: $P\left\{a < X \leq b\right\} = F(b) - F(a)$ for all $a < b$. And for $P\left\{X<b\right\}$, we need a new strategy:

$$\begin{align}
P\left\{X<b\right\}&= \lim_{h \to 0^+} P\left\{X \leq b-h\right\}\\
&= \lim_{h \to 0^+} F(b-h)
\end{align}$$

just keep in mind that $P\left\{X<b\right\}$ *may not* equal to $F(b)$.

# Discrete $r.v.$

A $r.v.$ that can take on at most a **countable** number of possible values is said to be ***discrete***, say $X$. We can define its ***probability mass function*** $p(a)$ as: $p(a) = P\left\{X = a\right\}$.

Easy to find that $p(a)$ is **positive** for at most a countable number of values of $a$. So if $X$ must assume one of the values $x_1, x_2, \dots$, then $p(x_i) > 0$ for $i = 1, 2, \dots$ and $p(x_i)=0$ for all other values of $x$.

Direct conclusions would be $\sum_{i=1}^{\infty}p(x_i) = 1$ and $F(a) = \sum_{x_i \leq a}x_i$

## The Bernoulli $r.v.$

For those $r.v.$ with the probability mass function defined as

$$
\begin{cases}
p(0) = P\left\{X=0\right\} = 1-p\\[0.5em]
p(1) = P\left\{X=1\right\} = p
\end{cases}
$$

where $0 < p < 1$, namely, the probability of successful trial.

## The Binomial $r.v.$

Suppose that $n$ independent trials, each of which results in a success with probability $p$ and a failure with probability $1-p$. Let $X$ denote the **number of successes** that occur in the $n$ trials, then $X$ is said to be a ***Binomial*** $r.v.$ with **parameters** $(n,p)$.

It's probability mass function is given by $p(i) = \d{\binom{n} {i}}p^i(1-p)^{n-i}$ for $i=0,1,\dots,n$, and it's easy to verify that this holds: 

$$\sum_{i=0}^{\infty}p(i) = \sum_{i=0}^{n} \binom{n} {i}p^i(1-p)^{n-i} = (p+(1-p))^{n} = 1$$

## The Geometric $r.v.$

Suppose that independent trials, each having probability $p$ of being a success and denote $X$ as the number of trials required until the first success. Then this $X$ is said to be a ***geometric*** $r.v.$ with parameter $p$. It's probability mass function is given by $p(n) = P\left\{X=n\right\} = (1-p)^{n-1}p$ for $n = 1,2,\dots$

And it's easy to verify that $\sum\limits_{n=1}^{\infty} p(n) = p\sum\limits_{n=1}^{\infty} (1-p)^{n-1} = 1$

## The Poisson Random Variable

For $r.v.$ $X$, taking on one of the values $i = 0,1,\dots$ with probability mass function given by

$$p(i) = P\left\{X=i\right\} = e^{-\lambda} \frac{\lambda^i} {i!}$$

And it's easy to verify that $\sum\limits_{i=0}^{\infty} p(i) = e^{-\lambda} \sum\limits_{i=0}^{\infty} \ffrac{\lambda^i} {i!} = e^{-\lambda}e^{\lambda} = 1$

One important application is to **approximate** a *binomial* $r.v.$, with large $n$ and small $p$.

$$\begin{align}
P_{\text{binom}}\left\{X=i\right\} &= \binom{n} {i} p^i (1-p)^{n-i} \\
&= \frac{n!} {(n-i)!i!} \left( \frac{\lambda} {n} \right)^i \left( 1 - \frac{\lambda} {n} \right)^{n-i} \\[0.6em]
&= \frac{n(n-1)(n-2) \cdots (n-i+1)} {n!} \frac{\lambda^i} {i!} \frac{(1-\lambda/n)^n} {(1-\lambda/n)^i} \\
& \approx 1 \cdot \frac{\lambda^i} {i!} \cdot \frac{e^{-\lambda}} {1} = P_{\text{poisson}}\left\{X=i\right\}
\end{align}$$

$Remark$

- $0!=1$
- For poisson distributed $X$, $\EE{X} = \lambda$

# Continuous $r.v.$

Now the $r.v.$ can take on a uncountable set of values, also say $X$. We say $X$ is a ***continuous*** $r.v.$ if there exists a **nonnegative** function $f(x)$, defined for all real $x \in (-\infty, \infty)$, having the property that for any set $B$ of real numbers, 

$$P\left\{X \in B\right\} = \int_{B} f(x) \;\dd{x}$$

here $f(x)$ is the ***probability density function*** of $X$. It must sastify $P\CB{X \in \left( -\infty, \infty\right)} = \d{\int_{-\infty}^{\infty} f(x)\;\dd{x} = 1}$. And one funny thing about this is for any *particular value* assumed to $X$ like $a$, $P\CB{X = a} = \d{\int_{a}^{a}f(x)\;\dd{x}}=0$.

Also, we can use this to define the cumulative distribution $F(\cdot)$: $F(a) = \d{\int_{-\infty}^{a} f(x) \;\dd{x}}$, then we can differentiate both sides and it yields: $\ffrac{\dd{}} {\dd{a}}F(a) = f(a)$.

## The Uniform Random Variable

A $r.v.$ is said to be ***uniformly distributed*** over the interval $(0,1)$ if its pdf is given by

$$f(x) = \begin{cases}
1 & \text{if } 0 < x < 1 \\[0.6em]
0 & \text{otherwise}
\end{cases}$$

And in general, we say that $X$ is a uniform random variable on the interval $(\alpha, \beta)$ if its pdf is given by

$$f(x) = \begin{cases}
\ffrac{1} {\beta - \alpha} & \text{if } \alpha < x < \beta \\[0.6em]
0 & \text{otherwise}
\end{cases}$$

## Exponential Random Variables

A continuous $r.v.$ whose pdf is given, for some $\lambda > 0$, by, 

$$f(x) = \begin{cases}
\lambda e ^{-\lambda x} & \text{if }x\geq 0 \\[0.6em]
0 & \text{if } x<0
\end{cases}$$

is said to be an ***exponential*** $r.v.$ with parameter $\lambda$. And for its cdf, we have

$$F(a) = \begin{cases}
\d{\int_{0}^{a} \lambda e ^{-\lambda x} \;\dd{x}} = 1 - e^{-\lambda a} & \text{if } a\geq 0 \\[0.6em]
0 & \text{if } a<0
\end{cases}$$

And also it's easy to verify that $F(\infty) = \d{\int_{0}^{\infty} \lambda e ^{-\lambda x} \;\dd{x} = 1}$

## Gamma Random Variables

A continuous $r.v.$ whose pdf is given, for some $\lambda > 0$ and $\alpha > 0$, by 

$$f(x) = \begin{cases}
\ffrac{\lambda e ^{-\lambda x} (\lambda x)^{\alpha-1}} {\Gamma(\alpha)} & \text{if } x\geq 0 \\[0.6em]
0 & \text{if } x<0
\end{cases}$$

is said to be a ***gamma*** $r.v.$ with parameter $\alpha$, $\lambda$, and ***gamma function***, $\Gamma(\alpha) = \d{\int_{0}^{\infty} e^{-x} x^{\alpha - 1} \; \dd{x}}$.
 
$Remark$

By induction we can show that $\Gamma(n) = (n-1)!$ for integral $n$.

>$$\begin{align}
\Gamma(n) &= \int_{0}^{\infty} e^{-x} x^{n-1} \;\dd{x} = (n-1)! \\
&= \int_{0}^{\infty} e^{-x}\; \ffrac{\dd{x^{n}}} {n}\\
&= \left.\ffrac{e^{-x}x^n} {n}\right|_{0}^{\infty} - \int_{0}^{\infty} -e^{-x} \ffrac{x^{n}} {n} \;\dd{x}
\end{align}$$

## Normal Random Variables

We say that $X$ is a ***normal*** $r.v.$ with parameters $\mu$ and $\sigma^2$ if its pdf is given by

$$f(x) = \frac{1} {\sqrt{2\pi\sigma^2}} \exp\CB{-\ffrac{(x-\mu)^2} {2\sigma^2}}$$

with $x \in \mathbb{R}$. It's density function is a bell-shaped curve that is symmetric around $\mu$.

$Remark$

If $X$ is normally distributed with parameters $\mu$ and $\sigma^2$, then for $Y = \alpha X + \beta$, it's also normally distributed with parameters $\alpha \mu + \beta$ and $\alpha^2 \sigma^2$. and from the linearity, $Y \in \mathbb{R}$.

>$$\begin{align}
F_Y(a) &= P(Y \leq a) = P(\alpha X + \beta \leq a)\\[0.6em]
&= F_X \left( \ffrac{a-\beta} {\alpha} \right) \\
&= \int_{-\infty}^{(a - \beta)/\alpha} \frac{1} {\sqrt{2\pi\sigma^2}} \exp\CB{-\ffrac{(x-\mu)^2} {2\sigma^2}} \;\dd{x} \\
&\stackrel{ y = \alpha x + \beta} {=} \int_{-\infty}^{a}\frac{1} {\sqrt{2\pi}\sigma\alpha} \exp\CB{-\ffrac{(y-(\alpha x + \beta))^2} {2\sigma^2\alpha^2}} \;\dd{y}
\end{align}$$

$Remark$

The previous result can be applied inversely so that any normally distributed $r.v.$ $X$ can be transformed into a specific one with parameters $0$ and $1$, by conducting $Z = (X - \mu)/\sigma$

# Expectation of a Random Variable
## The Discrete Case

If $X$ is a discrete $r.v.$ having a pmf $p(x)$, the the ***expected value*** of $X$ is defined by:

$\d{\EE{X} = \sum_{x:p(x)>0}xp(x)}$

**e.g.**  
Expectation of a **Bernoulli** $r.v.$

> $\EE{X} = 0 \cdot (1-p) + 1 \cdot p = p $

**e.g.**  
Expectation of a **Binomial** $r.v.$

> $\begin{align}
\EE{X} &= \sum_{i=0}^{n} i \cdot p(i) = \sum_{i=0}^{n} i \cdot \binom{n} {i} p^i (1-p)^{n-i} \\
&= \sum_{i=\mathbf{1}}^{n} \ffrac{n!} {(n-i)!(i-1)!} p^i (1-p)^{n-i} \\
&= np \sum_{i=\mathbf{1}}^{n} \ffrac{(n-1)!} {(n-i)!(i-1)!} p^{i-1} (1-p)^{n-i} \\
&\stackrel{k=i-1}{=} np \sum_{k=\mathbf{0}}^{n-1} \cdot \binom{n-1} {k} p^k (1-p)^{n-1-k} \\
&= np\left[p+(1-p)\right]^{n-1} = np
\end{align}$

**e.g.**  
Expectation of a **Geometric** $r.v.$

> $\begin{align}
\EE{X} &= \sum_{n=1}^{\infty} n \cdot p(1-p)^{n-1} \\
&\stackrel{q=1-p}{=} p \sum_{n=1}^{\infty}nq^{n-1} \\
&= p \sum_{n=1}^{\infty} \ffrac{\dd{}} {\dd{q}}q^{n} \\
&= p \ffrac{\dd{}} {\dd{q}} \left( \ffrac{q} {1-q} \right)\\
&= \ffrac{p} {(1-q)^2} = \ffrac{1} {p}
\end{align}$
>
> 错位相减法 also works

**e.g.**  
Expectation of a **Poisson** $r.v.$

> $\begin{align}
\EE{X} &= \sum_{i=0}^{\infty} i\cdot\ffrac{e^{-\lambda}\lambda^i} {i!} \\
&= \lambda e^{-\lambda} \sum_{i=\mathbf{1}}^{\infty} \ffrac{\lambda^{i-1}} {(i-1)!} \\
&= \lambda e^{-\lambda} e^{\lambda} = \lambda
\end{align}$

***

## The Continuous Case

For $r.v.$ $X$ with pdf $f(x)$, its expected value is defined by $\EE{X} = \d{\int_{-\infty}^{\infty} xf(x) \;\dd{x}}$.

**e.g.**  
Expectation of a **Uniform** $r.v.$

> $\begin{align}
\EE{X} &= \int_{\alpha}^{\beta} x \cdot \frac{1} {\beta - \alpha} \; \dd{x} \\
&= \frac{\beta^2 - \alpha^2} {2(\beta - \alpha)} = \frac{\beta + \alpha} {2}
\end{align}$

**e.g.**  
Expectation of a **Exponential** $r.v.$

> $\begin{align}
\EE{X} &= \int_{0}^{\infty} x \cdot \lambda e^{-\lambda x} \;\dd{x}  = \int_{0}^{\infty} -x\;\dd{e^{-\lambda x}}\\
&= \left. -xe^{-\lambda x}\right|_{0}^{\infty} + \int_{0}^{\infty} e^{-\lambda x} \; \dd{x} \\
&= 0 - \left. \frac{e^{-\lambda x}} {\lambda} \right|_{0}^{\infty} \\
&= \frac{1} {\lambda}
\end{align}$

**e.g.**  
Expectation of a **Normal** $r.v.$

> $\begin{align}
\EE{X} &= \frac{1} {\sqrt{2\pi}\sigma} \int_{-\infty}^{\infty} x \cdot \exp\CB{-\frac{(x-\mu)^2} {2\sigma^2}} \;\dd{x} \\
&\stackrel{y=x-\mu}{=}\frac{1}{\sqrt{2\pi}\sigma}\left( \int_{-\infty}^{\infty} y \exp\CB{-\frac{y^2} {2\sigma^2}} \; \dd{y} + \mu \int_{-\infty}^{\infty} \exp\CB{-\frac{(x-\mu)^2} {2\sigma^2}} \; \dd{x} \right) \\
&= \frac{1}{\sqrt{2\pi}\sigma} \int_{-\infty}^{\infty} y \exp\CB{-\frac{y^2} {2\sigma^2}} \; \dd{y} + \mu \int_{-\infty}^{\infty}  f(x) \; \dd{x}
\end{align}$

>By symmetricity of the first term we can conclude that $\EE{X} = \mu \cdot 1 = \mu$.

***

## Expectation of a Function of a $r.v.$

$Proposition$

For $X$ with pmf $p(x)$, and any real-valued function $g$, $\EE{g(X)} = \d{\sum_{x:p(x)>0}} g(x)p(x)$. And for those with pdf $f(x)$, we have $\EE{g(X)} = \d{\int_{-\infty}^{\infty}} g(x)f(x) \;\dd{x}$

$Corollary$

For constant $a$ and $b$, then $\EE{aX+b} = a\EE{X} + b$. 

***

We also call the quantity $\EE{X^n}$ for $n \geq 1$ the $n\text{-th}$ ***moment*** of $X$. And the variance, defined by $\Var{X} = \EE{(X - \EE{X})^2}$.

**e.g.**  
Variance of the ***Normal*** $r.v.$

> $\begin{align}
\Var{X} &= \EE{(X - \mu)^2} \\
&= \frac{1} {\sqrt{2\pi}\sigma}\int_{-\infty}^{\infty} (x-\mu)^2 \exp\CB{-\frac{(x-\mu)^2} {2\sigma^2}}\;\dd{x} \\
&\stackrel{y=(x-\mu)/\sigma}{=} \frac{\sigma^2} {\sqrt{2\pi}} \int_{-\infty}^{\infty} y^2 \exp\CB{-\frac{y^2} {2}} \;\dd{y} \\
&= \frac{\sigma^2} {\sqrt{2\pi}} \int_{-\infty}^{\infty} -y \;\dd{e^{-y^2/2}}\\
&= \frac{\sigma^2} {\sqrt{2\pi}} \left( \left.-ye^{-y^2/2}\right|_{-\infty}^{\infty} + \int_{-\infty}^{\infty} e^{-y^2/2} \;\dd{y} \right) \\
&= \frac{\sigma^2} {\sqrt{2\pi}} \cdot \int_{-\infty}^{\infty} e^{-y^2/2} \;\dd{y} \\
&= \sigma^2
\end{align}$

***

$Remark$

About the provement of $\d{\int_{-\infty}^{\infty} e^{-y^2/2} \;\dd{y}} = \sqrt{2\pi}$, u can use the method of double integral. Well, I am gonna think out of another way. (But failed finnaly... sad)

$Remark$

Another formular will connect the expectation and the variance: $\Var{X} = \EE{X^2} - (\EE{X})^2$, for both continuous case and discrete case.

# Jointly Distributed Random Variables
##  Joint Distribution Functions

For any two $r.v.$s $X$ and $Y$, we can define the ***joint cumulative probability distritbution function*** of $X$ and $Y$ by

$$F(a,b) = P\CB{X \leq a, Y \leq b}$$

for $a,b \in \mathbb{R}$. And with this we can find the ***marginal cumulative probability distribution*** like:

$$\begin{align}
F_X(a) &= P\CB{X \leq a} = P \CB{X \leq a, Y < \infty} = F(a, \infty)\\
F_Y(b) &= F(\infty, b)
\end{align}$$

In the case where $X$ and $Y$ are both discrete $r.v.$, it's also convenient to define the ***joint probability mass function*** of $X$ and $Y$ by: $p(x,y) = P\CB{X = x, Y=y}$, then following the ***marginal probability mass function*** like:

$$
p_X(x) = \sum_{y:p(x,y)>0} p(x,y) \;\lvert\; p_Y(y) = \sum_{x:p(x,y)>0} p(x,y)
$$

We say that $X$ and $Y$ are ***jointly continuous*** if there exists a function $f(x,y)$, namely, the ***joint probability density funciton*** of $X$ and $Y$, defined for all real $x$ and $y$, having the property that for all sets $A$ and $B$ of real numbers this holds:

$$\d{P\CB{X \in A, Y \in B} = \int_B \int_A f(x,y) \; \dd{x} \; \dd{y}}$$ 

And the ***marginal*** part:

$$\begin{align}
P\CB{X\in A} &= P\CB{X \in A, Y \in (-\infty,\infty)} \\
&= \int_{-\infty}^{\infty} \int_A f(x,y) \;\dd{x} \; \dd{y} \\
&= \int_A f_X(x)\;\dd{x}
\end{align}$$

where $f_X(x) = \d{\int_{-\infty}^{\infty} f(x,y) \; \dd{y}}$, which is how we obtain the marginal pdf of $X$.

And because $F(a,b) = P\CB{X \leq a,Y \leq b}=\d{\int_{-\infty}^{a}\int_{-\infty}^{b} f(x,y) \;\dd{y} \;\dd{x}}$, differentiation yields:

$$\ffrac{\dd{}^2} {\dd{a}\;\dd{b}}F(a,b) = f(a,b)$$

The expectation can be calculated by

$$
\EE{g(X,Y)} = \begin{cases}
\d{\sum_y \sum_x g(x,y) p(x,y)} & \text{discrete case}\\
\d{\int_{-\infty}^{\infty}\int_{-\infty}^{\infty} g(x,y) f(x,y) \;\dd{x}\;\dd{y}} &\text{continuous case}
\end{cases}
$$

A direct application of this is

$$\EE{\sum_{i=1}^{n}a_iX_i} = \sum_{i=1}^{n}a_i\EE{X_i}$$

for $n$ $r.v.$s and $n$ constants $a_1, \dots, a_n$.

$Remark$

Only applicable for linear combination!

**(V)e.g.**  
Choose $10$ letters from $A$ to $Z$. Compute the expected number of different types that are contained in a set of $10$ letters.

> It's hard to calculate that directly, so we break it apart, $10$ parts. We first define $X_i$ as

>$$X_i = \begin{cases}
1, & \text{if at least one type of letter } i \text{ is in the set of } 10\\
0, & \text{otherwise}
\end{cases}$$

>Then $X$, as the number of different types in the set of $10$ letters, we have $X = \sum X_i$. And we have

>$$\begin{align}
\EE{X_i} &= P\CB{X_i = 1} \\
&= 1 - P\CB{\text{no type of letter }i\text{ are in the set of }10} \\
&= 1 - \left(\ffrac{25} {26}\right)^{10}
\end{align}$$
>
>So that $\EE{X} = \sum\EE{X_i} = 26\left[1 - \left(\ffrac{25} {26}\right)^{10}\right]$

***

## Independent $r.v.$

$X$ and $Y$ are said to be ***independent*** if for all $a$, $b$, we have $P\CB{X \leq a, Y \leq b} = P\CB{X \leq a}\cdot P\CB{Y \leq b}$. *In other words*, the events $E_a = \CB{X \leq a}$ and $F_b = \CB{Y \leq b}$ are independent.

In terms of the joint distribution function $F$, we have that $X$ and $Y$ are independent if for $\forall a,b$,

$$F(a,b) = F_X (a) \cdot F_Y(b)$$

which can also be reduced to 

$$\begin{cases}
p(x,y) &\!\!\!\!= p_X(x) \cdot p_Y(y) & \text{discrete case}\\[0.6em]
f(x,y) &\!\!\!\!= f_X(x) \cdot f_Y(y) & \text{continuous case}
\end{cases}$$

$Proposition$

If $X$ and $Y$ are independent, the for any functions $h$ and $g$: $\EE{g(X)\cdot h(Y)} = \EE{g(X)} \cdot \EE{h(Y)}$.

## Covariance and Variance of Sums of $r.v.$

The covariance of *any* two random variables $X$ and $Y$, denoted by $\Cov{X,Y}$, is defined by

$$\begin{align}
\Cov{X, Y} &= \EE{(X - \EE{X}) \cdot (Y - \EE{Y})} \\
&= \EE{XY - Y\EE{X} - X\EE{Y} + \EE{X}\EE{Y}} \\
&= \EE{XY} - \EE{X} \EE{Y}
\end{align}$$

Easy to see that if $X$ and $Y$ are independent, then $\Cov{X,Y} = 0$

$Remark$

In general it can be shown that a **positive** value of $\Cov{X,Y}$ is an **indication** that $Y$ tends to increase as $X$ does, whereas a negative value indicates that $Y$ tends to decrease as $X$ increase.

**e.g.**  

Given the joint density function of $X$, $Y$, $f(x) = \ffrac{1} {y} \exp\CB{-y-\ffrac{x} {y}}$, for $0 < X,Y < \infty$. Verify that and find the covariance.

> For the verification:

>$$\begin{align}
\int_{-\infty}^{\infty}\int_{-\infty}^{\infty} f(x,y) \;\dd{y} \;\dd{x} &= \int_{0}^{\infty}\int_{0}^{\infty} \ffrac{1} {y} \exp\CB{-y-\frac{x} {y}} \;\dd{y} \;\dd{x} \\
&= \int_{0}^{\infty} e^{-y} \int_{0}^{\infty} \frac{1} {y} \exp\CB{-\frac{x} {y}} \;\dd{x} \;\dd{y}\\
&= \int_{0}^{\infty} e^{-y} \;\dd{y} \\
&= 1
\end{align}$$

>And for the covariance we first need the expectation of separate $r.v.$s. Two ways available. For $\EE{X}$,

>$$
\begin{align}
\EE{X} &= \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} x\cdot f(x,y) \;\dd{y} \;\dd{x} \\
&= \int_{0}^{\infty} e^{-y} \int_{0}^{\infty} \frac{x} {y} \exp\CB{-\frac{x} {y}} \;\dd{x} \;\dd{y}
\end{align}$$

>Note that $\d{\int_{0}^{\infty} \frac{x} {y} \exp\CB{-\frac{x} {y}} \;\dd{x}}$ is the exponential $r.v.$ with parameter $\ffrac{1}{y}$ and thus is equal to $y$. Consequently, $\EE{X} = \d{\int_{0}^{\infty} y e^{-y} \;\dd{y} = 1}$.

>***
>Then for $\EE{Y}$, we need another method. We first calculate the marginal probablity $f_Y(y)$.
>
>$f_Y(y) = e^{-y} \d{\int_{0}^{\infty} \ffrac{1} {y} \exp\CB{-\ffrac{x} {y}}\;\dd{x}} = e^{-y}$, then $\EE{Y} = 1$.
>
>***
>$$
\begin{align}
\EE{XY} &= \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} xy \cdot f(x,y) \;\dd{y} \;\dd{x} \\
&= \int_{0}^{\infty} y e^{-y} \int_{0}^{\infty} \frac{x} {y} \exp\CB{-\frac{x} {y}} \;\dd{x} \;\dd{y} \\
&= \int_{0}^{\infty} y^2 e^{-y} \;\dd{y} \\
&= \int_{0}^{\infty} -y^2 \;\dd{e^{-y}} \\
&= \left.-y^2 e^{-y}\right|_{0}^{\infty} + \int_{0}^{\infty} 2ye^{-y} \;\dd{y} = 2\EE{Y} = 2
\end{align}$$

>Consequently, $\Cov{X,Y} = \EE{XY} - \EE{X}\EE{Y} = 1$

$Remark$

>Covariance equaling to $0$ can't imply that the two are independent, the inverse statement is true though.

$Other \space properties$

For any $r.v.$s $X$, $Y$, $Z$ and constant $c$, we have

- $\Cov{X,X} = \Var{X}\\[0.5em]$
- $\Cov{X,Y} = \Cov{Y,X}\\[0.5em]$
- $\Cov{cX,Y} = c \cdot\Cov{X, Y}\\[0.5em]$
- $\Cov{X,Y+Z} = \Cov{X,Y} + \Cov{X,Z}$

And the generalized forth property: $\d{\Cov{\sum_{i=1}^{n}X_i,\sum_{j=1}^{m}Y_j}=\sum_{i=1}^{n}\sum_{j=1}^{m} \Cov{X_i,Y_j}}$.

And one more application for variance

$$
\begin{align}
\Var{\sum_{i=1}^{n} X_i} &= \Cov{\sum_{i=1}^{n}X_i,\sum_{j=1}^{n}X_j} \\
&= \sum_{i=1}^{n} \sum_{j=1}^{n} \Cov{X_i, X_j} \\
&= \sum_{i=1}^{n}\Cov{X_i, X_i} + \sum_{i=1}^{n} \sum_{j \neq i} \Cov{X_i, X_j} \\
&= \sum_{i=1}^{n}\Var{X_i} + 2 \sum_{i=1}^{n} \sum_{j < i} \Cov{X_i, X_j}
\end{align}$$

Even, when $X_i$ are independent, this will reduce to $\d{\Var{\sum_{i=1}^{n}X_i} = \sum_{i=1}^{n} \Var{X_i}}$

$Def$

If $X_1, X_2, \dots, X_n$ are **independent** and **identically distributed**, we define the ***sample mean*** as

$$\bar{X} = \frac{1} {n}\sum_{i=1}^{n} {X_i}$$

$Proposition$

Suppose that $X_1, \dots, X_n$ araea independent distributed with expected value $\mu$ and variance $\sigma^2$. Then:

- $\EE{\bar{X}} = \mu$
- $\Var{\bar{X}} = \ffrac{\sigma^2}{n}\\[0.5em]$
- $\Cov{\bar{X}, X_i - \bar{X}} = 0$, $i = 1,2,\dots,n\\[0.5em]$

**e.g.**  
Variance of a **Binomial** $r.v.$

>We first break it up. $X = X_1 +\cdots+X_n$, with the $n$ components from $n$ independent *Bernoulli* $r.v.$. Then we have

>$$\Var{X} = \sum \Var{X_i}$$

>Since $\Var{X_i} = \EE{X_i^2} - \left(\EE{X_i}\right)^2 = p - p^2$, $\Var{X} = np(1-p)$

***

**e.g.** ***The Hypergeometric***  

Consider $N$ individuals with $p$ percent of whom are in favor of a certain proposition and the rest are opposed, where $p$ is assumed to be *unknown* and required to *estimate*. We will randomly choosing and then determining the positions of $n$ members of the population.

>We use the portion of the favored in the sample as an estimator of $p$. First we let
>
>$$X_i = \begin{cases}
1, &\text{if the }i\texttt{th}\text{ person chosen is in favor} \\[0.5em]
0, &\text{otherwise}
\end{cases}$$
>
>Then the estimator of $p$ is $\ffrac{1} {n}\sum_{i=1}^{n} X_i$. We now compute its mean and variance for a little comparison
>
>$\d{\EE{\ffrac{1} {n}\sum_{i=1}^{n} X_i} = \ffrac{1} {n}\sum_{i=1}^{n} \EE{X_i} } = p$
>
>$\d{\Var{\ffrac{1} {n}\sum_{i=1}^{n} X_i} = \ffrac{1} {n^2} \left(\sum_{i=1}^{n} \Var{X_i} + 2 \underset{i<j}{\sum\sum} \Cov{X_i, X_j}\right)}$
>
>Easy to see that $X_i$ is a **Bernoulli** $r.v.$ so that $\Var{X_i} = p(1-p)$, so now we get down to handling the covariance.
>
>$\begin{align}
\Cov{X_i,X_j} &= \EE{X_i \cdot X_j} - \EE{X_i} \cdot \EE{X_j} \\[0.5em]
&= P\CB{X_i = 1, X_j = 1} - p^2 \\[0.5em]
&= \ffrac{Np} {N} \cdot \ffrac{Np-1} {N-1} - p^2\\[0.5em]
\end{align}$
>
>$\begin{align}
\Var{\ffrac{1} {n}\sum_{i=1}^{n} X_i} &= \ffrac{1} {n^2} \left[ np(1-p) + 2\binom{n}{2} \left( \ffrac{Np} {N} \cdot \ffrac{Np-1} {N-1} - p^2 \right) \right] \\
&= \ffrac{p(1-p)} {n} - \ffrac{(n-1)p(1-p)} {n(N-1)} = \ffrac{p(1-p)(N-n)} {n(N-1)}
\end{align}$

$Remark$

When $N$ increases, the variance goes larger and the limiting value as $N \to \infty$ is $p(1-p)/n$, which is not surprising since for $N$ large enough each $X_i$ can be considered as *independent* **bernoulli** $r.v.$ and thus $\sum X_i$ can be considered as **binomial** distribution with parameter $n$ and $p$.

$Remark$

The real ***Hypergeometric*** $r.v.$ is brought out like this: Totally $N$ identities with $p$ percent with a feature and the rest not. Then we select $n$ identities from all $N$ and denote $X$ as the number of identities with that feature:

$$\d{P\CB{X=k}} = \ffrac{\d{\binom{Np} {k}\binom{N-Np} {n-k}}} {\d{\binom{N} {n}}}$$

And an easy example for this is to consider an urn with $Np$ red balls and $N-Np$ blu balls in. We take $n$ balls out and this is the *distribution* of the number of blue balls.
***

Another thing is the ***convolution*** of the distributions $F_X$ and $F_Y$, the distribution of $X+Y$, from the distributions of $X$ and $Y$, given they are **independent**.

$$
\begin{align}
F_{X+Y}(a) &= P \CB{X+Y \leq a} \\
&= \iint_{x+y \leq a} f(x) g(y) \;\dd{x} \;\dd{y} \\
&= \int_{-\infty}^{\infty} \int_{-\infty}^{a-y} f(x) g(y) \;\dd{x} \;\dd{y}\\
&= \int_{-\infty}^{\infty} \left( \int_{-\infty}^{a-y} f(x) \;\dd{x} \right) g(y) \;\dd{y} \\
&= \int_{-\infty}^{\infty} F_X (a-y) g(y) \;\dd{y}
\end{align}$$ 

Then we differentiating both sides of the equation above, the pdf comes:

$$
\begin{align}
\ffrac{\dd{}} {\dd{a}} F_{X+Y}(a) &= \ffrac{\dd{}} {\dd{a}} \int_{-\infty}^{\infty} F_X (a-y) g(y) \;\dd{y} \\[0.7em]
f_{X+Y}(a) &= \int_{-\infty}^{\infty} \ffrac{\dd{}} {\dd{a}} \big(F_X (a-y)\big)g(y) \;\dd{y} \\
&= \int_{-\infty}^{\infty} f(a-y) g(y) \;\dd{y}
\end{align}$$ 

**(V)e.g.** **Sum** of Two Independent **Uniform** $r.v.$  

Given $X$ and $Y$ are independent $r.v.$ both uniformly distributed on $(0,1)$, find the pdf of $X+Y$.

>First we have $f(z) = g(z) = \begin{cases}
1, & 0 < z < 1 \\[0.5em]
0, & \text{otherwise}
\end{cases}$, and with the previous formula we have: 
>
>$$f_{X+Y} (z) = \int_{-\infty}^{\infty} f(z-y)g(y) \;\dd{y} = \int_0^1f(z-y)\;\dd{y}$$
>
>Then for $0 \leq z  \leq 1$, this yields $\d{f_{X+Y}(z) = \int_{0}^{z} \;\dd{y} = z}$. For $1 < z < 2$, we get $\d{f_{X+Y} (z) = \int_{z-1}^{1}\;\dd{y} = 2-z}$. Hence we draw the conclusion as
>
>$$
f_{X+Y}(z) = \begin{cases}
z, & 0 \leq z \leq 1 \\[0.5em]
2-z, & 1 < z < 2 \\[0.5em]
0, & \text{otherwise}
\end{cases}$$
>
>Just for fun, I also calculate the "triple" one:

>$$
f_{Z+Y}(w) = \begin{cases}
\frac{1} {2}w^2, & 0 \leq z \leq 1 \\[0.5em]
-w^2 + 3w - \frac{3} {2}, & 1 < z \leq 2 \\[0.5em]
\frac{1} {2}w^2 - 3w + \frac{9} {2} = \frac{1} {2}(w-3)^2, & 2 < z < 3 \\[0.5em]
0, & \text{otherwise}
\end{cases}$$

**e.g.** **Sum** of Independent **Poisson** $r.v.$  

Let $Χ$ and $Y$ be independent **Poisson** $r.v.$ with respective means $\lambda_1$ and $\lambda_2$. 

>$$\begin{align}
P\CB{X + Y = n} &= \sum_{k=0}^{n} P \CB{X=k, Y = n-k} \\
&= \sum_{k=0}^{n} P\CB{X = k} \cdot P\CB{Y = n-k} \\
&= \sum_{k=0}^{n} e^{-\lambda_1} \ffrac{\lambda_1^k} {k!} \cdot e^{-\lambda_2} \ffrac{\lambda_2^{n-k}} {(n-k)!}\\
&= \ffrac{e^{-\lambda_1 - \lambda_2}} {n!} \sum_{k=0}^{n} \ffrac{n!} {k!(n-k)!} \lambda_1^k \lambda_2^{n-k} \\
&= \ffrac{e^{-\lambda_1 - \lambda_2}} {n!} (\lambda_1 + \lambda_2)^n
\end{align}$$

>In words, $X_1+X_2$ has a **Poisson** distribution with mean $\lambda_1 + \lambda_2$.

$Remark$

The general idea of independency is for all values $a_1, a_2, \dots, a_n$, we have

$$P\CB{X_1 \leq a_1, X_2 \leq a_2, \dots, X_n \leq a_n} = P\CB{X_1 \leq a_1} \cdot P\CB{X_2 \leq a_2} \cdot  \cdots \cdot P\CB{X_n \leq a_n} $$

The ***Order Statistics*** 

Let $X_1, \dots, X_n$ be $i.i.d.$ continuous $r.v.$ with cdf $F$ and pdf $f = F'$. Define $X_{(i)}$ as the $i\texttt{th}$ smallest of these $r.v.$, then $X_{(1)}, \dots, X_{(n)}$ are called the ***Order Statistics*** . Find their distributions.

 $$P\CB{X_{(i)} \leq x} = \sum_{k=i}^{n} \binom{n} {k} \big(F(x)\big)^k \big( 1-F(x) \big)^{n-k}$$

Differentiation yields that the density function of $X_{(i)}$ is as follows:

$$
\begin{align}
f_{X_{(i)}} (x) &= f(x) \left( \sum_{k=i}^{n}\binom{n} {k} \cdot k \big(F(x)\big)^{k-1} \big( 1-F(x) \big)^{n-k} - \sum_{k=i}^{n}\binom{n} {k} \big(F(x)\big)^{k} \cdot (n-k) \big( 1-F(x) \big)^{n-k-1} \right) \\
&= f(x) \left( \sum_{k=i}^{n} \ffrac{n!} {(n-k)!(k-1)!} \big(F(x)\big)^{k-1} \big( 1-F(x) \big)^{n-k} - \sum_{k=i}^{n}\ffrac{n!} {(n-k-1)!k!} \big(F(x)\big)^{k} \big( 1-F(x) \big)^{n-k-1} \right) \\
&= f(x) \left( \sum_{k=i}^{n} \ffrac{n!} {(n-k)!(k-1)!} \big(F(x)\big)^{k-1} \big( 1-F(x) \big)^{n-k} - \sum_{j=i+1}^{n}\ffrac{n!} {(n-j)!(j-1)!} \big(F(x)\big)^{j-1} \big( 1-F(x) \big)^{n-j} \right) \\
&= \ffrac{n!} {(n-i)!(i-1)!} f(x) \big(F(x)\big)^{i-1}\big(1-F(x)\big)^{n-i}
\end{align}$$

***

## Joint Probability Distribution of Functions of $r.v.$

Let $X_1$ and $X_2$ be jointly continuous $r.v.$ with jointly pdf $f(x_1,x_2)$. We need to obtain the joint distribution of two new $r.v.$s $Y_1$ and $Y_2$ that arise as functions of $X_1$ and $X_2$, with $Y_1 = g_1(X_1, X_2)$ and $Y_2 = g_2(X_1, X_2)$.

$Assumptions$

1. The equations $y_1 = g_1(x_1, x_2)$ and $y_2 = g_2(x_1, x_2)$ can be *uniquely* solved for $x_1$ and $x_2$ in terms of $y_1$ and $y_2$ with solutions given by $x_1 = h_1(y_1,y_2)$ and $x_2 = h_2(y_1,y_2)$.$\\[0.7em]$
2. The functions $g_1$ and $g_2$ have *continuous partial derivatives* at *all points* $(x_1, x_2)$ and are such that the following determinant$\\[0.6em]$
$$J(x_1, x_2) = \begin{vmatrix}
\ffrac{\partial g_1}{\partial x_1} & \ffrac{\partial g_1}{\partial x_2} \\ 
\ffrac{\partial g_2}{\partial x_1} & \ffrac{\partial g_2}{\partial x_2}
\end{vmatrix} \equiv \ffrac{\partial g_1}{\partial x_1} \cdot \ffrac{\partial g_2}{\partial x_2} \; \: - \; \: \ffrac{\partial g_1}{\partial x_2} \cdot \ffrac{\partial g_2}{\partial x_1} \neq 0\\[0.5em]$$
at all points $(x_1, x_2)$.

Under these, $Y_1$ and $Y_2$ are jointly continuous with their joint density function given by

$$f_{Y_1,Y_2}(y_1,y_2) = g(y_1,y_2) = f_{X_1,X_2}(x_1, x_2) \big| J(x_1, x_2) \big|^{-1}$$

where $x_1 = h_1(y_1,y_2)$ and $x_2 = h_2(y_1,y_2)$. This formula can be obtained by differiate the following equationon both sides with respect to $y_1$ and $y_2$.

$$P\CB{Y_1 \leq y_1,Y_2 \leq y_2} = \iint\limits_{\d{\begin{array}{c}
(x_1,x_2): \\
g_1(x_1,x_2) \leq y_1\\
g_2(x_1,x_2) \leq y_2
\end{array}}} f_{X_1,X_2}(x_1, x_2) \;\dd{x_1}\;\dd{x_2}$$

**e.g.**  

If $X$ and $Y$ are independent **gamma** $r.v.$s with parameters $(\alpha, \lambda)$ and $(\beta, \lambda)$, respectively. Find the joint density of $U = X + Y$ and $V = X/(X+Y)$.

> From their independency, we have their joint density function first
>
> $$\begin{align}
f_{X,Y}(x,y) &= f_X(x) \cdot f_Y(y) \\
&= \ffrac{\lambda e^{-\lambda x} (\lambda x)^{\alpha - 1}} {\Gamma(\alpha)} \cdot \ffrac{\lambda e^{-\lambda x} (\lambda x)^{\beta - 1}} {\Gamma(\beta)} \\
&= \ffrac{\lambda^{\alpha + \beta} } {\Gamma(\alpha)\Gamma(\beta)} e^{-\lambda(x+y)} x^{\alpha-1} y^{\beta -1}
\end{align}$$
>
>Given $g_1(x,y) = x+y, g_2(x,y) = x/(x+y)$, we have $\ffrac{\partial g_1} {\partial x} = \ffrac{\partial g_1} {y} = 1$, $\ffrac{\partial g_2}{\partial x} = \ffrac{y} {(x+y)^2}$, and $\ffrac{\partial g_2} {\partial y}=- \ffrac{x} {(x+y)^2}$, also the solutions: $x = u\upsilon$ and $y=u(1-\upsilon)$ so that:
>
>$$J(x,y) = \begin{vmatrix}
1 & 1\\[0.6em]
\ffrac{y}{\left( x+y \right )^2} & \ffrac{-x}{\left( x+y \right )^2}
\end{vmatrix} = - \ffrac{1} {x+y}$$
>
>$$
\begin{align}
f_{U,V}(u,\upsilon) &= f_{X,Y}(x,y) \cdot (x+y) \\[0.6em]
&= f_{X,Y}(u\upsilon, u(1-\upsilon)) \cdot u \\[0.6em]
&= \ffrac{\lambda^{\alpha + \beta} } {\Gamma(\alpha)\Gamma(\beta)} e^{-\lambda(u\upsilon+u(1-\upsilon))} (u\upsilon)^{\alpha-1} (u(1-\upsilon))^{\beta -1} \cdot u \\[0.6em]
&= \ffrac{\lambda^{\alpha + \beta-1}\cdot \lambda} {\Gamma(\alpha)\Gamma(\beta)} \cdot e^{-\lambda u} \cdot u^{1 + \alpha-1 + \beta -1} \cdot \ffrac{\Gamma(\alpha + \beta)} {\Gamma(\alpha + \beta)} \cdot \upsilon^{\alpha-1} \cdot (1-\upsilon)^{\beta -1} \\[0.6em]
&= \ffrac{\lambda e^{-\lambda u} (\lambda u)^{\alpha + \beta -1}} {\Gamma(\alpha + \beta)} \cdot \ffrac{ \upsilon^{\alpha-1} (1-\upsilon)^{\beta -1} \Gamma(\alpha + \beta)} {\Gamma(\alpha)\Gamma(\beta)}
\end{align}$$

$Remark$

Later we will know that $X+Y$ is also a **gamma** $r.v.$ with parameter $(\alpha + \beta , \lambda)$, thus with a pdf: $\d{f_{U}(u)  = \ffrac{\lambda e^{-\lambda u} (\lambda u)^{\alpha + \beta -1}} {\Gamma(\alpha + \beta)}}$. 

Also, since $X+Y$ and $X/(X+Y)$ are independent, we can also see that: $\d{f_V{(\upsilon)} = \ffrac{\upsilon^ {\alpha-1} (1-\upsilon)^{\beta -1} \Gamma(\alpha + \beta)} {\Gamma(\alpha)\Gamma(\beta)}}$, which is called the ***beta density*** with parameters $(\alpha, \beta)$, with $0<\upsilon<1$.

$\QQQ$ The last paragraph.

***

And the same method can be applied to more than $2$ $r.v.$s. When the joint density function of the $n$ variables $X_1, X_2, \dots, X_n$ is given and we wnat to compute the joint density function of $Y_1, Y_2, \dots, Y_n$, where

$$Y_1 = g_1(X_1, X_2, \dots, X_n), Y_2 = g_2(X_1, X_2, \dots, X_n), \dots, Y_n = g_n(X_1, X_2, \dots, X_n)$$

Same assumptions required, like the continuous partial derivable and that the Jacobian determinant $J(x_1,x_2,\dots, x_n) \neq 0$ for all points $(x_1,x_2,\dots, x_n)$.

$$J(x_1,x_2,\dots, x_n) = \begin{vmatrix}
\ffrac{\partial g_1} {\partial x_1} & \ffrac{\partial g_1} {\partial x_2} & \cdots & \ffrac{\partial g_1} {\partial x_n} \\ 
\ffrac{\partial g_2} {\partial x_1} & \ffrac{\partial g_2} {\partial x_2} & \cdots & \ffrac{\partial g_2} {\partial x_n} \\ 
\vdots & \vdots & \ddots & \vdots \\ 
\ffrac{\partial g_n} {\partial x_1} & \ffrac{\partial g_n} {\partial x_2} & \cdots & \ffrac{\partial g_n} {\partial x_n} 
\end{vmatrix}$$

and the equation set has a unique solution, $x_i = h_i(y_1,y_2,\dots,y_n)$, for $y_1 = g_1 (x_1,x_2,\dots,x_n), y_2 = g_2 (x_1,x_2,\dots,x_n), \dots, y_n = g_n (x_1,x_2,\dots,x_n)$. Under these, the joint dnsity function of the $r.v.$s $Y_i$  is given by

$$f_{Y_1,Y_2,\dots, Y_n}(y_1,y_2,\dots,y_n) = f_{X_1,X_2,\dots,X_n}(x_1,x_2,\dots, x_n)\big|J(x_1,x_2,\dots, x_n)\big|^{-1}$$

where $x_i = h_i(y_1,y_2,\dots,y_n)$.

***

# Moment Generating Functions

The ***moment generating function*** $\phi(t)$ of the $r.v.$ $X$ is defined for all values $t$ by

$$\phi(t) = \EE{e^{tX}} = \begin{cases}
\d{\sum_x e^{tx} \cdot p(x)}, & \text{if } X \text{is discrete} \\[0.5em]
\d{\int_{-\infty}^{\infty} e^{tx} \cdot f(x) \;\dd{x}}, & \text{if } X \text{is continuous}
\end{cases}$$

We can use this function to obtain all the moments of $X$ by successively differentiating $\phi(t)$.

$$\begin{align}
\phi'(t) &= \ffrac{\dd{}} {\dd{t}} \EE{e^{tX}} \\
&= \EE{\ffrac{\dd{}} {\dd{t}} e^{tX}} \\
&= \EE{Xe^{tX}} \\[0.8em]
\Longrightarrow \phi'(0) &= \EE{X}
\end{align}$$

Similarly, $\phi''(t) = \EE{X^2 e^{tX}} \; \Longrightarrow \; \phi''(0) = \EE{X^2}$. So in general, the $n\texttt{th}$ derivative of $\phi(t)$ evaluated at $t=0$ equals $\EE{X^n}$, for $n \geq 1$.

**e.g.** The **Binomial** Distribution

>$$\begin{align}
\phi(t) &= \EE{e^{tX}} \\
&= \sum_{k=0}^{n} e^{tk} \cdot \left( \binom{n} {k} p^k (1-p)^{n-k} \right)\\
&= \sum_{k=0}^{n} \binom{n} {k} (pe^t)^k (1-p)^{n-k} \\[0.5em]
&= (pe^t + 1 - p)^n
\end{align}$$

>Hence, $\EE{X} = \phi'(0) = n(pe^t + 1 - p)^{n-1} \cdot pe^t \big.\big|_{t=0} = np$ and $\EE{X^2} = \cdots = n(n-1) p^2 + np$. Thus we can also obtain the variance: $\Var{X} = \EE{X^2} - (\EE{X})^2 = \cdots = np(1-p)$.

**e.g.** The **Poisson** Distribution

>$$\begin{align}
\phi(t) &= \EE{e^{tX}} \\
&= \sum_{n=0}^{\infty} e^{tn} \cdot \ffrac{e^{-\lambda} \lambda^n} {n!}\\
&= e^{-\lambda} \sum_{n=0}^{\infty} \ffrac{\left( \lambda e^t \right)^n} {n!} \\
&= \QQQ e^{-\lambda} \cdot e^{\lambda e^t} = \exp\CB{\lambda \left(e^t - 1 \right)}
\end{align}$$

>Differentiation yields: $\phi'(t) = \lambda e^t \exp\CB{\lambda \left(e^t - 1 \right)}$ and $\phi''(t) = \left( \lambda e^t \right)^2 \exp\CB{\lambda \left(e^t - 1 \right)} + \lambda e^t \exp\CB{\lambda \left(e^t - 1 \right)}$

> and so $\EE{X} = \lambda$, $\EE{X^2} = \lambda^2 + \lambda$. And $\Var{X} = \lambda$

**e.g.** The **Exponential** Distribution

>$$\begin{align}
\phi(t) &= \EE{e^{tX}} \\
&= \int_{0}^{\infty} e^{tx} \cdot \lambda e^{-\lambda x} \;\dd{x} \\
&= \lambda \int_{0}^{\infty} e^{-\left(\lambda - t\right)x} \;\dd{x} \\
&\stackrel{t < \lambda} {=} \ffrac{\lambda} {\lambda - t}
\end{align}$$

>Differentiation of $\phi(t)$ yields $\phi'(t) = \ffrac{\lambda} {\left(\lambda - t\right)^2}, \phi''(t) = \ffrac{2\lambda} {\left(\lambda - t\right)^3}$. Thus, $\EE{X} = \phi'(0) = \ffrac{1} {\lambda}$, $\EE{X^2} = \phi''(0) = \ffrac{2} {\lambda^2}$ and the variance of $X$ is given by $\Var{X} = \EE{X^2} - \left(\EE{X}\right) ^2 = \ffrac{1} {\lambda^2}$

$Remark$

Only when $t < \lambda$ can we calculate the integral.

**e.g.** The **Normal** Distribution

>$$\begin{align}
\EE{e^{tZ}} &= \int_{-\infty}^{\infty} e^{tz} \cdot \ffrac{1} {\sqrt{2\pi}} \exp\CB{-\ffrac{z^2} {2}} \;\dd{z} \\
&= \ffrac{1} {\sqrt{2\pi}} \int_{-\infty}^{\infty} \exp\CB{-\ffrac{z^2 - 2tz} {2}} \;\dd{z} \\
&= \ffrac{e^{t^2/2}} {\sqrt{2\pi}} \int_{-\infty}^{\infty} \exp\CB{-\ffrac{(x-t)^2} {2}} \;\dd{z} = \exp\CB{ \ffrac{t^2}{2}}
\end{align}$$

$\space$

>Here $Z$ is a ***standard normal*** $r.v.$, so for any normal $r.v.$ $X = \sigma Z + \mu$ with parameters $\mu$ and $\sigma^2$, we have
>
>$$\phi(t) = \EE{e^{tX}} = \EE{e^{t\left(\sigma Z + \mu\right)}} = e^{t\mu} \EE{e^{t\sigma Z}} = \exp\CB{ \ffrac{\sigma^2 t^2} {2} + \mu t}$$
>
>And by differentiating we obtian $\phi'(t) = \left(\mu + t \sigma^2\right) \exp\CB{\ffrac{\sigma^2 t^2} {2} + \mu t}$, so $\EE{X} = \phi'(0) = \mu$, and $\phi''(t) = \left(\mu + t \sigma^2\right)^2 \exp\CB{\ffrac{\sigma^2 t^2} {2}} + \sigma^2 \exp\CB{\ffrac{\sigma^2 t^2} {2}}$, so $\EE{X^2} = \phi''(0) = \mu^2 + \sigma^2$, implying that $\Var{X} = \sigma^2$.

***

An important property of the **moment generating function** is that: For the sum of *independent* $r.v.$s, it's mgf is just the product of the individual mgfs. Suppose $X$ and $Y$ are independent and have mgf $\phi_X(t)$ and $\phi_Y(t)$, respectively.

$$\begin{align}
\phi_{X+Y}(t) &= \EE{e^{t\left(X+Y\right)}} \\
&= \EE{e^{tX} \cdot e^{tY}} \\
&\stackrel{\texttt{independency}} {=} \EE{e^{tX}}\cdot \EE{e^{tY}} = \phi_X(t)\phi_Y(t)
\end{align}$$

Another important property is that the mgf *uniquely* determines the distribution. It's a one-to-one correspondence.

$Remark$

More about the **Poisson** Distribution, the ***Poisson paradigm***, that the number of success in $n$ trials that are either independent or at most weakly dependent is, when the trial success probabilities are all small, approximately a **Poisson** $r.v.$.

$Remark$

***Laplace transform***, for nonnegative $r.v.$ $X$, is defined as for $t \geq 0$, $g(t) = \phi(-t) = \EE{e^{-tX}}$. This would limit the value between $0$ and $1$.

We can also define the ***joint moment generating function*** of more than just two $r.v.$s. For any $n$ $r.v.$s $X_1, X_2, \dots, X_n$, and for all real values of $t_1, t_2, \dots, t_n$ we define:

$$\phi(t_1, t_2, \dots, t_n) = \EE{\exp\CB{t_1X_1 + t_2X_2 + \cdots + t_nX_n}}$$

and it can be shown that $\phi(t_1, t_2, \dots, t_n)$ uniquely determines the joint distribution of $X_1, X_2, \dots, X_n$.

**e.g.** The ***Multivariate Normal Distribution***

Let $Z_1,\dots,Z_n$ be a set of $n$ independent standart normal random variables. If, for some constants $a_{ij}$ and $\mu_i$, $1 \leq i \leq m$, $1 \leq j \leq n$,

$$
\begin{array}{rcl}
X_1\!\!\!\! &=&\!\!\!\!a_{11}Z_1 + \cdots + a_{1n}Z_n + \mu_1 \\
X_2 \!\!\!\!&=&\!\!\!\!a_{21}Z_1 + \cdots + a_{2n}Z_n + \mu_2 \\
& \vdots & \\
X_i \!\!\!\!&=&\!\!\!\!a_{i1}Z_1 + \cdots + a_{in}Z_n + \mu_i \\
& \vdots & \\
X_m \!\!\!\!&=&\!\!\!\!a_{m1}Z_1 + \cdots + a_{mn}Z_n + \mu_m
\end{array}$$

Then the $r.v.$s $X_1, X_2, \dots, X_n$. are said to have a **Multivariate Normal Distribution**.

> Easy to see that $\EE{X_i} = \mu_i$ and $\Var{X_i} = \sum\limits_{j=1}^{n}a_{ij}^2$. Then $\EE{\sum\limits_{i=1} ^{m} t_iX_i} = \sum\limits_{i=1}^{m} t_i\mu_i$ and

>$$\Var{\sum\limits_{i=1} ^{m} t_iX_i} = \Cov{\sum\limits_{i=1} ^{m} t_iX_i,\sum\limits_{j=1} ^{m} t_jX_j} = \sum\limits_{i=1}^{m}\sum\limits_{j=1}^{m} t_it_j\Cov{X_i,X_j}$$

>$$\phi\left(t_1,\dots,t_m\right) = \exp\CB{\sum\limits_{i=1}^{m} t_i\mu_i + \ffrac{1} {2} \sum\limits_{i=1}^{m} \sum\limits_{j=1}^{m} t_it_j\Cov{X_i,X_j}}$$

***

##  The Joint Distribution of the Sample Mean and Sample Variance from a Normal Population

$X_1,\dots,X_n$ are independent and identical distributed $r.v.$s, each with mean $\mu$ and variance $\sigma^2$. We now define the ***sample mean*** $\bar{X} =\ffrac{1} {n}\sum\limits_{i=1}^{n} X_i$ and ***sample variance***:

$$S^2 = \sum_{i=1}^{n}\ffrac{\left(X_i - \bar{X}\right)^2} {n-1}$$

With the fact that

$$\begin{align}
\sum_{i=1}^{n} \left(X_i - \bar{X}\right) &= \sum_{i=1}^{n} \left( X_i - \mu + \mu - \bar{X} \right)^2 \\
&= \left[\sum_{i=1}^{n} \left(X_i - \mu \right)^2\right] + n\left(\mu - \bar{X}\right)^2 + 2\left(\mu - \bar{X} \right)\sum_{i=1}^{n} \left(X_i - \mu \right) \\
&= \left[\sum_{i=1}^{n} \left(X_i - \mu \right)^2\right] + n\left(\mu - \bar{X}\right)^2 - 2n\left(\mu - \bar{X}\right)^2 = \left[\sum_{i=1}^{n} \left(X_i - \mu \right)^2\right] - n\left(\bar{X} - \mu\right)^2
\end{align}$$

we can calculate the expectation as

$$\begin{align}
\EE{S^2} &= \ffrac{1} {n-1} \left[\left(\sum_{i=1}^{n} \EE{(X_i - \mu)^2}\right)-n\EE{\left(\bar{X} - \mu\right)^2 }\right] \\
&= \ffrac{1} {n-1}\left(n\sigma^2 - n\Var{\bar{X}}\right) = \sigma^2
\end{align}$$

$Def$ ***Chi-Squared*** $r.v.$

If $Z_1,\dots,Z_n$ are *independent* **standard normal** $r.v.$s then the $r.v.$ $\sum Z_i^2$ is said to be a **chi-squared** $r.v.$ with $n$ ***degrees of freedom***.

We first compute its mgf, note that

$$\begin{align}
\EE{\exp\CB{tZ_i^2}} &= \int_{-\infty}^{\infty}\exp\CB{tx^2}\cdot\ffrac{1} {\sqrt{2\pi}} \exp\CB{-\ffrac{x^2} {2}} \;\dd{x} \\
&= \ffrac{1} {\sqrt{2\pi}} \int_{-\infty}^{\infty} \exp\underset{\;\;\begin{array}{c}
\uparrow \\
\sigma^2 = \left(1-2t\right)^{-1}
\end{array}}{\CB{-\ffrac{x^2} {2\sigma^2}}} \;\dd{x} \\[0.8em]
&= \sigma = \left(1-2t\right)^{-1/2}
\end{align}$$

Hence,

$$\EE{\exp\CB{t\sum_{i=1}^{n} Z_i^2}} = \prod_{i=1}^{n} \EE{\exp\CB{tZ_i^2}} = \left(1-2t\right)^{-n/2}$$

$Remark$

Consider $Y$ be a **normal** $r.v.$ with mean $\mu$ and variance $\sigma^2/n$ that is independent of $X_1, \dots, X_n$, then the $r.v.$s $Y, X_1-\bar{X}, X_2-\bar{X},\dots,X_n-\bar{X}$ have a **multivariate normal** distribution. Since they are independent, $\Cov{Y,X_i - \bar{X}} = 0$ for $i = 1,\dots, n$. Also, $\EE{Y + X_1-\bar{X} + \cdots + X_n-\bar{X}} = \EE{Y} = \EE{\bar{X}}$.

Our conclusion is that for a **multivariate normal** distribution is *completely*, *uniquely* determined by its expected values and covariances, $\bar{X}$ is independent of the sequence of deviations $X_i - \bar{X}$, $i = 1,\dots, n$.

So that it's also independent of the **sample variance** $S^2 \equiv \ffrac{1} {n-1} \sum_{i=1}^{n}\left(X_i - \bar{X} \right)^2$ and now we're gonna determine the distribution of $S^2$.

$$\ffrac{n-1} {\sigma^2} S^2 = \left[ \sum_{i=1}^{n} \ffrac{\left(X_i - \mu \right)^2} {\sigma^2}\right] - \ffrac{n\left(\bar{X} - \mu\right)^2} {\sigma^2} \Rightarrow \ffrac{(n-1)S^2} {\sigma^2} + \left( \ffrac{\bar{X} - \mu} {\sigma / \sqrt{n}} \right)^2 = \sum_{i=1}^{n} \ffrac{\left(X_i - \mu \right)^2} {\sigma^2}$$

The key is to use the mgf. We've already seen that the mgf for the right side term, the **chi-squared** $r.v.$ with $n$ degree of freedom and the second term on the left, the square of a **standard normal** $r.v.$, the **chi-squared** $r.v.$ with $1$ degree of freedom. So that

$$\EE{\exp\CB{t\cdot \ffrac{(n-1)S^2} {\sigma^2}}}(1-2t)^{-1/2} = (1-2t)^{-n/2}$$

Thus, the mgf of $\ffrac{n-1} {\sigma^2} S^2$ is the same with that of a **chi-squared** $r.v.$ with $n-1$ degrees of freedom, where we can claim the proposition

$Proposition$

If $X_1,\dots,X_n$ are $i.i.d.$ **normal** $r.v.$s with mean $\mu$ and varianc $\sigma^2$, then the **sample mean** $\bar{X}$ and **sample variance** $S^2$ are independent. $\bar{X}$ is a **normal** $r.v.$ with mean $\mu$ and variance $\sigma^2/n$; $(n-1)S^2/\sigma^2$ is a **chi-squared** $r.v.$ with $n-1$ degrees of freedom.

# The Distribution of the Number of Events that Occur

Consider arbitrary events $A_1,\dots,A_n$, and let $X$ denote the number of these events that occur. What's the pmf of $X$? We first define:

$$S_k = \sum_{\d{i_1 < \cdots < i_k}} P\left( A_{\d{i_1}},\dots,A_{\d{i_k}} \right)$$

as the sum of the probabilities of all the $\d{\binom{n} {k}}$ intersections ($\cap$) of $k$ distinct events, and note that the inclusion-exclusion identity states that

$$P\CB{X>0} = P\left(\bigcup_{i=1}^{n}A_i\right) = S_1 - S_2 + S_3 - \cdots + (-1)^{n+1} S_n$$

Now, to help understand we fix $h$ of the $n$ events, say $A_{1},\dots,A_{h}$ and let $A=\bigcap\limits_{j=1}^{h} A_{j}$ be the event that all $h$ of these events occur. Also, let $B=\bigcap\limits_{j\notin\CB{1, 2, \dots, h}} A_{j}^{c}$ be the none of the other $n-h$ events occur. Consequently, $A\cap B = AB$ is the event that $A_{1}, \dots,A_{h}$ are the *only* events to occur. Then, since $A = AB \cup AB^c$, we have $P(AB) = P(A) - P(AB^c)$.

While $B^c = \bigcup\limits_{j \notin \CB{1, 2, \dots, h}} A_j$, so that $P(AB^c) = P\left( A\bigcup\limits _{j\notin\CB{1,\dots,h}} A_j\right) = P\left( \; \bigcup\limits _{j\notin\CB{1,\dots,h}} AA_j \right)$.

Then we apply the inclusion-exclusion identity again:

$$
\begin{align}
P(AB^c) &= \sum_{\d{j\notin \CB{1, 2, \dots, h}}} P(AA_j) - \sum_{\d{j_1 <j_2 \notin \CB{1,\dots,h}}} P( AA_{\d{j_1}}A_{\d{j_2}} ) \\[1em]
&\;\;\;\;+ \sum_{\d{j_1 < j_2 < j_3 \notin \CB{1, 2, \dots, h}}} P( AA_{\d{j_1}}A_{\d{j_2}} A_{\d{j_3}} ) - \cdots
\end{align}$$

Then followed by $P(A\cap B) = P(A) - P(AB^c) = P(A_1 \cap A_2 \cap \cdots \cap A_h) - P(AB^c)$, we can approach our final generalized answer,

$\;\;\;\;
\begin{align}
P\CB{X=k} &= \sum_{\d{i_1 <\dots<i_k}} \left[ P\left(A_{\d{i_1}} \cap A_{\d{i_2}} \cap \cdots \cap A_{\d{i_k}}\right) - \sum _{\d{j \notin \CB{i_1,\dots, i_k}}} P\left(A_{\d{i_1}} \cap A_{\d{i_2}} \cap \cdots \cap A_{\d{i_k}} \cap A_{\d{j}}\right)\right. \\[0.8em]
&\;\;\;\;+ \sum_{\d{j_1 < j_2 \notin \CB{i_1,\dots,i_k}}} P\left( A_{\d{i_1}} \cap A_{\d{i_2}} \cap \cdots \cap A_{\d{i_k}} \cap A_{\d{j_1}} \cap A_{\d{j_2}} \right) \\
&\;\;\;\;- \left.\sum_{\d{j_1 < j_2 < j_3 \notin \CB{i_1,\dots,i_k}}} P\left( A_{\d{i_1}} \cap A_{\d{i_2}} \cap \cdots \cap A_{\d{i_k}} \cap A_{\d{j_1}} \cap A_{\d{j_2}} \cap A_{\d{j_3}} \right)+\cdots \right]
\end{align}$

Kinda complex, how to simplify this expression?

First note that $S_k = \sum\limits_{\d{i_1 <\dots<i_k}} P\left( A_{\d{i_1}} \cap A_{\d{i_2}} \cap \cdots \cap A_{\d{i_k}} \right)$. Now consider 

$$\sum_{\d{i_1 < \cdots <i_k}} \; \sum_{\d{j \notin \CB{i_1,\dots,i_k}}}P\left( A_{\d{i_1}} \cap A_{\d{i_2}} \cap \cdots \cap A_{\d{i_k}} \cap A_j \right)$$

Surely, there're repetition. These $k+1$ distinct events are choosed by two steps. We let them be $A_{\d{m_1}} \cap A_{\d{m_2}} \cap \cdots \cap A_{\d{m_{k+1}}}$, thus it's easier to find out the probability of every intersection actually appear $\d{\binom{k+1}{k}}$ times in this multiple summation. Hence:

$\;\;\;\;
\begin{align}
&\sum_{\d{i_1 < \cdots <i_k}} \; \sum_{\d{j \notin \CB{i_1,\dots,i_k}}}P\left( A_{\d{i_1}} \cap A_{\d{i_2}} \cap \cdots \cap A_{\d{i_k}} \cap A_j \right) \\
=& \binom{k+1}{k} \sum_{\d{m_1 < \cdots < m_{k+1}}} P\left(A_{\d{m_1}} \cap A_{\d{m_2}} \cap \cdots \cap A_{\d{m_{k+1}}}\right)\\
=& \binom{k+1}{k} S_{k+1} \\[1em]
&\texttt{So mf easier!}
\end{align}$

Similarly, we can say

$\;\;\;\;P\CB{X=k} = S_k - \d{\binom{k+1} {k}}S_{k+1} + \cdots + (-1)^{n-k}\d{\binom{n} {k}}S_n = \d{\sum_{j=k}^{n}} (-1)^{k+j} \d{\binom{j} {k}} S_j$

Using this we will now prove that $P\CB{X \geq k} = \d{\sum_{j=k}^{n} (-1)^{k+j} \binom{j-1}{k-1}S_j}$. We will use a backwards mathematical induction that starts with $k=n$. Now, when $k=n$ the preceding identity states that

$$P\CB{X=n} = Sn$$

First step finished! So we assume that $P\CB{X \geq k+1} = \d{\sum_{j=k+1}^{n} (-1)^{k+1+j} \binom{j-1} {k}S_j}$. And then

$$\begin{align}
P\CB{X \geq k} &= P\CB{X = k} + p\CB{X \geq k+1} \\[0.8em]
&= \left[\sum_{j=k}^{n} (-1)^{k+j} \binom{j} {k} S_j\right] + \left[ \sum_{j=k+1}^{n} (-1)^{k+1+j} \binom{j-1} {k}S_j \right]\\
&= S_k + \left[ \sum_{j=k+1}^{n} (-1)^{k+j} \left[\binom{j} {k} - \binom{j-1} {k} \right] S_j \right] \\
&= S_k + \sum_{j=k+1}^{n} (-1)^{k+j} \binom{j-1} {k-1}S_j = \sum_{j=k}^{n} (-1)^{k+j} \binom{j-1} {k-1}S_j 
\end{align}$$

All done!

# Limit Theorems

First we prove the **Markov's inequality**.

$Proposition$ ***Markov's inequality***

If $X$ is a $r.v.$ that takes only *nonnegative* values, then for any value $\alpha > 0$,

$$P\CB{X \geq a} \leq \ffrac{\EE{X}} {a}$$

$Proof$

This proof is for the case where $X$ is continuous with density $f$.

$$\begin{align}
\EE{X} &= \int_{0}^{\infty} x\cdot f(x) \;\dd{x}\\
&= \int_{0}^{a} x\cdot f(x) \;\dd{x} + \int_{a}^{\infty} x\cdot f(x) \;\dd{x}\\
&\geq \int_{a}^{\infty} x\cdot f(x) \;\dd{x} \geq \int_{a}^{\infty} a\cdot f(x)\;\dd{x} \\
&= a \cdot P\CB{X \geq a}
\end{align}$$

$Remark$

From the process of proving it, we can easily find that this holds for discrete $r.v.$, and a slightly different result can be made if the $r.v.$ only takes nonpositive values.

$Proposition$ ***Chebyshev's Inequality***

If $X$ is a $r.v.$ with mean $\mu$ and variance $\sigma^2$, then, for any value $k > 0$,

$$P \CB{\left|X - \mu \right| \geq k} \leq \ffrac{\sigma^2} {k^2}$$

$Proof$

Since $\left(X - \mu\right)^2$ is a nonnegative $r.v.$, we can apply the previous proposition, the **Markov's inequality** (with $a = k^2$) to obtain:

$$P\CB{\left(X - \mu\right)^2 \geq k^2} = P\CB{\left|X - \mu\right| \geq k} \leq \ffrac{\EE{\left(X-\mu\right)^2}} {k^2}$$

$Remark$

These two propositions are important for that they produce the methods to find a bound for a certain probability given limited infomations like the mean or the mean and the variane.

**e.g.**

The number of items produced in a factory during a week is a $r.v.$ with *mean* $500$.

What's the probability that this week's production will be at least $1000$?

> Let $X$ be the number of items that will be produced in a week.
>
>$P\CB{X \geq 1000} \leq\ffrac{\EE{X}} {1000} = \ffrac{500} {1000} = 0.5$

If the variance is also given with value $100$, then what's the probability that this week's production will be between $400$ and $600$?

>$P\CB{|X-500| \geq 100} \leq \ffrac{\sigma^2} {100^2} = \ffrac{1} {100}$, hence, $P\CB{400 < X < 600} = 1 - \ffrac{1} {100} = \ffrac{99} {100}$.

***

$Theorem$ ***Strong Law of Large Numbers***

Let $X_1,X_2,\dots$ be a sequence of *independent* $r.v.$ having a common distribution, and let $\EE{X_i} = \mu$. Then **with probability** $1$ (later will be shortened as $wp1$).

$$\lim_{n \to \infty}\ffrac{X_1 + X_2 + \cdots + X_n} {n} = \mu$$

$Theorem$ ***Central Limit Theorem***

Let $X_1,X_2,\dots$ be a sequence of $i.i.d.$ $r.v.$, each with mean $\mu$ and variance $\sigma^2$. Then 

$$\lim_{n \to \infty} P\CB{\ffrac{X_1 + X_2 + \cdots + X_n - n \mu} {\sigma \sqrt{n}} \leq a} = \ffrac{1} {\sqrt{2 \pi}} \int_{-\infty}^{a} e^{-x^2/2} \;\dd{x}$$

$Remark$

This holds for *any* distribution of the $X_i$s! Herein lies its power! Say an **binomially** distributed $r.v.$ with parameters $n$ and $p$, $X$. Then $X$ can be seen as the sum of $n$ independent **Bernoulli** $r.v.$s, each with parameter $p$. Hence the distribution

$$\ffrac{X- \EE{X}} {\sqrt{\Var{X}}} = \ffrac{X - np} {\sqrt{np(1-p)}}$$

approaches the **standard normal** distribution as $n$ approaches $\infty$. And this normal approximation will be generally great for values of $n$ satisfying $np(1-p) \geq 10$. See the next example~

**e.g.** From **Binomial** to **Normal**

$X$ is the number of times that a fair coin, flipped $40$ times, lands *heads*. What's the probability that $X = 20$? How's the normal approximation comparing to the exact solution?

> How to approximate a discrete $r.v.$ using a continuous $r.v.$? Here's the *trick*.
>
>$$
\begin{align}
P\CB{X=20} &= P\CB{19.5 < X < 20.5} \\
&= P\CB{\ffrac{19.5-20} {\sqrt{10}} < \ffrac{X - 20} {\sqrt{10}} < \ffrac{20.5 - 20} {\sqrt{10}}} \\[0.6em]
&= P\CB{-0.16 < Z < 0.16} \\[0.7em]
& \mathbf{\approx} \Phi(0.16) - \Phi(-0.16)
\end{align}$$

$Remark$

Here, $\Phi(z)$ is the probability that the **standard normal** is less than $z$ and is given by

$$\Phi(z) = \ffrac{1} {\sqrt{2\pi}} \int_{-\infty}^{z} e^{-x^2/2}\;\dd{x} $$

>Thus by the symmetry of the **standard normal** distribution: $\Phi(-0.16) = P\CB{\N{0,1} > 0.16} = 1 - \Phi(0.16)$, where $\N{0,1}$ is a **standard normal** $r.v.$. Hence the answer is

>$$P\CB{X = 20} \approx 2\Phi(0.16) - 1 = 0.1272$$

>Then, the exact result is $P\CB{X=20} = \d{\binom{40} {20}\left(\frac{1} {2}\right)^{40}} = 0.1268$

The following will be a heuristic proof of the ***CLT***, central limit theorem. We first suppose that $X_i$ have mean $0$ and variance $1$, and their common mgf is $\EE{e^{tX}}$. Then the mgf of $\ffrac{\sum X_i} {\sqrt{n}}$ is:

$$\begin{align}
\EE{\exp\CB{t\cdot\left( \ffrac{X_1+\cdots+X_n} {\sqrt{n}} \right)}} &= \EE{e^{tX_1/\sqrt{n}} \cdots e^{tX_n/ \sqrt{n}}} \\
&= \left( \EE{e^{tX/\sqrt{n}}} \right)^n
\end{align}$$

From the **Taylor series expnsion** of $e^y$, for large $n$, we have

$$e^{tX/\sqrt{x}} = 1 + \ffrac{tX} {\sqrt{n}} + \ffrac{t^2X^2} {2n} $$

and since $\EE{X} = 0$ and $\EE{X^2} = 1$, we have

$$
\begin{align}
\EE{\exp\CB{t\cdot\left( \ffrac{X_1+\cdots+X_n} {\sqrt{n}} \right)}} &= \left( \EE{e^{tX/\sqrt{n}}} \right)^n \\
&= \left(1 + \ffrac{t^2} {2n}\right)^n \\
&\to e^{t^2/2} \;\;\;\;\text{as } n \to \infty
\end{align}$$

This is the mgf of a **standard normal** $r.v.$ with mean $0$ and variance $1$. So we can already say that the $r.v.$ $\ffrac{X_1 + \cdots + X_n} {\sqrt{n}}$ converges to the **standard normal** distribution function $\Phi$.

And then when $X_i$ have mean $\mu$ and variance $\sigma^2$, we convert them to $\ffrac{X_i - \mu} {\sigma}$ with mean $0$ and $1$. Thus the preceding shows that:

$$P\CB{\ffrac{X_1 - \mu + \cdots +X_n - \mu} {\sigma\sqrt{n}} \leq a} \to \Phi(a)$$

which proves the **CLT**.

# Stochastic Processes

A ***stochastic process*** $\CB{X(t), t \in T}$ is a *collection* of $r.v.$. The index $t$ is often interpreted as ***time*** and as a result, we refer to $X(t)$ as the ***state*** of the process at time $t$. And it must be *infinite* elements in it.

The set $T$ is the ***index set*** of the process. When $T$ is countable set, the stochastic process is said to be a ***discrete-time process***. And if $T$ is an interval of the real line, the stochastic process is said to be a ***continuous-time process***.

The ***state space*** of a stochastic process is the set of *ALL possible values* that the $r.v.$ $X(t)$ can assume.

- State Space and Time Parameter are both Discrete (Random Walk)
- State Space is continuous and Time Parameter is Discrete (Common)
- State Space is Discrete and Time Parameter is continuous (Poisson Process)
- State Space and Time Parameter are both Continuous (Brownian Motion Process)

**e.g.**

Consider a partical that moves along a set of $m+1$ nodes, labeled from $0$ to $m$, that are arranged arround a circle. At each step the particle is equally likely to move one position in either the clockwise or counterclockwise direction. That is, if $X_n$ is the position of the particle after its $n\texttt{th}$ step, then

$$P\CB{X_{n+1} = i+1 \mid X_n = i} = P \CB{X_{n+1} = i-1 \mid X_n = i} = \ffrac{1} {2}$$

where we let $i+1=0$ when $i=m$ and $i-1=m$ when $i=0$. Now the particle starts at $0$ and continues to move around according to the preceding rules until all the nodes have been visited. What is the probability that node $i$ is the last one visited?

![](./figs/e.g.2.53.png)

> Consider the first time that the particle is at one of the two neighbors of node $i$ not $0$ as assumed, say, node $i − 1$. Since either node $i$ or $i+1$ has yet been visited, it follows that $i$ will be the last node visited $iff$ $i+1$ is visited before $i$. So that the particle will progress $m − 1$ steps in a specified direction before progressing one step in the other direction. 

>That is, it is equal to the probability that a gambler who starts with one unit, and wins one when a fair coin turns up heads and loses one when it turns up tails, will have his fortune go up by $m − 1$ before he goes broke. Hence, because the preceding $\QQQ$ implies that the probability that node $i$ is the last node visited is the same for all $i$, and because these probabilities must sum to $1$, we obtain

>$$P \CB{ i \text{ is the last node visited}} = 1/ m$$

$Remark$

Consider that gambler again. He going down $n$ before being up $1$ is with probability $\QQQ$ $1/(n+1)$; or equivalently,

$$P\CB{\text{gambler is up }1\text{ before being down }n} = \ffrac{n} {n+1}$$

Then:

$$
\begin{align}
& P\CB{\text{gambler is up }2\text{ before being down }n} \\
=& P\CB{\text{up }2\text{ before down }n \mid \text{up }1\text{ before down }n} \cdot \ffrac{n} {n+1} \\
=& P\CB{\text{up }2\text{ before down }n+1}\cdot \ffrac{n} {n+1} \\
=& \ffrac{n+1} {n+2}\ffrac{n} {n+1} = \ffrac{n} {n+2}
\end{align}$$

Repeating this argument yields that

$$P\CB{\text{gambler is up }k\text{ before being down }n} = \ffrac{n} {n+k}$$