## Joint mass function

:::{.callout-note}
Most examples will involve two random variables, but everything can be generalized for more of them.
:::

### Definition
Suppose two discrete random variables $X$ and $Y$ are defined on a common probability space, and can take on values
$x_1, x_2, \dots$ and $y_1, y_2, \dots,$ respectively. The joint probability mass function of them is defined as <br>
$$p(x_i , y_j ) = P{X = x_i , Y = y_j}, i = 1, 2, \dots , j = 1, 2, \dots .$$
This function contains all information about the joint distribution of $X$ and $Y$.<br>
Any joint mass function satisfies :
$$p(x,y)\ge 0, \; \forall x,y \in\mathbb{R}$$
$$\sum_{i,j}p(x_i,j_j)=1$$
Any function with above properties is a joint probability mass function.

### Marginal mass functions 
Marginal mass functions are <br>
$$p_X(x_i):=P\{X=x_i\}$$ 
and 
$$p_Y(y_i):=P\{Y=y_i\}$$
Also
$$p_X(x_i)=\sum_jp(x_i,y_j)$$
and
$$p_Y(y_j)=\sum_ip(x_i,y_j)$$ 

### Example
An urn has $3$ red, $4$ white, $5$ black balls. Drawing $3$ at once, let $X$ be the number of red, $Y$ the number of white balls drawn.
The joint mass function is:

| $Y \downarrow$  $X \rightarrow$ | 0 | 1 | 2 | 3 |$p_Y(\cdot)$
|---------|:-----|------:|:------:|:------:|:------:|
| __0__    | $\displaystyle \frac{\binom{5}{3}}{\binom{12}{3}}$   |   $\displaystyle \frac{\binom{3}{1}\binom{5}{2}}{\binom{12}{3}}$  |    $\displaystyle \frac{\binom{3}{2}\binom{5}{1}}{\binom{12}{3}}$   |  $\displaystyle \frac{\binom{3}{3}\binom{5}{0}}{\binom{12}{3}}$ |  $\displaystyle \frac{\binom{8}{3}}{\binom{12}{3}}$
| __1__     |  $\displaystyle \frac{\binom{4}{1}\binom{5}{2}}{\binom{12}{3}}$   |    $\displaystyle \frac{\binom{4}{1}\binom{3}{1}\binom{5}{1}}{\binom{12}{3}}$  |   $\displaystyle \frac{\binom{4}{1}\binom{3}{2}}{\binom{12}{3}}$    | 0| $\displaystyle \frac{\binom{4}{1}\binom{8}{2}}{\binom{12}{3}}$ 
| __2__       |  $\displaystyle \frac{\binom{4}{2}\binom{5}{1}}{\binom{12}{3}}$    |      $\displaystyle \frac{\binom{4}{2}\binom{3}{1}}{\binom{12}{3}}$  |   0    |0| $\displaystyle \frac{\binom{4}{2}\binom{8}{1}}{\binom{12}{3}}$ 
| __3__       |  $\displaystyle \frac{\binom{4}{3}}{\binom{12}{3}}$     |     0 |   0    |  0|   $\displaystyle \frac{\binom{4}{3}}{\binom{12}{3}}$ 
| $p_X(\cdot)$       |   $\displaystyle \frac{\binom{9}{3}}{\binom{12}{3}}$    |      $\displaystyle \frac{\binom{3}{1}\binom{9}{2}}{\binom{12}{3}}$  |    $\displaystyle \frac{\binom{3}{2}\binom{9}{1}}{\binom{12}{3}}$   |  $\displaystyle \frac{\binom{3}{3}}{\binom{12}{3}}$ | 1

: Table shows Joint probability distribution. 



## Conditional mass function

### Definition 

suppose $p_Y(y_j)>0$. The conditional mass function of $X$ given $Y=y_j$ is defined by 
$$p_{X \mid Y}(x \mid y_i):= P\{X=x \mid Y=y_j\}=\frac{\overbrace{p(x,y_i)}^{\text{joint}} }{\underbrace{p_Y(y_i)}_{\text{marginal}} }$$
As the conditional probability was a proper probability, this is a proper mass function: $\forall x,y_i$
$$p_{X \mid Y}(x \mid y_j) \ge 0, \qquad \sum_i p_{X|Y}(x_i \mid y_j)=1$$

#### Example

Let $X$ and $Y$ have joint mass function 

| $X \downarrow$  $Y \rightarrow$| 0 | 1 | 
|---------|:-----|------:|
| __0__      | 0.4   |  0.2 |
| __1__     | 0.1  | 0.3 |


: joint distribution

The conditional distribution of $X$ given $Y=0$ is
$$p_{X\mid Y}(0 \mid 0)= \frac{p(0,0)}{p_Y(0)}= \frac{p(0,0)}{p(0,0)+p(1,0)}=\frac{0.4}{0.4+0.1}=\frac{4}{5}$$ 
$$p_{X\mid Y}(1 \mid 0)= \frac{p(1,0)}{p_Y(0)}= \frac{p(1,0)}{p(0,0)+p(1,0)}=\frac{0.1}{0.4+0.1}=\frac{1}{5}$$ 

## Independent Random Variable

### Definition

Random variables $X$ and $Y$ are independent, if events formulated with them are so, That is, if for every $A,B\subseteq \mathbb{R}$
$$P\{X\in A,Y\in B\}=P\{X\in A\} \cdot P \{ Y \in B\} $$

:::{.callout-tip}
The abbreviation i.i.d. is used for independent and identically distributed random variables.
:::

Two random variables $X$ and $Y$ are independent if and only if their joint mass function factorizes into the product of the marginals :
$$p(x_i,y_i)=p_X(x_i)\cdot p_Y(y_i), \qquad \forall x_i,y_i$$  

### Discrete Convolution

Let $X$ and $Y$ be independent, integer valued random variables with respective mass functions $p_X$ and $p_Y$ . Then
$$p_{X+Y}(k) = \sum_{i=-\infty}^\infty p_X(k − i) \cdot p_Y (i), \qquad (\forall k \in \mathbb{Z})$$
This formula is called discrete convolution of the mass function $p_X$ and $p_Y$

- Proof 
  $$\begin{align*}
  p_{X+Y}(K)&=P\{X+Y=K\}\\
  &=\sum_{i=-\infty}^\infty P\{X=k-i,Y=i\}\\
  &=\sum_{i=-\infty}^\infty P_X(k-i)\cdot p_Y(i)
  \end{align*}$$

Let $X \sim \textrm{Poi}(\lambda)$ and $Y \sim \textrm{Poi}(\mu)$ be independent than: <br>
$X+Y \sim \textrm{Poi}(\lambda+\mu)$

- proof
  $$\begin{align*}
  p_{X+Y}(K)&=\sum_{i=-\infty}^\infty P_X(k-i)\cdot p_Y(i) \\
  &= \sum_{i=-\infty}^\infty \frac{\lambda^{(k-i)}}{(k-i)!}e^{-\lambda}\cdot \frac{\mu^i}{i!}e^{-\mu} \\
  &= e^{-\lambda - \mu}\frac{1}{k!}\sum_{i=-\infty}^\infty \frac{k!}{(k-i)!\cdot i!}\lambda^{(k-i)} \cdot \mu^i \\
  &= e^{-\lambda - \mu}\frac{1}{k!}\sum_{i=-\infty}^\infty \binom{k}{i} \lambda^{(k-i)} \cdot \mu^i \\
  &= e^{-(\lambda + \mu)}\frac{1}{k!}(\lambda + \mu)^k \\
  &=\textrm{Poi}(\lambda+\mu)
  \end{align*}$$

Let $X,Y$ be i.i.d. $\textrm{Geom}(p)$ variables then: <br>
$X+Y$ is not geometric.

- proof
  $$\begin{align*}
  p_{X+Y}(K)&=\sum_{i=-\infty}^\infty P_X(k-i)\cdot p_Y(i) \\
  &=\sum_{i=1}^{k-1}(1-p)^{k-i-1}p\cdot (1-p)^{i-1}p \\
  &=(k-1)(1-p)^{k-2}p^2 
  \end{align*}$$  
  Hence $X+Y$ is not Geometric, _it's actually called Negative Binomial._

Let $X \sim \textrm{Binom}(n,p)$ and $Y \sim \textrm{Binom}(m,p)$ be independent (__notice the same__ $\mathbf p$) then :<br>
$X+Y \sim \textrm{Binom}(n+m,p)$

- proof
  $$\begin{align*}
  p_{X+Y}(K)&=\sum_{i=-\infty}^\infty P_X(k-i)\cdot p_Y(i) \\
  &=\sum_{i=0}^k \binom{n}{k-i}p^{k-i}(1-p)^{n-k+i}\cdot \binom{m}{i}p^i(1-p)^{m-i} \\
  &=p^k(1-p)^{m+n-k}\sum_{i=0}^k \binom{n}{k-i} \binom{m}{i}  \\
  &=\binom{m+n}{k}  p^k(1-p)^{m+n-k} \\
  &=\textrm{Binom}(n+m,p)
  \end{align*}$$  
  To prove above equation we used the fact that $\sum_{i=0}^k \binom{n}{k-i} \binom{m}{i}=\binom{m+n}{k}$

### Continuous convolution

Suppose $X$ and $Y$ are independent continuous random variables with respective densities $f_X$ and $f_Y$. Then their sum is a continuous random variable with density
$$f_{X+Y}(a)=\int_{- \infty} ^ \infty f_X(a-y)\cdot f_Y(y)dy, \qquad (\forall a \in \mathbb{R})$$

### Gamma distribution

Let $X$ and $Y$ be i.i.d. $\mathrm{Exp}(\lambda),$ and the density of their sum $(a\ge 0)$

$$\begin{align*}
f_{X+Y}(a)&=\int_{-\infty}^\infty f_X(a-y) \cdot f_Y(y)dy \\
&=\int_{0}^a \lambda e^{-\lambda (a-y)}\cdot \lambda e^{-\lambda y}dy \\
&=\lambda^2 e^{-\lambda a}\cdot y \Big |_0^a \\
&= \lambda^2 a \cdot e^{-\lambda a}
\end{align*}$$
This density is called $\mathrm{Gamma}(2,\lambda)$

<br><br>

Let $X \sim \mathrm{Exp}(\lambda)$ and $Y \sim \mathrm{Gamma}(2,\lambda)$ be i.i.d. again

$$\begin{align*}
f_{X+Y}(a)&=\int_{-\infty}^\infty f_X(a-y) \cdot f_Y(y)dy \\
&=\int_{0}^a \lambda e^{-\lambda (a-y)}\cdot \lambda^2 y \cdot e^{-\lambda y}dy \\
&=\lambda^3 e^{-\lambda a}\cdot \frac{y^2}{2} \Big |_0^a \\
&= \frac{\lambda^3 a^2 \cdot e^{-\lambda a}}{2}
\end{align*}$$
This density is called $\mathrm{Gamma}(3,\lambda)$

<br><br>

Let $X \sim \mathrm{Exp}(\lambda)$ and $Y \sim \mathrm{Gamma}(3,\lambda)$ be i.i.d. again

$$\begin{align*}
f_{X+Y}(a)&=\int_{-\infty}^\infty f_X(a-y) \cdot f_Y(y)dy \\
&=\int_{0}^a \lambda e^{-\lambda (a-y)}\cdot\frac{\lambda^3 y^2 \cdot e^{-\lambda y}}{2}dy \\
&=\lambda^3 e^{-\lambda a}\cdot \frac{y^3}{2\cdot 3} \Big |_0^a \\
&= \frac{\lambda^4 a^3 \cdot e^{-\lambda a}}{2\cdot 3} \\
&= \frac{\lambda^4 a^3 \cdot e^{-\lambda a}}{3!}
\end{align*}$$
This density is called $\mathrm{Gamma}(4,\lambda)$


<br><br><br>
The convolution of $n$ i.i.d. $\mathrm{Exp}(\lambda)$ distributions results in the $\mathrm{Gamma}(n,\lambda)$ density:

$$f(X)=\frac{\lambda^n X^{n-1} \cdot e^{-\lambda X}}{(n-1)!},\qquad \forall X\ge 0 \tag{1}$$
and zero otherwise.
<br><br><br><br>
This is the density of the sum of n i.i.d. $\mathrm{Exp}(\lambda)$ random variables. In particular, $\mathrm{Gamma}(1,\lambda) \equiv  \mathrm{Exp}(\lambda)$
<br><br><br><br>

Now if we integrate $f(X)$ it should equal to $1$
$$\begin{align*}
\int_{-\infty}^\infty f(x) &= \int_{-\infty}^\infty \frac{\lambda^n X^{n-1} \cdot e^{-\lambda X}}{(n-1)!} dx \\
&=  \int_{-\infty}^\infty \frac{ (\lambda X)^{n-1} \cdot e^{-\lambda X}}{(n-1)!} \lambda dx \\
&=1
\end{align*}$$
Now we write $Z= \lambda X, \;dZ=\lambda dX$,from above equation we get:
$$(n-1)! = \int_{-\infty}^\infty  (Z)^{n-1} \cdot e^{-Z}  dZ $$

The Gamma function is defined for every $\alpha > 0$ real numbers, by
$$\Gamma(\alpha) :=\int_{-\infty}^\infty  Z^{\alpha-1} \cdot e^{-Z}  dZ$$
In particular, $\Gamma(n)=(n-1)!$ for positive integer $n$
<br><br>
Using equation $(1)$ we can write Gamma distribution 

$$f(X)=\frac{\lambda^n X^{n-1} \cdot e^{-\lambda X}}{\Gamma(n)}, \qquad \forall X\ge 0$$
and zero otherwise.

<br><br>
If $X \sim \mathrm{Gamma}(\alpha , \lambda),$ then

$$EX= \frac{\alpha}{\lambda}, \qquad \mathrm{Var}X=\frac{\alpha}{\lambda ^2}$$

## Expectation, covariance

### Expectation

Expectation is defined as 
$$EX = \sum_i X_i\cdot p(X_i), \qquad EX=\int_{-\infty}^\infty Xf(X)dX$$

#### Properties of Expectation

- Simple monotonicity property :
  If $a \le X \le b$ then, $a \le EX \le b$ <br>
  Proof:<br>
  $$\begin{align*}
  & a =a\cdot 1 =a \sum_i p(X_i) \\
  & \le\sum_i X_i p(X_i) \le \\
  & b \sum_i p(X_i) =b \cdot 1=b
  \end{align*}$$
- Expectation of _functions of variables_<br>
  Let $X$ and $Y$ be the random variables and $g: \mathbb{R}\times \mathbb{R} \rightarrow \mathbb{R}$ function then 
  $$\mathbf{E}g(X,Y)=\sum_{i,j}g(X_i,Y_j)\cdot p(X_i,Y_j)$$
- Expectation of sums and differences:<br>
  - Let $X$ and $Y$ be the random variables then $E(X+Y)=EX+EY$ and $E(X-Y)=EX-EY$<br>
    Proof:
    $$\begin{align*}
    E(X\pm Y) &= \sum _{i,j}(X_i \pm Y_j)\cdot P(X_i, Y_j) \\
    &=\sum_i \sum_j X_i \cdot P(X_i, Y_j) + \sum_i \sum_j Y_j \cdot P(X_i, Y_j) \\
    &=\sum_i X_i\cdot p_X(X_i) \pm \sum_j Y_j\cdot p_Y(Y_j)\\
    &=EX \pm EY
    \end{align*}$$
  - Let $X$ and $Y$ be the random variable such that $X \le Y$, then $EX \le EY$<br>
    Proof: <br>
    The difference $Y-X$ is non negative and difference of it's expectation is also non negative
    $$\begin{align*}
    & &E(Y-X) &\ge0 \\
    & \Rightarrow & EY-EX &\ge 0 \\
    & \Rightarrow & -EX &\ge -EY \\
    &\Rightarrow & EX &\le EY \\
    \end{align*}$$

  - __Example__ (sample mean) <br>
    Let $X_1,X_2, \dots , X_n$ be identically distributed random variables with mean $\mu$. Their sample mean is 
    $$\bar{X}:=\frac{1}{n}\sum_{i=1}^nX_i$$
    It's expectation is 
    $$\begin{align*}
    E\bar{X}&=E\left(\frac{1}{n}\sum_{i=1}^nX_i\right)\\
    &=\frac{1}{n}E\left(\sum_{i=1}^nX_i\right)\\
    &=\frac{1}{n}\sum_{i=1}^nE\left(X_i\right)\\
    &=\frac{1}{n}\sum_{i=1}^n\mu\\
    &=\mu
    \end{align*}$$
  

### Covariance

__Independence__<br>
Let $X$ and $Y$ be independent random variables, and $g,h$ be functions. Then
$$\mathbf{E}(g(X)\cdot h(Y))= \mathbf{E}g(X)\cdot \mathbf{E}h(Y) \tag{2}$$
proof: <br>
$$\begin{align*}
\mathbf{E}(g(X)\cdot h(Y))&=\int \int g(X)h(Y)p_{XY}(X,Y)dXdY \\
&=\int \int g(X)h(Y)p_{X}(X)p_{Y}(Y)dXdY \\
&=\int  g(X)p_{X}(X)dX\int h(Y)p_{Y}(Y)dXdY \\
&=\mathbf{E}g(X)\cdot \mathbf{E}h(Y)
\end{align*}$$
Here we used the fact that the joint probability distribution factorizes $(p_{XY}(x,y)=p_{X}(x)p_{Y}(y))$ as $X$ and $Y$ are independent 

<br><br>
__Covariance__ <br>
The covariance of the random variable $X$ and $Y$ is 
$$\mathbf{Cov}(X,Y)=\mathbf{E}[(X-\mathbf{E}X)\cdot (Y-\mathbf{E}Y)]$$ 
Another form of the above formula
$$\mathbf{Cov}(X,Y)=\mathbf{E}(XY)-\mathbf{E}X\mathbf{E}Y \tag{3}$$
Proof:
$$\begin{align*}
\mathbf{E}[(X-\mathbf{E}X)\cdot (Y-\mathbf{E}Y)]&=\mathbf{E}[XY-Y\mathbf{E}X-X\mathbf{E}Y+\mathbf{E}X\mathbf{E}Y]\\
&=\mathbf{E}(XY)-\mathbf{E}Y\mathbf{E}X-\mathbf{E}X\mathbf{E}Y+\mathbf{E}X\mathbf{E}Y \\
&= \mathbf{E}(XY)-\mathbf{E}X\mathbf{E}Y
\end{align*}$$

Now we know that
$$\mathbf{Cov}(X,Y)=\mathbf{E}(XY)-\mathbf{E}X\mathbf{E}Y$$
Using equation $(2)$ and $(3)$ we can say for independent random variables $X$ and $Y,\;$ $\mathbf{Cov}(X,Y)=0,$ as  $\mathbf{E}(XY)=\mathbf{E}X\mathbf{E}Y$
$$\mathbf{Cov}(X,Y)=\mathbf{E}(XY)-\mathbf{E}X\mathbf{E}Y=0$$
But this is not true other way around. _covariance zero doesn't necessarily mean random variables are independent_. 

:::{.callout-tip}
- If random variables $X$ and $Y$ are independent, then $\mathbf{Cov}(X,Y)=0$ <br>
- But if $\mathbf{Cov}(X,Y)=0,$ doesn't necessarily mean $X$ and $Y$ are independent. 
:::

<br>

- Properties  <br>
  - Covariance is positive semidefinite: $\mathbf{Cov}(X,X)=\mathbf{Var}(X)\ge0,$
  - Covariance is symmetric: $\mathbf{Cov}(X,Y)=\mathbf{Cov}(Y,X)$
  - Almost bilinear:<br>
    Fix $a_i , b, c_j , d$ real numbers. Covariance is 
    $$\mathbf{Cov}\left(\sum_ia_iX_i+b,\sum_jc_jY_j+d \right)=\sum_{i,j}a_ic_j\mathbf{Cov}(X_i,Y_j)$$

### Variance

Let $X_1, X_2, \dots , X_n$ be random variables. Then
$$\mathbf{Var}\sum_{i=1}^n X_i =\sum_{i=1}^n \mathbf{Var}\left(X_i \right)+2\sum_{1\le i\le j\le n} \mathbf{Cov}\left(X_i ,X_j \right)$$
In particular, variances of __independent__ random variables are additive.
<br><br>
Proof:
$$\begin{align*}{}
\mathbf{Var}\sum_{i=1}^n X_i &=\mathbf{Cov}\left(\sum_{i=1}^n X_i ,\sum_{j=1}^n X_j \right)\\
&=\sum_{i=1,j=1}^n \mathbf{Cov}\left(X_i ,X_j \right)\\
&=\sum_{i=1}^n \mathbf{Cov}\left(X_i ,X_i \right)+\sum_{i\not= j} \mathbf{Cov}\left(X_i ,X_j \right)\\
&=\sum_{i=1}^n \mathbf{Var}\left(X_i \right)+\sum_{i<j} \mathbf{Cov}\left(X_i ,X_j \right)+\sum_{i>j} \mathbf{Cov}\left(X_i ,X_j \right)\\
&=\sum_{i=1}^n \mathbf{Var}\left(X_i \right)+2\sum_{1\le i\le j\le n} \mathbf{Cov}\left(X_i ,X_j \right)
\end{align*}$$

Notice that for __independent__ variables.
$$\begin{align*}{}
\mathbf{Var}\left(X-Y\right)&=\mathbf{Var}\left(X+-\left(Y\right)\right)\\
&=\mathbf{Var}\left(X\right)+\mathbf{Var}\left(-Y\right)+2\mathbf{Cov}\left(X,-Y\right)\\
&=\mathbf{Var}\left(X\right)+\mathbf{Var}\left(Y\right)-2\mathbf{Cov}\left(X,Y\right)\\
&=\mathbf{Var}\left(X\right)+\mathbf{Var}\left(Y\right)
\end{align*}$$

Above used the  fact that <br>
$\mathbf{Var}\left(X+Y\right)=\mathbf{Var}\left(X\right)+\mathbf{Var}\left(Y\right)+2\mathbf{Cov}\left(X,Y\right)$, Here is the proof:

$$\begin{align*}{}
\mathbf{Var}\left(X+Y\right)&=\mathit{\mathbf{E}}\left\lbrack {\left(X+Y\right)}^2 \right\rbrack -{\left(\mathit{\mathbf{E}}\left\lbrack X+Y\right\rbrack \right)}^2 \\
&=\mathit{\mathbf{E}}\left\lbrack X^2 +Y^2 +2\mathrm{XY}\right\rbrack -{\left(\mathit{\mathbf{E}}\left\lbrack X\right\rbrack +\mathit{\mathbf{E}}\left\lbrack Y\right\rbrack \right)}^2 \\
&=\mathit{\mathbf{E}}\left\lbrack X^2 \right\rbrack +\mathit{\mathbf{E}}\left\lbrack Y^2 \right\rbrack +2\mathit{\mathbf{E}}\left\lbrack X Y\right\rbrack -{\left(\mathit{\mathbf{E}}\left\lbrack X\right\rbrack \right)}^2 -{\left(\mathit{\mathbf{E}}\left\lbrack Y\right\rbrack \right)}^2 -2\mathit{\mathbf{E}}\left\lbrack X\right\rbrack \mathit{\mathbf{E}}\left\lbrack Y\right\rbrack \\
&=\left(\mathit{\mathbf{E}}\left\lbrack X^2 \right\rbrack -{\left(\mathit{\mathbf{E}}\left\lbrack X\right\rbrack \right)}^2 \right)+\left(\mathit{\mathbf{E}}\left\lbrack Y^2 \right\rbrack -{\left(\mathit{\mathbf{E}}\left\lbrack Y\right\rbrack \right)}^2 \right)+2\left(\mathit{\mathbf{E}}\left\lbrack XY\right\rbrack -\mathit{\mathbf{E}}\left\lbrack X\right\rbrack \mathit{\mathbf{E}}\left\lbrack Y\right\rbrack \right)\\
&=\mathbf{Var}\left(X\right)+\mathbf{Var}\left(Y\right)+2\mathbf{Cov}\left(X,Y\right)
\end{align*}$$

- __Example__ (variance of the sample mean)<br>
  Suppose that $X_j$'s are i.i.d. each with variance $\sigma^2$, The sample mean is <br>
  $$\bar{X}=\frac{1}{n}\sum_{i=1}^nX_i$$
  It's Variance is 
  $$\begin{align*}{}
  \mathbf{Var}\left(\bar{X} \right)&=\mathbf{Var}\left(\frac{1}{n}\sum_{i=1}^n X_i \right)\\
  &=\frac{1}{n^2 }\mathbf{Var}\left(\sum_{i=1}^n X_i \right)\\
  &=\frac{1}{n^2 }\left(\sum_{i=1}^n \mathbf{Var}\left(X_i \right)\right)\\
  &=\frac{1}{n^2 }n\sigma^2 \\
  &=\frac{\sigma^2 }{n}
  \end{align*}$$
  _Variance of the sample mean decreases with $n$, that’s why we like sample averages._<br>
  
  

- __Example__ (unbiased sample variance) <br>
  Suppose we are given the values of $X_1,X_2,\dots X_n$ of an i.i.d. sequence of random variables with mean $\mu$ and variance $\sigma^2$.<br>
  We know that the sample mean $\bar{X}$ <br>
  has mean $\mu$ and <br>
  small variance $\frac{\sigma^2}{n}$ <br>
  Therefore it serves as a good estimator for the value of $\mu$. But what should we use to estimate the variance $\sigma^2$? This quantity is the unbiased sample variance:
  $$S^2 :=\frac{1}{n-1}\sum_{i=1}^n {\left(X-\bar{X} \right)}^2$$
  Now we compute the expected value of sample variance
  $$\begin{align*}{}
  {\mathrm{ES}}^2 &=\mathit{\mathbf{E}}\left\lbrack \frac{1}{n-1}\sum_{i=1}^n {\left(X-\bar{X} \right)}^2 \right\rbrack \\
  &=\frac{1}{n-1}\sum_{i=1}^n \mathit{\mathbf{E}}\left\lbrack {\left(X-\bar{X} \right)}^2 \right\rbrack \\
  &=\frac{n}{n-1}\mathit{\mathbf{E}}\left\lbrack {\left(X-\bar{X} \right)}^2 \right\rbrack 
  \end{align*}$$
  we get
  $${\mathrm{ES}}^2=\frac{n}{n-1}\mathit{\mathbf{E}}\left\lbrack {\left(X-\bar{X} \right)}^2 \right\rbrack \tag{4}$$
  Next notice that
  $$\mathit{\mathbf{E}}\left\lbrack X-\bar{X} \right\rbrack =\mathit{\mathbf{E}}\left\lbrack X\right\rbrack -\mathit{\mathbf{E}}\left\lbrack \bar{X} \right\rbrack =\mu -\mu =0$$
  also
  $$\begin{align*}{}
  \mathbf{Var}\left(X-\bar{X} \right)&=\mathit{\mathbf{E}}\left\lbrack {\left(X-\bar{X} \right)}^2 \right\rbrack -{\left(\mathit{\mathbf{E}}\left\lbrack X-\bar{X} \right\rbrack \right)}^2 \\
  \mathbf{Var}\left(X-\bar{X} \right)&=\mathit{\mathbf{E}}\left\lbrack {\left(X-\bar{X} \right)}^2 \right\rbrack \\
  \Rightarrow \mathit{\mathbf{E}}\left\lbrack {\left(X-\bar{X} \right)}^2 \right\rbrack &=\mathbf{Var}\left(X-\bar{X} \right)\\
  &=\mathbf{Var}\left(X\right)+\mathbf{Var}\left(\bar{X} \right)-2\mathbf{Cov}\left(X,\bar{X} \right)
  \end{align*}$$
  Now , 
  $$\begin{align*}{}
  \mathbf{Cov}\left(X,\bar{X} \right)&=\mathbf{Cov}\left(X,\frac{1}{n}\sum_j X_j \right)\\
  &=\frac{1}{n}\sum_j \mathbf{Cov}\left(X,X_j \right)\\
  &=\frac{1}{n}\mathbf{Cov}\left(X,X\right)\\
  &=\frac{\sigma^2 }{n}
  \end{align*}$$
  we have now
  $$\mathbf{Var}\left(X\right)=\sigma^2 ,\mathbf{Var}\left(\bar{X} \right)=\frac{\sigma^2 }{n},\mathbf{Cov}\left(X,\bar{X} \right)=\frac{\sigma^2 }{n}$$    
  Putting it all back in equation $(4)$ we get 
  $$\begin{align*}{}
  \mathit{\mathbf{E}}\left\lbrack S^2 \right\rbrack &=\frac{n}{n-1}\left(\sigma^2 +\frac{\sigma^2 }{n}-2\frac{\sigma^2 }{n}\right)\\
  &=\frac{n}{n-1}\left(\sigma^2 -\frac{\sigma^2 }{n}\right)\\
  &=\frac{n}{n-1}\frac{\left(n-1\right)\sigma^2 }{n}\\
  &=\sigma^2 
  \end{align*}$$

- __Example__ (Binomal distribution)<br>
  Suppose that $n$ independent trails are made, each succeeding with probability $p$. Define $X_i$ as the indicator of success in the $i^{\text{th}}$ trail, $i=1,2,\dots ,n$. Then
  $$X=\sum_{i=1}^n X_i$$
  Counts the total number of success, therefore $X \sim \mathrm{Binom}(n,p)$. It's  variance is

  $$\begin{align*}{}
  \mathbf{Var}\left(X\right)&=\mathbf{Var}\left(\sum_{i=1}^n X_i \right)\\
  &=\sum_{i=1}^n \mathbf{Var}\left(X_i \right)\\
  &=\sum_{i=1}^n p\left(1-p\right)\\
  &=n\cdot p\left(1-p\right)
  \end{align*}$$

- __Example__ (Gamma distribution) <br>
  Let $n$ be a positive integer, $\lambda > 0$ real, and $X \sim \mathrm{Gamma}(n,\lambda)$, Then we know <br>
  $$X\overset{d}{=} \sum_{i=1}^n X_i$$
  where $X_1,X_2,\dots, X_n$ are i.i.d. $\mathrm{Exp}(\lambda)$. Therefore
  $$\begin{align*}{}
  \mathbf{Var}\left(X\right)&=\mathbf{Var}\left(\sum_{i=1}^n X_i \right)\\
  &=\sum_{i=1}^n \mathbf{Var}\left(X_i \right)\\
  &=\sum_{i=1}^n \frac{1}{\lambda^2 }\\
  &=\frac{n}{\lambda^2 }
  \end{align*}$$
  Here $\overset{d}{=}$ means “equal in distribution”.

### Cauchy-Schwarz inequality 

For every $X$ and $Y$
$$|\mathit{\mathbf{E}}\left\lbrack \mathrm{XY}\right\rbrack |\le \sqrt{\mathit{\mathbf{E}}\left\lbrack X^2 \right\rbrack }\cdot \sqrt{\mathit{\mathbf{E}}\left\lbrack Y^2 \right\rbrack }$$
with equality iff $Y=\text{Const}. \cdot X \text{a.s.}$ 

Proof:

$$\begin{align*}{}
0 &\le {\mathit{\mathbf{E}}\left\lbrack \frac{X}{\sqrt{\mathit{\mathbf{E}}\left\lbrack X^2 \right\rbrack }}\pm \frac{Y}{\sqrt{\mathit{\mathbf{E}}\left\lbrack Y^2 \right\rbrack }}\right\rbrack }^2 \\
&=\mathit{\mathbf{E}}\left\lbrack \frac{X^2 }{\mathit{\mathbf{E}}\left\lbrack X^2 \right\rbrack }\right\rbrack +\mathit{\mathbf{E}}\left\lbrack \frac{Y^2 }{\mathit{\mathbf{E}}\left\lbrack Y^2 \right\rbrack }\right\rbrack \pm 2\mathit{\mathbf{E}}\left\lbrack \frac{\mathrm{XY}}{\sqrt{\mathit{\mathbf{E}}\left\lbrack X^2 \right\rbrack \;}\sqrt{\mathit{\mathbf{E}}\left\lbrack Y^2 \right\rbrack }}\right\rbrack \\
&=2\pm 2\mathit{\mathbf{E}}\left\lbrack \frac{\mathrm{XY}}{\sqrt{\mathit{\mathbf{E}}\left\lbrack X^2 \right\rbrack \;}\sqrt{\mathit{\mathbf{E}}\left\lbrack Y^2 \right\rbrack }}\right\rbrack 
\end{align*}$$

For $-$ case:

$$\begin{align*}{}
&&0 &\le 2-2\mathit{\mathbf{E}}\left\lbrack \frac{\mathrm{XY}}{\sqrt{\mathit{\mathbf{E}}\left\lbrack X^2 \right\rbrack \;}\sqrt{\mathit{\mathbf{E}}\left\lbrack Y^2 \right\rbrack }}\right\rbrack \\
&\Rightarrow & -2 &\le -2\mathit{\mathbf{E}}\left\lbrack \frac{\mathrm{XY}}{\sqrt{\mathit{\mathbf{E}}\left\lbrack X^2 \right\rbrack \;}\sqrt{\mathit{\mathbf{E}}\left\lbrack Y^2 \right\rbrack }}\right\rbrack \\
&\Rightarrow & -\sqrt{\mathit{\mathbf{E}}\left\lbrack X^2 \right\rbrack \;}\sqrt{\mathit{\mathbf{E}}\left\lbrack Y^2 \right\rbrack } & \le -\mathit{\mathbf{E}}\left\lbrack \mathrm{XY}\right\rbrack \\
&\Rightarrow & \sqrt{\mathit{\mathbf{E}}\left\lbrack X^2 \right\rbrack \;}\sqrt{\mathit{\mathbf{E}}\left\lbrack Y^2 \right\rbrack }& \ge \mathit{\mathbf{E}}\left\lbrack \mathrm{XY}\right\rbrack \\
&\Rightarrow & \mathit{\mathbf{E}}\left\lbrack \mathrm{XY}\right\rbrack &\le \sqrt{\mathit{\mathbf{E}}\left\lbrack X^2 \right\rbrack \;}\sqrt{\mathit{\mathbf{E}}\left\lbrack Y^2 \right\rbrack }
\end{align*}$$

For $+$ case:
$$\begin{align*}{}
&&0 &\le 2+2\mathit{\mathbf{E}}\left\lbrack \frac{\mathrm{XY}}{\sqrt{\mathit{\mathbf{E}}\left\lbrack X^2 \right\rbrack \;}\sqrt{\mathit{\mathbf{E}}\left\lbrack Y^2 \right\rbrack }}\right\rbrack \\
&\Rightarrow & -2 &\le +2\mathit{\mathbf{E}}\left\lbrack \frac{\mathrm{XY}}{\sqrt{\mathit{\mathbf{E}}\left\lbrack X^2 \right\rbrack \;}\sqrt{\mathit{\mathbf{E}}\left\lbrack Y^2 \right\rbrack }}\right\rbrack \\
&\Rightarrow & -\sqrt{\mathit{\mathbf{E}}\left\lbrack X^2 \right\rbrack \;}\sqrt{\mathit{\mathbf{E}}\left\lbrack Y^2 \right\rbrack } &\le \mathit{\mathbf{E}}\left\lbrack \mathrm{XY}\right\rbrack \\
&\Rightarrow & \mathit{\mathbf{E}}\left\lbrack \mathrm{XY}\right\rbrack &\ge -\sqrt{\mathit{\mathbf{E}}\left\lbrack X^2 \right\rbrack \;}\sqrt{\mathit{\mathbf{E}}\left\lbrack Y^2 \right\rbrack }
\end{align*}$$

Using Cauchy-Schwarz inequality we can get below important relations:

- $\mathit{\mathbf{E}}\left\lbrack |\mathrm{XY}|\right\rbrack \le \sqrt  {\mathit{\mathbf{E}}\left\lbrack X^2 \right\rbrack }\cdot \sqrt{\mathit{\mathbf{E}}\left\lbrack Y^2 \right\rbrack }$<br>
  Proof:
  
  $$\begin{align*}{}
  \mathit{\mathbf{E}}\left\lbrack |\mathrm{XY}|\right\rbrack &=\mathit{\mathbf{E}}\left\lbrack |X|\cdot |Y|\right\rbrack \\
  &\le \sqrt{\mathit{\mathbf{E}}\left\lbrack {|X|}^2 \right\rbrack }\cdot \sqrt{\mathit{\mathbf{E}}\left\lbrack {|Y|}^2 \right\rbrack }\\
  &=\sqrt{\mathit{\mathbf{E}}\left\lbrack X^2 \right\rbrack }\cdot \sqrt{\mathit{\mathbf{E}}\left\lbrack Y^2 \right\rbrack }
  \end{align*}$$

- $\big|\mathbf{Cov}\left(X,Y\right)\big|\le \mathrm{SD}\;X\cdot \mathrm{SD}\;Y$ <br>
  Proof:
  $$\begin{align*}{}
  \big|\mathbf{Cov}\left(X,Y\right)\big|&=\big|\mathit{\mathbf{E}}\left\lbrack \left(X-\mathit{\mathbf{E}}\left\lbrack X\right\rbrack \right)\left(Y-\mathit{\mathbf{E}}\left\lbrack Y\right\rbrack \right)\right\rbrack \big|\\
  &\le \sqrt{\mathit{\mathbf{E}}\left\lbrack {\left(X-\mathit{\mathbf{E}}\left\lbrack X\right\rbrack \right)}^2 \right\rbrack }\cdot \sqrt{\mathit{\mathbf{E}}\left\lbrack {\left(Y-\mathit{\mathbf{E}}\left\lbrack Y\right\rbrack \right)}^2 \right\rbrack }\\
  &=\mathrm{SD}\;X\cdot \mathrm{SD}\;Y
  \end{align*}$$

## Correlation

The correlation coefficient of random variables $X$ and $Y$ is
$$\rho \left(X,Y\right):=\frac{\mathbf{Cov}\left(X,Y\right)}{\mathrm{SD}\;X\cdot \mathrm{SD}\;Y}$$

- $-1 \le \rho(X,Y) \le 1$<br>
  Proof :
  $$\begin{align*}{}
  &&\big|\mathbf{Cov}\left(X,Y\right)\big| &\le \mathrm{SD}\;X\cdot \mathrm{SD}\;Y\\
  \Rightarrow &&\frac{\big|\mathbf{Cov}\left(X,Y\right)\big|}{\mathrm{SD}\;X\cdot \mathrm{SD}\;Y}&\le 1\\
  \Rightarrow &&-1\le \frac{\mathbf{Cov}\left(X,Y\right)}{\mathrm{SD}\;X\cdot \mathrm{SD}\;Y}&\le \\
  \Rightarrow &&-1 \le \rho(X,Y) &\le 1
  \end{align*}$$
  and the “equality iff” part of Cauchy-Schwarz implies that we have equality iff $Y = aX$, that is, $Y = aX +b$ for some fixed $a, b$.

## Conditional expectation

The conditional expectation of $X$, given $Y = y_j$ is 
$$\mathbf{E}(X \mid Y=y_j):=\sum_i X_i \times \underbrace{ p_{X\mid Y}(x_i\mid y_j)}_{\text{conditional mass function}}$$

- Example :<br>
  Let $X$ and $Y$ be independent $\text{Poi}(\lambda)$ and $\text{Poi}(\mu)$ variables, and $Z = X + Y$. Find the conditional expectation $\mathbf {E}(X \mid Z = k)$.<br>
  Conditional mass function for $0\le i\le k$ is : <br>
  $$p_{X\mid Z} \left(i\mid k\right)=\frac{p\left(i,k\right)}{p_Z \left(k\right)}$$
  where $p(i, k)$ is the joint mass function of $X$ and $Z$ at $(i, k)$.<br>
  Now,
  $$\begin{align*}{}
  p\left(i,k\right)&=P\left\lbrace X=i,Z=k\right\rbrace \\
  &=P\left\lbrace X=i,X+Y=k\right\rbrace \\
  &=P\left\lbrace X=i,Y=k-i\right\rbrace \\
  &=e^{-\lambda } \frac{\lambda^i }{i!}\cdot e^{-\mu } \frac{\mu^{\left(k-i\right)} }{\left(k-i\right)!}
  \end{align*}$$
  We know that 
  $$Z=X+Y\sim \mathrm{Poi}\left(\lambda +\mu \right)$$
  so,
  $$p_Z \left(k\right)=e^{-\left(\lambda +\mu \right)} \frac{{\left(\lambda +\mu \right)}^k }{k!}$$
  Now using these values we can compute conditional mass function
  $$\begin{align*}{}
  p_{X|Z} \left(i|k\right)&=\frac{p\left(i,k\right)}{p_Z \left(k\right)}\\
  &=e^{-\lambda } \frac{\lambda^i }{i!}\cdot e^{-\mu } \frac{\mu^{\left(k-i\right)} }{\left(k-i\right)!}\times \frac{1}{e^{-\left(\lambda +\mu \right)} \frac{{\left(\lambda +\mu \right)}^k }{k!}}\\
  &=e^{-\lambda } \frac{\lambda^i }{i!}\cdot e^{-\mu } \frac{\mu^{\left(k-i\right)} }{\left(k-i\right)!}\times \frac{k!}{e^{-\left(\lambda +\mu \right)} {\left(\lambda +\mu \right)}^k }\\
  &=\frac{k!}{\left(k-i\right)!\cdot i!}e^{-\left(\lambda +\mu \right)} \times \frac{{\lambda^i \mu }^{\left(k-i\right)} }{e^{-\left(\lambda +\mu \right)} {\left(\lambda +\mu \right)}^k }\\
  &={\left({{k}\atop{i}}\right)}\times \frac{{\lambda^i \mu }^{\left(k-i\right)} }{{\left(\lambda +\mu \right)}^k }\\
  &={\left({{k}\atop{i}}\right)}\times \frac{{\lambda^i \mu }^{\left(k-i\right)} }{{\left(\lambda +\mu \right)}^k }\\
  &={\left({{k}\atop{i}}\right)}\times \frac{{\lambda^i \mu }^{\left(k-i\right)} }{{\left(\lambda +\mu \right)}^i {\left(\lambda +\mu \right)}^{k-i} }\\
  &={\left({{k}\atop{i}}\right)}\times {\left(\frac{\lambda }{\lambda +\mu }\right)}^i \times {\left(\frac{\mu }{\lambda +\mu }\right)}^{k-1} \\
  &={\left({{k}\atop{i}}\right)}\times p^i \times \left(1-p\right)^{k-i}
  \end{align*}$$
  where $p=\left(\frac{\lambda }{\lambda +\mu }\right)$<br>
  We conclude $(X \mid Z = k) \sim \text{Binom}(k, p),$<br>
  Therefore 
  $$\mathbf {E}(X \mid Z = k)=kp=k\cdot \frac{\lambda }{\lambda +\mu }$$
  We abbreviate above expression as 
  $$\mathbf {E}(X \mid Z )=Z\cdot \frac{\lambda }{\lambda +\mu }$$
  Notice $\mathbf {E}(X \mid Z )$ dependts only on $Z$ not on $X$

## Tower rule

$\mathit{\mathbf{E}}\left\lbrack \mathit{\mathbf{E}}\left\lbrack X\mid Y\right\rbrack \right\rbrack =\mathit{\mathbf{E}}\left\lbrack X\right\rbrack$<br>
Intution on [stack overflow.](https://math.stackexchange.com/a/41543/738892)<br>
Proof:

$$\begin{align*}{}
\mathit{\mathbf{E}}\left\lbrack \mathit{\mathbf{E}}\left\lbrack X\mid Y\right\rbrack \right\rbrack &=\sum_j E\left\lbrack X\mid Y=y_j \right\rbrack {\cdot p}_Y \left(y_j \right)\\
&=\sum_j \left(\sum_i x_i \cdot p_{x\mid y} \left(x_i ,y_j \right)\right){\cdot p}_Y \left(y_j \right)\\
&=\sum_j \sum_i x\cdot p_{x\mid y} \left(x_i ,y_j \right){\cdot p}_Y \left(y_j \right)\\
&=\sum_j \sum_i x_i p\left(x_i ,y_j \right)\\
&=\sum_i x_i \sum_j p\left(x_i ,y_j \right)\\
&=\sum_i x_i p_X \left(x_i \right)\\
&=\mathit{\mathbf{E}}\left\lbrack X\right\rbrack 
\end{align*}$$

## TODO
Below topics are skipped, complete it later:

- Conditional variance
- Random sums
- Moment generating functions
- Independent sums

## Concentration Bounds

### Markov’s inequality

Let $X$ be a __non-negative__ random variable. Then for all $a > 0$
reals,
$$\mathit{\mathbf{P}}\left\lbrace X\ge a\right\rbrace \le \frac{\mathit{\mathbf{E}}\left\lbrack X\right\rbrack }{a}$$
Of course this inequality is useless for $a ≤ \mathit{\mathbf{E}}\left\lbrack X\right\rbrack $.<br>
Proof: <br>
Let indicator random variable $I$ be defined as 
$$I = \begin{cases}
   1 &\text{if } X\ge a \\
   0 &\text{if } X<a
\end{cases}$$
Then 
$$\begin{align*}{}
\mathit{\mathbf{E}}\left\lbrack I\right\rbrack &=1\cdot \mathit{\mathbf{P}}\left(X\ge a\right)+0\cdot P\left(X<a\right)\\
&=\mathit{\mathbf{P}}\left(X\ge a\right)
\end{align*}$$
Also if we look at Indicator random variable we can say that $I\le \frac{X}{a}$ Because if $X\ge a$ then $\frac{X}{a}\ge 1$, when $0\le X<a$ then $0\le\frac{X}{a}<1$


Hence 

$$\begin{align*}{}
\mathit{\mathbf{E}}\left\lbrack \mathrm{I}\right\rbrack &\le \frac{\mathit{\mathbf{E}}\left\lbrack X\right\rbrack }{a}\\
\Rightarrow P\left\lbrace X\ge a\right\rbrace &\le \frac{\mathit{\mathbf{E}}\left\lbrack X\right\rbrack }{a}
\end{align*}$$

- Example <br>
  A coin is flipped $n$ times what is the probability of getting $90\%$ heads.
  Using Markov’s inequality
  $$P\left\lbrace X\ge 0\ldotp 9\right\rbrace \le \frac{\mathit{\mathbf{E}}\left\lbrack X\right\rbrack }{a}=\frac{\frac{n}{2}}{0\ldotp 9n}=\frac{5}{9}$$

### Chebyshev’s inequality

Let $X$ be a random variable with mean $\mu$ and variance $\sigma^2$ both
finite. Then for all $b > 0$ reals,
$$\mathit{\mathbf{P}}\left\lbrace |X-\mu |\ge b\right\rbrace \le \frac{\mathbf{Var}\left(X\right)}{b^2 }$$
Of course this inequality is useless for $b ≤ \mathbf{SD}X$.<br>
Proof:<br>
Apply Markov’s inequality on the random variable $(X − \mu)^2 \ge 0$

$$\begin{align*}{}
\mathit{\mathbf{P}}\left\lbrace |X-\mu |\ge b\right\rbrace &=\mathit{\mathbf{P}}\left\lbrace {\left(X-\mu \right)}^2 \ge b^2 \right\rbrace \\
&\le \frac{\mathit{\mathbf{E}}\left\lbrack {\left(X-\mu \right)}^2 \right\rbrack }{b^2 }=\frac{\mathbf{Var}\left(X\right)}{b^2 }
\end{align*}$$

- Example <br>
  A coin is flipped $n$ times what is the probability of getting $90\%$ heads.<br>
  Chebyshev’s inequality<br>
  Consider $X_1,X_2,\dots,X_n,$ Each $X_i=1$ if the $i^{\text{th}}$ toss is head and $X_i=0$ if $i^{\text{th}}$  toss is tail.<br>
  $$\mathit{\mathbf{E}}\left\lbrack X_i \right\rbrack =\frac{1}{2}$$
  $$\mathit{\mathbf{E}}\left\lbrack X_i^2 \right\rbrack =\frac{1}{2}$$
  Now we find variance 
  $$\mathbf{Var}\left(X_i \right)=\mathit{\mathbf{E}}\left\lbrack X_i^2 \right\rbrack -{\left(\mathit{\mathbf{E}}\left\lbrack X_i \right\rbrack \right)}^2 =\frac{1}{2}-\frac{1}{4}=\frac{1}{4}$$
  Now,
  $$\begin{align*}{}
  |X| &\ge 0\ldotp 9\\
  \Rightarrow |X-\mu |&\ge 0\ldotp 9-\mu \\
  \Rightarrow |X-\mu |&\ge 0\ldotp 9-0\ldotp 5\\
  \Rightarrow |X-\mu |&\ge 0\ldotp 4
  \end{align*}$$
  Now can use Chebyshev’s inequality

  $$\begin{align*}{}
  \mathit{\mathbf{P}}\left\lbrace |X-\mu |  \ge b\right\rbrace &\le \frac{\mathbf{Var}\left(X\right)}{b^2 }\\
  \mathit{\mathbf{P}}\left\lbrace |X-\mu | \ge 0\ldotp 4\right\rbrace &\le \frac{\frac{1}{4}n}{{\left(0\ldotp 4n\right)}^2 }\\
  &=\frac{\frac{1}{4}n}{0\ldotp 16n^2 }=\frac{25}{16n}
  \end{align*}$$
  So 
  $$|X|\ge0.9=\frac{25}{16n}$$
  We can see that Chebyshev’s inequality provides better bound than Markov’s inequality and the bound becomes better as $n$ increases.

<br><br><br>
$\tiny  {\textcolor{#808080}{\boxed{\text{Reference: Dr. Subruk, IIT Hyderabad }}}}$