## Expected values

`Definition`: Let $X\sim f(x)$. The *expected value* of $g(X)$ is defined as
$$\left.\operatorname{E}(g(X))=\left\{\begin{array}{ll}\int_{-\infty}^{\infty}g(x) f_{X}(x) dx&\text{if X is continuous}\\\sum_{x\in\mathcal{X}}g(x) f_{X}(x)=\sum_{x\in\mathcal{X}}g(x)P(X=x)&\text{if X is discrete}\end{array}\right.\right.$$
- If $\mathrm{E}(|g(X)|)=\int_{-\infty}^{\infty}|g(x)|f(x)dx=\infty$, we say that $E(g(X))$ does not exist.
- In particular, the mean of $X$ is:
$$E(x)=\int_{-\infty}^{\infty}xf(x)dx,\text{ if cont}$$
$$E(x)=\sum_{x}xf(x),\text{ if discrete}$$

### Exponential distribution
Exponential distribution $X\sim\mathrm{Expo}(\beta)$ with $\beta>0$ and pdf

$$f(x)=\begin{cases}\begin{array}{ll}\frac{1}{\beta}e^{-x/\beta}&,x\geq0\\0&,x<0\end{array}&&\end{cases}$$

### Gamma distribution
Gamma distribution Gamma($\alpha,\beta$) with $\alpha>0, \beta>0$ and pdf

$$f(x)=\left\{\begin{array}{ll}\frac{1}{\Gamma(\alpha)\beta^{\alpha}}x^{(\alpha-1)}e^{-x/\beta}&,x\geq0\\0&,x<0\end{array}\right.$$
- **Note**: Gamma($\alpha=1,\beta$)=Expo($\beta$)

Let $X\sim$ Gamma($\alpha,\beta$)
- $E(X)=\alpha\beta$
- $E(X^2)=(\alpha+1)\alpha\beta^2$
- $Var(X)=\alpha\beta^2$

## Some useful facts

Gamma function $\Gamma(\alpha)=\int_0^\infty t^{\alpha-1}e^{-t}dt, \alpha>0$
- If $n$ is an integer: $\Gamma(n)=(n-1)!$
- $\Gamma(\alpha)=(\alpha-1)\Gamma(\alpha-1)$
- $\Gamma(0.5)=\sqrt{\pi}$

Integration by parts: $\int uv^{\prime}=uv-\int u^{\prime}v$

Taylor series for $e^{\lambda}$: $e^\lambda=\sum_{x=0}^\infty\frac{\lambda^x}{x!}$

$\int\frac1{1+x^2}=\tan^{-1}(x)$

### Poisson distribution
Poisson($\lambda$) with $\lambda>0$ and pmf
$f(x)=\begin{cases}\begin{array}{ll}\frac{e^{-\lambda}\lambda^{x}}{x!}&,&x=0,1,2,\ldots\\0&,&\mathrm{otherwise}\end{array}&&\end{cases}$
- $\sum_{x=0}^{\infty}\frac{e^{-\lambda}\lambda^{x}}{x!}=e^{-\lambda}\sum_{x=0}^{\infty}\frac{\lambda^{x}}{x!}=e^{-\lambda}e^{\lambda}=1$
- $E(X) = \lambda$
- $E(X^2)=\lambda^2+\lambda$
- $Var(X)=\lambda$

### Cauchy distribution
$f(x)=\frac1{\pi(1+x^2)}\quad\mathrm{~,~}x\in\mathbb{R}$

$\begin{aligned}&\int_{-\infty}^{\infty}\frac{1}{\pi}\frac{1}{(1+x^{2})}dx=\frac{1}{\pi}\tan^{-1}(x)|_{x=-\infty}^{\infty}=\frac{1}{\pi}(\frac{\pi}{2}-(-\frac{\pi}{2}))=\frac{\pi}{\pi}=1\end{aligned}$

E(X) does not exist, because $\int_{\infty}^{-\infty}(x)\frac{1}{\pi}\frac{1}{(1+x^{2})}dx=\infty$.

## A note on methods
We can approach $E(g(X))$ in two ways:
1. Using the pdf of $X$:
$E(g(X))=\int_{-\infty}^{\infty}g(x)f(x)dx$
2. Using the pdf of $Y=g(X)$:
$E(g(X))=E(Y)=\int_{-\infty}^{\infty}yf_Y(y)dy$

## Properties of expectation
**Theorem**
Let $X$ be a random variable and let $a, b, c$ be constants, 

and suppose that $E(g_1(X))$ and $E(g_2(X))$ exist.
- $E(ag_1(X)+bg_2(X)+c)=aE(g_1(X))+bE(g_2(X))+c$
- $\mathrm{If~}g_1(x)\geq0\text{ for all }x\mathrm{~then~}E(g_1(X))\geq0$
- $\mathrm{If~}g_1(x)\geq g_2(x)\text{ for all }x\mathrm{~then~}E(g_1(X))\geq E(g_2(X))$
- $\mathrm{If~}a\leq g(x)\leq b\text{ for all }x\mathrm{~then~}a\leq E(g(x))\leq b$

## Moments - Section 2.3
Let $n$ be a *positive integer* and $X$ be a random variable
- The $n$-th moment of $X$ is $\mu_n^{\prime}=E\left(X^n\right)$.
- The $n$-th central moment of $X$ is $\mu_n=E\left((X-\mu)^n\right)$.

## Variance
The mean or expected value of a r.v. $X$ is the 1st moment:
$\mu_1^{\prime}=\operatorname{E}(X)\equiv\mu$

The variance of a r.v. X is the 2nd central moment $\mathrm{Var}(X)=E\left((X-\mu)^2\right)\equiv\sigma^2$

The standard deviation of $X$ is defined as $\sigma=\sqrt{\mathrm{Var}(X)}$

- $\mathrm{Var}(ax+b)=a^2\mathrm{Var}(x)$
- $\mathrm{Var}(x)=E(x^{2})-E(x)^{2}$
- $E(x^{2})=\int x^{2}f(x)dx$
- $Var(x)=\int(x-\mu_{x})^{2}f(x)dx$

## Moment generating functions (mgf)

Mgf is a very useful theoretical tool,
- to characterize a distribution
- for limits
- to prove (a version of) the Central Limit Theorem!

### Definition
Let $X\sim F(x)$. The moment generating function (mgf) of $X$ is defined as

$$M_X(t)=E\left(e^{tX}\right)$$

if the expectation exists for $t$ in a neighborhood of 0.

### Moment generating functions
`Theorem`: If a r.v. $X$ has mgf $M_X(t)$ then

$$E\left(X^{n}\right)=\left.\frac{d^{n}}{dt^{n}}M_{X}(t)\right|_{t=0}$$

Find the mgf for the Gamma and Poisson distributions and use it to gnerate the first two moments.

#### Example for Gamma distribution
Let $X\sim Gamma(\alpha,\beta)$, $f(x)=\left\{\begin{array}{ll}\frac{1}{\Gamma(\alpha)\beta^{\alpha}}x^{(\alpha-1)}e^{-x/\beta}&,x\geq0\\0&,x<0\end{array}\right.$, with $\alpha>0, \beta>0$

$
\begin{align}
M_X(t) &= E(e^{tX}) \\
       &= \int_{-\infty}^{\infty} e^{tx} f(x) \, dx \\
       &= \int_{0}^{\infty} e^{tx} \frac{1}{\Gamma(\alpha)\beta^{\alpha}}x^{\alpha-1}e^{-x/\beta} \, dx &(\text{since } x \geq 0 \text{ for Gamma}) \\
       &= \int_{0}^{\infty} \frac{1}{\Gamma(\alpha)\beta^{\alpha}} x^{\alpha-1} e^{-x/\beta + tx} \, dx &\left(\text{let } -\frac{x}{\beta} + tx = -\frac{x}{\hat{\beta}}, \, \hat{\beta} = \frac{\beta}{1 - t\beta}, \, t \neq \frac{1}{\beta}\right) \\
       &= \frac{1}{\beta^{\alpha}} \hat{\beta}^{\alpha} \int_{0}^{\infty} \frac{1}{\Gamma(\alpha)\hat{\beta}^{\alpha}} x^{\alpha-1} e^{-x/\hat{\beta}} \, dx \\
       &= \frac{\hat{\beta}^{\alpha}}{\beta^{\alpha}} &(\hat{\beta} = \frac{\beta}{1 - t\beta},\,\hat{\beta}>0, \, t \neq \frac{1}{\beta}) \\
       &= \frac{\beta^{\alpha}}{\beta^{\alpha}(1 - t\beta)^{\alpha}} \\
       &= (1 - \beta t)^{-\alpha}
\end{align}
$

Since $\hat{\beta} = \frac{\beta}{1 - t\beta},\,\hat{\beta}>0$, we can get $1 - t\beta>0$, thus $t<\frac{1}{\beta}$.

Saw before: $E(X)=\alpha\beta$, $E(X^2)=\alpha(\alpha+1)\beta^2$, let's vertify it.

$E(X)=\frac{d}{dt}M(t)|_{t=0}=\frac{d}{dt}(1-\beta t)^{-\alpha}|_{t=0}=-\alpha(1-\beta t)^{-(\alpha+1)}(-\beta)|_{t>0}=\alpha\beta(1-\beta t)^{-(\alpha+1)}|_{t=0}=\alpha\beta$

$E(X^2)=\frac{d^2}{dt^2}M(t)|_{t=0}=\frac{d}{dt}\alpha\beta(1-\beta t)^{-(\alpha+1)}|_{t=0}=-\alpha\beta(\alpha+1)(1-\beta t)^{-(\alpha+2)}(-\beta)|_{t-0}=\alpha(\alpha+1)\beta^2$

#### Example for Poisson distribution

Poisson($\lambda$) with $\lambda>0$ and pmf
$f(x)=\begin{cases}\begin{array}{ll}\frac{e^{-\lambda}\lambda^{x}}{x!}&,&x=0,1,2,\ldots\\0&,&\mathrm{otherwise}\end{array}&&\end{cases}$, 

`Pre-knowledge`

- cdf:$\sum_{x=0}^{\infty}\frac{e^{-\lambda x}}{x!}=1$
- $\begin{aligned}\sum_{x=0}^\infty\frac{a^x}{x!}=e^a\end{aligned}$
- $e^{tx}=(e^t)^x$
- $E(X) = \lambda$
- $E(X^2)=\lambda^2+\lambda$
- $(uv)'=u'v+uv'$
- $\left(\frac{u}{v}\right)'=\frac{v\cdot u'-u\cdot v'}{v^2}$

$\begin{align}
M(t)&=E(e^{tx}) \\
    &=\sum_{x=0}^{\infty}\frac{e^{tx}e^{-\lambda}\lambda^{x}}{x!} \\
    &=e^{-\lambda}\sum_{x=0}^{\infty}\frac{[e^{t}\lambda]^x}{x!},\,(a = e^{t}\lambda) \\
    &=e^{-\lambda}e^{e^{t}\lambda} \\
    &=e^{\lambda e^{t} - \lambda},\,\text{ or }exp(\lambda exp(t) - \lambda) \\
\end{align}$

$\begin{align}
E(X)&=\frac{d}{dt}M(t)|_{t=0} \\
    &=\frac{d}{dt}exp(\lambda exp(t)-\lambda)|_{t=0} \\
    &=exp(\lambda exp(t)-\lambda)·\lambda exp(t)|_{t=0} \\
    &=\lambda
\end{align}$

$\begin{align}
E(X^2)&=\frac{d}{dt} exp(\lambda exp(t)-\lambda)·\lambda exp(t)|_{t=0} \\
      &=\lambda\exp(\lambda\exp(t)-\lambda+t)(\lambda exp(t)+1)|_{t=0} \\
      &=\lambda^2+\lambda
\end{align}$

## Mgfs uniquely define a distribution

Mgfs (**not moments**) **uniquely** characterize a distribution

### Theorem

Let $F_X(x)$ and $F_Y(y)$ be cdfs for **whom all moments exists**.
- (a) If $X$ and $Y$ have **bounded support**, then
    
    $F_X(u)=F_Y(u),\forall u\quad\mathrm{iff}\quad E(X^k)=E(Y^k)\mathrm{~}\forall k=0,1,2,\ldots $
    
    
- (b) If mgfs exist and $M_X(t) = M_Y(t)$ for all $t$ in a neighborhood of 0, then $F_X(u) = F_Y(u)$ for all $u$
    - Remember: If $F_X(u)=F_Y(u),\forall u$ then $X\overset{D}{\operatorname*{=}}Y$.

## More on the Theorem

1. Note: Moments $E(X^k)$ can exist even when the mgf does not
2. Part b): If both $M_X(t)$ and $M_Y(t)$ exist

    - $M_X(t)=M_Y(t)\quad\Leftrightarrow\quad X\triangleq Y$
    
    - So, just like the cdf and the pdf, the moment generating function (if it exists) uniquely determines the distribution of $X$
 
 
3. Generally the moments themselves $E(X^k)$ do not uniquely determine a distribution
    - Can have $X$ and $Y$ with same moments for all $k$ but different distribution (and different mgfs)

4. Part a): If X and Y have bounded support we have
- $\text{All moments equal}\Leftrightarrow\quad X\overset{\mathrm{D}}{\operatorname*{=}}Y$
- So in that special case, the infinite sequence of moments does uniquely determine the distribution.

## Convergence of mgfs

Convergence of mgfs implies convergence of cdf’s

### Theorem
Let $X_1,X_2,X_3,...$ be a sequence of random variables with mgfs $M_{X_i}(t)$, $i=1,2,3,...$ and suppose that

$$\lim_{i\to\infty}M_{X_i}(t)=M_X(t),\,\forall t\text{ in a neighborhood of 0}$$

and that $M_X(t)$ is a mgf. Then there exists a unique cdf $F_X$ whose moments are determined by $M_X(t)$ and 

$$\lim_{i\to\infty}F_{X_i}(x)=F_X(x),\,\text{(It means convergence in distribution)}$$

for all $x$ where $F_X(x)$ is continuous.

## Poisson approximation to a Binomial

Let $X_1, X_2, X_3,...$ be a sequence of random variables where

$$X_n\sim\mathrm{Binomial}\left(n,\frac\lambda n\right)\quad n\in\mathbb{N},\lambda>0$$

As $n\to\infty$ the distribution of $X_n$ approaches the Poisson distribution.
- So for large $n$ we can approximate the Binomial$(n, p)$ distribution with a Poisson$(np)$ distribution ($*$)

####  Pre-knowledge
The limit identity: $\lim_{n\to\infty}\left(1+\frac{x}{n}\right)^n\to e^x\quad\text{as}\quad n\to\infty$

Let $X\sim\mathrm{binomial}(n,p)$, that is, $P(X=x)=\binom{n}{x}p^{x}(1-p)^{n-x},\quad x=0,1,\ldots,n.$

- $E(X) = np$
- $E(X^2) = n(n-1)p^2+np$
- $Var(X) = np(1-p)$

#### Proof: $(*)$

About Poisson Distribution mgf, we have,

$\begin{align}
M_Y(t)&=E(e^{tx}) \\
    &=\sum_{x=0}^{\infty}\frac{e^{tx}e^{-\lambda}\lambda^{x}}{x!} \\
    &=e^{-\lambda}\sum_{x=0}^{\infty}\frac{[e^{t}\lambda]^x}{x!},\,(a = e^{t}\lambda) \\
    &=e^{-\lambda}e^{e^{t}\lambda} \\
    &=e^{\lambda e^{t} - \lambda},\,\text{ or }exp(\lambda exp(t) - \lambda) \\
\end{align}$

About Bonomial Distribution mgf, we have,

$\begin{align}
M_{X}(t)&=\sum_{x=0}^{n}e^{tx}\begin{pmatrix}n\\x\end{pmatrix}p^{x}(1-p)^{n-x} \\
        &=\sum_{x=0}^{n}\begin{pmatrix}n\\x\end{pmatrix}(pe^{t})^{x}(1-p)^{n-x} \\
\end{align}$

Using special formular: $\sum_{x=0}^{n}\binom{n}{x} u^{x}v^{n-x}=(u+v)^{n}.$

Therefore, $M_X(t)=\left[pe^t+(1-p)\right]^n$

Since $\lim_{n\to\infty}\left(1+\frac{a_n}n\right)^n=e^a$, we have

$\begin{align}
M_X(t)&=\lim_{n\to\infty}\left[pe^t+(1-p)\right]^n \\
      &=\lim_{n\to\infty}\left[1+\frac{1}{n}(e^{t}-1)(np)\right]^{n} \\
      &=\lim_{n\to\infty}\left[1+\frac{1}{n}(e^{t}-1)\lambda\right]^{n} \\
      &=e^{\lambda e^{t} - \lambda} \\
      &=M_Y(t)
\end{align}$

## Some useful facts

Binomial Theorem

$$(x+y)^n=\sum_{i=1}^n\binom nix^iy^{n-i},\,\forall x, y\in\mathbb{R},n\in\mathbb{N}$$

A useful limit. If $\lim_{n\to\infty}a_n=a$ then

$$\lim_{n\to\infty}\left(1+\frac{a_n}n\right)^n=e^a$$

## More on mgfs

### Theorem

Let $X$ be a random variable, $a,b$ constants and $Y = aX + b$. Then

$$M_Y(t)=e^{bt}M_X(at)$$

`Proof:`

$\begin{align}
M_Y(t)&=E(e^{ty})\\
      &=E(e^{t(ax+b)}))\\
      &=E(e^{tax}e^{tb})\\
      &=e^{tb}E(e^{tax})\\
      &=e^{tb}M_X(at)
\end{align}$

## Other special moments
- Mean: First moment,$\mu=E(X)$
- Variance: Second central moment, $\mu_2=E((X-\mu)^2)=\sigma^2$
- Skewness: $\alpha_3=\frac{\mu_3}{(\mu_2)^{3/2}}=\frac{\mu_3}{\sigma^3},\,\text{ where }\mu_3=E((X-\mu)^3)$
    - Measures lack of symmetry
    - A pdf $f(x)$ is symmetric about $a$ if
        - $f(a-\epsilon)=f(a+\epsilon)\quad\forall\epsilon>0$
        - f symmetric $\Leftrightarrow$ $\alpha_3=0$
        - f left skewed $\Leftrightarrow$ $\alpha_3<0$
        - f right skewed $\Leftrightarrow$ $\alpha_3>0$
- Kurtosis: $\alpha_4=\frac{\mu_4}{\mu_2^2}=\frac{\mu_4}{\sigma^4}$
    - Measures "flatness" versus "peakedness" of $f(x)$
- Mode of a distribution is a value a such that $f(a)\geq f(x)$ for all $x$

## Quantiles of a distribution

If $X$ is a r.v. and $0 < p < 1$ then the value up is called the $p$-th quantile of $X$ if

$$F(u_p)\geq p\text{ and }1-F(u_p)\geq 1-p$$

If $X$ is discrete we can define

$$u_p=\min\{x:F(x)=p\}$$

Special cases:
- 1st quartile $Q_1 = u_{0.25}$
- Median $Q_2 = m = u_{0.50}$
- 3rdt quartile $Q_3 = u_{0.75}$