# Expectation(Random Variable)

The expectation of a random variable is a number that attempts to capture the center of that random variable's distribution. It can be interpreted as the long-run average of many independent samples from the given distribution. More precisely, it is defined as the probability-weighted sum of all possible values in the random variable's support

$$
\text{E}[X] = \sum_{x \in \mathcal{X}}xP(x)
$$

# Variance
Whereas expectation provides a measure of centrality, the variance of a random variable quantifies the spread of that random variable's distribution. The variance is the average value of the squared difference between the random variable and its expectation

$$
\text{Var}(X) = \text{E}[(X - \text{E}[X])^2] \\
\text{Var}(X) = \text{E}(X^2) - \text{E(X)}^2
$$

- Example, compute the variance of a die roll, a uniform random variable over the sample space  $\Omega$ = {1,2,3,4,5,6}

$$
Var(X) = E[X- E(X)]^2\\
       = E(X^2) - E(X)^2 \\
       = (\sum_{k=1}^{6}k^2 \cdot \frac{1}{6}) - (3.5)^2 \\
       = \frac{1}{6}\cdot(1+4+9+16+25+36) - 3.5^2\\
       = \frac{1}{6}\cdot91 - 3.5^2\\
       \approx 2.92
$$


# Discrete Random Variable

[probability-distributions](https://seeing-theory.brown.edu/probability-distributions/index.html)

- A discrete random variable has a finite or countable number of possible values
- if X is a discrete random vatiable, then these exists unique nonnegative function f(x), and F(x), such that the following are true

$$
P(X = x) = f(x)\\
P(X < x) = F(x)
$$

- f(x) is the probability mass function 
- F(x) is the cumulative distribution fuction 

### Bernoulli distribution 

$$
f(x;p) = \begin{cases} p & \text{if } x = 1 \\ 1-p & \text{if } x = 0 \end{cases}
$$

### A binomial random variable
is the sum of n independent Bernoulli random variables with parameter p. It is frequently used to model the number of successes in a specified number of identical binary experiments, such as the number of heads in five coin tosses.

$$
f(x; n,p) = \binom{n}{x}p^{x}(1-p)^{n-x}
$$

### A Poisson random variable
counts the number of events occurring in a fixed interval of time or space, given that these events occur with an average rate λ. This distribution has been used to model events such as meteor showers and goals in a soccer match

$$
f(x;\lambda) = \dfrac{\lambda^{x}e^{-\lambda}}{x!}
$$

# Continuous Random Variable

If X is a continuous random variable, then there exists unique nonnegative functions, f(x) and F(x), such that the following are true:
$$
P(a \le X \le b) = \int_{a}^{b}f(x)dx\\
P(X \lt x) = F(x)
$$

### Uniform distribution

$$
f(x;a,b) = \left\{\begin{array}{ll} \dfrac{1}{b-a} \text{ for } x \in [a,b]\\ 0 \qquad \text{ otherwise } \end{array}\right.
$$

### Normal (Gaussian) distribution

$$
f(x;\mu, \sigma^2) = \dfrac{1}{\sqrt{2\pi\sigma^{2}}} e^{-\dfrac{(x-\mu)^{2}}{2\sigma^{2}}}
$$

### Beta distribution
The beta distribution is a general family of continuous probability distributions bound between 0 and 1. The beta distribution is frequently used as a conjugate prior distribution in Bayesian statistics  

$$
f(x;\alpha,\beta) = \dfrac{\Gamma(\alpha + \beta)x^{\alpha - 1}(1-x)^{\beta - 1}}{\Gamma(\alpha)\Gamma(\beta)}
$$

# Confidence Intervals

Suppose that during the presidential election, we were interested inthe proportion p of the population that preferred Hillary Clinton toDonald Trump. It wouldn’t be feasible to call every single person inthe country and write down who they prefer. Instead, we can take abunch of samples,$X_{1}, . . . ,X_{n}$where

$$
X_{i} = \left\{\begin{array}{ll} 1 \text{ if person i prefers Hillary } \\ 0 \qquad \text{ otherwise } \end{array}\right.
$$

- The sample mean 

$$
\hat{X} = \frac{1}{N}\sum_{i=1}^{n}X_{i}
$$

$E(\hat{X}) = p$ each $X_{i}$ is 1 with probability p and 0 with probability 1-p

by the CLT
$$
\frac{(\hat{X} - p)}{(\sigma/\sqrt{N})} \sim N(0,1)
$$


- we don't know $\sigma$
- But this is a valid approximation

$$
\hat{\sigma} = \sqrt{\frac{1}{N}\sum_{i=1}^{N}(X_{i}-\hat{X})^2}
$$

$$
P(-1.96\le \frac{\hat{X} - p}{\hat{\sigma}/\sqrt{N}} \le 1.96) = 0.95\\
P(\hat{X}-1.96\frac{\hat{\sigma}}{\sqrt{N}} \le p \le \hat{X}+1.96\frac{\hat{\sigma}}{\sqrt{N}}) = 0.95
$$

-Above formula looks like below   
$$
[\hat{\mu}+z_{left}\frac{\hat{\sigma}}{\sqrt{N}},\hat{\mu}+z_{right}\frac{\hat{\sigma}}{\sqrt{N}}]\\
\hat{\sigma} = \sqrt{\frac{1}{N}\sum_{i=1}^{N}(x_{i}-\hat{\mu})^2}
$$

#### Bernoulli Confidence Interval
- replaced the Gaussian symbols with Bernoulli symbols

var(X) = p(1-p)  
95%CI =   
$$
\approx [\hat{p}+z_{left}\sqrt{\frac{\hat{p}(1-\hat{p})}{N}},\hat{p}+z_{right}\sqrt{\frac{\hat{p}(1-\hat{p})}{N}}]
$$

## Hypothesis Testing

Suppose we suspect that the proportion ofvoters who prefer Hillary Clinton is greater than $\frac{1}{2}$, and that we take n samples, denoted$ {X_{i}}_{i=1}^{n}$from the U.S. population. Based on thesesamples, can we support or reject our hypothesis that Hillary Clin-ton is more popular? And how confident are we in our conclusion?

- The __alternative hypothesis__, denoted $H_{a}$, is a claim we would like to support. In our previous example, the alternative hypothesis was $p$ > 0.5.
- The __null hypothesis__, denoted $H_{0}$ is the opposite of the alternative hypothesis. In this case, the null hypothesis is $p$ ≤ 0.5, i.e. that less than half of the population supports Hillary.
- The __test statisticis__ a function of the sample observations. Based on the test statistic, we will either accept or reject the __null hypothesis__. In the previous example, the test statistic was the sample mean $\hat{X}$. The sample mean is often the test statistic for many hypothesis tests
- The __rejection region__ is a subset of our sample space $\Omega$ that determines whether or not to reject the null hypothesis. If the test statistic falls in the rejection region, then we reject the null hypothesis. Otherwise, we accept it. In the presidential election example,the rejection region would be

    - RR:{($x_{1},...,x_{n}$): $\hat{X}$ > k}  
    - This notation means we reject if ̄$\hat{X}$ falls in the interval(k,$\infty$),where __k__ is some number which we must determine.k is determined by the Type I error, which is defined in the next section. Once k is computed, we reject or accept the null hypothesis depending on the value of our test statistic, and our test is complete.

### Types of Error

There are two fundamental types of errors in hypothesis testing.They are denoted Type I and II error.

# Reference
[basic-probability](https://seeing-theory.brown.edu/doc/basic-probability.pdf)