# Expectation(Random Variable)

The expectation of a random variable is a number that attempts to capture the center of that random variable's distribution. It can be interpreted as the long-run average of many independent samples from the given distribution. More precisely, it is defined as the probability-weighted sum of all possible values in the random variable's support

$$
\text{E}[X] = \sum_{x \in \mathcal{X}}xP(x)
$$

# Variance
Whereas expectation provides a measure of centrality, the variance of a random variable quantifies the spread of that random variable's distribution. The variance is the average value of the squared difference between the random variable and its expectation

$$
\text{Var}(X) = \text{E}[(X - \text{E}[X])^2] \\
\text{Var}(X) = \text{E}(X^2) - \text{E(X)}^2
$$

- Example, compute the variance of a die roll, a uniform random variable over the sample space  $\Omega$ = {1,2,3,4,5,6}

$$
Var(X) = E[X- E(X)]^2\\
       = E(X^2) - E(X)^2 \\
       = (\sum_{k=1}^{6}k^2 \cdot \frac{1}{6}) - (3.5)^2 \\
       = \frac{1}{6}\cdot(1+4+9+16+25+36) - 3.5^2\\
       = \frac{1}{6}\cdot91 - 3.5^2\\
       \approx 2.92
$$


# Discrete Random Variable

[probability-distributions](https://seeing-theory.brown.edu/probability-distributions/index.html)

- A discrete random variable has a finite or countable number of possible values
- if X is a discrete random vatiable, then these exists unique nonnegative function f(x), and F(x), such that the following are true

$$
P(X = x) = f(x)\\
P(X < x) = F(x)
$$

- f(x) is the probability mass function 
- F(x) is the cumulative distribution fuction 

### Bernoulli distribution 

$$
f(x;p) = \begin{cases} p & \text{if } x = 1 \\ 1-p & \text{if } x = 0 \end{cases}
$$

### A binomial random variable
is the sum of n independent Bernoulli random variables with parameter p. It is frequently used to model the number of successes in a specified number of identical binary experiments, such as the number of heads in five coin tosses.

$$
f(x; n,p) = \binom{n}{x}p^{x}(1-p)^{n-x}
$$

### A Poisson random variable
counts the number of events occurring in a fixed interval of time or space, given that these events occur with an average rate λ. This distribution has been used to model events such as meteor showers and goals in a soccer match

$$
f(x;\lambda) = \dfrac{\lambda^{x}e^{-\lambda}}{x!}
$$

# Continuous Random Variable

If X is a continuous random variable, then there exists unique nonnegative functions, f(x) and F(x), such that the following are true:
$$
P(a \le X \le b) = \int_{a}^{b}f(x)dx\\
P(X \lt x) = F(x)
$$

### Uniform distribution

$$
f(x;a,b) = \left\{\begin{array}{ll} \dfrac{1}{b-a} \text{ for } x \in [a,b]\\ 0 \qquad \text{ otherwise } \end{array}\right.
$$

### Normal (Gaussian) distribution

$$
f(x;\mu, \sigma^2) = \dfrac{1}{\sqrt{2\pi\sigma^{2}}} e^{-\dfrac{(x-\mu)^{2}}{2\sigma^{2}}}
$$

### Beta distribution
The beta distribution is a general family of continuous probability distributions bound between 0 and 1. The beta distribution is frequently used as a conjugate prior distribution in Bayesian statistics  

$$
f(x;\alpha,\beta) = \dfrac{\Gamma(\alpha + \beta)x^{\alpha - 1}(1-x)^{\beta - 1}}{\Gamma(\alpha)\Gamma(\beta)}
$$

# Confidence Intervals

Suppose that during the presidential election, we were interested inthe proportion p of the population that preferred Hillary Clinton toDonald Trump. It wouldn’t be feasible to call every single person inthe country and write down who they prefer. Instead, we can take abunch of samples,$X_{1}, . . . ,X_{n}$where

$$
X_{i} = \left\{\begin{array}{ll} 1 \text{ if person i prefers Hillary } \\ 0 \qquad \text{ otherwise } \end{array}\right.
$$

- The sample mean 

$$
\hat{X} = \frac{1}{N}\sum_{i=1}^{n}X_{i}
$$

$E(\hat{X}) = p$ each $X_{i}$ is 1 with probability p and 0 with probability 1-p

by the CLT
$$
\frac{(\hat{X} - p)}{(\sigma/\sqrt{N})} \sim N(0,1)
$$


- we don't know $\sigma$
- But this is a valid approximation

$$
\hat{\sigma} = \sqrt{\frac{1}{N}\sum_{i=1}^{N}(X_{i}-\hat{X})^2}
$$

$$
P(-1.96\le \frac{\hat{X} - p}{\hat{\sigma}/\sqrt{N}} \le 1.96) = 0.95\\
P(\hat{X}-1.96\frac{\hat{\sigma}}{\sqrt{N}} \le p \le \hat{X}+1.96\frac{\hat{\sigma}}{\sqrt{N}}) = 0.95
$$

-Above formula looks like below   
$$
[\hat{\mu}+z_{left}\frac{\hat{\sigma}}{\sqrt{N}},\hat{\mu}+z_{right}\frac{\hat{\sigma}}{\sqrt{N}}]\\
\hat{\sigma} = \sqrt{\frac{1}{N}\sum_{i=1}^{N}(x_{i}-\hat{\mu})^2}
$$

#### Bernoulli Confidence Interval
- replaced the Gaussian symbols with Bernoulli symbols

var(X) = p(1-p)  
95%CI =   
$$
\approx [\hat{p}+z_{left}\sqrt{\frac{\hat{p}(1-\hat{p})}{N}},\hat{p}+z_{right}\sqrt{\frac{\hat{p}(1-\hat{p})}{N}}]
$$

## Hypothesis Testing

Suppose we suspect that the proportion ofvoters who prefer Hillary Clinton is greater than $\frac{1}{2}$, and that we take n samples, denoted$ {X_{i}}_{i=1}^{n}$from the U.S. population. Based on thesesamples, can we support or reject our hypothesis that Hillary Clin-ton is more popular? And how confident are we in our conclusion?

- The __alternative hypothesis__, denoted $H_{a}$, is a claim we would like to support. In our previous example, the alternative hypothesis was $p$ > 0.5.
- The __null hypothesis__, denoted $H_{0}$ is the opposite of the alternative hypothesis. In this case, the null hypothesis is $p$ ≤ 0.5, i.e. that less than half of the population supports Hillary.
- The __test statisticis__ a function of the sample observations. Based on the test statistic, we will either accept or reject the __null hypothesis__. In the previous example, the test statistic was the sample mean $\hat{X}$. The sample mean is often the test statistic for many hypothesis tests
- The __rejection region__ is a subset of our sample space $\Omega$ that determines whether or not to reject the null hypothesis. If the test statistic falls in the rejection region, then we reject the null hypothesis. Otherwise, we accept it. In the presidential election example,the rejection region would be

    - RR:{($x_{1},...,x_{n}$): $\hat{X}$ > k}  
    - This notation means we reject if ̄$\hat{X}$ falls in the interval(k,$\infty$),where __k__ is some number which we must determine.k is determined by the Type I error, which is defined in the next section. Once k is computed, we reject or accept the null hypothesis depending on the value of our test statistic, and our test is complete.

### Types of Error

There are two fundamental types of errors in hypothesis testing.They are denoted Type I and II error.

- A Type I error is made when we reject $H_{0}$ when it is in fact true. The probability of Type I error is typically denoted as $\alpha$. In other words, $\alpha$ is the probability of a false positive.
- A Type II error is made when we accept $H_{0}$ when it is in fact false. The probability of Type II error is typically denoted as $\beta$
- In other words, $\beta$ is the probability of a false negative. In the context of hypothesis testing,$\alpha$ will determine the rejection region. If we restrict the probability of a false positive to be less than 0.05, then we have  
$P(\hat{X} \in RR|H_{0}) \le 0.05$   
- our test statistic falls in the rejection region (meaning we reject $H_{0}$), given that $H_{0}$ is true, with probability 0.05. Continuing along our example of the presidential election, the rejection region was of the form ̄$\hat{X} \gt$ k, and the null hypothesis was that p ≤ 0.5. Our above expression then becomes  
$p(\hat{X} \gt k | p \le 0.5) \le$ 0.05  
- if n > 30, we can apply the CLT to say:
$$
P(\frac{\hat{X}-p}{S/\sqrt{n}} \gt \frac{k-p}{S/\sqrt{n}} | p \le 0.5) = P(Y \gt \frac{k-p}{S/\sqrt{n}} | p \le 0.5)
$$

- where Y is N(0,1) random variable, since p $\le$ 0.5 implies 
$\frac{k-p}{S/\sqrt{n}} \geq \frac{k-0.5}{S/\sqrt{n}} $  
- hence
$$
P(Y \gt \frac{k-p}{S/\sqrt{n}}  | p \le 0.5) \le P(Y \gt \frac{k-0.5}{S/\sqrt{n}} )
$$

- we can loop up a z table to find $z_{0.05}$ = -1.64
$$
P(Y \gt 1.64) = P(Y  \lt -1.64) = 0.05
$$

- letting $\frac{k-0.5}{S/\sqrt{n}} = 1.64$, we can solve for k to determine our rejection region

$$
k = 0.5 + 1.64\cdot \frac{S}{\sqrt{n}}
$$

- since our rejection region was of the form $\hat{X}$ > k, we simply check whether $\hat{X} \gt 0.5 + \frac{S}{\sqrt{n}}$.
- If this is true, then we reject the null, and conclude that more than half the population favors Hillary Clinton.
- Since we set $\alpha$ = 0.05, we are 1 - $\alpha$ = 0.95 confident that our conclusion was correct

- In the above example, we determined the rejection region by plugging in 0.5 for p, even though the null hypothesis was p ≤ 0.5. It is almost as though our null hypothesis was $H_{0}$:p=0.5 instead of $H_{0}$:p ≤ 0.5. In general, we can simplify $H_{0}$ and assume the bordercase (p=0.5 in this case) when we are determining the rejection region

![](https://s3-us-west-2.amazonaws.com/courses-images/wp-content/uploads/sites/1888/2017/05/11170656/3159.png)
![](http://financetrain.com/assets/cip3.gif)
![](https://www.researchgate.net/profile/Avijit_Hazra/publication/320742650/figure/download/tbl1/AS:669269071237158@1536577589946/Critical-z-values-used-in-the-calculation-of-confidence-intervals.png)

### p-value

As we saw in the previous section, a selected $\alpha$ determined the rejection region so that the probability of a false positive was less than $\alpha$. Now suppose we observe some test statistic, say, the sample proportion of voters ̄$\hat{X}$ who prefer Hillary Clinton. We then ask the following question. Given ̄$\hat{X}$, what is the smallest value of $\alpha$ suchthat we still reject the null hypothesis? This leads us to the following definition.

- The p-value, denoted p, is defined  
p=min{$ \alpha \in (0, 1):\text{Reject } H_{0} \text{using an }\alpha \text{level test}$}

- This definition isn’t that useful for computing p-values. In fact,there is a more intuitive way of thinking about them. Suppose we observe some sample mean $\hat{X}_{1}$

- Now suppose we draw a new sample mean, $\hat{X}_{2}$ The p-value is just the probability that our new sample mean is more extreme than the one we first observed, assuming the null hypothesis is true. By “extreme” we mean, more different from our null hypothesis.

Suppose that we sampled n people and asked which candidate they preferred. As we did before, we can represent each person as an indicator function,

$$
X_{i} = \left\{\begin{array}{ll} 1 \text{ if person i prefers Hillary } \\ 0 \qquad \text{ otherwise } \end{array}\right.
$$

- Then $\hat{X}$ is the proportion of the sample that prefers Hillary. After taking the n samples, suppose we observe that $\hat{X}$ = 0.7. If we were to set up a hypothesis test, our hypotheses, test statistic, and rejection region would be

$$
H_{0} : q \le 0.5 \\
H_{a} : q \gt 0.5 \\
Test statistics : \hat{X} \\
RR:(x_{1},...,x_{n}): \hat{X} \gt k
$$

- where q is the true proportion of the entire U.S. population that favors Hillary. Using the intuitive definition, the p value is the probability that we observe something more extreme than 0.7. Since the null hypothesis is that q ≤ 0.5, “more extreme” in this case means, “bigger than 0.7”. 

- So the p-value is the probability that, given a new sample, we observe the new $\hat{X}$ is greater than 0.7, assuming the null, i.e. that q ≤ 0.5. Normalizing $\hat{X}$, we have



$$
P(\hat{X} \gt 0.7 | H_{0}) = P(\frac{\hat{X}-0.5}{S/\sqrt{n}} \gt P(\frac{0.7-0.5}{S/\sqrt{n}} ) \approx P(Y \gt \frac{0.7-0.5}{S/\sqrt{n}})= p(p-value)
$$

- refer below:
    - previous example : $P(\hat{X} \in RR|H_{0}) \le p-value:0.05$  
    - $p(\hat{X} \gt k | p \le 0.5) \le$ 0.05  
    - $P(Y \gt \frac{k-p}{S/\sqrt{n}}  | p \le 0.5) \le P(Y \gt \frac{k-0.5}{S/\sqrt{n}} )$

- where Y∼N(0, 1). We would then compute the value $z_{p}$=$\frac{0.7−0.5}{S/√n}$by plugging in the sample standard deviation, S, and the number of samples we took, n. 
- We would then look up a z table and find the probability corresponding to $z_{p}$, denoted p (this is our p value).
- We now claim that this p is equal to the smallest $\alpha$ for which we reject the null hypothesis

- we need to show that for any $\alpha$ < p, we accept the null hypothesis. We also need to show that for any $\alpha$ ≥ p, we reject the null hypothesis.


- Suppose $\alpha$ <p. We need to show that the test statistic $\hat{X}$=0.7 falls in the acceptance region determined by $\alpha$. Using a z table, we couldfind $z_{\alpha}$such that  
$$
\alpha = P(Y \gt z_{\alpha}) \approx P(\frac{\hat{X}- 0.5}{S/\sqrt{n}} \gt z_{\alpha}|H_{0}) = P(\hat{X} \gt z_{\alpha}\cdot \frac{S}{\sqrt{n}}+ 0.5 | H_{0})
$$

- the rejection region is determined by 
$$
\hat{X} \gt k_{\alpha} = z_{\alpha}\cdot\frac{S}{\sqrt{n}} + 0.5
$$

- Since $\alpha$ < p, the corresponding $z_{p}$ such that p=P(Y>$z_{p}$) satisfies $z_{p}$<$z_{\alpha}$. By the RHS of expression (1)

$$
p = P(Y \gt \frac{0.7-0.5}{S/\sqrt{n}})
$$

- which implies $z_{p} = \frac{0.7-0.5}{S/\sqrt{n}} \Rightarrow z_{p}\cdot \frac{S}{\sqrt{n}}+0.5 = 0.7$. This implies that  

$$
0.7 = z_{p}\cdot\frac{S}{\sqrt{n}} + 0.5 \lt z_{\alpha}\cdot\frac{S}{\sqrt{n}}+0.5 = k_{\alpha}
$$

- Therefore $\hat{X}$=0.7< $k_{\alpha}$  implies $\hat{X}$=0.7 is in the acceptance region determined by $\alpha$. Hence, we accept the null hypothesis for any $\alpha$<p.




# Reference
[basic-probability](https://seeing-theory.brown.edu/doc/basic-probability.pdf)
[probability-distributions](https://seeing-theory.brown.edu/probability-distributions/index.html)