# Sampling
**Population** is the set of all possible outcomes/observations of an experiment/survey.
A **sample** is any subset of a population.
A population can be:
* finite
    * consists of a finite number of elements
    * *eg*, the cards in a deck, all the people of a country
* infinite
    * consists of an infinitely large number of elements
    * *eg*, result of a coint flip (can be infinitely generated)
    * *eg*, the position of an object (the coordinate position can have infinite precision)
    



## Simple random sample
A **sample** of size $n$ is obtained by taking taking a set of $n$ observations from a given population.

We assume that each observation is a value of a random variable that follows some probability distribution.

A **simple random sample** is a sample that is chosen such that every subset of $n$ observation in the population has an equal probability of being selected.

### Sampling without replacement
Suppose that we wish to obtain a sample from the population without replacement.
*eg* we want to select 5 distinct people out of 100 people.


Given a population of size $N$, and a desired sample of size $n$, by [combinatorics](), we know that there are $C^N_n$ possible samples.

Thus, we assign a label to each of combination and perform the choosing algorithm as described previously.

**Example**

Suppose that we wish to sample 2 items out of $\{A, B, C\}$.
1. Consider all the combinations, that is $\{AB, AC, BC\}$
2. Label each combination a unique number between 1 to 3.
3. Generate a distinct random numbers between 1 to 3.
4. Pick the combination corresponding to the generated numbers to obtain our sample.

---

<span hidden>TODO: Add links</span>

### Sampling with replacement
Suppose, instead, that we wish to obtain a sample from the population with replacement.
*eg* we want to select 5 people out of 100 people, but the same person can be selected multiple times.

Given a population of size $N$, and a desired sample of size $n$, by [combinatorics](), we know that there are $N^n$ possible samples.



**Example**

Suppose that we wish to sample 2 items out of $\{A, B, C\}$.
1. Consider all the combinations, that is $\{AA, AB, AC, BA, BB, BC, CA, CB, CC\}$
2. Label each combination a unique number between 1 to 9.
3. Generate a distinct random numbers between 1 to 9.
4. Pick the combination corresponding to the generated numbers to obtain our sample.

---


<span hidden>TODO: Add links</span>

## Sampling from an infinite population
When the population is finite, we can easily obtain samples in such a way that each subset of the population has an equal probability of being selected.
We achieved this by assigning a probability to each subset based on the size of the population.
However, this is not possible with an infinite population.

**Example**

In an experiment where we collect the number of heads obtained in 10 coin tosses, notice that the population is infinite, because we can always perform another experiment to obtain another observation.
Hence, it is impossible for us to perform our previous approach of labeling each sample, generating a random number, and choosing that sample.

## Sampling distribution of sample mean
Recall that our goal of selecting random samples is to obtain information about some unknown population parameters.
These parameters could be the mean or variance of the population.

We obtain a large random sample, then based on the information obtain from this sample, we draw inference on the true value of the population parameters.
For example, we wish to know how many people in the country likes cats, we survey a small subset of this population, then based on this subset, we can make some assertions on the proportion of the country's population that likes cat.

**Statistic** is the value obtain from a sample.
Since each observation is determined by a random variable, the statistic will also vary, and thus is also a random variable.

**Sampling distribution** is the probability distribution of a statistic.

### Sample mean
Given a random sample $X_1, X_2, \dots, X_n$ of size $n$, the **sample mean** is defined as
$$
\bar X = \frac{1}{n}\sum^n_{i=1} X_i
$$

Note that the sample mean is also a random variable, thus we can obtain the mean and variance of the distribution.

**Example**

Suppose that our population has the following values $\{1, 2, 3, 4, 5\}$.

From the [random variable chapter](./random_variables.ipynb#aggregate), we can compute that the population mean and population variance is $\mu_X = 3$ and $\sigma^2 _X = 2$

Now, suppose that we draw a sample of size 2 (with replacement).
To obtain the sample mean, we can first iterate all possible samples of size 2, and compute the $\bar X$ for each sample.

|Samples| $$\bar X$$| $$Pr(\bar X = \bar x)$$
| :---- | ---- | --- |
|(1, 1) | 1 | 1/25 |
|(1, 2), (2, 1) | 1.5 | 2/25 |
|(1, 3), (3, 1), (2,2) | 2 | 3/25 |
|(1, 4), (4, 1), (2,3), (3,2) | 2.5 | 4/25 |
|(1, 5), (5, 1), (2,4), (4,2), (3,3) | 3 | 5/25 |
|(2, 5), (5, 2), (3,4), (4,3) | 3.5 | 4/25 |
|(3, 5), (5, 3), (4,4) | 4 | 3/25 |
|(4, 5), (5, 4) | 4.5 | 2/25 |
|(5, 5) | 5 | 1/25|

Hence, with this table, we obtained the distribution of $\bar X$.
We can treat is as just another probability distribution and obtain the mean.

$$
\mu_{\bar X} = E(\bar X) = \sum \bar x f_{\bar X}(\bar x) = 
1\left(\frac{1}{25}\right) + 
1.5\left(\frac{2}{25}\right) + 
\dots +
5\left(\frac{1}{25}\right) =  3
$$

$$
\sigma ^x _{\bar X} = E(\bar X^2) - E(\bar X) ^2 = 4
$$

---

**Theorem**

Given an infinite population or finite population with replacement that has population mean of $\mu$ and population variance of $\sigma ^2$; when random samples of size $n$ is drawn, the sampling distribution of the sample mean $\bar X$ has the following properties,
$$
\mu _{\bar X} = \mu_X \quad \sigma ^2_{\bar X} = \frac{\sigma_X^2}{n} 
$$

## Law of large numbers
Suppose that we obtain a sample of size $n$ from a population with mean $\mu$ and some **finite** population variance $\sigma ^2$.
The **law of large number** states that for any $\epsilon \in \mathbb{R}$
$$
Pr(|\bar X - \mu| > \epsilon) \to 0 \quad \text{as } n \to \infty
$$

**Corollary**
$$
Pr(|\bar X - \mu| < \epsilon) \to 1 \quad \text{as } n \to \infty
$$

In other words, as the same size gets larger, it becomes more likely that the sample mean is closer to the population mean.

## Central limit theorem
Once again, suppose that we obtain a sample of size $n$ from a population with mean $\mu$ and some **finite** population variance $\sigma ^2$.

The **central limit theorem** states that $\bar X$ is **approximately normal** with mean $\mu$ and variance $\frac{\sigma^2}{n}$ if $n$ is **sufficiently large**.

And we can perform [standardization](./probability_distributions.ipynb#standardization), and hence
$$
Z = \frac{\bar X - \mu}{\sigma / \sqrt{n}} \sim N(0,1) \text{ approximately}
$$

<span hidden> TODO: Add code example </span>

## Sampling distribution from normal population
If $X_i \sim N(\mu, \sigma^2)$ for all $i$ (that is, all observations are drawn from the same **normal** distribution), then $\bar X \sim N(\mu, \sigma ^2 /n)$ for any sample size $n$.

<span hidden> TODO: Add code example </span>

If all $X_i \sim N(\mu, \sigma^2)$ approximately, then $\bar X \sim N(\mu, \sigma^2/n)$ approximately.

## Sampling distribution of difference of two sample means <span id="mean-diff"/>
Suppose that we have two populations with means $\mu_1, \mu_2$ and $\sigma^2_1, \sigma^2_2$ respectively.
If we samples of size $n_1, n_2$ from each respective population, then the sampling distribution of the difference of sample means $\bar X_1 - \bar X_2$, is **approximately** normally distributed with the following parameters,
$$
\mu_{\bar X_1 - \bar X_2} = \mu_1 - \mu_2 \quad \sigma^2_{\bar X_1 - \bar X_2} = \frac{\sigma^2_1}{n_1} + \frac{\sigma^2_2}{n_2}
$$

<span hidden> TODO: Add proof</span>

If $n_1, n_2 \geq 30$, then the normal approximation of $\bar X_1 - \bar X_2$ is rather good regardless of the shape of the two population distribution.

<span hidden> TODO: Add code example </span>

## Chi-square distribution
The **chi-square** distribution with $n$ degrees of freedom has the probability density function as follows.
Note this function is not really important for our use, it's here simply for completeness.
We are more interested in how it is connected to the [sample variance](#sample-variance).
$$
f_X(x) = \frac{1}{2^{n/2} \Gamma (n/2)} x^{n/2-1} e^{-x/2}, \quad \text{for } x > 0
$$
0, otherwise

It is denoted by the symbol $\chi^2(n)$, where $n \in \mathbb{Z}_{\geq 0}$

The gamma function is defined as 
$$
\gamma(n) = \int ^ \infty _0 x^{n-1} e^{-x} dx
$$

Note that $\gamma(n) = (n-1)!$ for $n \in \mathbb{Z^+}$

In order words, it is factorial function extended for real numbers.

<span hidden> TODO: Add code example </span>

### Properties
$$
E(X) = n \quad V(X) = 2n
$$

For large $n$, $\chi^2(n) \sim N(n, 2n)$ approximately


<span id="chi-square-sum-prop"/>
If $X_1, \dot, X_k$ are independent chi-square random variables with $n_1, \dots n_k$ degrees of freedom, then $X_1 + \dots + X_k$ also has a chi-square distribution, with $n_1 + \dots n_k$ degrees of freedom.
Formally, it is 
$$
\sum X_i \sim \chi^(\sum n_i)
$$

### Theorems <span id="chi-square-theorem"/>
* If $X \sim N(0,1)$, then $X^2 \sim \chi^2(1)$
    * By standardizing, if $X \sim N(\mu, \sigma^2)$, then $(\frac{X-\mu}{\sigma^2})^2 \sim \chi^2(1)$
    * By the property above, when given a sample $X_1, X_2, \dots , X_n$ from a normal distribution with mean $\mu$ and variance $\sigma^2$, $\sum (\frac{X_i - \mu}{\sigma^2})^2 \sim \chi^2(n)$

<span hidden> TODO: Add code example </span>

### Chi-square table
Suppose that $X \sim \chi^2(n)$.
We denote $\chi^2(n;\alpha)$ as the **constant** value that satisfy $Pr(X \geq \chi^2(n;\alpha)) = \alpha$.
That is, given some probability $\alpha$, we wish to find the constant for the variable $X$ such that the probability of the random variable $X$ being larger than this constant in a chi-square distribution is $\alpha$.

Similarly, $\chi^2(n; 1-\alpha)$ correspond to the constant that satisfy $Pr(X \leq \chi^2(n;1-\alpha)) = \alpha$.

## Sample variance <span id="sample-variance"/>

Given a random sample $X_1, \dots, X_n$, the **sample variance** is defined as 
$$
S^2 = \frac{1}{n-1}\sum(X_i - \bar X)^2
$$

However, the distribution of the sample variance has little practical use in statistics.
Instead, we are interested in the sample distribution of the random variable $\frac{(n-1)S^2}{\sigma^2}$ when all $X_i \sim N(\mu, \sigma^2)$.


The reason we are looking at this particular random variable is because when all the samples are drawn from a normal distribution, then 
$$\frac{(n-1)S^2}{\sigma^2} \sim \chi^2(n-1)$$
Note that this is not derived directly from [this theorem](#chi-square-theorem), since it is uses the population mean while we are using the sample mean here.
The actual proof can be obtained [here](https://statproofbook.github.io/P/norm-chi2), which is beyond the scope of this topic.

## t-distribution <span id="t-dist"/>
Let $Z \sim N(0,1)$, and $U \sim \chi^2(n)$.
If $Z$ and $U$ are independent, then the **t-distribution** with $n$ degrees of freedom is exhibited by the random variable
$$
T = \frac{Z}{\sqrt{U/n}}
$$

The probability density function of a $t$-distribution is given by
$$
f_T(t) = \frac{\Gamma((n+1)/2)}{\sqrt{n \pi} \Gamma(n/2)} (1+\frac{t^2}{n})^{(n+1)/2}, \quad -\infty < t < \infty
$$

Once again, this formula is only here for completeness.

### Properties
* The graph of the $t$-distribution is symmetric about $t=0$
* As $n \to \infty$, $f_T(t) \to \frac{1}{\sqrt{2\pi}}e^{-t^2/2}$, which is simply the standard normal distribution
* If $T \sim t(n)$, then $E(T) = 0, V(T) = n/(n-2)$ for $n>2$

<span hidden> TODO: Add code example </span>

### Connection to sampling
If $X$ is drawn from a normal population, then by considering the following random variables
$$
Z = \frac{\bar X - u}{\sigma / \sqrt n} \sim N(0, 1)
$$

$$
U = \frac{(n-1)S^2}{\sigma^2} \sim \chi^2(n-1)
$$

We can show that $\bar X$ and $S^2$ are independent, and so are $Z$ and $U$.

Hence,
$$
T = \frac{\bar X - \mu}{S/\sqrt n} = \frac{Z}{\sqrt{U / (n-1)}} \sim t_{n-1}
$$

<span hidden> TODO: Add proof</span>

## F-distribution

Let $U \sim \chi^2(n_1), V \sim \chi^2(n_2)$, then the **F-distribution** with $(n_1, n_2)$ degrees of freedom is exhibited by the random variable
$$
F = \frac{U/n_1}{V/n_2}
$$

The probability density function is defined as
$$
f_F(x) = \frac{n_1^{n_1/2}n_2^{n_2/2}\Gamma((n_1+n_2)/2)}{\Gamma(n_1/2) \Gamma(n_2/2)}\frac{x^{n_1/2 -1}}{(n_1x+n_2)^{(n_1+n_2)/2}}
$$

Once again, this formula is only here for completeness.

### Connection to sampling
Suppose that we have two random samples of size $n_1,n_2$ respectively, obtained from two normal population with variance $\sigma_1^2, \sigma_2^2$ respectively.

We know that 
$$
U = \frac{(n_1-1)S_1^2}{\sigma^2_1} \sim \chi^2(n_1-1)
$$
$$
V = \frac{(n_2-1)S_2^2}{\sigma^2_2} \sim \chi^2(n_2-1)
$$
and are independent.

Hence, we have 
$$
F = \frac{U/(n_1-1)}{V/(n_2-1)} = \frac{S_1^2/\sigma_1^2}{S_2^2/\sigma^2_2} \sim F(n_1-1, n_2-1)
$$

<span hidden> TODO: Add proof</span>

### Theorems
If $F \sim F(n,m)$, then $1/F \sim F(m, n)$