$$
\newcommand{theorem}{\textbf{Theorem: }}
\newcommand{proof}{\textbf{Proof: }}
$$

# Estimation of normal distribution
Suppose that some characteristics of the elements in the population can be represented by a random variable $X$ whose probability function is $f_X(x;\theta)$.
This could be the height of people in a city or salary of employees, *etc*.
We know the family that the probability function belongs to, but are not sure of some **unknown parameter** $\theta$.

Now suppose that we've obtained some values $x_1, \dots x_n$ of a random sample $X_1, \dots, X_n$ from the population.
From these samples, we wish to estimate the value of the $\theta$.

## Estimation
The estimation can be perform in two ways

### Point estimation
A **point estimation** is defined as the statistic where 
$$
\hat \theta = \hat \Theta (X_1, \dots, X_n)
$$

where $\hat \Theta$ is some function of the sample, and is called the **point estimator**

Note that a statistic is a **function of the random sample** and **does not depend on any unknown parameter**.

For example, $\bar X = \frac{1}{n}\sum X_i$ and $X_{max} = \max(X_1, \dots, X_n)$ are both statistics as there are no unknown parameter.

However, $W = \frac{1}{n}\sum(X_i - \mu)^2$ is not a statistic if $\mu$ is unknown, but is one if $\mu$ is known.

Suppose $\mu$ is the population mean.
Then $\bar X$ is a **(point) estimator** of $\mu$.
The value of $\bar X$, denoted as $\bar x$, is an estimate of $\mu$.

### Interval estimation
An **interval estimation** is defined by two statistics, $\hat \Theta _L, \hat \Theta _U$, where $\hat \Theta _L < \hat \Theta _U$, such that $(\hat \Theta _L, \hat \Theta _U)$ is an interval where the probability of it containing $\theta$ can be determined.

Suppose that $\sigma$ is known.
Then we can set
$$
\hat \Theta _L = \bar X - 2 \sigma \quad \hat \Theta _U = \bar X + 2 \sigma
$$

then $(\hat \Theta _L, \hat \Theta _U)$ is an interval estimator for $\mu$.

## Unbiased estimator
An **unbiased estimator** ($\hat \Theta$) of $\theta$ satisfy
$$
E(\hat \Theta) = \theta
$$

**Example**

$\bar X$ is an unbiased estimator of $\mu$ as $E(\bar X) = \mu$.

$S^2=\frac{1}{n-1}\sum(X_i - \bar X)^2$ is an unbiased estimator of $\sigma^2$

## Interval estimation
An interval estimate of $\theta$ gives us a bound on $\theta$
$$
\hat \theta _L < \theta < \hat \theta_U
$$

where $\hat \theta _L, \hat \theta _U$ depends on
* the value of $\hat \Theta$ for a particular sample
* the sampling distribution of $\hat \Theta$

Note that since $\hat \theta _L, \hat \theta _U$ both depends on the statistic of the sample, these values can change based on which sample we obtained.
Hence, **it is possible that $\theta$ is not contained within the interval**.

Instead, we are interested in the random interval $(\hat \Theta _L, \hat \Theta _U)$, which contains $\theta$ with a probability of $1-\alpha$.

Formally, we are interested in the distribution with the following probability
$$
Pr(\hat \Theta _L < \theta < \hat \Theta _U) = 1 - \alpha
$$

Then, the interval $(\hat \theta _L, \hat \theta _U)$ forms the width of the $(1-\alpha)100\%$ confidence interval.

The fraction $(1-\alpha)$ is the **confidence coefficient/degree of confidence**.
The endpoints $\hat \theta _L, \hat \theta _I$ are called the **lower/upper confidence limits** respectively.

When we obtain samples of size $n$, if many samples are taken, we expect $(1 - \alpha) 100 \%$ of the intervals to contain $\theta$.
In other words, we have a confidence of $(1-\alpha)100\%$ in saying our interval covers $\theta$.

For example, when $\alpha = 0.05$, then we have a 95\% confidence in saying our interval will cover $\theta$, and we expect only 5\% of the sample's statistic not containing $\theta$.

## Confidence interval for mean
### Known variance <span id="known-variance"/>
Suppose that
1. population variance is known
2. population is normal or $n \geq 30$

#### Confidence interval <span id="confidence-interval"/>

Using [central limit theorem]() based on (2), we expect that 
$$
\bar X \sim N(\mu, \frac{\sigma^2}{n})
$$
and
$$
Z = \frac{\bar X - \mu}{\sigma / \sqrt n} \sim N(0, 1)
$$


Hence, by definition of normal distribution
$$
Pr(-z_{\alpha/2} < \frac{\bar X - \mu}{\sigma / \sqrt n} < z_{\alpha/2}) = 1 -\alpha
$$

where $z_{\alpha/2}$ is the value of $Z$ where $Pr(Z > z_{\alpha/2}) = \alpha/2$

Rearranging, we get
$$
Pr\left(\bar X - z_{\alpha/2}\left(\frac{\sigma}{\sqrt n}\right) < \mu < \bar X + z_{\alpha/2}\left(\frac{\sigma}{\sqrt n}\right)\right) = 1 - \alpha
$$

which is the confidence interval we desire.


#### Finding sample size

Now suppose that we wish to determine how large a sample we should obtain before finding $\bar X$.

Most of the time $\bar X$ will not be exactly equal to $\mu$, which leads to errors in the estimate.
The size of this error is $|\bar X - \mu|$

Rearranging the previous interval, we get
$$
Pr\left(- z_{\alpha/2}\left(\frac{\sigma}{\sqrt n}\right) < \bar X  - \mu < z_{\alpha/2}\left(\frac{\sigma}{\sqrt n}\right)\right) = 1 - \alpha \\
\Rightarrow
Pr\left(|\bar X - \mu| < z_{\alpha/2} \frac{\sigma}{\sqrt n} \right) = 1 - \alpha
$$

We wish to say that 
$$
Pr(|\bar X - \mu| \leq e) \geq 1 - \alpha
$$
where $e$ is our **margin of error**.

Comparing with the previous equation, we get
$$
e \geq z _{\alpha/2}\frac{\sigma}{\sqrt n}
$$
Therefore, for a given margin of error $e$, the sample size is given by
$$
n \geq \left(z _{\alpha/2} \frac{\sigma}{e}\right)^2
$$

That is, we need a sample of at least size $(z_{\alpha/2} \frac{\sigma}{e})^2$ in order to obtain a confidence interval with at most a margin of error $e$.

**Example**
Suppose that the mean for the weight of a random sample of 25 rice bags from a manufacturer is 5kg.
It is known that the $\sigma = 0.3$.

Suppose that we want to find the 95\% confidence interval for the mean of all the rice bags produced by the manufacturer.


The 95\% interval is simply
$$
\bar X - z_{0.025}\left(\frac{\sigma}{\sqrt n}\right) < \mu < \bar X + z_{0.025}\left(\frac{\sigma}{\sqrt n}\right)
\\ \Rightarrow
5 - 1.96\frac{0.3}{\sqrt {25}}<  \mu < 5 + 1.96 \frac{0.3}{\sqrt {25}}
\\ \Rightarrow
3.824 < \mu < 6.176
$$

Now, suppose that we wish to increase our sample size such that we want to be 95\% certain that our estimate of $\mu$ is off by less than 0.1kg.
Thus, our new sample size has to be
$$
n \geq \left(z_{\alpha/2} \frac{\sigma}{e}\right)^2 = \left(\frac{1.96 \cdot 0.3}{0.1}\right)^2  = 34.5744
$$

Hence, we need a sample size of at least $35$ in order to obtain that small an error.

---

### Unknown variance <span id="mean-unknown-variance"/>
Suppose that
1. Population variance is unknown
2. Population is normal/approximately normal
3. sample size is small

Consider
$$
T = \frac{\bar X-\mu}{S /\sqrt n}
$$
where $S^2$ is the sample variance.
We [know](./sampling.ipynb#t-dist) that $T \sim t_{n-1}$

Hence
$$
Pr(-t_{n-1;\alpha/2} < T < t_{n-1;\alpha/2}) = 1-\alpha
\\ \Rightarrow
Pr(-t_{n-1;\alpha/2} < \frac{\bar X-\mu}{S /\sqrt n} < t_{n-1;\alpha/2}) = 1-\alpha
\\ \Rightarrow
Pr\left(\bar X-t_{n-1;\alpha/2}\left(\frac{S}{\sqrt n}\right) < \mu < \bar X + t_{n-1;\alpha/2}\left(\frac{S}{\sqrt n}\right)\right) = 1-\alpha
$$

which is the confidence interval that we desire.

Thus, if the assumptions hold, we can obtain an estimate on the mean by using the sample mean and sample variance.

For large $n$, $(n > 30)$, the $t$-distribution is approximately $N(0,1)$.
Hence, it simplifies to 

$$
Pr\left(\bar X - z_{\alpha/2}\left(\frac{S}{\sqrt n}\right) < \mu < \bar X + z_{\alpha/2}\left(\frac{S}{\sqrt n}\right)\right) = 1 - \alpha
$$

Notice the similarity to the [known variance case](#known-variance).
In some sense, we can say that $S$ approaches $\sigma$ at large $n$.

<span hidden> TODO: Add example </span>

## Confidence intervals for difference of two means
Suppose that we have two populations with means $\mu_1, \mu_2$, variance $\sigma_1^2, \sigma_2^2$.
Then 
$$
\bar X_1 - \bar X_2
$$
is a point estimator of $\mu_1 - \mu_2$

### Known variance <span id="diff-means-known-variance"/>
If 
1. $\sigma_1^2, \sigma_2^2$ are known and not equal
2. populations are normal or $n_1,n_2 \geq 30$

We [know](./sampling.ipynb#mean-diff) that 
$$
\bar X_1 - \bar X_2 \sim N\left(\mu_1 - \mu_2, \frac{\sigma^2_1}{n_1} + \frac{\sigma^2_2}{n_2}\right)
$$

Rearranging, we get
$$
Pr\left(-z_{\alpha/2} <  \left((\bar X_1 - \bar X_2) - (\mu_1 - \mu_2)\right) \frac{1}{\sqrt{ \frac{\sigma^2_1}{n_1} + \frac{\sigma^2_2}{n_2}}} < z_{\alpha/2}\right) = 1-\alpha
$$

Hence our confidence interval is
$$
(\bar X_1 - \bar X_2) -z_{\alpha/2}\sqrt{ \frac{\sigma^2_1}{n_1} + \frac{\sigma^2_2}{n_2}} < \mu_1 - \mu_2 < (\bar X_1 - \bar X_2) + z_{\alpha/2} \sqrt{ \frac{\sigma^2_1}{n_1} + \frac{\sigma^2_2}{n_2}}
$$

### Large sample with unknown variance <span id="diff-mean-large-n-unknown-variance"/>
If 
1. $\sigma_1^2, \sigma_2^2$ are unknown
2. $n_1, n_2 \geq 30$

then we can replace $\sigma_1^2, \sigma_2^2$ by their estimates $S_1^2, S_2^2$ to obtain our confidence interval.

Hence our confidence interval is
$$
(\bar X_1 - \bar X_2) -z_{\alpha/2}\sqrt{ \frac{S^2_1}{n_1} + \frac{S^2_2}{n_2}} < \mu_1 - \mu_2 < (\bar X_1 - \bar X_2) + z_{\alpha/2} \sqrt{ \frac{S^2_1}{n_1} + \frac{S^2_2}{n_2}}
$$

### Unknown but equal variance <span id="diff-means-unknown-equal-variance"/>
If 
1. $\sigma_1^2, \sigma_2^2$ are unknown **but equal**
2. populations are normal

#### Small sample size
If it is also that $n_1, n_2 \leq 30$

Let $\sigma_1^2 = \sigma_2^2 = \sigma^2$, then
$$
\bar X_1 - \bar X_2 \sim N\left(\mu_1 - \mu_2, \sigma^2 \left(\frac{1}{n_1}+\frac{1}{n_2}\right)\right)
$$

Standardizing, we get
$$
Z = \frac{ \bar X_1 - \bar X_2  - (\mu_1 - \mu_2)}{\sigma \sqrt{ \left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}
$$

Since the populations are normal with the same variance, then 
$$
\frac{(n_1-1)S_1^2}{\sigma^2} \sim \chi^2_{n_1-1} \quad 
\frac{(n_2-1)S_2^2}{\sigma^2} \sim \chi^2_{n_2-1}  
$$

Therefore, 
$$
\frac{(n_1-1)S_1^2 + (n_2-1)S_2^2}{\sigma^2} \sim \chi^2_{n_1+n_2-2}  
$$ by [property of chi-square distribution](./sampling.ipynb#chi-square-sum-prop).

To estimate $\sigma^2$, we can obtain the pooled sample variance, defined as
$$
S_p^2 = \frac{(n_1-1)S^2_1 + (n_2-1)S_2^2}{n_1+n_2-2}
$$

Since we now have a $Z$ and $\chi^2$ distribution, we can form a [*t*-distribution](./sampling.ipynb#t-dist).

Setting $n = n_1 + n_2 - 2$,
$$
T = \frac{Z}{\sqrt{U/n}}
= \frac{ \bar X_1 - \bar X_2  - (\mu_1 - \mu_2)}{\sqrt{ S_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}} \sim t_{n1 + n_2 -2}
$$

<details>
    <summary style="color: blue">$\proof$ (Click to expand)</summary>
    <div style="background: aliceblue">
$$
\begin{align}
T &= \frac{Z}{\sqrt{U/n}} \\
&= Z\frac{1}{\sqrt{U/(n_1+n_2-2)}} \\
&= \frac{ \bar X_1 - \bar X_2  - (\mu_1 - \mu_2)}{\sigma \sqrt{ \left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}
\frac{1}{\sqrt{\frac{(n_1-1)S_1^2 + (n_2-1)S_2^2}{\sigma^2} \frac{1}{n_1+n_2-2}}} \\
&= \frac{ \bar X_1 - \bar X_2  - (\mu_1 - \mu_2)}{\sqrt{ \left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}
\frac{1}{\sqrt{\frac{(n_1-1)S_1^2 + (n_2-1)S_2^2}{n_1+n_2-2}}} \\
&= \frac{ \bar X_1 - \bar X_2  - (\mu_1 - \mu_2)}{\sqrt{ S_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}} \sim t_{n1 + n_2 -2}
\end{align} 
$$
    </div>
</details>

Finally, our confidence interval is simply

$$
\bar X_1 - \bar X_2 - t_{n_1 + n_2 -2;\alpha/2} \sqrt{ S_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)} < \mu_1 - \mu_2 < \bar X_1 - \bar X_2 + t_{n_1 + n_2 -2;\alpha/2} \sqrt{ S_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)} 
$$

#### Large sample size
If also $n_1, n_2 \geq 30$.

It is known that the $t$-distribution begins to resemble a standard normal distribution at large $n$.
Hence, we can replace the $t$ distribution with a $z$ distribution, thus getting


$$
\bar X_1 - \bar X_2 - z_{\alpha/2} \sqrt{ S_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)} < \mu_1 - \mu_2 < \bar X_1 - \bar X_2 + z_{\alpha/2} \sqrt{ S_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)} 
$$

<span hidden> TODO: Add example </span>

### Paired data <span id="diff-means-paired-data"/>
If we have pair data, that is our data is dependent.
For example, we could obtain the weight of individuals before $(x_i)$ and after $(y_i)$ the completion of some exercise regime.
This forms a pair of data that are related to each other for each individual.
Hence, to determine the efficacy of the routine, we determine the difference of paired observations, $d_i = x_i - y_i$.

By modeling the reduction in weight as a normal distribution, we assume in the sample $d_1, \dots, d_n$, each person's reduction in weight is drawn from a normal distribution with mean $\mu _D$, abd some unknown variance $\sigma _D^2$.

We know that $\mu_D = \mu_1 - \mu_2$, and thus the point estimate of $\mu_D$ is given by
$$
\bar d = \frac{1}{n} \sum d_i = \frac{1}{n} \sum (x_i - y_i)
$$

And the point estimate of $\sigma_D^2$ is 
$$
s _D ^2 = \frac{1}{n-1}\sum(d_i - \bar d)^2
$$

By similar construction from the previous case, we can set
$$
T = \frac{\bar d - \mu _D} {s_d / \sqrt n} \sim t_{n-1}
$$

#### Small, normal sample

For $n \leq 30$ and population is approximately normal, we will get the following confidence interval


$$
\bar d - t_{n-1;\alpha/2} \left(\frac{S_D}{\sqrt n}\right) < 
\mu_D < 
\bar d + t_{n-1;\alpha/2} \left(\frac{S_D}{\sqrt n}\right)
$$

#### Large sample

For $n \geq 30$, we can once again replace the $t$-distribution with a standard normal.


$$
\bar d - z_{\alpha/2} \left(\frac{S_D}{\sqrt n}\right) < 
\mu_D < 
\bar d + z_{\alpha/2} \left(\frac{S_D}{\sqrt n}\right)
$$

## Confidence interval for variance

### Normal population 
Let $X_1, \dots, X_n$ be a random sample from a (approximately) normal distribution.

Then a point estimate of $\sigma^2$ is the sample variance, defined as
$$
S^2 = \frac{1}{n-1}\sum(X_i- \bar X)^2 = \frac{1}{n-1}\left(\sum X_i^2 - n\bar X^2\right)
$$

#### Known mean
Suppose that $\mu$ is known.

Then we have 
$$
\frac{X_i - \mu}{\sigma} \sim N(0,1)\\ \Rightarrow
\left(\frac{X_i - \mu}{\sigma}\right)^2 \sim \chi^2(1)\\ \Rightarrow
\sum \left(\frac{X_i - \mu}{\sigma}\right)^2 \sim \chi^2(n)
$$

Hence, 
$$
Pr\left( \chi^2_{n;1-\alpha/2} < \sum \left(\frac{X_i - \mu}{\sigma}\right)^2 < \chi^2_{n;\alpha/2}\right) = 1- \alpha
$$

And we obtain our interval
$$
\frac{(\sum X_i - \mu)^2}{\chi^2_{n;1-\alpha/2}} < \sigma^2 < \frac{(\sum X_i - \mu)^2}{\chi^2_{n;\alpha/2}}
$$

#### Unknown mean
Suppose that $\mu$ is unknown.
Then we can instead look at
$$
\frac{(n-1)S^2}{\sigma^2} = \sum \frac{(X_i - \bar X)^2}{\sigma^2} \sim \chi^2(n-1)
$$

Note that the above result is true for both small and large $n$

Hence
$$
Pr\left(\chi^2_{n-1;1-\alpha/2} < \frac{(n-1)S^2}{\sigma^2} <  \chi^2_{n-1;\alpha/2}\right) = 1 - \alpha
$$

And our interval is
$$
\frac{(n-1)S^2}{\chi^2_{n-1;\alpha/2}} <   \sigma^2< \frac{(n-1)S^2}{\chi^2_{n-1;1-\alpha/2}} 
$$

Note that the degree of freedom in the chi-square distribution is reduced from $n$ to $n-1$ when $\mu$ is unknown.

## Confidence interval for ratio of variance <span id = "ratio-variance"/>
Suppose that a sample is drawn from each normal population, and the means are unknown.

Then
$$
U = \frac{(n_1-1)S_1^2}{\sigma_1^2} \sim \chi^2(n_1-1) \quad 
V = \frac{(n_2-1)S_2^2}{\sigma_2^2} \sim \chi^2(n_2-1)
$$

Then 
$$
F = \frac{U/(n_1-1)}{V/(n_2-1)} = \frac{S_1^2/\sigma_1^2}{S_2^2/\sigma_2^2} \sim F(n_1 - 1, n_2 -1)
$$

Once again, we obtain our interval as
$$
\frac{S_1^2}{S_2^2} \frac{1}{F_{n_1-1, n_2-1; a/2}} < \frac{\sigma_1^2}{\sigma_2^2} < \frac{S_1^2}{S_2^2} \frac{1}{F_{n_2-1, n_1-1; a/2}}
$$