# Central Limit Theorem

## The Central Limit Theorem

The Central Limit Theorem (CLT) posits that the distribution of *sample means* drawn from a population will approximate a normal distribution, irrespective of the shape of the population distribution, provided the sample size is sufficiently large (typically n > 30). Even for populations that are already normally distributed, the theorem remains valid for smaller sample sizes.

## Estimating the Population Mean

While the sample mean serves as an estimate for the population mean, it's essential to recognize that the standard deviation of the sampling distribution ($\sigma_{\overline{x}}$) differs from the population standard deviation ($\sigma$).

## The Standard Error

The standard deviation of the sampling distribution ($\sigma_{\overline{x}}$) is termed the **standard error** and is linked to the population standard deviation by the formula:

$$\sigma_{\overline{x}} = \frac{\sigma}{\sqrt{n}}$$

Here, $\sigma$ represents the population standard deviation, and $n$ denotes the sample size. 

As the sample size increases, the standard error diminishes toward 0, and the sample mean ($\overline{x}$) converges towards the population mean ($\mu$).


## Estimates and Confidence Intervals

### Point Estimate

A point estimate is a single statistic calculated from a sample used to estimate an unknown population parameter. For instance, the sample mean can serve as a point estimate for the population mean.

### Interval Estimate

An interval estimate is a range of values believed to encompass the true population parameter. It represents the margin of error in estimating the population parameter.

### Confidence Interval

A confidence interval is a range of values within which the population mean is presumed to lie. It can be calculated as follows:

#### When Population Standard Deviation is Known

For a random sample of size $n$ with mean $\overline{x}$, taken from a population with standard deviation $\sigma$ and mean $\mu$, the confidence interval for the population mean is:

$$\overline{x} - \frac{z\sigma}{\sqrt{n}} \leq \mu \leq \overline{x} + \frac{z\sigma}{\sqrt{n}}$$

#### When Population Standard Deviation is Unknown

In cases where the population standard deviation is unknown, the sample standard deviation ($s$) substitutes $\sigma$ in calculating the confidence interval:

$$\overline{x} - \frac{zs}{\sqrt{n}} \leq \mu \leq \overline{x} + \frac{zs}{\sqrt{n}}$$

Here:  
                                                                                                                                                 
$\overline{x}$: Sample mean.  
$n$: Sample size.  
$\mu$: Population mean (the parameter we are estimating).  
$\sigma$: Population standard deviation (known when calculating the confidence interval with the formula that includes $\sigma$).  
$s$: Sample standard deviation (used as an estimate of the population standard deviation when it's unknown).  
$z$: The critical value from the standard normal distribution corresponding to the desired confidence level. It is determined based on the chosen confidence level (e.g., 95% confidence level corresponds to a z-score of approximately 1.96). This value is used to calculate the margin of error.

### Example

Suppose we have grades of 10 students drawn from a population, and we aim to ascertain the 95% confidence interval for the population mean.


In [1]:
import numpy as np
import scipy.stats as stats
from scipy.stats import t

grades =  np.array([3.1,2.9,3.2,3.4,3.7,3.9,3.9,2.8,3.4,3.6])

stats.t.interval(0.95, len(grades)-1, loc=np.mean(grades), scale=stats.sem(grades))

(3.1110006165952773, 3.668999383404722)

The arguments inside t.interval function are 95% confidence interval, degrees of freedom (n-1), sample mean and the standard  error calculated by stats.sem function.