# Central Limit Theorem

## The Central Limit Theorem

The central limit theorem states that the distribution of *sample means* from a population will approximate a normal distribution, regardless of the shape of the population distribution, as long as the sample size is sufficiently large (usually n > 30). If the population is already normal, the theorem holds true for even smaller sample sizes.

## Estimating the Population Mean

The sample mean is used as an estimate for the population mean, but it is important to note that the standard deviation of the sampling distribution ($\sigma_{\overline{x}}$) is not equal to the population standard deviation ($\sigma$).

## The Standard Error

The standard deviation of the sampling distribution ($\sigma_{\overline{x}}$) is referred to as the **standard error** and is related to the population standard deviation as follows:

$$\sigma_{\overline{x}} = \frac{\sigma}{\sqrt{n}}$$

where $\sigma$ is the population standard deviation and $n$ is the sample size.

As the sample size increases, the standard error approaches 0 and the sample mean ($\overline{x}$) approaches the population mean ($\mu$).

## Estimates and Confidence Intervals

### Point Estimate

A point estimate is a single statistic calculated from a sample that is used to estimate an unknown population parameter. For example, the sample mean can be used as a point estimate for the population mean.

### Interval Estimate

An interval estimate is a range of values that is believed to contain the true population parameter. It represents the error in estimating the population parameter.

### Confidence Interval

A confidence interval is a range of values within which the population mean is believed to lie. The confidence interval for the population mean can be calculated as follows:

#### When Population Standard Deviation is Known

For a random sample of size n with mean x, taken from a population with standard deviation $\sigma$ and mean $\mu$, the confidence interval for the population mean is:

$$\overline{x}-\frac{z\sigma}{\sqrt{n}}\leq \mu \leq \overline{x}+\frac{z\sigma}{\sqrt{n}}$$

#### When Population Standard Deviation is Unknown

When the population standard deviation is unknown, the sample standard deviation (s) can be used in place of $\sigma$ to calculate the confidence interval:

$$\overline{x}-\frac{zs}{\sqrt{n}}\leq \mu \leq \overline{x}+\frac{zs}{\sqrt{n}}$$

### Example

Suppose we have grades of 10 students taken from a population, and we want to find the 95% confidence interval for the population mean.

In [1]:
import numpy as np
import scipy.stats as stats
from scipy.stats import t

grades =  np.array([3.1,2.9,3.2,3.4,3.7,3.9,3.9,2.8,3.4,3.6])

stats.t.interval(0.95, len(grades)-1, loc=np.mean(grades), scale=stats.sem(grades))

(3.1110006165952773, 3.668999383404722)

The arguments inside t.interval function are 95% confidence interval, degrees of freedom (n-1), sample mean and the standard  error calculated by stats.sem function.