# Statistics-7 🤝

**Title :** $\textit{Confidence Interval Estimation}$\
**Author :** $\textit{Manideep Bangaru}$ 👨🏻‍💻

$\small \text{on 21/02/2020}$

## Confidence Interval Estimate for the Mean 
### (When $\Large \sigma$ is known)

$\large \bar{X}-Z_\frac{\alpha}{2} \frac{\sigma}{\sqrt{n}} \le \mu \le \bar{X}+Z_\frac{\alpha}{2} \frac{\sigma}{\sqrt{n}}$

Here, $\large Z_\frac{\alpha}{2} \frac{\sigma}{\sqrt{n}}$ is called **Sampling error**

#### <font color = 'orange'> An Example to Understand </font>
A paper manufacturer has a production process that operates continuously throughout an entire production shift. The paper is expected to have a mean length of 11 inches, and the standard deviation of the length is 0.02 inch. At periodic intervals, a sample is selected to determine whether the mean paper length is still equal to 11 inches or whether something has gone wrong in the production process to change the length of the paper produced. You select a random sample of 100 sheets, and the mean paper length is 10.998 inches. Construct a 95% confidence interval estimate for the population mean paper length.

In [10]:
from scipy.stats import norm
print(norm.ppf(0.025)) # Z-score at 0.025 significance level
print(10.998 + (norm.ppf(0.025) * (0.02/(100**0.5))))
print(10.998 - (norm.ppf(0.025) * (0.02/(100**0.5))))

-1.9599639845400545
10.994080072030918
11.00191992796908


Therefore, at a 95% confidence level, the population mean lies between 10.9941 and 11.0019

To use the above formula, we should know the population standard deviation ($\sigma$)

### (When $\Large \sigma$ is NOT known)

In most business situations, you don't know $\sigma$, the population standard deviation. Therefore, we will look at a method for constructing a confidence interval estimate of $\mu$ that uses the sample statistic _S_ as an estimate of the population parameter $\sigma$.

#### t - distribution (Also called as Student's t distribution)
If the variable X is normally distributed, then the following statistic:

$\large t = \frac{\bar{x}-\mu}{\frac{S}{\sqrt{n}}}$ ,\
\
has a t distribution with **n-1** degress of freedom. This expression has the same form as the Z statistic, except that _S_ is used to estimate the unknown $\sigma$

#### Properties of t - Distribution
The t distribution is very similar in appearance to the standardized normal distribution. Both distributions are symmetrical and bell-shaped, with the mean and the median equal to zero. However, because S is used to estimate the unknown s, the values of t are more variable than those for Z. Therefore, the t distribution has more area in the tails and less in the center than does the standardized normal distribution

$\large \bar{X}-t_\frac{\alpha}{2} \frac{S}{\sqrt{n}} \le \mu \le \bar{X}+t_\frac{\alpha}{2} \frac{S}{\sqrt{n}}$

### Determining Sample Size
Determining the proper sample size is a complicated procedure, subject to the constraints of budget, time, and the amount of acceptable sampling error

#### Sample size determination for the Mean
In confidence interval estimate for the mean equation the amount added to or subtracted from $\bar{X}$ is equal to half the width of the interval. This quantity represents the amoutn of imprecision in the estimate that results from sampling error.

The Sampling error (**_e_**) = $\large Z_{\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}$

From the above Sampling error equation,

$\large n = \frac{Z_{\frac{\alpha}{2}}^2 \sigma^2}{e^2}$