<a id="lib"></a>
# 1. Import Libraries

**Let us import the required libraries.**

In [1]:
import scipy.stats as stats
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

<a id="pt"></a>
##  Point Estimation

This method considers a single value (sample statistic) as the population parameter. 

Let $X_{1}, X_{2}, X_{3},..., X_{n}$ be the random sample drawn from a population with mean $\mu$ and standard deviation $\sigma$. <br>
The point estimation method estimates the population mean, $\mu = \overline{X}$, where $\overline{X}$ is the sample mean and population standard deviation, $\sigma = s$, where $s$ is the standard deviation of the sample .

<a id="err"></a>
### 2.1.1 Sampling Error

Sampling error is considered as the absolute difference between the sample statistic used to estimate the parameter and the corresponding population parameter. Since the entire population is not considered as the sample, the values of mean, median, quantiles, and so on calculated on sample differ from the actual population values. 

One can reduce the sampling error either by increasing the sample size or determining the optimal sample size using various methods.

### Example:

#### 1. Consider the data for the number of ice-creams sold per day. An ice-cream vendor collected this data for 90 days and then a sample is drawn (without replacement) containing ice-creams sold for 25 days. 

data = [21, 93, 62, 76, 73, 20, 56, 95, 41, 36, 38, 13, 80, 88, 34, 18, 40, 11, 
        25, 29, 61, 23, 82, 10, 92, 69, 60, 87, 14, 91, 94, 49, 57, 83, 96, 55, 
        79, 52, 59, 39, 58, 17, 19, 98, 15, 54, 48, 46, 72, 45, 65, 28, 37, 30, 
        68, 75, 16, 33, 31, 99, 22, 51, 27, 67, 85, 47, 44, 77, 64, 97, 84, 42, 
        90, 70, 74, 89, 32, 26, 24, 12, 81, 53, 50, 35, 71, 63, 43, 86, 78, 66]
        
sample = [10, 22, 47, 66, 11, 57, 77, 98, 31, 63, 74, 84, 50, 96, 88, 92, 70, 54, 65, 44, 16, 72, 20, 90, 43]


In [3]:
data = [21, 93, 62, 76, 73, 20, 56, 95, 41, 36, 38, 13, 80, 88, 34, 18, 40, 11, 
        25, 29, 61, 23, 82, 10, 92, 69, 60, 87, 14, 91, 94, 49, 57, 83, 96, 55, 
        79, 52, 59, 39, 58, 17, 19, 98, 15, 54, 48, 46, 72, 45, 65, 28, 37, 30, 
        68, 75, 16, 33, 31, 99, 22, 51, 27, 67, 85, 47, 44, 77, 64, 97, 84, 42, 
        90, 70, 74, 89, 32, 26, 24, 12, 81, 53, 50, 35, 71, 63, 43, 86, 78, 66]
        
sample = [10, 22, 47, 66, 11, 57, 77, 98, 31, 63, 74, 84, 50, 96, 88, 92, 70, 54, 
          65, 44, 16, 72, 20, 90, 43]


In [4]:
mu = np.mean(data)  # population_mean
x_bar = np.mean(sample)  # sample_mean
print('Sampling error:', abs(mu-x_bar))

Sampling error: 3.1000000000000014


In [8]:
sigma = np.std(data)
n = len(sample)
print('Standard error:', sigma/n**0.5)

Standard error: 5.195831662656775


<a id="int"></a>
## 2.2 Interval Estimation for Mean

This method considers the range of values in which the population parameter is likely to lie. The confidence interval is an interval that describes the range of values in which the parameter lies with a specific probability. It is given by the formula,<br> <p style='text-indent:20em'> `conf_interval = sample statistic ± margin of error`</p>

The uncertainty of an estimate is described by the `confidence level` which is used to calculate the margin of error. 

In [10]:
stats.norm.cdf(1)-stats.norm.cdf(-1)

0.6826894921370859

In [11]:
stats.norm.cdf(2)-stats.norm.cdf(-2)

0.9544997361036416

In [12]:
stats.norm.isf(0.05)

1.6448536269514729

<a id="large"></a>
### 2.2.1 Interval Estimation with Z stat

Here the standard deviation of the population is known.The confidence interval for the population mean with $100(1-\alpha)$% confidence level is given as: $\overline{X} \pm Z_{\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}$

Where, <br>
$\overline{X}$: Sample mean<br>
$\alpha$: Level of significance<br>
$\sigma$: Population Standard deviation <br>
$n$: Sample size

### Example:

#### 1. A random sample of weight (in kg.) for 35 diabetic patients is drawn from the population with a standard deviation of 8 kg. Find the 90% confidence interval for the population mean.

    Weight: [59.1, 65.0, 75.8, 79.2, 95.0, 99.8, 89.1, 65.3, 41.9, 55.2, 94.8, 84.1, 83.2, 74.0, 75.5, 76.2, 79.1, 80.1, 
             92.1, 74.2, 59.2, 64.0, 75, 78.2, 95.6, 97.8, 89.5, 64.2, 41.8, 57.2, 85, 91.4, 81.8, 74.6, 90]

In [14]:
sigma = 8
Weight= [59.1, 65.0, 75.8, 79.2, 95.0, 99.8, 89.1, 65.3, 41.9, 55.2, 94.8, 84.1, 83.2, 74.0, 75.5, 76.2, 79.1, 80.1, 
         92.1, 74.2, 59.2, 64.0, 75, 78.2, 95.6, 97.8, 89.5, 64.2, 41.8, 57.2, 85, 91.4, 81.8, 74.6, 90]
x_bar = np.mean(Weight)
n = len(Weight)
z = stats.norm.isf(0.05)

In [15]:
ll = x_bar - (z*(sigma/n**0.5))
ul = x_bar + (z*(sigma/n**0.5))
print(ll,ul)

74.46146621975642 78.90996235167215


In [None]:
# Inbuilt function:

# stats.norm.interval(loc=x_bar,scale=sigma/n**0.5, alpha=confidence_level)

In [16]:
stats.norm.interval(loc=x_bar,scale=sigma/n**0.5, alpha=0.9)

(74.46146621975642, 78.90996235167215)

In [None]:
# The pop mean ranges from 74.4 to 78.9 with 90% confidence

#### Practice

2. There are 150 apples on a tree. You randomly choose 40 apples and found that the average weight of apples is 182 grams with a population standard deviation of 30 grams. Find the 95% confidence interval for the population mean.

In [18]:
stats.norm.interval(loc=182,scale=30/40**0.5,alpha=0.95)

(172.70307451543158, 191.29692548456842)

In [19]:
stats.norm.interval(loc=182,scale=30/40**0.5,alpha=1)

(-inf, inf)

#### 3. A movie production house needs to estimate the average monthly wage of the technical crew members. The previous data shows that the standard deviation of the wages is 190 dollars. The production team thinks that the estimation of the average wage should not exceed 54 dollars. The team has decided to take a small subset of wages for the estimation. Find a suitable number of wages to be considered to get the estimate with 90% confidence.

In [20]:
me = 54
sigma = 190
z = stats.norm.isf(0.05)
n = ((z*sigma)/me)**2
print(round(n))

33


In [None]:
# Minimum of 33 samples are required to perform the analysis.

<a id="small"></a>
### 2.2.2  Interval Estimation with t stat

Here the standard deviation of the population is unknown. The confidence interval for the population mean with $100(1-\alpha)$% confidence level is given as: $\overline{X} \pm t_{\frac{\alpha}{2}, n-1}\frac{s}{\sqrt{n}}$

Where, <br>
$\overline{X}$: Sample mean<br>
$\alpha$: Level of significance<br>
$s$: Sample standard deviation<br>
$n-1$: degrees of freedom

The ratio $\frac{s}{\sqrt{n}}$ is the estimate of the standard error of the mean. And $t_{\frac{\alpha}{2}, n-1}\frac{s}{\sqrt{n}}$ is the margin of error for the estimate.

The value of $t_{\frac{\alpha}{2}, n-1}$ for different $\alpha$ values can be obtained using the `stats.t.isf()` from the scipy library.  

### Example:

#### 1. In a apple tree, you randomly choose 17 apples and found that the average weight of apples is 78 grams with a standard deviation of 23 grams. Find the 90% confidence interval for the population mean.

In [21]:
n=17
x_bar = 78
s = 23
t = stats.t.isf(0.05,df=n-1)
ll = x_bar-(t*(s/n**0.5))
ul = x_bar+(t*(s/n**0.5))
print(ll,ul)

68.26090326067306 87.73909673932694


In [22]:
stats.t.interval(loc=x_bar,scale = s/n**0.5 , alpha=0.9,df=n-1)

(68.26090326067306, 87.73909673932694)

### Practice:

#### 1. In a class.  randomly 15 students are selected and found that the average weight of height is 145 cm with a standard deviation of 10cm. Find the 95% confidence interval for the population mean.

In [None]:
# (139.4621845843536, 150.5378154156464)


<a id="prop"></a>
## 2.3 Interval Estimation for Proportion

Consider a population in which each observation is either a success or a failure. The population proportion is denoted by `P` which the ratio of the number of successes to the size of the population.

The confidence interval for the population proportion with $100(1-\alpha)$% confidence level is given as: $p \pm Z_{\frac{\alpha}{2}}\sqrt{\frac{p(1 - p)}{n}}$

Where, <br>
$p$: Sample proportion<br>
$\alpha$: Level of significance<br>
$n$: Sample size

The quantity $Z_{\frac{\alpha}{2}}\sqrt{\frac{p(1 - p)}{n}}$ is the margin of error.

### Example:

#### 1. A financial firm has created 50 portfolios. From them, a sample of 13 portfolios was selected, out of which 8 were found to be underperforming. Construct a 99% confidence interval to estimate the population proportion.

In [23]:
n = 13
ps = 8/13
z = stats.norm.isf(0.005)
ll = ps-(z*(np.sqrt((ps*(1-ps))/n)))
ul = ps+(z*(np.sqrt((ps*(1-ps))/n)))
print(ll,ul)

0.26782280814713794 0.962946422622093


In [25]:
stats.norm.interval(loc=ps,scale=np.sqrt((ps*(1-ps))/n),alpha=0.99)

(0.26782280814713805, 0.9629464226220927)

In [None]:
# The proportion of underperforming candidate ranges from 26.7% tp 96.2%

### Practice:

#### 1. A survey is taken for the preference of work from home. A sample of 60  was selected, out of which 42 were opting work from home. Construct a 99% confidence interval to estimate the population proportion.

In [None]:
# 0.5476118833255879- 0.852388116674412
