### import packages

In [4]:
import numpy as np
import pandas as pd
from scipy import stats

In [14]:
education_districtwise = pd.read_csv("education_districtwise.csv")
education_districtwise = education_districtwise.dropna()
education_districtwise.head(10)

Unnamed: 0,DISTNAME,STATNAME,BLOCKS,VILLAGES,CLUSTERS,TOTPOPULAT,OVERALL_LI
0,DISTRICT32,STATE1,13,391,104,875564.0,66.92
1,DISTRICT649,STATE1,18,678,144,1015503.0,66.93
2,DISTRICT229,STATE1,8,94,65,1269751.0,71.21
3,DISTRICT259,STATE1,13,523,104,735753.0,57.98
4,DISTRICT486,STATE1,8,359,64,570060.0,65.0
5,DISTRICT323,STATE1,12,523,96,1070144.0,64.32
6,DISTRICT114,STATE1,6,110,49,147104.0,80.48
7,DISTRICT438,STATE1,7,134,54,143388.0,74.49
8,DISTRICT610,STATE1,10,388,80,409576.0,65.97
9,DISTRICT476,STATE1,11,361,86,555357.0,69.9


### sampling

In [9]:
sampled_data = education_districtwise.sample(n=50, replace = True, random_state = 12323)
sampled_data.head(10)

Unnamed: 0,DISTNAME,STATNAME,BLOCKS,VILLAGES,CLUSTERS,TOTPOPULAT,OVERALL_LI
1,DISTRICT649,STATE1,18,678,144,1015503.0,66.93
354,DISTRICT388,STATE34,50,3042,338,10082852.0,84.95
412,DISTRICT304,STATE24,19,1774,209,1648574.0,65.5
194,DISTRICT39,STATE21,10,751,124,2205170.0,75.16
626,DISTRICT402,STATE5,15,164,111,3279860.0,95.68
177,DISTRICT503,STATE21,22,2311,295,5959798.0,74.41
545,DISTRICT603,STATE17,15,1536,141,2194262.0,81.35
464,DISTRICT33,STATE22,5,904,51,1339832.0,72.75
37,DISTRICT21,STATE26,11,982,156,2181753.0,82.4
77,DISTRICT628,STATE25,8,350,84,1480080.0,80.83


### construct 95% confidence interval

#### steps:
1.   <h4>Identify a sample statistic
2.   <h4>Choose a confidence level
3.   <h4>Find the margin of error 
4.   <h4>Calculate the interval

#### calculate sample mean

In [16]:
sample_mean = sampled_data['OVERALL_LI'].mean()
sample_mean

73.1474

#### calculate sample standard error (scale for scipy.stats.norm.interval())

reference: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html

In [20]:
estimated_standard_error = sampled_data['OVERALL_LI'].std() / np.sqrt(sampled_data.shape[0])

In [23]:
stats.norm.interval(confidence=0.95, loc=sample_mean, scale=estimated_standard_error)

(70.10810214733846, 76.18669785266155)

We have a 95% confidence interval for the mean district literacy rate that stretches from about 70.1% to 76.18%.

`95% CI: (70.1, 76.18)`

### construct 99% confidence interval

In [24]:
stats.norm.interval(confidence=0.99, loc=sample_mean, scale=estimated_standard_error)

(69.1530855468515, 77.14171445314851)

We have a 99% confidence interval for the mean district literacy rate that stretches from about 69.15% to 77.14%.

`99% CI: (69.15, 77.14)`

* With a confidence level of 95%, the interval covers 5.6 percentage points (71.4% - 77.0%)
* With a confidence level of 99%, the interval covers 7.4 percentage points (70.5% - 77.9%)