# Steps
Statistical theory tells us that we can perform interval estimation through a confidence interval. We can calculate confidence intervals for the mean of a variable using the normal and t-distribution. Below are the steps that we will take while calculating our confidence interval:

- Step 1: Calculating the sample mean of the variable that we are interested in.
- Step 2: Calculating the standard error of the mean (SEM) – sample standard deviation divided by the square root of the sample size.
- Step 3: Calculating the confidence level – this can be any level, common ones being 95% and at times even 99%. 

In [1]:
# Example 1
# ---
# Calculate a 95% confidence interval for the mean of the given data
# ---
# data = [1, 2, 3, 4, 5]
# ---
# 
import pandas as pd
import numpy as np
# Importing scipy.stats
import scipy.stats as stats

# defining our data
data = np.array([1, 2, 3, 4, 5])

# Calculating the sample mean of final data
#
sample_mean = data.mean()
sample_mean

# Finding the sample size
#
sample_size = data.shape[0]
sample_size

# Finding the standard error of the mean of data 
#
std_error = data.std() / np.sqrt(sample_size)
std_error

# Calculating the 95% Confidence Interval for mean data 
# To calculate the confidence interval, we will use the norm object from the statssubpackage. 
# The norm object has an interval() method that receives three inputs 
# – our chosen confidence interval 0.95, and the sample mean and standard error of the mean.  
# 
stats.norm.interval(0.95, loc=sample_mean, scale=std_error)

# We can be 95% certain that the population mean data is between 1.7 and 4.23

(1.7604099353908769, 4.239590064609123)

# Challenge

In [3]:
# Challenge 1
# ---
# Determine with 95% confidence interval the average of height of Kenyan men at a national level.
# ---
height = [ 186.0, 180.0, 195.0, 189.0, 191.0, 177.0, 161.0, 177.0, 192.0, 179.0, 185.0, 192.0,
 169.0, 172.0, 191.0, 184.0, 193.0, 182.0, 190.0, 185.0, 179.0, 188.0, 179.0, 188.0,
 170.0, 179.0, 195.0, 179.0, 169.0, 185.0, 170.0, 197.0, 187.0, 177.0, 173.0, 179.0,
 195.0, 179.0, 190.0, 174.0, 195.0, 206.0, 180.0, 169.0, 178.0, 201.0, 180.0, 180.0,
 171.0, 191.0]
# ---
#
ex_1 = np.array([height])
print(ex_1)

[[186. 180. 195. 189. 191. 177. 161. 177. 192. 179. 185. 192. 169. 172.
  191. 184. 193. 182. 190. 185. 179. 188. 179. 188. 170. 179. 195. 179.
  169. 185. 170. 197. 187. 177. 173. 179. 195. 179. 190. 174. 195. 206.
  180. 169. 178. 201. 180. 180. 171. 191.]]


In [4]:
# Calculating the sample mean of final data
#
mean_1 = ex_1.mean()
mean_1

183.06

In [5]:
# Finding the sample size
#
size = ex_1.shape[0]
size


1

In [6]:
# Finding the standard error of the mean of data 
#
std_error = ex_1.std() / np.sqrt(size)
std_error


9.422122903040483

In [7]:
# Calculating the 95% Confidence Interval for mean data 
# To calculate the confidence interval, we will use the norm object from the statssubpackage. 
# The norm object has an interval() method that receives three inputs 
# – our chosen confidence interval 0.95, and the sample mean and standard error of the mean.  
# 
stats.norm.interval(0.95, loc=mean_1, scale=std_error)

(164.59297845213067, 201.52702154786934)

We can be 95% certain that the population mean data is between 164.59297845213067 & 201.52702154786934

# Challenge 2

In [8]:

# ---
# Twelve users attempted to add a channel on their digital decoder TV to a list of favorites.
# After the task they rated the difficulty on the 7 point Single Ease Question.  
# Compute the 95% confidence interval. The responses are shown below
# ---
# 
difficulty = [2, 6, 4, 1, 7, 3, 6, 1, 7, 1, 6, 5, 1, 1]
# 
ex_2 = np.array([difficulty])

In [9]:
# Calculating the sample mean of final data
#
mean_2 = ex_2.mean()
mean_2

3.642857142857143

In [10]:
# Finding the sample size
#
size2 = ex_2.shape[0]
size2

1

In [23]:
# Finding the standard error of the mean of data 
#
std_error2 = ex_2.std() / np.sqrt(size2)
std_error2

2.378689400763816

In [24]:
# Calculating the 95% Confidence Interval for mean data 
# To calculate the confidence interval, we will use the norm object from the statssubpackage. 
# The norm object has an interval() method that receives three inputs 
# – our chosen confidence interval 0.95, and the sample mean and standard error of the mean.  
# 
stats.norm.interval(0.95, loc=mean_2, scale=std_error2)

(-1.0192884130470996, 8.305002698761385)

We can be 95% certain that the population mean data is between -1.0192884130470996 & 8.305002698761385

# Challenge 3

In [18]:
# Challenge 3
# ---
# Calculate the 95% confidence interval for the mean of longlife-life insecticide 
# treated mosquito net (LLIN) distributions by the Against Malaria Foundation.
# ---
# Dataset 
url = 'http://bit.ly/MalariaDataset'
# 
ex_3 = pd.read_csv(url, encoding='latin-1')
ex_3.head()

Unnamed: 0,#_llins,location,country,when,by_whom,country_code
0,3000,Mombasa/Siaya,Kenya,May-Jun 06,Red Cross,KEN
1,3000,Blan/Mch/Nkh/Nkh,Malawi,May-Jun 06,Red Cross,MWI
2,3000,Capr/Kava/Ohang,Namibia,May-Jun 06,Red Cross,NAM
3,2000,Kigali,Rwanda,May-Jun 06,Red Cross,RWA
4,2000,Soroti,Uganda,May-Jun 06,Red Cross,UGA


In [20]:
# Calculating the sample mean of final data
#
mean_3 = ex_3['#_llins'].mean()
mean_3

215412.89221556886

In [21]:
# Finding the sample size
#
size3 = ex_3['#_llins'].shape[0]
size3

167

In [25]:
# Finding the standard error of the mean of data 
#
std_error3 = ex_3['#_llins'].std() / np.sqrt(size3)
std_error3

84302.16683316056

In [26]:
# Calculating the 95% Confidence Interval for mean data 
# To calculate the confidence interval, we will use the norm object from the statssubpackage. 
# The norm object has an interval() method that receives three inputs 
# – our chosen confidence interval 0.95, and the sample mean and standard error of the mean.  
# 
stats.norm.interval(0.95, loc=mean_2, scale=std_error3)

(-165225.56795453888, 165232.85366882462)

We can be 95% certain that the population mean data is between -165225.56795453888 & 165232.85366882462