#### Challenge 3: To practice - Constructing Confidence Intervals

While testing our hypothesis is a great way to gather empirical evidence for accepting or rejecting the hypothesis, another way to gather evidence is by creating a confidence interval. A confidence interval gives us information about the true mean of the population. So for a 95% confidence interval, we are 95% sure that the mean of the population is within the confidence interval. 
).

To read more about confidence intervals, click [here](https://en.wikipedia.org/wiki/Confidence_interval).


In the cell below, we will construct a 95% confidence interval for the mean hourly wage of all hourly workers. 

The confidence interval is computed in SciPy using the `t.interval` function. You can read more about this function [here](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.t.html).

To compute the confidence interval of the hourly wage, use the 0.95 for the confidence level, number of rows - 1 for degrees of freedom, the mean of the sample for the location parameter and the standard error for the scale. The standard error can be computed using [this](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.sem.html) function in SciPy.

In [1]:
import numpy as np
import pandas as pd
from scipy import stats

In [2]:
salaries = pd.read_csv('../Current_Employee_Names__Salaries__and_Position_Titles.csv')

In [3]:
#Pedro's way as baseline, because I've seen it work:
hourly_salaries = salaries[salaries['Salary or Hourly'] == 'Hourly']
hourly_salaries_mean = np.mean(hourly_salaries['Hourly Rate'])

In [4]:
degrees_freedom = len(hourly_salaries)-1
l = np.mean(hourly_salaries['Hourly Rate'])
s = stats.sem(hourly_salaries['Hourly Rate'])
conf_level = 0.95
conf_interval = stats.t.interval(conf_level, df=degrees_freedom, loc=l, scale=s)
print(conf_interval)
print("The 95% interval is between", round(conf_interval[0],2), "and",round(conf_interval[1],2))

(34.29839539204361, 34.874504045197305)
The 95% interval is between 34.3 and 34.87


In [6]:
#So that works, now for my original way
hourly_wage = salaries['Hourly Rate'].dropna()
print(np.mean(hourly_wage))

34.58644971862046


In [7]:
standard_error = hourly_wage.sem()
mu = np.mean(hourly_wage)
n = hourly_wage.shape[0]
degrees_freedom = n - 1
print(degrees_freedom)
print(standard_error)
print(mu)
print(n)

8173
0.14694742867989846
34.58644971862046
8174


In [8]:
conf_interval = stats.t.interval(0.95, df=degrees_freedom, loc=mu, scale=standard_error)

In [9]:
conf_interval

(34.2983953920436, 34.87450404519731)

Now construct the 95% confidence interval for all salaried employeed in the police in the cell below.

In [12]:
police_salaries = salaries[salaries['Department'] == 'POLICE']['Annual Salary'].dropna()
police_salaries.head()                                    

1     93354.0
4     90024.0
6    111444.0
7    103932.0
9     95736.0
Name: Annual Salary, dtype: float64

In [13]:
#Even the given standard error scipy formula works in these notebook! Did not in the original. 
mu = np.mean(police_salaries)
standard_error = stats.sem(police_salaries)
n = police_salaries.shape[0]
degrees_freedom = n - 1
print(degrees_freedom)
print(standard_error)
print(mu)
print(n)

13823
153.0509585263483
88834.11892361111
13824


In [14]:
pol_conf_interval = stats.t.interval(0.95, df=degrees_freedom, loc=mu, scale=standard_error)

In [15]:
pol_conf_interval

(88534.1182885883, 89134.11955863392)