In [1]:
import numpy as np
import pandas as pd
from scipy import stats

# Confidence Interval (sigma known)

A community health association is interested in estimating the average number of<br>
maternity days women stay in the local hospital. A random sample is taken of<br>
36 women who had babies in the hospital during the past year. The following<br>
numbers of maternity days each woman was in the hospital are rounded to the<br>
nearest day.

In [2]:
data = np.array([int(i) for i in '334325314342353243241634335232354354'])

In [3]:
data

array([3, 3, 4, 3, 2, 5, 3, 1, 4, 3, 4, 2, 3, 5, 3, 2, 4, 3, 2, 4, 1, 6,
       3, 4, 3, 3, 5, 2, 3, 2, 3, 5, 4, 3, 5, 4])

In [4]:
n = data.shape[0]

In [5]:
n

36

In [6]:
sigma = 1.17

In [7]:
z_alpha_2 = stats.norm.ppf(0.01)

In [8]:
point_estimate = data.mean()

In [9]:
lower_bound = data.mean()+z_alpha_2*(sigma/np.sqrt(n))

In [10]:
upper_bound = data.mean()-z_alpha_2*(sigma/np.sqrt(n))

In [13]:
print(f'''
We are 98% confident that the average number of maternity days women stay
in the local hospital for the population is between
{round(lower_bound,3)} <= mu <= {round(upper_bound,3)}
''')


We are 98% confident that the average number of maternity days women stay
in the local hospital for the population is between
2.852 <= mu <= 3.759



This means that if we were to sample 100 times and construct a 98% confidence interval<br>
for each sample, approximately 98 of theese confidence intervals will contain the<br>
population mean.

# Confidence Interval (sigma unknown)

In the aerospace industry some companies allow their employees to accumulate extra working hours<br>
beyond their 40-hour week. These extra hours sometimes are referred to as green time, orcomp time.<br>
<br>
Many managers work longer than the eight-hour workday preparing proposals, overseeing crucial tasks,<br> 
andtaking care of paperwork. Recognition of such overtime is important.<br>
<br>
Most managers are usually not paid extra for this work, but a record is kept of this time and occasionally<br>
the manager is allowed to use some of this comp time as extra leave or vacation time.<br>
<br>
Suppose a researcher wants to estimate the average amount of comp time accumulated per week for managers<br>
in the aerospace industry.<br>
<br>
He randomly samples 18 managers and measures the amount of extra time<br>
they work during a specific week and obtains the results shown (in hours).<br>

In [14]:
extra_time = np.array([6,21,17,20,7,0,8,16,29,3,8,12,11,9,21,25,15,16])

Let's assume that the comp time is normally distributed in the population.

In [15]:
extra_time.shape[0]

18

In [16]:
df = extra_time.shape[0] - 1

In [17]:
t_alpha_2 = stats.t.ppf(0.05,df)

In [18]:
t_alpha_2

-1.7396067260750676

In [20]:
point_estimate = extra_time.mean()

In [23]:
s = extra_time.std(ddof=1)

In [24]:
margin_of_error = t_alpha_2*(s/np.sqrt(extra_time.shape[0]))

In [26]:
lower_bound = point_estimate + margin_of_error

In [27]:
upper_bound = point_estimate - margin_of_error

In [28]:
print(f'''
We are 90% confident that the average amount of 
comp time accumulated by a manager per week in this industry is between
{round(lower_bound,3)} <= mu <= {round(upper_bound,3)}
''')


We are 90% confident that the average amount of 
comp time accumulated by a manager per week in this industry is between
10.357 <= mu <= 16.754



**Possible use cases:**<br>
<br>
From these figures:
1. aerospace managers could attempt to build a reward system for such extra work 
2. evaluate the regular 40-hour week to determine how to use the normal work hours more effectively and thus reduce comp time

# Estimating the Population Proportion

<div>
    <img src='./images/inferential1.png' width=800>
</div>

In [31]:
n = 212
p_hat = 34/212
q_hat = 1-p_hat

In [32]:
z_alpha_2 = stats.norm.ppf(0.05)

In [37]:
margin_of_error = z_alpha_2*np.sqrt((p_hat*q_hat/n))

In [38]:
lower_bound = p_hat+margin_of_error

In [39]:
upper_bound = p_hat-margin_of_error

In [47]:
print(f'''
We can estimate that the population proportion of
boot-cut jeans purchases in between {round(lower_bound,2)} and {round(upper_bound,2)}
With a confidence level of 90%
''')


We can estimate that the population proportion of
boot-cut jeans purchases in between 0.12 and 0.2
With a confidence level of 90%



As the estimated population is quiet low, we might want to improve the design of boot-cut jeans or replace it with a more trending style of jeans that appeal to the younger population.