Materials in this notebook is adapted from [Statistics for Business and Economics 10e Chapter 10](https://www.amazon.com/Statistics-Business-Economics-Education-Printed/dp/1305585313/ref=sr_1_5?keywords=statistics+for+business+and+economics+pagano&qid=1581359479&sr=8-5)

  # Inferences about the differences between two population means
  
  
Greystone Department Stores, Inc., operates two stores in Buffalo, New York: one is in the inner city and the other is in a suburban shopping center. 

- The regional manager noticed that products that sell well in one store do not always sell well in the other. The manager believes this situation may be attributable to differences in customer demographics at the two locations. Customers may differ in age, education, income, and so on. Suppose the manager asks us to investigate the difference between the mean ages of the customers who shop at the two stores.

- population 1 as all customers who shop at the inner city store

- population 2 as all customers who shop at the suburban store.


$\mu_{1}:$ Mean of population 1 (Mean age of the customers shopping from inner-city store)

$\mu_{2}:$ Mean of population 2 (Mean age of customers  shooping from suburban store)


Q: Is there a difference between $\mu_{1}$ and $\mu_{2}$? In other words, whether $\mu_{1} - \mu_{2} = 0$ ?

__Remark__

Note that $\mu_{1}$ and $\mu_{2}$ are population parameters and we don't know the actual values. So we should estimate them from samples.

## Finding an Interval Estimate for $\mu_{1} - \mu_{2}$

1. Select a random sample of $n_{1}$ customers from customers shopping inner-city store.

2. Select a random sample of $n_{2}$ customers from customers shopping suburban store. 


__Note__

Suppose population the standard deviations $\sigma_{1} = 9$ and  $\sigma_{2} = 10$ are known.


In [45]:
## your work here 

## If you want to check how to use pickle

## https://www.thoughtco.com/using-pickle-to-save-objects-2813661

In [1]:
import pickle
import numpy as np

file_sample1 = open('sample_inner.obj', 'rb')
sample1 = pickle.load(file_sample1)
file_sample1.close()

file_sample2 = open('sample_suburban.obj', 'rb')
sample2 = pickle.load(file_sample2)
file_sample2.close()

In [4]:
sample1.mean()


40.00990876845374

In [5]:
sample2.mean()

32.63569329600446

__Your turn__

1. Find $\bar{x}_{1}:$ sample mean for the age of customers shopping from inner-city store.

2. Find $\bar{x}_{2}:$ sample mean for the age of customers shopping from suburban store.

3. Find $\bar{x}_{1} - \bar{x}_{2}$? This will give us a point estimator for $\mu_{1} - \mu_{2}$

3. Make a guess for $\mu_{1} - \mu_{2}$


In [18]:
## your code is here

x1_bar = sample1.mean()

x2_bar = sample2.mean()
h = x1_bar - x2_bar
h

7.374215472449279

__Q__: How can we find a good __interval estimate__ for $\mu_{1} - \mu_{2}$?

__Short Answer:__ We know that the sampling distribution of  $\bar{x}_{1} - \bar{x}_{2}$ having __normal distribution__ with mean $\mu_{1} - \mu_{2}$ and standard deviation: 

$$ \sigma_{\bar{x}_{1} - \bar{x}_{2}} = \sigma_{\text{sampling}} = \sqrt{\frac{\sigma_{1}^{2}}{n_{1}} + \frac{\sigma_{2}^{2}}{n_{2}}} $$

__Note__: The sampling  standardard  deviation is also known as standard error.

__Your Turn__

1. What is the value of $\sigma_{1}$ 

2. What is the value of $\sigma_{2}$

3. What is the value of $n_{1}$

4. What is the value of $n_{2}$

5. Use 1-4 to find $\sigma_{\bar{x}_{1} - \bar{x}_{2}}$ and keep this number in a variable named as "standard_error".

In [10]:
sigma1 = 9

sigma2 = 10

n1 = len(sample1)

n2 = len(sample2)

standard_error = np.sqrt((sigma1**2 / n1) + (sigma2**2 / n2))
standard_error

2.071428571428571

<img src = "img/margin_of_error.png" width = 350>

In [11]:
import scipy.stats as stats
stats.norm.ppf(.975)

1.959963984540054

In [12]:
stats.norm.ppf(.025)

-1.9599639845400545

In [13]:
_*standard_error

-4.059925396547255

__Your Turn__

1. Find $z_{\alpha/2}$ for the confidence level $\alpha = 0.95$.

2. Calculate margin of error using the formula above and record this in a variable named as "margin_of_error"


In [17]:
z_alpha = stats.norm.ppf(.975)

margin_of_error = z_alpha * standard_error

upper_bound = h + margin_of_error
lower_bound  = h - margin_of_error
(lower_bound, upper_bound)

(3.314290075902025, 11.434140868996533)

Note that once we can calculate the margin  of  error then we can create an interval estimate. 

<img src = "img/interval_estimate.png" width = 450>

__Your turn__

1. Find a interval estimate for $\mu_{1} - \mu_{2}$ with the confidence level $\alpha = 0.95$

## Hypothesis Tests about $\mu_{1} - \mu_{2}$

### Two Tailed

1. Suppose we hypothesis that there is a age difference $D_{0}$ is 5 between inner city customers and suburban customers.

Mathematically we can express this as:

\begin{equation}
    H_{a}:  \mu_{1} - \mu_{2} = D_{0} = 5\\
    H_{0}:  \mu_{1} - \mu_{2} \neq D_{0} = 5 \\ 
    \text{Significance Level: }\alpha = 0.05
\end{equation}

### One-Tailed Hypothesis Testing (Directional)

a. The suburban customers are 5 years older than inner city customers. 

Mathematically we can express this as:

\begin{equation}
    H_{a}:  \mu_{2} - \mu_{1} > D_{0} = 5\\
    H_{0}:  \mu_{2} - \mu_{1} \leq D_{0} = 5\\
    \text{Significance Level: }\alpha = 0.05
\end{equation}


b. The inner-city customers are 5 years older than suburban customers. 

Mathematically we can express this as:

\begin{equation}
    H_{a}:  \mu_{1} - \mu_{2} > D_{0} = 5\\
    H_{0}:  \mu_{1} - \mu_{2} \leq D_{0} = 5\\
    \text{Significance Level: } \alpha = 0.05
\end{equation}

__Your Turn__

1. Find the difference $\bar{x}_{1} - \bar{x}_{2}$.

In [None]:
## your code is here


2. What is the form of the sampling distribution of $\bar{x}_{1} - \bar{x}_{2}$?

In [None]:
## Your answer here


3. What is the mean of the distribution $\bar{x}_{1} - \bar{x}_{2}$?


In [19]:
## Your answer here
h

7.374215472449279

4. What is the standard deviation of the distribution $\bar{x}_{1} - \bar{x}_{2}$?

In [None]:
## Your answer here


5. Can you find how many standard deviations is $\bar{x}_{1} - \bar{x}_{2}$ away from $\mu_{1} - \mu_{2}$? Note that this number also known as z_score!

In [None]:
## Your answer here


6. Use "stats.cdf" from "scipy.stats" to find the p_value.

In [44]:
import scipy.stats as stats

## Your answer here