**Table of contents**<a id='toc0_'></a>    
- [Import statements](#toc1_1_)    
- [Loading the datasets](#toc1_2_)    
- [Hypothesis testing workflow in python](#toc1_3_)    
- [**z-test**: Hypothesis testing for a single population parameter using z-score and p-value](#toc2_)    
  - [-> Finding the z-score](#toc2_1_)    
  - [-> Finding the p-value](#toc2_2_)    
  - [-> Significance level](#toc2_3_)    
  - [-> Confidence intervals](#toc2_4_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=2
	maxLevel=5
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

### <a id='toc1_1_'></a>[Import statements](#toc0_)

In [1]:
import warnings

warnings.filterwarnings("ignore")

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
from scipy.stats import norm

In [4]:
from numpy.random import default_rng

rng = default_rng(seed=328)

### <a id='toc1_2_'></a>[Loading the datasets](#toc0_)

- The *"late_shipments"* dataset contains supply chain data on the delivery of medical supplies. Each row represents one delivery of a part. The "late" column denotes whether or not the part was delivered late. A value of "Yes" means that the part was delivered late, and a value of "No" means the part was delivered on time.

In [5]:
late_shipments = pd.read_feather("./datasets/late_shipments.feather")

In [6]:
late_shipments.head()

Unnamed: 0,id,country,managed_by,fulfill_via,vendor_inco_term,shipment_mode,late_delivery,late,product_group,sub_classification,...,line_item_quantity,line_item_value,pack_price,unit_price,manufacturing_site,first_line_designation,weight_kilograms,freight_cost_usd,freight_cost_groups,line_item_insurance_usd
0,36203.0,Nigeria,PMO - US,Direct Drop,EXW,Air,1.0,Yes,HRDT,HIV test,...,2996.0,266644.0,89.0,0.89,"Alere Medical Co., Ltd.",Yes,1426.0,33279.83,expensive,373.83
1,30998.0,Botswana,PMO - US,Direct Drop,EXW,Air,0.0,No,HRDT,HIV test,...,25.0,800.0,32.0,1.6,"Trinity Biotech, Plc",Yes,10.0,559.89,reasonable,1.72
2,69871.0,Vietnam,PMO - US,Direct Drop,EXW,Air,0.0,No,ARV,Adult,...,22925.0,110040.0,4.8,0.08,Hetero Unit III Hyderabad IN,Yes,3723.0,19056.13,expensive,181.57
3,17648.0,South Africa,PMO - US,Direct Drop,DDP,Ocean,0.0,No,ARV,Adult,...,152535.0,361507.95,2.37,0.04,"Aurobindo Unit III, India",Yes,7698.0,11372.23,expensive,779.41
4,5647.0,Uganda,PMO - US,Direct Drop,EXW,Air,0.0,No,HRDT,HIV test - Ancillary,...,850.0,8.5,0.01,0.0,Inverness Japan,Yes,56.0,360.0,reasonable,0.01


In [7]:
late_shipments.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 27 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   id                        1000 non-null   float64
 1   country                   1000 non-null   object 
 2   managed_by                1000 non-null   object 
 3   fulfill_via               1000 non-null   object 
 4   vendor_inco_term          1000 non-null   object 
 5   shipment_mode             1000 non-null   object 
 6   late_delivery             1000 non-null   float64
 7   late                      1000 non-null   object 
 8   product_group             1000 non-null   object 
 9   sub_classification        1000 non-null   object 
 10  vendor                    1000 non-null   object 
 11  item_description          1000 non-null   object 
 12  molecule_test_type        1000 non-null   object 
 13  brand                     1000 non-null   object 
 14  dosage   

### <a id='toc1_3_'></a>[Hypothesis testing workflow in python](#toc0_)

<img src="./Hypothesis testing workflow Python.png">

## <a id='toc2_'></a>[**z-test**: Hypothesis testing for a single population parameter using z-score and p-value](#toc0_)

**The z-test** can be used to test the null hypothesis that a population parameter is equal to a certain value, *for any population parameter that is normally distributed*. This includes parameters such as the *population mean, the population variance, and the population proportion*.

### <a id='toc2_1_'></a>[-> Finding the z-score](#toc0_)

$$ z = \frac{\text{Sample statistic} - \text{Null Hypothesis value}}{\text{Standard error of the sample statistic}} $$

$$ z = \frac{\text{Sample statistic} - \text{Null Hypothesis value}}{\sigma / \sqrt{n}} $$

> Let's use the late shipments dataset and the proportion of late shipments as an example to illustrate how to calculate the z-score.

*The null hypothesis is that the proportion of late shipments is six percent i.e, $H_0: P=0.06$*

*The alternative hypothesis is that the proportion of late shipments is greater than six percent i.e, $H_A: P>0.06$*

In [8]:
# Calculate the proportion of late shipments
late_prop_samp = late_shipments.late.value_counts(normalize=True)["Yes"]

# Print the results
print(late_prop_samp)

0.061


We often don't know the population standard deviation, $\sigma$. As a result direct calculation of the standard error i.e, the population standard deviation (of the sample statistic) is not possible. 

In such cases if we want to use the *z-test* what we can do is we can create a **bootstrap distribution** of the sample statistic. Then we can use the standard deviation of the bootstrap distribution as the standard error of the sample statistic i.e, the bootstrap distribution is used as an estimate of the population distribution.

In [9]:
# Create a bootstrap distribution of the proportion of late shipments
late_shipments_boot_distn = []

for _ in range(1000):
    late_shipments_boot_distn.append(
        late_shipments.late.sample(frac=0.4, replace=True).value_counts(normalize=True)[
            "Yes"
        ]
    )

In [10]:
# Print the result
print(late_shipments_boot_distn[:20])

[0.045, 0.0575, 0.0675, 0.0375, 0.0725, 0.055, 0.0575, 0.06, 0.065, 0.05, 0.07, 0.0675, 0.0825, 0.07, 0.0725, 0.055, 0.0875, 0.07, 0.0475, 0.0525]


In [11]:
# Hypothesize that the proportion of late shipments is 6%
late_prop_hyp = 0.06

# Calculate the standard error
std_error = np.std(late_shipments_boot_distn)

In [12]:
# Find z-score of late_prop_samp
z_score = (late_prop_samp - late_prop_hyp) / (std_error)

# Print z_score
print(z_score)

0.08448914739961473


### <a id='toc2_2_'></a>[-> Finding the p-value](#toc0_)

The tails of the distribution that are relevant depend on whether the alternative hypothesis refers to "greater than", "less than", or "differences between."

> Let's see how we can use the z-score to calculate the p-value for our hypothesis test.

We can calculate the p-value for a z-score using the `norm.cdf()` function from the `scipy.stats` module. The z-distribution is actually a normal distribution. The `norm.cdf()` function takes in a z-score and returns the area under the normal curve to the left of that z-score.

In [13]:
# This is a right tailed test. Since the CDF returns the cumulative probability
# to the left of a certain z-score we use (1 - CDF) to calculate the p-value
p_value = 1 - norm.cdf(z_score)

# Print the p-value
print(p_value)

0.4663337655549401


### <a id='toc2_3_'></a>[-> Significance level](#toc0_)

**p-values quantify how much evidence there is for the null hypothesis**. Large p-values indicate a lack of evidence for the alternative hypothesis, sticking with the assumed null hypothesis instead. Small p-values make us doubt this original assumption in favor of the alternative hypothesis. What defines the cutoff point between a small p-value and a large one? 

The cutoff point is known as the significance level, and is denoted alpha, $\alpha$. The appropriate significance level depends on the dataset and the discipline worked in. Five percent is the most common choice, but ten percent and one percent are also popular. 

The significance level is the probability of rejecting the null hypothesis when it is true. It is the threshold for how much evidence we need to reject the null hypothesis.

**Taking decision on the basis of p-value:** The significance level gives us a decision process for which hypothesis to support. If $p <= \alpha$, we reject the null hypothesis. Otherwise, we fail to reject it. 

It's important that we decide what the appropriate significance level should be before we run our test. Otherwise, there is a temptation to decide on a significance level that lets us choose the hypothesis we want. 

**Type I and Type II errors:** Type I errors occur when we reject the null hypothesis when in fact it is true. Type II errors occur when we fail to reject the null hypothesis when in fact it is false.

In [14]:
# Choose a significance level (note: this should've been done before calculating p-value)
alpha = 0.05

# Check if the calculated p-value is < alpha
p_value < alpha

False

Since the calculated p-value (0.47) < significance level (0.05), we fail to reject the null hypothesis. This means that we do not have enough evidence to say that the proportion of late shipments is greater than six percent.

### <a id='toc2_4_'></a>[-> Confidence intervals](#toc0_)

To get a sense of the potential values of the population parameter, it's common to choose a confidence interval level of $1 - \alpha$.

Confidence intervals account for uncertainty in our estimate of a population parameter by providing a range of possible values. Confidence interval defines how much confident we are that the true value lies somewhere in the interval specified by that range.

For example, a 95% confidence interval for the mean of a population is a range of values that you can be 95% confident contains the true mean of the population.

If the hypothesized population parameter is within the confidence interval, you should fail to reject the null hypothesis.

In [15]:
# Calculate 95% confidence interval using quantile method
lower, upper = pd.Series(late_shipments_boot_distn).quantile([0.025, 0.975])

# Print the confidence interval
print((lower, upper))

(0.04, 0.085)


Since the hypothesized population parameter (0.06) is within the confidence interval (0.0375, 0.085) at the 95% confidence level, we fail to reject the null hypothesis. This is the same conclusion we reached when we calculated the p-value.