# Table of Contents
<li><a href="#Hypothesis_tests_and_z_scores">Hypothesis_tests_and_z_scores</a></li>
<li><a href="#p-values">p-values</a></li>
<li><a href="#Statistical_significance">Statistical_significance</a></li>
<li><a href="#Performing_t-tests">Performing_t-tests</a></li>
<li><a href="#Calculating_p-values_from_t-statistics">Calculating_p-values_from_t-statistics</a></li>
<li><a href="#Paired_t-tests">Paired_t-tests</a></li>
<li><a href="#ANOVA_tests">ANOVA_tests</a></li>
<li><a href="#One-sample proportion tests">One-sample proportion tests</a></li>
<li><a href="#Two-sample_proportion_tests">Two-sample_proportion_tests</a></li>
<li><a href="#Chi-square_test_of_independence">Chi-square_test_of_independence</a></li>
<li><a href="#Chi-square_goodness_of_fit_tests">Chi-square_goodness_of_fit_tests</a></li>
<li><a href="#Assumptions_in_hypothesis_testing">Assumptions_in_hypothesis_testing</a></li>
<li><a href="#Non-parametric_tests">Non-parametric_tests</a></li>
<li><a href="#Non-parametric_ANOVA_and_unpaired_t-tests">Non-parametric_ANOVA_and_unpaired_t-tests</a></li>

<a id='Hypothesis_tests_and_z_scores'></a>
# Hypothesis_tests_and_z_scores

![image.png](attachment:923a0128-1eff-4fb7-9674-de2a29d0b478.png)

In [None]:
# Print the late_shipments dataset
print(late_shipments)

# Calculate the proportion of late shipments
late_prop_samp = (late_shipments['late'] == 'Yes').mean()

# Print the results
print(late_prop_samp)

![image.png](attachment:a5cec934-7da1-4720-a82a-9c2d054ff443.png)

In [None]:
# Hypothesize that the proportion is 6%
late_prop_hyp = 0.06

# Calculate the standard error
std_error = np.std(late_shipments_boot_distn, ddof=1)

# Find z-score of late_prop_samp
z_score = (late_prop_samp - late_prop_hyp) / std_error

# Print z_score
print(z_score)

<a id='p-values'></a>
# p-values

![image.png](attachment:f5375ad3-bc5f-4735-963e-2d1d3035d49c.png)

![image.png](attachment:91e9121b-c44f-47eb-8a18-07a4fc33e810.png)

In [None]:
# Calculate the z-score of late_prop_samp
z_score = (late_prop_samp - late_prop_hyp) / std_error

# Calculate the p-value
p_value = 1 - norm.cdf(z_score, loc=0, scale=1)
                 
# Print the p-value
print(p_value) 

Perfect p-value! The p-value is calculated by transforming the z-score with the standard normal cumulative distribution function.

<a id='Statistical_significance'></a>
# Statistical_significance

![image.png](attachment:caac122d-f82d-4fd4-87ff-7e83e88c6ddc.png)

In [None]:
# Calculate 95% confidence interval using quantile method
lower = np.quantile(late_shipments_boot_distn, 0.025)
upper = np.quantile(late_shipments_boot_distn, 0.975)

# Print the confidence interval
print((lower, upper))

![image.png](attachment:68d9850d-4637-4951-b8fb-ff228eb2377a.png)

Cool and confident! When you have a confidence interval width equal to one minus the significance level, if the hypothesized population parameter is within the confidence interval, you should fail to reject the null hypothesis.



<a id='Performing_t-tests'></a>
# Performing_t-tests

![image.png](attachment:c2f32316-0223-4f86-9ac5-4bc3a0c63871.png)

In [None]:
# Calculate the numerator of the test statistic
numerator = xbar_no - xbar_yes

# Calculate the denominator of the test statistic
denominator = np.sqrt(s_no**2 / n_no + s_yes**2 / n_yes)

# Calculate the test statistic
t_stat = numerator/denominator

# Print the test statistic
print(t_stat)

t-rrific! When testing for differences between means, the test statistic is called 't' rather than 'z', and can be calculated using six numbers from the samples. Here, the value is about -2.39 or 2.39, depending on the order you calculated the numerator.



<a id='Calculating_p-values_from_t-statistics'></a>
# Calculating_p-values_from_t-statistics

![image.png](attachment:d91867e6-1194-443b-b505-92c0cac373eb.png)

![image.png](attachment:dc79e5e9-b14f-4813-be8f-cc2702626a4b.png)

In [None]:
# Calculate the degrees of freedom
degrees_of_freedom = n_no + n_yes - 2

# Calculate the p-value from the test stat
p_value = t.cdf(t_stat, df=degrees_of_freedom)

# Print the p_value
print(p_value)

![image.png](attachment:32177db6-9b0c-49b1-9e28-104b2a9d7ba3.png)

Perspicacious p-value predictions! When the standard error is estimated from the sample standard deviation and sample size, the test statistic is transformed into a p-value using the t-distribution.



<a id='Paired_t-tests'></a>
# Paired_t-tests

![image.png](attachment:d2ac44ed-a6be-4b6d-8f97-5ba47b7ff33d.png)

In [None]:
# Calculate the differences from 2012 to 2016
sample_dem_data['diff'] = sample_dem_data['dem_percent_12'] - sample_dem_data['dem_percent_16']

# Find the mean of the diff column
xbar_diff = sample_dem_data['diff'].mean()

# Find the standard deviation of the diff column
s_diff = sample_dem_data['diff'].std()

# Plot a histogram of diff with 20 bins
sample_dem_data['diff'].hist(bins=20)
plt.show()

Delightful difference discovery! Notice that the majority of the histogram lies to the right of zero.

![image.png](attachment:0ba352d6-84e8-425f-b05c-f830643faeb1.png)

In [None]:
# Conduct a t-test on diff
test_results = pingouin.ttest(x=sample_dem_data['diff'],
                              y = 0, alternative='two-sided')


                              
# Print the test results
print(test_results)

In [None]:
# Conduct a t-test on diff
test_results = pingouin.ttest(x=sample_dem_data['diff'], 
                              y=0, 
                              alternative="two-sided")

# Conduct a paired t-test on dem_percent_12 and dem_percent_16
paired_test_results = pingouin.ttest(x=sample_dem_data['dem_percent_12'],
                                     y = sample_dem_data['dem_percent_16'],
                                     paired=True,
                                     alternative='two-sided')



                              
# Print the paired test results
print(paired_test_results)

![image.png](attachment:48239a3b-a962-4cc6-99a1-1f1b154bd3e8.png)

In [None]:
pingouin.ttest(x=sample_dem_data['dem_percent_12'], 
               y=sample_dem_data['dem_percent_16'], 
               alternative="two-sided")

Paired t-test party! Using .ttest() lets you avoid manual calculation to run your test. When you have paired data, a paired t-test is preferable to the unpaired version because it reduces the chance of a false negative error.



<a id='ANOVA_tests'></a>
# ANOVA_tests

![image.png](attachment:9a617baa-0af1-4829-9edb-56133d9f7c78.png)

In [None]:
# Calculate the mean pack_price for each shipment_mode
xbar_pack_by_mode = late_shipments.groupby("shipment_mode")['pack_price'].mean()

# Calculate the standard deviation of the pack_price for each shipment_mode
s_pack_by_mode = late_shipments.groupby("shipment_mode")['pack_price'].std()

# Boxplot of shipment_mode vs. pack_price
sns.boxplot(x='pack_price', y='shipment_mode', data=late_shipments)
plt.show()

![image.png](attachment:47cc8c32-843a-46e1-b310-47a8688ec4b1.png)

In [None]:
# Run an ANOVA for pack_price across shipment_mode
anova_results = pingouin.anova(data=late_shipments,
                               dv='pack_price',
                               between='shipment_mode')



# Print anova_results
print(anova_results)

![image.png](attachment:812ad675-bf19-46b6-b8e0-ce01735f5ac8.png)

In [None]:
# Perform a pairwise t-test on pack price, grouped by shipment mode
pairwise_results = pingouin.pairwise_tests(data=late_shipments,
                                           dv='pack_price',
                                           between='shipment_mode',
                                           padjust='none') 




# Print pairwise_results
print(pairwise_results)

In [None]:
# Modify the pairwise t-tests to use Bonferroni p-value adjustment
pairwise_results = pingouin.pairwise_tests(data=late_shipments, 
                                           dv="pack_price",
                                           between="shipment_mode",
                                           padjust="bonf")

# Print pairwise_results
print(pairwise_results)

![image.png](attachment:7f02fa25-1477-4459-9fe1-bb126f71c34c.png)

Pairwise perfection! After applying the Bonferroni adjustment, the p-values for the t-tests between each of the three groups are all less than 0.1.



<a id='One-sample proportion tests'></a>
# One-sample proportion tests

![image.png](attachment:ba4f3927-6615-4fca-98ab-2989520756da.png)

In [None]:
# Hypothesize that the proportion of late shipments is 6%
p_0 = 0.06

# Calculate the sample proportion of late shipments
p_hat = (late_shipments['late'] == "Yes").mean()

# Calculate the sample size
n = len(late_shipments)

# Calculate the numerator and denominator of the test statistic
numerator = p_hat - p_0
denominator = np.sqrt(p_0 * (1 - p_0) / n)

# Calculate the test statistic
z_score = numerator / denominator

# Calculate the p-value from the z-score
p_value = 1 - norm.cdf(z_score, loc=0, scale=1)

# Print the p-value
print(p_value)

Well proportioned! While bootstrapping can be used to estimate the standard error of any statistic, it is computationally intensive. For proportions, using a simple equation of the hypothesized proportion and sample size is easier to compute.



<a id='Two-sample_proportion_tests'></a>
# Two-sample_proportion_tests

![image.png](attachment:2977e2bf-cc14-4d9b-84c9-0f0143d1ee8b.png)

![image.png](attachment:44a48675-9ee9-4b91-96db-9f686556a40b.png)

In [None]:
# Calculate the pooled estimate of the population proportion
p_hat = (p_hats["reasonable"] * ns["reasonable"] + p_hats["expensive"] * ns["expensive"]) / (ns["reasonable"] + ns["expensive"])

# Calculate p_hat one minus p_hat
p_hat_times_not_p_hat = p_hat * (1 - p_hat)

# Divide this by each of the sample sizes and then sum
p_hat_times_not_p_hat_over_ns = p_hat_times_not_p_hat / ns["expensive"] + p_hat_times_not_p_hat / ns["reasonable"]

# Calculate the standard error
std_error = np.sqrt(p_hat_times_not_p_hat_over_ns)

# Calculate the z-score
z_score = (p_hats["expensive"] - p_hats["reasonable"]) / std_error

# Calculate the p-value from the z-score
p_value = 1 - norm.cdf(z_score, scale=1, loc=0)

# Print p_value
print(p_value)

![image.png](attachment:7d56dc1f-2d50-4ddb-be10-07817348e47b.png)

In [None]:
# Count the late column values for each freight_cost_group
late_by_freight_cost_group = late_shipments.groupby("freight_cost_group")['late'].value_counts()

# Create an array of the "Yes" counts for each freight_cost_group
success_counts = np.array([45, 16])

# Create an array of the total number of rows in each freight_cost_group
n = np.array([545, 455])

# Run a z-test on the two proportions
stat, p_value = proportions_ztest(count=success_counts, 
                                  nobs=n, alternative='larger')


# Print the results
print(stat, p_value)

<a id='Chi-square_test_of_independence'></a>
# Chi-square_test_of_independence

![image.png](attachment:90f44622-8164-4e8e-b02c-c776292c701f.png)

In [None]:
# Proportion of freight_cost_group grouped by vendor_inco_term
props = late_shipments.groupby('vendor_inco_term')['freight_cost_group'].value_counts(normalize=True)

# Convert props to wide format
wide_props = props.unstack()

# Proportional stacked bar plot of freight_cost_group vs. vendor_inco_term
wide_props.plot(kind="bar", stacked=True)
plt.show()

# Determine if freight_cost_group and vendor_inco_term are independent
expected, observed, stats = pingouin.chi2_independence(data=late_shipments, x='freight_cost_group', y='vendor_inco_term')

# Print results
print(stats[stats['test'] == 'pearson']) 

![image.png](attachment:4805f9e8-edcc-40a1-a006-a05c908d6546.png)

Independence insight! The test to compare proportions of successes in a categorical variable across groups of another categorical variable is called a chi-square test of independence.



<a id='Chi-square_goodness_of_fit_tests'></a>
# Chi-square_goodness_of_fit_tests

![image.png](attachment:17798ee1-558b-4baa-8af5-8c1caeb297b4.png)

In [None]:
# Find the number of rows in late_shipments
n_total = len(late_shipments)

# Create n column that is prop column * n_total
hypothesized["n"] = hypothesized["prop"] * n_total

# Plot a red bar graph of n vs. vendor_inco_term for incoterm_counts
plt.bar(incoterm_counts['vendor_inco_term'], incoterm_counts['n'], color="red", label="Observed")

# Add a blue bar plot for the hypothesized counts
plt.bar(hypothesized['vendor_inco_term'], hypothesized['n'],color='b', alpha=0.5, label="Hypothesized")
plt.legend()
plt.show()

![image.png](attachment:41db953b-ab6e-4259-8d88-99d1da71fba6.png)

In [None]:
# Perform a goodness of fit test on the incoterm counts n
gof_test = chisquare(f_obs=incoterm_counts['n'], f_exp = hypothesized['n'])


# Print gof_test results
print(gof_test)

![image.png](attachment:8d0518db-16a2-4941-b310-e9c78029415c.png)

What a good goodness of fit! The test to compare the proportions of a categorical variable to a hypothesized distribution is called a chi-square goodness of fit test.



<a id='Assumptions_in_hypothesis_testing'></a>
# Assumptions_in_hypothesis_testing

![image.png](attachment:196791fd-9f80-4f03-82f9-39610b36ae52.png)

In [None]:
# Count the freight_cost_group values
counts = late_shipments['freight_cost_group'].value_counts()

# Print the result
print(counts)

# Inspect whether the counts are big enough
print((counts >= 30).all())

In [None]:
# Count the late values
counts = late_shipments['late'].value_counts()

# Print the result
print(counts)

# Inspect whether the counts are big enough
print((counts >= 10).all())

In [None]:
# Count the values of freight_cost_group grouped by vendor_inco_term
counts = late_shipments.groupby('vendor_inco_term')['freight_cost_group'].value_counts()

# Print the result
print(counts)

# Inspect whether the counts are big enough
print((counts >= 5).all())

In [None]:
# Count the shipment_mode values
counts = late_shipments['shipment_mode'].value_counts()

# Print the result
print(counts)

# Inspect whether the counts are big enough
print((counts >= 30).all())

Setting a great example for an ample sample! While randomness and independence of observations can't easily be tested programmatically, you can test that your sample sizes are big enough to make a hypothesis test appropriate. Based on the last result, we should be a little cautious of the ANOVA test results given the small sample size for Air Charter.



<a id='Non-parametric_tests'></a>
# Non-parametric_tests

![image.png](attachment:d840c285-a5da-42e0-ac8c-4c61588e3014.png)

In [None]:
# Conduct a paired t-test on dem_percent_12 and dem_percent_16
paired_test_results = pingouin.ttest(x=sample_dem_data['dem_percent_12'],
                                     y=sample_dem_data['dem_percent_16'],
                                     paired=True,
                                     alternative='two-sided') 




# Print paired t-test results
print(paired_test_results)

In [None]:
# Conduct a Wilcoxon test on dem_percent_12 and dem_percent_16
wilcoxon_test_results = pingouin.wilcoxon(x=sample_dem_data['dem_percent_12'],
                                     y=sample_dem_data['dem_percent_16'],
                                     alternative='two-sided')



# Print Wilcoxon test results
print(wilcoxon_test_results)

You are Wilcox-on the right path! Given the large sample size (500), you obtained similar results here between the parametric t-test and non-parametric Wilcoxon test with a very small p-value.



<a id='Non-parametric_ANOVA_and_unpaired_t-tests'></a>
# Non-parametric_ANOVA_and_unpaired_t-tests

![image.png](attachment:3b8a202e-bb5c-4e09-a13e-42a1fc973092.png)

In [None]:
# Select the weight_kilograms and late columns
weight_vs_late = late_shipments[['weight_kilograms', 'late']]

# Convert weight_vs_late into wide format
weight_vs_late_wide = weight_vs_late.pivot(columns='late', 
                                           values='weight_kilograms')


# Run a two-sided Wilcoxon-Mann-Whitney test on weight_kilograms vs. late
wmw_test = pingouin.mwu(x=weight_vs_late_wide['Yes'], y=weight_vs_late_wide['No'], alternative='two-sided')



# Print the test results
print(wmw_test)

They tried to make me use parameters, but I said "No, no, no". The small p-value here leads us to suspect that a difference does exist in the weight of the shipment and whether or not it was late. The Wilcoxon-Mann-Whitney test is useful when you cannot satisfy the assumptions for a parametric test comparing two means, like the t-test.



![image.png](attachment:749aea2a-5e41-48cf-81b1-05287fe18847.png)

In [None]:
# Run a Kruskal-Wallis test on weight_kilograms vs. shipment_mode
kw_test = pingouin.kruskal(data=late_shipments,
                           dv='weight_kilograms',
                           between='shipment_mode')



# Print the results
print(kw_test)