In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab09.ipynb")

# CE 93: Lab Assignment 09

You must submit the lab to Gradescope by the due date. You will submit the zip file produced by running the final cell of the assignment.

## About this Lab
The objective of this assignment is to perform hypothesis testing based on observed data.

## Instructions 
**Run the first cell, Initialize Otter**, to import the autograder and submission exporter.

Throughout the assignment, replace `...` with your answers. We use `...` as a placeholder and these should be deleted and replaced with your answers.

Any part listed as a "<font color='red'>**Question**</font>" should be answered to receive credit.

**Please save your work after every question!**

To read the documentation on a Python function, you can type `help()` and add the function name between parentheses.

**Run the cell below**, to import the required modules.

In [None]:
# Please run this cell, and do not modify the contents
import math
import numpy as np
import scipy
import pandas as pd
import statistics as stats
import cmath
import re
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import hashlib
import ipywidgets as widgets
from ipywidgets import FileUpload
from IPython.display import display
from PIL import Image
import os
import resources
import random                                  
from statsmodels.stats.weightstats import ztest  
from scipy.stats import * 

def get_hash(num):
    """Helper function for assessing correctness"""
    return hashlib.md5(str(num).encode()).hexdigest()

### Introduction

For this week's lab, we will work with ammonia measurements across a river.

Ammonia can be toxic to aquatic life at high levels. Typical natural sources of ammonia in water include: decomposition or breakdown of organic waste matter, gas exchange with the atmosphere, forest fires, animal and human waste, and nitrogen fixation processes. Some processes that directly create ammonia are commercial fertilizers and other industrial applications. When present at high levels, ammonia has a toxic effect on aquatic life. Because of this, it is important to monitor the ammonia levels in an aquatic environment.

Source: https://www.epa.gov/wqc/aquatic-life-criteria-ammonia

### Ammonia Data

The California Environmental Protection Agency (CalEPA) is worried that the amount of ammonia in the environment is reaching unhealthy levels. Given your data analysis experience, they hired you as a consultant to analyze their data and better understand whether ammonia levels are unhealthy and what actions they should take.

Let's load the provided data set `ammonia_conc.csv`. It has one feature, which is ammonia levels in ppm in the river of interest. There are a total of **150** measurements taken at random times over 1 year.

|Feature|Units|
|:-|:-|
|Ammonia Level|ppm|

* load using the Pandas `read_csv()` function

The data are in units of **ppm** (parts per million). The unhealthy level is defined as **0.04 ppm**.

Run the cell below, which reads the data and saves it as a variable named `df`.

In [None]:
# read a .csv file as a DataFrame
df = pd.read_csv('resources/ammonia_conc.csv')

# returns the first 5 rows of the data set by default
df.head()

### Create Variables from the DataFrame

We want to generate a data vector for the ammonia level (easier to work with a data vector than a DataFrame).

<font color='red'>**Question 1.0.**</font> Create a data vector for the ammonia level and save it as variable `ammonia`. You can refer to previous labs to answer this question. (0.25 pts)

In [None]:
# ANSWER CELL
# create variable for ammonia levels

ammonia = ...

print(ammonia)

In [None]:
grader.check("q1.0")

<font color='red'>**Question 1.1.**</font> Compute the following summary statistics for `ammonia`. Do not just manually type the numeric answers. Use Python functions to determine the values. (1.25 pts)

* What is the mean of ammonia? Assign your answer to `mean_ammonia`.
* What is the median of ammonia? Assign your answer to `median_ammonia`.
* What is the sample standard deviation of ammonia? Assign your answer to `stdev_ammonia`.
* What is the coefficient of variation of nitrates? Compute this value as a decimal and not a percentage. Assign your answer to `cv_ammonia`.
* What is the 95$^{th}$ percentile of ammonia? Assign your answer to `per_ammonia`.

In [None]:
# ANSWER CELL

# compute summary statistics
mean_ammonia = ...
median_ammonia= ...
stdev_ammonia = ...
cv_ammonia = ...
per_ammonia = ...

print(f'Mean: {mean_ammonia:.4f} ppm' if not isinstance(mean_ammonia, type(Ellipsis)) else None)
print(f'Median: {median_ammonia:.4f} ppm' if not isinstance(median_ammonia, type(Ellipsis)) else None)
print(f'Standard deviation: {stdev_ammonia:.3f} ppm' if not isinstance(stdev_ammonia, type(Ellipsis)) else None)
print(f'Coefficient of variation: {cv_ammonia:.3f}' if not isinstance(cv_ammonia, type(Ellipsis)) else None)
print(f'95th percentile: {per_ammonia.round(3)} ppm' if not isinstance(per_ammonia, type(Ellipsis)) else None)

In [None]:
grader.check("q1.1")

<font color='red'>**Question 1.2.**</font> The unhealthy level for ammonia is defined as **0.040 ppm**. So, if the average **population** level is greater than 0.040 ppm, the water is considered unhealthy. Without performing any additional calculations, and based on the summary statistics above, can you conclude with **certainty** that the water is unhealthy?  Assign your answer to the variable `q1_2` as a string. (0.25 pts)

**A.** Yes because the sample mean is greater than 0.04 ppm \
**B.** No because the sample mean is less than 0.04 ppm \
**C.** No because of sampling variation \
**D.** Yes because of sampling variation \
**E.** Yes because both the mean and median are greater than 0.04 ppm \
**F.** No because both the mean and median are less than 0.04 ppm

Your answer should be a string, e.g., `"A"`, `"B"`, etc.\
Remember to put quotes around your answer choice.

In [None]:
# ANSWER CELL
q1_1 = ...
q1_1

In [None]:
grader.check("q1.2")

## Hypothesis Testing

As with any sample, there is always sampling variation and uncertainty. It is important to recognize that a sample is influenced by measurement errors, sampling bias, sampling variation, among other factors. Therefore, to make conclusions on the population using data from uncertain and random samples, it is important to use proper statistical methods.

For this data set, we are concerned that the population average ammonia levels are **greater** than the unhealthy level of 0.04 ppm. Therefore, our null and alternative hypotheses are as follows:

**$H_0: \mu = 0.040$ ppm**

**$H_1: \mu > 0.040$ ppm**

where $\mu$ is the population mean of ammonia levels

Since we are interested in the population mean $\mu$, we will use the sample mean $\overline{X}$ as our test statistic to perform the hypothesis test.

We have a large sample (150 measurements), and by the Central Limit Theorem, the sample mean, $\overline{X}$, has a normal distribution, regardless of the underlying distribution of the population.

The distribution of the test statistic under the null hypothesis is known as the **null** distribution. If the population standard deviation $\sigma$ is unknown, it can be approximated by the sample standard deviation $s$ (for large samples).

<font color='red'>**Question 2.0.**</font> What are the values of the parameters of the null distribution of the test statistic $\overline{X}$ for the ammonia levels? Assign your answers to the variables `mu_null` and `sigma_null`. For `sigma_null`, do not just manually type the numeric answer. Use appropriate methods to determine the value. (0.5 pts)

In [None]:
# ANSWER CELL

# get parameters of the distribution of the test statistic
mu_null = ...
sigma_null = ...

print(f'Null Distribution: N ({mu_null:.3f}, {sigma_null:.4f})' if not isinstance(mu_null, type(Ellipsis)) and not isinstance(sigma_null, type(Ellipsis)) else None)

In [None]:
grader.check("q2.0")

<font color='red'>**Question 2.1.**</font> Under the null distribution (i.e., if $H_0$ were true), what is the $z$-score of the observed test statistic (i.e., of the observed sample mean)? Assign your answer to `z_score`. Do not just manually type the numeric answer. Use Python expressions that return the desired answer and assign the expression to the variable. (0.5 pts)

The $z$-score of the test statistic is:

$$z=\dfrac{\overline{x}-\mu_0}{\dfrac{\sigma}{\sqrt{n}}}$$

where $\mu_0$ is value of $\mu$ under the null hypothesis

If $\sigma$ is unknown, it can be replaced  $s$ (for large samples).

In [None]:
# ANSWER CELL

z_score = ...

print(f'Z-score: {z_score:.3f}' if not isinstance(z_score, type(Ellipsis)) else None)

In [None]:
grader.check("q2.1")

### $p$-value

Now that you have obtained the $z$-score of the test statistic, you can compute the $p$-value. In the lecture, we defined the $p$-value as the probability of obtaining a test statistic at least as extreme as the result actually observed, assuming $H_0$ to be true (i.e., under the null distribution). In other words, it is the probability starting at the observed test statistic and looking in the direction(s) that support the alternative hypothesis. 

<font color='red'>**Question 2.2.**</font> Using $H_1: \mu>0.04$ and the $z$-score of the test statistic you computed above, what is the $p$-value for this hypothesis test?

*Hint:* P(Z < z) = $\Phi(z)$ = `norm.cdf(z)`

Assign your answer to `p_value`. Do not just manually type the numeric answer. Use Python expressions that return the desired answer and assign the expression to the variable. (0.5 pts)

In [None]:
# ANSWER CELL

p_value = ...

print(f'p-value: {p_value:.3f}' if not isinstance(p_value, type(Ellipsis)) else None)

In [None]:
grader.check("q2.2")

### Decision

The smaller the $p$-value, the stronger the evidence is against $H_0$. More specifically, if the $p$-value $\leq$ the significance level $\alpha$, the result is statistically significant at the $100\alpha\%$ level and we reject $H_0$. This is because a low $p$-value implies that it is very unlikely we observe a sample as extreme as our sample if $H_0$ were true.

Otherwise, if the $p$-value $> \alpha$, the result is not statistically significant at the $100\alpha\%$ level and we fail to reject $H_0$. This is because a $p$-value that is not low enough implies that it is not very unlikely we observe a sample as extreme as our sample if $H_0$ were true.

The significance level $\alpha$ is something that you, as a data analyst, should specify. It reflects the threshold probability that makes you "feel comfortable" rejecting $H_0$. A commonly used value is $\alpha=0.05$, but other values are also used depending on the data being analyzed and how critical the analysis is.

<font color='red'>**Question 2.3.**</font> What is the appropriate conclusion under the following significance levels? Assign ALL that apply to the variable `q2_3`. (1 pt)

**A.** The result is statistically significant at the $\underline{10\%}$ level and we $\underline{\text{conclude}}$ that ammonia levels are unhealthy.\
**B.** The result is statistically significant at the $\underline{5\%}$ level and we $\underline{\text{conclude}}$ that ammonia levels are unhealthy. \
**C.** The result is statistically significant at the $\underline{2\%}$ level and we $\underline{\text{conclude}}$ that ammonia levels are unhealthy. \
**D.** The result is statistically significant at the $\underline{1\%}$ level and we $\underline{\text{conclude}}$ that ammonia levels are unhealthy. \
**E.** The result $\underline{\text{is not}}$ statistically significant at the $\underline{10\%}$ level and we $\underline{\text{can't conclude}}$ that ammonia levels are unhealthy.\
**F.** The result $\underline{\text{is not}}$ statistically significant at the $\underline{5\%}$ level and we $\underline{\text{can't conclude}}$ that ammonia levels are unhealthy.\
**G.** The result $\underline{\text{is not}}$ statistically significant at the $\underline{2\%}$ level and we $\underline{\text{can't conclude}}$ that ammonia levels are unhealthy.\
**H.** The result $\underline{\text{is not}}$ statistically significant at the $\underline{1\%}$ level and we $\underline{\text{can't conclude}}$ that ammonia levels are unhealthy.

Answer in the next cell. Add each selected choice as a string and separate each two answer choices by a comma. For example, if you want to select `"A"` and `"B"`, your answer should be `"A", "B"`.\
Assign your answer to the given variable.
Remember to put quotes around each answer choice.

In [None]:
# ANSWER CELL

q2_3 = ...
q2_3

In [None]:
grader.check("q2.3")

### `ztest()` in `Python`

We can easily perform all the steps you completed above in `Python` using a single line of code. When assuming a normal distribution for the test statistic and calculating a $z$-score (applicable when sample is large), the test is referred to as a $z$-test (because it is based on the $z$-distribution). 

When testing for the population mean, we can use: [`ztest(x1, value, alternative)`](https://www.statsmodels.org/stable/generated/statsmodels.stats.weightstats.ztest.html#statsmodels.stats.weightstats.ztest)

* `x1` is the sample (all the sample values, not the sample mean)
* `value` is the mean under the null hypothesis ($\mu_0$)
* `alternative` is the alternative hypothesis and takes on the following values:
    1. `'two-sided'`: $H_1: \mu \neq \mu_0$
    2. `'larger'`: $H_1: \mu > \mu_0$
    3. `'smaller'`: $H_1: \mu < \mu_0$

So, if we want to test $H_1: \mu < 1$, we can use: `ztest(ammonia, value=1, alternative='smaller')`. Note that we used `alternative='smaller'` because we are testing $H_1: \mu < 1$. If we want to test a different alternative hypothesis, we have to update the parameter `alternative` accordingly.

`ztest(x1, value, alternative)` returns a tuple with two values:
1. The $z$-score of the test statistic
2. The $p$-value

We can extract these values (which is known as unpacking in Python) using:

`z_score, p_value = ztest(x1, value, alternative)`

<font color='red'>**Question 3.0.**</font> Using `ztest()` and $H_1: \mu > 0.04$, confirm your answers to Questions 2.1 and 2.2. Specifically, what are the $z$-score and the $p$-value? Assign your answers to `z_score1` and `p_value1`, respectively. Do not just manually type the numeric answer. Use Python expressions that return the desired answer and assign the expression to the variable. (0.5 pts)

Note that the values from `ztest()` and what you computed already should exactly match :) 

In [None]:
# ANSWER CELL

z_score1, p_value1 = ...

print(f'z-score: {z_score1:.3f}, p-value: {p_value1:.3f}' if not isinstance(z_score1, type(Ellipsis)) else None)

In [None]:
grader.check("q3")

<font color='red'>**Question 3.1.**</font> Using `ztest()` and $H_1: \mu \neq 0.04$, what are the $z$-score and the $p$-value? Assign your answers to `z_score2` and `p_value2`, respectively. Do not just manually type the numeric answer. Use Python expressions that return the desired answer and assign the expression to the variable. (0.5 pts)

In [None]:
# ANSWER CELL

z_score2, p_value2 = ...

print(f'z-score: {z_score2:.3f}, p-value: {p_value2:.3f}' if not isinstance(z_score2, type(Ellipsis)) else None)

In [None]:
grader.check("q3.1")

### Small-Sample Test

Everything we did thus far was based on a large sample. If the sample isn't large enough, we cannot assume a normal distribution for the test statistic. The only exception is if the underlying population is normal. In this case, we can use a $t$-distribution:

$$\dfrac{\overline{X}-\mu}{\dfrac{s}{\sqrt{n}}}\sim t (df=n-1)$$

Run the code below to select the last 10 ammonia measurements. This will be our small sample and we will save it as a new variable `ammonia_last`.

In [None]:
# run the code below to select the last 10 ammonia measurements.

ammonia_last = ammonia[-10:]

<font color='red'>**Question 4.0.**</font> Compute the sample mean of `ammonia_last` and assign it to `mean_ammonia_last`. Do not just manually type the numeric answer. Use Python expressions that return the desired answer and assign the expression to the variable. (0.25 pts)

In [None]:
# ANSWER CELL

# compute the mean of ammonia_last
mean_ammonia_last = ...

print(f'Sample mean of last 10 measurements: {mean_ammonia_last:.3f} ppm' if not isinstance(mean_ammonia_last, type(Ellipsis)) else None)

In [None]:
grader.check("q4.0")

<font color='red'>**Question 4.1.**</font> Compare the mean of the subsample (`mean_ammonia_last`) to that of the full sample (`mean_ammonia`). What can you say based on this comparison? Assign ALL that apply to the variable `q4`. (0.5 pts)

**A.** The mean of the subsample is greater than that of the full sample \
**B.** The mean of the subsample is equal to that of the full sample \
**C.** The mean of the subsample is smaller than that of the full sample \
**D.** Based on the mean values only, we can say that the subsample will have a lower **$p$**-value for testing $H_{1}: \mu > 0.04$ than the full sample and thus stronger evidence against the null hypothesis \
**E.** Based on the mean values only, we can say that the subsample will have a higher **$p$**-value for testing $H_{1}: \mu > 0.04$ than the full sample and thus weaker evidence against the null hypothesis \
**F.** We can't tell with certainty whether the subsample will have a lower **$p$**-value for testing $H_{1}: \mu > 0.04$ than the full sample based on the mean values only because the null distribution will be different for the subsample

Answer in the next cell. Add each selected choice as a string and separate each two answer choices by a comma. For example, if you want to select `"A"` and `"B"`, your answer should be `"A", "B"`.\
Assign your answer to the given variable.
Remember to put quotes around each answer choice.

In [None]:
# ANSWER CELL

q4 = ...
q4

In [None]:
grader.check("q4.1")

### `ttest_1samp()` in `Python`

While you are very much capable of performing the hypothesis testing by computing the $t$-value of the test statistic and then the $p$-value (similar to what you did above), we will use existing `Python` functions to directly perform the hypothesis test. 

*Again, the reason we are using a $t$-distribution now is because our sample is small (n=10) and we are assuming that the underlying distribution in normal but with unknown population standard deviation $\sigma$.*

Assuming the underlying population is normal, we can perform hypothesis testing on the population mean using a $t$-statistic. The test is thus referred to as a $t$-test. Similar to `ztest()`, we can use: [`ttest_1samp(a, popmean, alternative)`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_1samp.html)

* `a` is the observed sample
* `popmean` is the mean under the null hypothesis, $\mu_0$ **(this was called `value` for `ztest()`)**
* `alternative` takes on the following values:
    1. `'two-sided'`: $H_1: \mu \neq \mu_0$
    2. `'greater'`: $H_1: \mu > \mu_0$ **(this was called `'larger'` for `ztest()`)**
    3. `'less'`: $H_1: \mu < \mu_0$ **(this was called `'smaller'` for `ztest()`)**
    
**Note that some of the parameters for `ttest_1samp()` use different names from `ztest()`.**

So, if we want to test $H_1: \mu < 1$, we can use: `ttest_1samp(ammonia_last, popmean=1, alternative='less')`. Note that we used `alternative='less'` because we are testing $H_1: \mu < 1$. If we want to test a different alternative hypothesis, we have to update the parameter `alternative` accordingly.

Similar to `ztest()`,  `ttest_1samp()` also returns a tuple with two values:
1. The $z$-score of the test statistic
2. The $p$-value

We can extract these values (which is known as unpacking in Python) using:

`t_score, p_value = ttest_1samp(a, popmean, alternative)`

<font color='red'>**Question 5.0.**</font> Using `ttest_1samp()` and $H_1: \mu > 0.04$ with `ammonia_last`, what are the $t$-score and the $p$-value? Assign your answers to `t_score` and `p_value3`, respectively. Do not just manually type the numeric answer. Use Python expressions that return the desired answer and assign the expression to the variable. (0.5 pts)

In [None]:
# ANSWER CELL

t_score, p_value3 = ...

print(f't-score: {t_score:.3f}, p-value: {p_value3:.3f}' if not isinstance(t_score, type(Ellipsis)) else None)

In [None]:
grader.check("q5.0")

<font color='red'>**Question 5.1.**</font> What is the appropriate conclusion based on the $t$-test at the 0.05 significance level? Assign your answer to the variable `q5_1` as a string. (0.5 pts)

**A.** The result $\underline{\text{is}}$ statistically significant at the $\underline{5\%}$ level and we $\underline{\text{conclude}}$ that ammonia levels are unhealthy. \
**B.** The result $\underline{\text{is not}}$ statistically significant at the $\underline{5\%}$ level and we $\underline{\text{conclude}}$ that ammonia levels are unhealthy. \
**C.** The result $\underline{\text{is}}$ statistically significant at the $\underline{5\%}$ level and we $\underline{\text{can't conclude}}$ that ammonia levels are unhealthy. \
**D.** The result $\underline{\text{is not}}$ statistically significant at the $\underline{5\%}$ level and we $\underline{\text{can't conclude}}$ that ammonia levels are unhealthy. 

Your answer should be a string, e.g., `"A"`, `"B"`, etc.\
Remember to put quotes around your answer choice.

**Note that the test for this question will be a hidden test. Meaning, you will NOT be able to know whether your answer is correct or not by running the `grader.check()` cell. You should know how to confidently answer this question by now.**

In [None]:
# ANSWER CELL

q5_1 = ...
q5_1

In [None]:
grader.check("q5.1")

If you did the calculations correctly, you should observe a higher $p$-value for the subsample than that of the full sample. But why?

It is interesting to note the sample mean of `ammonia_last` is higher than that of the full sample. Thus, looking only at the sample means, one might be tempted to say that the subsample shows stronger evidence that $\mu>0.040$ and thus stronger evidence against $H_0: \mu=0.040$ (meaning, we should get a lower $p$-value).

However, we obtained a higher $p$-value based on the subsample, indicating the opposite: a higher $p$-value implies that the sample is more likely if the null were true. The reason for this is that the small sample has much more uncertainty than the full sample (due to the smaller sample size), which results in more spread for the null distribution, and thus higher $p$-value in this case. The only way we could quantify this uncertainty and take it into consideration while making decisions is through proper statistical tests. That's why we perform hypothesis testing.

### Small-Sample Test (Non-Normal Sample)

In the lecture, we mentioned that when the population standard deviation is unknown and the sample size is small, the $t$-distribution is applicable **only if the population is normally distributed**. Just like estimating confidence intervals for non-normal distributions and small samples can be challenging, performing hypothesis testing for non-normal distributions and small samples can be challenging.

In this case, we can perform hypothesis testing using bootstrapped confidence intervals.

### Bootstrapping 

Instead of using $p$-values to make decisions for hypothesis testing, we can calculate confidence intervals based on the sample and check if the null hypothesis falls within the confidence interval. If the null hypothesized value $H_0:\mu=0.040$ is within the confidence interval, we fail to reject $H_0$.
Otherwise, if the null hypothesized value $H_0:\mu=0.040$ is not within the confidence interval, we reject $H_0$.

In Lab 08, we saw how we can use bootstrapping to calculate confidence intervals for any estimate without making assumptions on the distribution of the data (or any assumptions at all!). 

Similarly, we can use bootstrapping to calculate confidence intervals and make decisions for hypothesis testing. If the significance level for hypothesis testing is $\alpha$, the associated confidence interval is $100(1-\alpha)\%$.

Let's select a total of **5000** bootstrap samples from `ammonia_last` and calculate the mean of each sample. Run the code cell below. Note that here we are specifying `random.seed(99)`, so the random sample will not change each time we rerun the cell.

In [None]:
#set the random seed equal to 99
random.seed(99)

# specify the total number of samples to create
n_samples = 5000

# create an empty array to save the means of each sample
bootstrap_means= []

# loop through a total of n_samples times
for i in range(n_samples):
    
    # select a random sample of the same size as the data and with replacement
    bootstrapped_sample = random.choices(list(ammonia_last), k=len(ammonia_last))
    
    # calculate the sample mean
    ammonia_sample_mean = np.mean(bootstrapped_sample)
    
    # append the mean value to save all the means
    bootstrap_means = np.append(bootstrap_means, ammonia_sample_mean)
    
# print a few bootstrapped means
print(f'Sample Bootstrapped Means: [{bootstrap_means[0]:.3f}, {bootstrap_means[1]:.3f}, ..., {bootstrap_means[-1]:.3f}] ppm')

Recall that our hypotheses are:

**$H_0: \mu = 0.040$ ppm**

**$H_1: \mu > 0.040$ ppm**

To make a decision at the $\alpha$ significance level using confidence intervals, we have to find an **appropriate** $100(1-\alpha)\%$ confidence interval. The confidence interval (two-sided, one-sided lower, or one-sided upper) should depend on the directionality of the alternative hypothesis. Refer to the lecture slides if you are unsure what to use in this case.

<font color='red'>**Question 6.0.**</font> Calculate the appropriate confidence interval for testing $H_1: \mu > 0.040$ at the 0.05 significance level and plot the results along with the histogram of `bootstrap_means`. Follow these steps: (1.0 pt)

1. Calculate the appropriate confidence interval for the mean. Assign the one-sided estimate of the confidence interval to `q6`.
2. Plot a frequency histogram of `bootstrap_means` with `bins=15` and assign it to the variable `histogram`.
3. Plot vertical a red line at the confidence interval bound extending from 0 to 1000. (refer to Lab 08)
4. Plot the value of the null hypothesis $H_0: \mu = 0.040$ using: `plt.scatter(0.04, 0, color='magenta', s=40, clip_on=False)`
5. Set the x-axis label to `'Bootstrap Means (ppm)'` and the y-axis label to `'Frequency'`.

In [None]:
# ANSWER CELL

# Do not modify these lines for grading purposes
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection

# create figure and axes
fig_1, ax_1 = plt.subplots(nrows=1, ncols=1, figsize=(4,2.5))

# Edit the code below to plot a frequency histogram of bootstrap_means (only edit where you have ...)

# get the appropriate confidence interval
q6 = ...

# Plot frequency histogram. Assign the plot to the variable histogram.
histogram = ...

# plot red vertical line at the confidence interval estimate
...

# plot the value of the null hypothesis
...

# Label the axes
...

# Display the plot
plt.tight_layout()
plt.show()

# Print the interval
print(f'Confidence interval for Mean using Bootstrapping: ({q6.round(4)}, infinity) ppm' if not isinstance(q6, type(Ellipsis)) else None)

In [None]:
grader.check("q6.0")

<font color='red'>**Question 6.1.**</font> What is the appropriate conclusion based on the bootstrapped confidence interval? Assign ALL that apply to the variable `q6_1`. (0.5 pts)

**A.** The confidence interval $\underline{\text{does not include}}$ the null hypothesized value.\
**B.** The confidence interval $\underline{\text{includes}}$ the null hypothesized value. \
**C.** The result $\underline{\text{is not}}$ statistically significant at the **5%** level and we $\underline{\text{conclude}}$ that ammonia levels are unhealthy. \
**D.** The result $\underline{\text{is}}$ statistically significant at the **5%** level and we $\underline{\text{conclude}}$ that ammonia levels are unhealthy. \
**E.** The result $\underline{\text{is}}$ statistically significant at the **5%** level and we $\underline{\text{can't conclude}}$ that ammonia levels are unhealthy.\
**F.** The result $\underline{\text{is not}}$ statistically significant at the **5%** level and we $\underline{\text{can't conclude}}$ that ammonia levels are unhealthy. 


Answer in the next cell. Add each selected choice as a string and separate each two answer choices by a comma. For example, if you want to select `"A"` and `"B"`, your answer should be `"A", "B"`.\
Assign your answer to the given variable.
Remember to put quotes around each answer choice.

In [None]:
# ANSWER CELL

q6_1 = ...
q6_1

In [None]:
grader.check("q6.1")

## Difference in Means

After doing more research, you find out that a nearby field has recently begun using a new fertilizer, which the CalEPA scientists suspect may be the source of increased ammonia concentration in the river. The last 10 ammonia measurements (which we saved as `ammonia_last`) were taken **after** the nearby field has begun using the new fertilizer. The **first 15** ammonia measurements were taken **before** the nearby field has begun using the new fertilizer. So, you are interested in testing if the mean ammonia amount has changed after the nearby field has begun using the new fertilizer. 

Run the code below to select the **first 15** ammonia measurements and save it as a new variable `ammonia_first`.

In [None]:
# run the code below to select the first 15 ammonia measurements.

ammonia_first = ammonia[:15]

Let $\mu_F$ be the average population ammonia levels before the nearby field has begun using the new fertilizer (corresponding sample is `ammonia_first`) and let $\mu_L$ be the average population ammonia levels after the nearby field has begun using the new fertilizer (corresponding sample is `ammonia_last`). You are interested in testing whether the new fertilizer may be the source of increased ammonia concentration in the river, and thus, whether $\mu_L>\mu_F$. Therefore, our null and alternative hypotheses are as follows:

Population means of ammonia levels are the same: **$H_0: \mu_L = \mu_F$**

Population mean of ammonia levels has increased: **$H_1: \mu_L > \mu_F$**

<font color='red'>**Question 7.0.**</font> Based on the samples `ammonia_last` and `ammonia_first`, calculate a point estimate for $\mu_L-\mu_F$. Assign your answer to `mean_difference`. Do not just manually type the numeric answer. Use Python expressions that return the desired answer and assign the expression to the variable. (0.5 pts)

In [None]:
# ANSWER CELL

mean_difference = ...

print(f'Point estimate for difference in means: {mean_difference:.3f} ppm' if not isinstance(mean_difference, type(Ellipsis)) else None)

In [None]:
grader.check("q7.0")

### Hypothesis Testing for Difference in Means

Because both samples here are small (10 and 15 samples), we cannot use the central limit theorem to approximate the distribution of the sample means as normal. 

The only exception is if the underlying populations are normal. In this case, the distribution of the sample means would be a $t$-distribution and the distribution of the difference in means can also be approximated as a $t$-distribution. There are formulas to perform this hypothesis test on the difference in means from small samples, but we haven't covered them in this course.

We will therefore use existing functions to perform the above hypothesis test for difference in means using small samples. We will use the following function: [`ttest_ind(a, b, equal_var, alternative)`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html#scipy.stats.ttest_ind)

* `a` is the first observed sample
* `b` is the second observed sample
* `equal_var`: if `True` (default), the test assumes equal population variances for the two populations from which `a` and `b` were sampled. If `False`, the test does not assume equal population variances for the two populations from which `a` and `b` were sampled. The best practice is to assume the variances to be unequal unless we have reasons to believe that they are equal. So, it is recommended to set `equal_var = False`.
* `alternative` takes on the following values:
    1. `'two-sided'`: $H_1: \mu_a \neq \mu_b$
    2. `'greater'`: $H_1: \mu_a > \mu_b$
    3. `'less'`: $H_1: \mu_a < \mu_b$

**Note that the order of the samples, `a` and `b` is important for one-sided tests. If you use `altenrative='greater'`, the test will be performed for $H_1: \mu_a > \mu_b$, where $\mu_a$ is the population mean corresponding to the first input, `a`, and $\mu_b$ is the population mean corresponding to the second input, `b`.**

So, if we want to test $H_1: \mu_L \neq \mu_F$, we can use: `ttest_ind(ammonia_last, ammonia_first, equal_var=False, alternative='two-sided')`. Note that we used `alternative='two-sided'` because we are testing $H_1: \mu_L \neq \mu_F$. If we want to test a different alternative hypothesis, we have to update the parameter `alternative` accordingly.

The test requires that the two samples `a` and `b` be independent.

Similar to other hypothesis testing functions, `ttest_ind()` also returns a tuple with two values:
1. The $z$-score of the test statistic
2. The $p$-value

We can extract these values (which is known as unpacking in Python) using:

`t_score, p_value = ttest_ind(a, b, equal_var, alternative)`

<font color='red'>**Question 7.1.**</font> Using `ttest_ind()` and **$H_1: \mu_L > \mu_F$**, what are the $t$-score and the $p$-value? Assign your answers to `t_score2` and `p_value4`, respectively. Assume that the samples are independent and that their populations **do not have equal variance**. Do not just manually type the numeric answer. Use Python expressions that return the desired answer and assign the expression to the variable. (0.5 pts)

In [None]:
# ANSWER CELL

t_score2, p_value4 = ...

print(f't-score: {t_score2:.3f}, p-value: {p_value4:.3f}' if not isinstance(t_score2, type(Ellipsis)) else None)

In [None]:
grader.check("q7.1")

<font color='red'>**Question 7.2.**</font> What is the appropriate conclusion based on the two-sample $t$-test at the 0.05 significance level? Assign your answer to the variable `q7_2` as a string. (0.5 pts)

**A.** The result $\underline{\text{is}}$ statistaclly significant at the **5%** level and we $\underline{\text{conclude}}$ that ammonia levels increased after using the new fertilizer.\
**B.** The result $\underline{\text{is}}$ statistaclly significant at the **5%** level and we $\underline{\text{can't conclude}}$ that ammonia levels increased after using the new fertilizer. \
**C.** The result $\underline{\text{is not}}$ statistaclly significant at the **5%** level and we $\underline{\text{conclude}}$ that ammonia levels increased after using the new fertilizer. \
**D.** The result $\underline{\text{is not}}$ statistaclly significant at the **5%** level and we $\underline{\text{can't conclude}}$ that ammonia levels increased after using the new fertilizer. 

Your answer should be a string, e.g., `"A"`, `"B"`, etc.\
Remember to put quotes around your answer choice.

**Note that the test for this question will be a hidden test. Meaning, you will NOT be able to know whether your answer is correct or not by running the `grader.check()` cell. You should know how to confidently answer this question by now.**

In [None]:
# ANSWER CELL
q7_2 = ...
q7_2

In [None]:
grader.check("q7.2")

## Extra: Bootstrapping Difference in Means

As previously mentioned, $t$-tests require that the underlying population be normal, which is not always true. In this case we can perform bootstrapping to get confidence intervals and reach a conclusion for the hypothesis testing. Our hypotheses are: 

Population means of ammonia levels are the same: **$H_0: \mu_L = \mu_F$**

Population means of ammonia levels increased: **$H_1: \mu_L > \mu_F$**

We can re-write them as follows:

**$H_0: \mu_L - \mu_F =0 $**

**$H_1: \mu_L - \mu_F >0$**

To perform the test without assuming the populations are normal, we can select a bootstrap sample from `ammonia_last` and another bootstrap sample from `ammonia_first`. We can then compute the difference of the means of the bootstrapped samples, and this would be one sample for $\mu_L - \mu_F$. If we repeat this 5000 times, we will get 5000 bootstrapped samples for the difference in means. Then, we can compute the appropriate confidence interval for $\mu_L - \mu_F$ based on $\alpha$ and the alternative hypothesis. Finally, if the value $\mu_L - \mu_F =0 $ is within the bootstrapped difference in means, we fail to reject $H_0$.
Otherwise, if the null hypothesized value is not within the confidence interval, we reject $H_0$.

The beauty of this is that we don't have to make any assumptions on the distributions or whether the populations have equal variances, or any other assumptions.

Read then run the code below, which implements the steps above.

In [None]:
#set the random seed equal to 99
random.seed(99)

# create figure and axes
fig_2, ax_2 = plt.subplots(nrows=1, ncols=1, figsize=(4,2.5))

# specify the total number of samples to create
n_samples = 5000

# create an empty array to save the difference in means
bootstrap_means_diff= []

# loop through a total of n_samples times
for i in range(n_samples):
    
    # select a random sample of the same size as the data and with replacement from ammonia_last
    bootstrapped_sample_last = random.choices(list(ammonia_last), k=len(ammonia_last))
    
    # select a random sample of the same size as the data and with replacement from ammonia_first
    bootstrapped_sample_first = random.choices(list(ammonia_first), k=len(ammonia_first))
    
    # calculate the difference in sample means
    ammonia_sample_mean_diff = np.mean(bootstrapped_sample_last)-np.mean(bootstrapped_sample_first)
    
    # append the difference in sample means
    bootstrap_means_diff = np.append(bootstrap_means_diff, ammonia_sample_mean_diff)

# get the 5th percentile of the bootstrapped difference in means
# Because H1 has a ">" sign, the appropriate confidence interval is: (low, infinity)
# So, we only need one value for the confidence interval
# for alpha = 0.05, this would be a 95% lower-confidence interval
# Thus, we want the 5th percentile of the bootstrapped values
low = np.percentile(bootstrap_means_diff, 5)

# divide bootsrapped values into two groups: one within CI and one outside CI
bootstrap_means_diff_within = bootstrap_means_diff[bootstrap_means_diff>low] # CI is (low, infinity)
bootstrap_means_diff_outside = bootstrap_means_diff[bootstrap_means_diff<low]

# plot a histogram of the bootstrapped means
plt.hist(bootstrap_means_diff_within, bins=11, color='g', ec='k') # plot those within CI in green
plt.hist(bootstrap_means_diff_outside, bins=5, color='m', ec='k') # plot those outside CI in magenta

# plot magenta vertical lines at the 5th percentile, which is the cutoff for the CI
plt.vlines(low, 0, 1000, 'm', ':', lw=2)

# plot the null hypothesis value using  a blue dot
plt.scatter(0.0, 0, color='blue', s=40, clip_on=False)

# specify y limits
plt.ylim(0, 1000)

# label axes
plt.xlabel('Difference in Means, $\mu_L-\mu_F$ (ppm)')
plt.ylabel('Frequency')

# display plot
plt.show()

# Print the interval
print('Bootsrtapped 95% CI: (' + str(np.round(low,3)) + ', infinity)')

# Print decision
print('Null Hypothesis is not within CI: Reject H0') if 0<low else print('Null Hypothesis is within CI: Fail to reject H0')

plt.show()

The histogram shows all the bootstrapped values for the difference in means (total of 5000 values). The dotted vertical magenta line corresponds to the cutoff value for the confidence interval. Because this is a one-sided test with an alternative hypothesis having $>$ sign, the appropriate confidence interval should be one-sided with also the form $>$ some cutoff value. The bootstrapped values are thus divided into two groups:
1. Those in green are within the confidence interval (upper 95% of the bootstrapped means)
2. Those in magenta are outside the confidence interval (lower 5% of the bootstrapped means)

Then shown using the blue dot is the null hypothesized value $\mu_L - \mu_F =0 $. We can see that for the 0.05 significance level, the blue dot falls outside the confidence interval, and thus, we reject the null hypothesis and conclude that the mean amount of ammonia increased after using the new fertilizer: $\mu_L>\mu_F$.

You don't have to answer any questions for this last part :)

### You're done with this Lab!

**Important submission information:** After completing the assignment, click on the Save icon from the Tool Bar &nbsp;<i class="fa fa-save" style="font-size:16px;"></i>&nbsp;. After saving your notebook, **run the cell with** `grader.check_all()` and confirm that you pass the same tests as in the notebook. Then, **run the final cell** `grader.export()` and click the link to download the zip file. Finally, go to Gradescope and submit the zip file to the corresponding assignment. 

**Once you have submitted, stay on the Gradescope page to confirm that you pass the same tests as in the notebook.**

In [None]:
%matplotlib inline
img = mpimg.imread('resources/animal.jpg')
imgplot = plt.imshow(img)
imgplot.axes.get_xaxis().set_visible(False)
imgplot.axes.get_yaxis().set_visible(False)
print("Congratulations on finishing this lab!")
plt.show()

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

Make sure you submit the .zip file to Gradescope.

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False)