In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab03.ipynb")

# CE 93: Lab Assignment 03

You must submit the lab to Gradescope by the due date. You will submit the zip file produced by running the final cell of the assignment.

## About this Lab
The objective of this assignment is to apply the frequency notion of probability.

## Instructions 
**Run the first cell, Initialize Otter**, to import the autograder and submission exporter.

Throughout the assignment, replace `...` with your answers. We use `...` as a placeholder and theses should be deleted and replaced with your answers.

Any part listed as a "<font color='red'>**Question**</font>" should be answered to receive credit.

**Please save your work after every question!**

To read the documentation on a Python function, you can type `help()` and add the function name between parentheses.

**Run the cell below**, to import the required modules.

In [None]:
# Please run this cell, and do not modify the contents
import math
import numpy as np
import scipy
import pandas as pd
import statistics as stats
import cmath
import re
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import hashlib
import ipywidgets as widgets
from ipywidgets import FileUpload
from IPython.display import display
from PIL import Image
import os
import resources

def get_hash(num):
    """Helper function for assessing correctness"""
    return hashlib.md5(str(num).encode()).hexdigest()

### Introduction

In this lab, we will be using the frequency notion of probability. According to this notion, the probability of an event, $E$, is estimated as the proportion of times $E$ would occur in the long run, if the experiment were to be repeated over and over again.

More specifically, let $n$ denote the number of observations of the phenomenon of interest and $n_E$ the number of observations during which event $E$ occurred. Then the probability of event $E$ occurring, $P(E)$, is formally defined as: 

$$
P(E) = \lim_{n\to\infty} \frac{n_E}{n} 
$$

In practice, the number of observations is usually finite. In this case, only an approximate estimate of the probability is obtained. Naturally, the accuracy of the probability estimation increases as the sample size $n$ increases.

It should be noted that other notions of probability exist. We will discuss them in the class and future assignments.       

In response to the drought, the San Francisco Public Utilities Commission is considering a program to incentivize rainwater harvesting for new residential and commercial buildings in the city. The program would create a $1000 rebate for high-volume rainwater cisterns. Before making the expensive investment, the Commission wants to better understand the rainfall statistics for the area. They are asking you, a Civil and Environmental Engineering consultant, to analyze historical probability data for rainfall in the city.

<img src="resources/rain.png" width='450'/>

Fig 1. Rainfall at Fort Mason, San Francisco http://www.sfgate.com/bayarea/article/First-Bay-Area-rain-this-fall-could-uproot-trees-9966631.php

### Load the data

In this lab we will be working with rainfall data set in San Francisco from 1849-50 to 2021-2022. The file is named `SFrainfall_2021.csv`. 

Source: https://ggweather.com/sf/season.html

Let's load the provided data set `SFrainfall_2021.csv`. These are all the features:

|Feature|Units|Description|
|:-|:-|:-|
|year|yr|The year over which data was recorded|
|days|days|Number of rainy days in the year|
|rain|inch|Cumulative rainfall in the year|

* Load the data using the Pandas `pd.read_csv()` function

Run the cell below, which reads the data and saves it as a variable named `df`.

In [None]:
# read a .csv file as a DataFrame
df = pd.read_csv('resources/SFrainfall_2021.csv')

# returns the first 5 rows of the data set by default
df.head()

### Create Variables from the DataFrame

We want to generate data vectors, one for each column in the dataset (one for years, one for days, and one for rain). 

<font color='red'>**Question 1.0.**</font> Create different variables for each column in the Dataframe. You can refer to the previous lab for guidance on how to answer this question. (0.25 pts)
- Create a variable `year`  for the year
- Create a variable `days`  for the number of rainy days in each year
- Create a variable `rain` for the cumulative rainfall (inches) in each year


In [None]:
# ANSWER CELL
# create variables for year, days, and rain
# replace ... with your code

year = ...
days = ...
rain = ...

In [None]:
grader.check("q1.0")

### Graphical Summaries

Next, consider the following graphical summaries of the data sets.

 <img src="resources/Graphical_Summaries.png"/>

<font color='red'>**Question 2.0.**</font> For each of the four plots, match it with the correct corresponding graph type from the options below by assigning your answer to the variables given in the cell below as a string. (1 pt)

**A.** Not a good graphical summary \
**B.** Pie Chart \
**C.** Histogram \
**D.** Line Graph \
**E.** Scatter Plot \
**F.** Box Plot

Answer in the next cell. Your answer should be a string, e.g., `"A"`, `"B"`, etc.\
Remember to put quotes around your answer choice.

In [None]:
# ANSWER CELL
plot1_type = ...
plot2_type = ...
plot3_type = ...
plot4_type = ...

print(f'Plot 1 type: {plot1_type}')
print(f'Plot 2 type: {plot2_type}')
print(f'Plot 3 type: {plot3_type}')
print(f'Plot 4 type: {plot4_type}')

In [None]:
grader.check("q2.0")

Next, consider the following objectives of different graphs:

Objective 1: Visualize the relationship between variables \
Objective 2: Visualize how a variable changes with time \
Objective 3: Visualize the distribution of a numerical variable

<font color='red'>**Question 3.0.**</font>  Match each of the above objectives with the corresponding graph type from the options below by assigning your answer to the variables given in the cell below as a string. (0.75 pts)

**A.** Line Graph \
**B.** Scatter Plot \
**C.** Bar Chart \
**D.** Histogram 

Your answer should be a string, e.g., `"A"`, `"B"`, etc.\
Remember to put quotes around your answer choice.

In [None]:
# ANSWER CELL
objective_1 = ...
objective_2 = ...
objective_3 = ...

print(f'Objective 1: {objective_1}')
print(f'Objective 2: {objective_2}')
print(f'Objective 3: {objective_3}')

In [None]:
grader.check("q3.0")

### Symbolic Expressions of Events

Next, let's define the following events: 	

- $E_1$ = the number of rainy days in SF in a given year is > 80 days
- $E_2$ = the amount of cumulative annual rainfall in SF in a given year is > 30 inches

The plots below show cumulative rainfall versus number of rainy days for every year in our data set. So, each dot represents data for a single year. The blue dots represent all of the outcomes/years. Let the orange dots represent different events that satisfy certain conditions.

<img src="resources/Symbolic_Expressions.png"/>

<font color='red'>**Question 4.0.**</font> For each of the four plots, match it with the correct corresponding symbolic expression that the orange dots represent based on the definition of $E_1$ and $E_2$ above. Assign your answer to the variables given in the cell below as a string. (1 pt)

**A.** $E_2$ \
**B.** $E_1 \cup E_2$ \
**C.** $E_1$ \
**D.** $E_1 \cap E_2$ \
**E.** $\overline{E_1}$

Your answer should be a string, e.g., `"A"`, `"B"`, etc.\
Remember to put quotes around your answer choice.

In [None]:
# ANSWER CELL
plot_1 = ...
plot_2 = ...
plot_3 = ...
plot_4 = ...

print(f'Plot 1 answer: {plot_1}')
print(f'Plot 2 answer: {plot_2}')
print(f'Plot 3 answer: {plot_3}')
print(f'Plot 4 answer: {plot_4}')

In [None]:
grader.check("q4.0")

### Plot and Interpret CDF 

In previous labs, we saw how to plot histograms using `plt.hist()`. We also saw in the lecture how to plot a cumulative diagram from a histogram plot. So, let's generate cumulative distribution function (CDF) plots for our data.

We can do this using `plt.hist()` by specifying the following parameters:
* `cumulative`
* `histtype`

By default, `cumulative=False` and `histtype=bar`.

To show a cumulative diagram, you have to add between parentheses: \
`plt.hist(..., cumulative=True, histtype=step)`.

To show a cumulative diagram based on **proportions** and not frequency, add between parentheses: \
`plt.hist(..., cumulative=True, histtype=step, density=True)`.

<font color='red'>**Question 5.0.**</font> Modify the code below to make two cumulative proportion diagrams with `bins=20` such that (0.25 pts):
1. The first is for `days` (already provided to you)
2. The second is for `rain` (simply need to copy and edit)

In [None]:
# ANSWER CELL

# Do not modify this line for grading purposes
import matplotlib.pyplot as plt

# Edit the code below to make two cumulative proportion diagrams in 1 figure (only edit where you have ...)

# create figure and axes
fig_1, ax_1 = plt.subplots(nrows=1, ncols=2, figsize=(10,3))

# specify number of bins
N = 20

# plot first axes (plot)
# Save the plot as variable q5_0_0
q5_0_0 = ax_1[0].hist(days, bins=N, cumulative=True, histtype='step', density=True,) # using ax_1[0] because first subplot is index 0
# set title, xlabel, and ylabel
ax_1[0].set(title='CDF for days', 
            xlabel='number of rainy days', 
            ylabel='cumulative proportion')
# add grid lines
ax_1[0].grid()

# first, create the plot. The second plot will have index [1].
# Save the plot as variable q5_0_1
q5_0_1 = ...
# second, set title, xlabel, and ylabel
...
# third, add grid lines
...

# display all figures
plt.tight_layout()
plt.show()

In [None]:
grader.check("q5.0")

Recall that we defined:
- $E_1$ = the number of rainy days in SF in a given year is > 80 days
- $E_2$ = the amount of cumulative annual rainfall in SF in a given year is > 30 inches

<font color='red'>**Question 5.1.**</font> Based on your cumulative proportion plots for `days` and `rain`, which of the following statement(s) is(are) True? Assign ALL that apply to the variable `q5_1`. (0.75 pts)

**A.** We cannot estimate $P(E_1)$ from the CDF\
**B.** $P(E_1)$ is approximately 0.15\
**C.** $P(E_1)$ is approximately 0.85\
**D.** $P(E_2)$ is approximately 0.15\
**E.** $P(E_2)$ is approximately 0.85\
**F.** $P(\overline{E_2})$ is approximately 0.15\
**G.** $P(\overline{E_2})$ is approximately 0.85

Answer in the next cell. Add each selected choice as a string and separate each two answer choices by a comma. For example, if you want to select `"A"` and `"B"`, your answer should be `"A", "B"`.\
Assign your answer to the given variable.
Remember to put quotes around each answer choice.

In [None]:
# ANSWER CELL
q5_1 = ...
q5_1

In [None]:
grader.check("q5.1")

### Estimating Probabilities Using Frequencies

Let's try to calculate the probabilities of events $E_1$ and $E_2$ based on the data we have and using  logical expressions (`<`, `>`, `<=`, `>=`, `==`).

Let's say $E_3$ is the event that the amount of cumulative rainfall in a given year is less than 10 inches. Then, the probability of the event can be calculated as:

$$ P(E_3) = \frac{n_{E_{3}}}{n} $$

where:

1. $n_{E_{3}}$ is the number of times (i.e., frequency) that this event has occurred (i.e., the number of years with cumulative rain less than 10 inches)

2. $n$ is the total number of observations (i.e., the total number of years in the data set)

We can calculate $P(E_3)$ in Python using logical expressions as follows: 
    
1. Calculate $n_{E_{3}}$: If we use `rain < 10`, this will return a Boolean data type (True or False) that indicates whether the condition is satisfied for every element in the array (in this case, whether rain < 10 for every year in our data set). To obtain the frequency or total number of occurrences, we can use the `sum()` function. True is counted as 1 and False is counted as 0. Therefore, `sum(rain < 10)` yields the frequency or the total number of years where the condition rain < 10 is satisfied. The Python code to do this is:

    `n_E3 = sum(rain < 10) # frequency that rain is less than 10 inches`
       
2. Calculate $n$ using the `len()` function (length of an object). The Python code to do this is
  
   `n = len(rain) # total number of observations`

3. Finally, we calculate the probability by dividing the numbers above. The Python code to do this is:

    `P_E3 = n_E3 / n # probability of event E3`

We can combine multiple events using `&` (element-wise logical AND), `|` (element-wise logical OR). For example, if we want the number of years when rain was greater than 10 **and** less than 20 inches, we can use:

`n_E4 = sum((rain > 10) & (rain < 20)) # frequency that rain is greater than 10 AND less than 20 inches`

We can then divide by `n` to get the probability that cumulative rain is greater than 10 and less than 20 inches.


<font color='red'>**Question 6.0.**</font> Compute the following probabilities using the frequency notion and based on the instructions above. Do not just manually type the numeric answer. Use Python expressions that return the desired answer and assign it to the corresponding variable.
1. What is $P(E_1)$? Assign your answer to `P_E1`. (0.25 pts)
2. What is $P(E_2)$? Assign your answer to `P_E2`. (0.25 pts)
3. What is $P(E_1\cap E_2)$? Assign your answer to `P_E1_and_E2`. (0.25 pts)
4. What is $P(E_1\cup E_2)$? Assign your answer to `P_E1_or_E2`. (0.25 pts)

Enter your code in the cell below to compute these probabilities.

Recall that we defined:
- $E_1$ = the number of rainy days in SF in a given year is > 80 days
- $E_2$ = the amount of cumulative annual rainfall in SF in a given year is > 30 inches

In [None]:
# ANSWER CELL

P_E1 = ...
P_E2 = ...
P_E1_and_E2 = ...
P_E1_or_E2 = ...

print(f'P(E1) =\t\t  {P_E1:.3f}') if not isinstance(P_E1, type(Ellipsis)) else None
print(f'P(E2) =\t\t  {P_E2:.3f}') if not isinstance(P_E2, type(Ellipsis)) else None
print(f'P(E1 ∩ E2) =\t  {P_E1_and_E2:.3f}') if not isinstance(P_E1_and_E2, type(Ellipsis)) else None
print(f'P(E1 U E2) =\t  {P_E1_or_E2:.3f}') if not isinstance(P_E1_or_E2, type(Ellipsis)) else None

In [None]:
grader.check("q6.0")

<font color='red'>**Question 6.1.**</font> Based on the above probabilities, what can you say about $E_1$ and $E_2$? Assign ALL that apply to the variable `q6_1`. (0.75 pts)

**A.** None of the options \
**B.** $E_1$ and $E_2$ are mutually exclusive \
**C.** $E_1$ and $E_2$ are exhaustive \
**D.** $E_1$ and $E_2$ are independent

Answer in the next cell. Add each selected choice as a string and separate each two answer choices by a comma. For example, if you want to select `"A"` and `"B"`, your answer should be `"A", "B"`.\
Assign your answer to the given variable.
Remember to put quotes around each answer choice.

In [None]:
# ANSWER CELL
q6_1 = ...
q6_1

In [None]:
grader.check("q6.1")

### What If We Had Less Data??

This data set has observations from 173 years! It is not always possible to have this many observations. The fewer the observations, the less accurate our probability estimates will be. Remember, the definition of probability is based on the limit as $n\to\infty$. 

So, let's examine if the number of observations influences the estimated probabilities.

First, let's review the basics of indexing for a 1D DataFrame.
* `rain[0]` will return the rain value for the **first** year in the data set
* `rain[:10]` will return the rain values for the **first 10** years in the data set
* `rain[-10:]` will return the rain values for the **last 10** years in the data set
* `rain[10:20]` will return the rain values between the first **11 and 20** years in the data set


**Next, suppose the data was available only for the last 25 years.**

Let's recalculate the following probabilities:
- $ P(E_1) $ 
- $ P(E_2) $
- $P(E_1\cap E_2)$
- $P(E_1\cup E_2)$

I am providing you with the code below. Read it and understand it. You will have to do something similar using the first 25 years of data.

In [None]:
# recalculate the probabilities using the last 25 years of data

# get number of rainy days and amount of rainfall for last 25 years
days_l25 = days[-25:]
rain_l25 = rain[-25:]

# Get the new total number of measurements for this reduced dataset (it should be 25 since 25 years)
n_l25 = len(days_l25)

# Get the number of times event E1 has occurred in the last 25 years
nE1_l25 = sum(days_l25 > 80)

# Get Probability of E1 based on the last 25 years
P_E1_l25 = nE1_l25 / n_l25

####################################################

# Get the number of times event E2 has occurred in the last 25 years
nE2_l25 = sum(rain_l25 > 30)

# Get Probability of E2 based on the last 25 years
P_E2_l25 = nE2_l25 / n_l25

####################################################

# Get the number of times E1 AND E2 occurred based on the last 25 years
nE1_and_E2_l25 = sum((days_l25>80) & (rain_l25>30))

# Get the Probability of E1 AND E2 based on the last 25 years
P_E1_and_E2_l25 = nE1_and_E2_l25 / n_l25

####################################################

# Get the number of times E1 OR E2 occurred based on the last 25 years
nE1_or_E2_l25 = sum((days_l25>80) | (rain_l25>30))

# Get the Probability of E1 OR E2 based on the last 25 years
P_E1_or_E2_l25 = nE1_or_E2_l25 / n_l25

# print results
print(f'P(E1) =\t\t  {P_E1_l25:.3f}')
print(f'P(E2) =\t\t  {P_E2_l25:.3f}')
print(f'P(E1 ∩ E2) =\t  {P_E1_and_E2_l25:.3f}')
print(f'P(E1 U E2) =\t  {P_E1_or_E2_l25:.3f}')

**Next, suppose the data was available only for the first 25 years.**

<font color='red'>**Question 7.0.**</font> Adapt the Python routine from above to recalculate the following probabilities using the first 25 years of data. Do not just manually type the numeric answer. Use Python expressions that return the desired answer and assign it to the corresponding variable.
1. What is $P(E_1)$? Assign your answer to `P_E1_f25`. (0.25 pts)
2. What is $P(E_2)$? Assign your answer to `P_E2_f25`. (0.25 pts)
3. What is $P(E_1\cap E_2)$? Assign your answer to `P_E1_and_E2_f25`. (0.25 pts)
4. What is $P(E_1\cup E_2)$? Assign your answer to `P_E1_or_E2_f25`. (0.25 pts)

Enter your code in the cell below to compute these probabilities.

Recall that we defined:
- $E_1$ = the number of rainy days in SF in a given year is > 80 days
- $E_2$ = the amount of cumulative annual rainfall in SF in a given year is > 30 inches

*Hint: First, define a new variable, say `days_f25`, and set it equal to the first 25 values in `days`. Second, define a new variable, say `rain_f25`, and set it equal to the first 25 values in `rain`. Then, apply the same code from above to calculate probabilities but now using `days_f25` and `rain_f25`. Recall that `rain[:10]` will return the rain values for the **first 10** years in the data set*.

In [None]:
# ANSWER CELL
# recalculate the probabilities using the first 25 years of data

# get number of rainy days and amount of rainfall for first 25 years
days_f25 = ...
rain_f25 = ...

# Get the new total number of measurements for this reduced dataset (it should be 25 since 25 years)
n_f25 = ...

# Get the number of times event E1 has occurred in the first 25 years
nE1_f25 = ...

# Get Probability of E1 based on the first 25 years
P_E1_f25 = ...

####################################################

# Get the number of times event E2 has occurred in the first 25 years
nE2_f25 = ...

# Get Probability of E2 based on the first 25 years
P_E2_f25 = ...

####################################################

# Get the number of times E1 AND E2 occurred based on the first 25 years
nE1_and_E2_f25 = ...

# Get the Probability of E1 AND E2 based on the first 25 years
P_E1_and_E2_f25 = ...

####################################################

# Get the number of times E1 OR E2 occurred based on the first 25 years
nE1_or_E2_f25 = ...

# Get the Probability of E1 OR E2 based on the first 25 years
P_E1_or_E2_f25 = ...

# print results
print(f'P(E1) =\t\t  {P_E1_f25:.3f}') if not isinstance(P_E1_f25, type(Ellipsis)) else None
print(f'P(E2) =\t\t  {P_E2_f25:.3f}') if not isinstance(P_E2_f25, type(Ellipsis)) else None
print(f'P(E1 ∩ E2) =\t  {P_E1_and_E2_f25:.3f}') if not isinstance(P_E1_and_E2_f25, type(Ellipsis)) else None
print(f'P(E1 U E2) =\t  {P_E1_or_E2_f25:.3f}') if not isinstance(P_E1_or_E2_f25, type(Ellipsis)) else None

In [None]:
grader.check("q7.0")

<font color='red'>**Question 7.1.**</font> Compare the probabilities of the different events when using the full dataset, last 25 years, and first 25 years. What do you observe? Assign your answer to the variable `q7_1` as a string. (0.5 pts)

**A.** The probabilities of the events change based on the years used in the analysis. \
**B.** The probabilities of the events remain exactly the same regardless of the years used in the analysis. 

Your answer should be a string, e.g., `"A"`, `"B"`, etc.\
Remember to put quotes around your answer choice.

In [None]:
# ANSWER CELL
q7_1 = ...
q7_1

In [None]:
grader.check("q7.1")

<font color='red'>**Question 7.2.**</font> What can you tell based on these results? Assign ALL that apply to the variable `q7_2`. (0.5 pts)

**A.** Probability is based on our state of knowledge and it is influenced by the extent of our knowledge at a given point in time \
**B.** For any event, its probability is always constant \
**C.** The probabilities based on the first 25 years are our best estimate because they are more historic \
**D.** The probabilities based on the complete data set are our best estimate because they have more observations

Answer in the next cell. Add each selected choice as a string and separate each two answer choices by a comma. For example, if you want to select `"A"` and `"B"`, your answer should be `"A", "B"`.\
Assign your answer to the given variable.
Remember to put quotes around each answer choice.

In [None]:
# ANSWER CELL
q7_2 = ...
q7_2

In [None]:
grader.check("q7.2")

### Conditional Probability

The Climate Prediction Center (CPC) issues maps showing the probabilities of temperature and precipitation deviation from normal. The precipitation outlook for the next three months is shown below. (https://www.cpc.ncep.noaa.gov/products/predictions/long_range/seasonal.php?lead=1)

<img src="https://www.cpc.ncep.noaa.gov/products/predictions/long_range/lead01/off01_prcp.gif" />
  
<br>

Based on this outlook, San Francisco has equal chances of getting precipitation above and below normal.

Let's define *equal chances of getting precipitation above and below normal* as getting cumulative precipitation within the inter quartile range (IQR). We can calculate percentiles and quartiles directly in Python as follows: 

* For example, [`np.percentile(data, q)`](https://numpy.org/doc/stable/reference/generated/numpy.percentile.html) returns the qth percentile for the specified data
* `np.percentile(rain, q=[20, 40])` returns the 20th and 40th percentile (in the order you specify the `q` values)
* If you use `per20, per40 = np.percentile(rain, q=[20, 40])`, variable `per20` will be the 20th percentile and variable `per40` will be the 40th percentile, and you can perform operations like `per40 - per20`

Remember that the interquartile range is the third quartile (75th percentile) - first quartile (25th percentile).

<font color='red'>**Question 8.0.**</font> Compute the following summary statistics of the `rain` data (using the full data set). Do not just manually type the numeric answer. Use Python expressions that return the desired answer and assign it to the corresponding variable.
1. What is the first quartile of the rain data set? Assign your answer to `Q1`. (0.25 pts)
2. What is the third quartile of the rain data set? Assign your answer to `Q3`. (0.25 pts)
3. What is the interquartile range of the rain data set? Assign your answer to `IQR`. (0.25 pts)

Enter your code in the cell below to compute these summary statistics.

In [None]:
# ANSWER CELL
# add your code below to calculate the first and third quartiles and then the IQR

Q1, Q3 = ...
IQR = ...

print(f'Q1 =  {Q1:.3f} inches') if not isinstance(Q1, type(Ellipsis)) else None
print(f'Q3 =  {Q3:.3f} inches') if not isinstance(Q3, type(Ellipsis)) else None
print(f'IQR =  {IQR:.3f} inches') if not isinstance(IQR, type(Ellipsis)) else None

In [None]:
grader.check("q8.0")

Assume that so far this year, the cumulative rainfall in SF has been 30 inches. Thus, we know that by the end of the year, rain > 30 inches (we can't get negative rain!). Thus, we know that event $E_2$ has occurred. Remember, we previously defined: 	

- $E_1$ = the number of rainy days in SF in a given year is > 80 days
- $E_2$ = amount of cumulative annual rainfall in SF in a given year is > 30 inches


<font color='red'>**Question 8.1.**</font> Knowing that the cumulative precipitation will be > 30 inches, and based on the historical data that we have from 1849-50 to 2021-2022 (complete dataset), what is the probability that there will be more than 80 days of rain in this year? What is the mean of nitrates? Assign your answer to `q8_1`. Do not just manually type the numeric answer. Use Python expressions that return the desired answer and assign it to the corresponding variable. (0.5 pts)

*Hint: This is a conditional probability. There are different ways to calculate this probability. Mathematically, conditional probability is:*

$$P(A|B)=\dfrac{P(A \cap B)}{P(B)}$$

In [None]:
# ANSWER CELL
# feel free to add any additional code
# Make sure the final answer is assigned to variable q8_1

q8_1 = ...

print(f'P(days > 80 | rain > 30) =  {q8_1:.3f}') if not isinstance(q8_1, type(Ellipsis)) else None

In [None]:
grader.check("q8.1")

<font color='red'>**Question 8.2.**</font> What can you tell about the events $E_1$ and $E_2$? Assign your answer to the variable `q8_2` as a string. (0.5 pts)

**A.** Dependent \
**B.** Independent \
**C.** Exclusive \
**D.** Exhaustive

Your answer should be a string, e.g., `"A"`, `"B"`, etc.\
Remember to put quotes around your answer choice.

In [None]:
# ANSWER CELL
q8_2 = ...
q8_2

In [None]:
grader.check("q8.2")

### Decision Making

Recent research shows that the drought and a changing climate have resulted in changes to historical precipitation trends in more recent years. In addition, UCB researchers have shown that prolonged droughts which are affecting California will decrease the number of total rainy days in the future. 

<font color='red'>**Question 9.0.**</font> Based on this and your statistical analysis, what would be your recommendation to the San Francisco Public Utilities Commission? Assign your answer to the variable `q9` as a string. (0.5 pts)

**A.** Rainfall harvesting is recommended because rain and number of rainy days are independent \
**B.** Rainfall harvesting is recommended because we are expecting more rainfall in San Francisco in the future \
**C.** Rainfall harvesting might not be ideal because the decrease in the number of total rainy days will decrease the precipitation in San Francisco

Your answer should be a string, e.g., `"A"`, `"B"`, etc.\
Remember to put quotes around your answer choice.

In [None]:
# ANSWER CELL
q9 = ...
q9

In [None]:
grader.check("q9.0")

### You're done with this Lab!

**Important submission information:** After completing the assignment, click on the Save icon from the Tool Bar &nbsp;<i class="fa fa-save" style="font-size:16px;"></i>&nbsp;. After saving your notebook, **run the cell with** `grader.check_all()` and confirm that you pass the same tests as in the notebook. Then, **run the final cell** `grader.export()` and click the link to download the zip file. Finally, go to Gradescope and submit the zip file to the corresponding assignment. 

**Once you have submitted, stay on the Gradescope page to confirm that you pass the same tests as in the notebook.**

In [None]:
%matplotlib inline
img = mpimg.imread('resources/animal.jpg')
imgplot = plt.imshow(img)
imgplot.axes.get_xaxis().set_visible(False)
imgplot.axes.get_yaxis().set_visible(False)
print("Congratulations on finishing this lab!")
plt.show()

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

Make sure you submit the .zip file to Gradescope.

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False)