## The Climate near Berkeley

In [1]:
# Run this cell to set up the notebook, but please don't change it.

# These lines import the Numpy and Datascience modules.
import numpy as np
from datascience import *

# These lines do some fancy plotting magic.
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import warnings
warnings.simplefilter('ignore', FutureWarning)

# These lines load the tests.
from client.api.assignment import load_assignment 
tests = load_assignment('berkeley_climate.ok')

The US National Oceanic and Atmospheric Administration (NOAA) operates thousands of climate observation stations (mostly in the US) that collect information about local climate.  Among other things, each station records the highest and lowest observed temperature each day.  These data, called "Quality Controlled Local Climatological Data," are publicly available [here](http://www.ncdc.noaa.gov/orders/qclcd/) and described [here](https://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/quality-controlled-local-climatological-data-qclcd).

We've provided you with an excerpt of that dataset.  All the readings are from 2015 and from California stations.

**Question 1.** Load the data from `temperatures.csv` into a table called `temperatures`.  Check out the columns in the table.  Each row represents the data from one station on one day.  The column "Date" is in MMDD format, meaning that the last two digits denote the day of the month, and the first 1 or 2 digits denote the month.

In [2]:
temperatures = ...
temperatures

In [3]:
_ = tests.grade('q1')

**Question 2.** Each station is named for the city in which it resides.  Is there a station in Berkeley?  Write code to help you answer the question in the next cell, and then write your answer in the cell after that, along with **an English explanation** of what your code does.

*Hint:* Use the Table method `.where`.

In [5]:
# Use this cell to work on this problem.

*Write your answer here, replacing this text.*

Let's find the station closest to the UC Berkeley campus.  The campus is located roughly at latitude 37.871746 and longitude -122.259030.  We'll break this down into a few steps.

**Question 3.** Create a table called `with_degree_differences` that's a copy of `temperatures`, but with 2 extra columns:

1. "Latitude difference": The difference between the latitude of the row's station and the latitude of UC Berkeley.
2. "Longitude difference": The difference between the longitude of the row's station and the longitude of UC Berkeley.

In [6]:
# We've provided the lat/long of UC Berkeley so you don't have to retype them:
BERKELEY_LATITUDE = 37.871746
BERKELEY_LONGITUDE = -122.259030

with_degree_differences = ...
    ...
    ...
with_degree_differences

In [7]:
_ = tests.grade('q3')

**Question 4.**  Degrees latitude and longitude don't correspond directly to distances, because the Earth is a sphere.  Near Berkeley, one degree latitude is [around 69 miles](https://www2.usgs.gov/faq/categories/9794/3022), and one degree longitude is around 54.6 miles.  Compute a table called `with_mile_differences` that's a copy of `with_degree_differences` with 2 extra columns:

1. "North-South difference": The difference between UC Berkeley and the row's station along the North-South axis.  This is the difference in latitude times 69.
2. "East-West difference": The difference between UC Berkeley and the row's station along the East-West axis.  This is the difference in latitude times 54.6.

In [8]:
MILES_PER_DEGREE_LATITUDE = 69
MILES_PER_DEGREE_LONGITUDE = 54.6
with_mile_differences = ...
    "North-South difference (miles)", with_degree_differences.column("Latitude difference")*MILES_PER_DEGREE_LATITUDE,
    "East-West difference (miles)", with_degree_differences.column("Longitude difference")*MILES_PER_DEGREE_LONGITUDE)
with_mile_differences

In [9]:
_ = tests.grade('q4')

**Question 5.** Compute the distance from UC Berkeley to each row's station.  By the Pythagorean theorem, the distance is:
$$\sqrt{(\text{North-South difference (miles)})^2 + (\text{East-West difference (miles)})^2}$$

Create a table called `with_distances` that's a copy of `with_mile_differences`, but with an extra column called "Distance to UC Berkeley" containing these distances.

*Hint:* Use elementwise arithmetic operations to square each difference, add them, and square-root them.

In [10]:
with_distances = with_mile_differences.with_column("Distance to UC Berkeley", (with_mile_differences.column("North-South difference (miles)")**2 + with_mile_differences.column("East-West difference (miles)")**2)**0.5)
with_distances

In [11]:
_ = tests.grade('q5')

**Question 6.** Sort the table by distance to find the station that's closest to Berkeley.  Find its name and assign it to `closest_station_name`.

In [12]:
closest_station_name = ...
closest_station_name

In [13]:
_ = tests.grade('q6')

**Question 7.** Make a table called `closest_station_readings`.  It should be a table like the original `temperatures` table, except it should contain only the rows from the station you found in the previous question.  Sort it in increasing order by date.

In [14]:
closest_station_readings = temperatures.where("Station name", are.equal_to(closest_station_name)).sort("Date")

# This prints out your whole table (with unnecessary columns removed).
closest_station_readings.select(2, 1, 0).show()
# This code makes a plot of the highs and lows over time in your table,
# which is easier to read than the raw numbers.  You don't need to modify
# this.
closest_station_readings.scatter(2, make_array(0, 1))

In [15]:
_ = tests.grade('q7')

**Question 8.** From the graph, can you figure out the hottest and coldest months in 2015, in terms of average minimum temperature?  (If it looks like there's a tie, name all the months that might qualify.  If you can't answer the question from these data, explain why.)

*Write your answer here, replacing this text.*

In [16]:
# For your convenience, you can run this cell to run all the tests at once!
import os
_ = [tests.grade(q[:-3]) for q in os.listdir("tests") if q.startswith('q')]

In [17]:
# Run this cell to submit your work *after* you have passed all of the test cells.
# It's ok to run this cell multiple times. Only your final submission will be scored.

!TZ=America/Los_Angeles ipython nbconvert --output=".berkeley_climate_$(date +%m%d_%H%M)_submission.html" berkeley_climate.ipynb