# Assignment 3: A year of weather in Boulder

(YOUR NAME HERE)

(TODAY'S DATE HERE)

*In solving each of the problems below, please include text comments and description for your future self, so that when you look back you'll have notes on how you solved these problems!*

## Part 1: Reading in data and adding columns

*The data file `boulder_weather2022.csv` contains daily weather data for each day of the year 2022. The data include:*

- Year (2022)
- Month (1-12)
- Day (1-31)
- minimum temperature on that day (Tmin, in degrees Farenheit)
- maximum temperature on that day (Tmax, in degrees Farenheit)
- precipitation depth (Precip, in inches)
- depth of snow that fell that day (Snow, in inches)
- current depth of snow on the ground (Snowcover, in inches)

*The "no data" code is -999.*

*Write a function that reads the contents of this data file into a pandas DataFrame and **adds two columns**: one for the "Day of Year", which starts at January 1st = 1 and ends with December 31st = 365, and one that has the daily precipitation converted from inches to mm (mm = inches x 25.4). Your function should take a filename as an input parameter, and return the DataFrame. How you read the data file is up to you, but we highly recommend the pandas `read_csv()` function!*

*Test your function by displaying the DataFrame to make sure it looks correct.* 

## Part 2: Plotting temperature

*Write and run a function that plots an array of daily low and high temperature values against day-of-the-year. Your function should:*

- Include a brief docstring
- Have 3 required parameters: an array of day numbers, an array of daily low temperatures, and an array of daily high temperatures (here "array" can be a pandas `Series` or a numpy `ndarray`)
- Have one optional parameter: a string that will be displayed as a title; the value should default to `None`, and if it is `None`, a generic title like "Temperature data" should be displayed
- Use a different color or line style for low versus high temperatures
- Label the axes
- Have a title
- Include a legend that defines the two curves
- Add gridlines to your plot using `plt.grid(True)`

*Since we didn't cover legends in any detail in class, here are examples that show two alternative ways to use legends. Method 1 puts the text in a list that gets passed to `legend()`:*
```
plt.plot([0, 1], [2, 1], "b")
plt.plot([0, 1], [1.5, 2], "r")
plt.xlabel("(x-axis label here)")
plt.ylabel("(y-axis label here)")
plt.title("This is my title")
plt.legend(["Blue curve", "Red curve"])
```

*Method 2 uses the optional `label` keyword argument of the `plot()` function:*
```
plt.plot([0, 1], [2, 1], "b", label="Blue curve")
plt.plot([0, 1], [1.5, 2], "r", label="Red curve")
plt.xlabel("(x-axis label here)")
plt.ylabel("(y-axis label here)")
plt.title("This is my title")
plt.legend()
```



## Part 3: Analyzing precipitation

*It's often said that Boulder has "300 sunny days a year". Is that really true? Let's test it for the year 2022.*

*Write a function that takes your DataFrame as an input argument. Your function should loop over each day in the record, adding up the following:*

- the total precipitation depth over the whole year
- the number of sunny days (no precipitation)
- the number of snowy days (days when the data item "Snow" is greater than zero)
- the number of rainy days (there is precipitation but snow is zero)
- the number of unknown days (either "Precip" or "Snow" has the no-data code -999)

*Your function should either report these directly, or `return` them so that the code that calls it can report them (or both).*

*(Note: you can find the number of rows in a pandas DataFrame using the `len()` function)*


## Part 4: Filtering data

*There are a number of days in the dataset for which the precipitation depth is listed as -999. This means there is no data available for these days. Before plotting precipitation data, it's important to filter out these invalid data points.*

*Write a function that takes your DataFrame as an input, and returns two arrays (either numpy arrays or pandas Series): the precipitation depth (in mm) on days when the precipitation value is not -999, and the corresponding day-of-year numbers for those days. Verify that the number of days with valid data is consistent what you got in Part 3, when you counted how many days lack valid data.*

### Some hints

*A convenient way to extract valid data is to use boolean indexing. You can find an example showing how boolean indexing works in the class notebook from September 27, "09_introduction_to_numpy" (page down toward the end of the notebook).*

*One approach is to apply boolean indexing directly to a column in a pandas DataFrame. For example, if your Boulder weather DataFrame is called `bwx`, the following example operation returns a boolean array indicating whether the minimum temperature was above or below 32 F:*

```
bwx["Tmin"] < 32.0
```

*You can use boolean indexing to extract and store just those values for which your condition applies. For example, the following line gets a pandas Series containing **only** the minimum temperature on sub-freezing days and assigns it to a new variable called `frost_day_min`:*

```
frost_day_tmin = bwx["Tmin"][bwx["Tmin"] < 32.0]
```

*Similarly, here's a line that gets the corresponding values in a column called "DayOfYear"*:

```
frost_days = bwx["DayOfYear"][bwx["Tmin"] < 32.0]
```

*If you find this method confusing and would rather work with numpy arrays, you can use the pandas `to_numpy()` method to convert an individual column into a numpy array. For example, the following line creates a new numpy array called `tmin` that contains the values in your DataFrame's "Tmin" column:*

```
tmin = bwx["Tmin"].to_numpy()
```

*Alternatively, if boolean indexing seems too confusing, you could use a `for` loop and an `if` statement to extract values that meet your criteria. The choice of approach is up to you, but however you do it, be thinking about the pros and cons of different approaches - that's a key part of the learning process for coding!*

## Part 5: Plotting precipitation

*Write a function that uses the matplotlib.pyplot function `bar()` to make a bar graph showing the depth of precipitation for each day of the year. Your function should take an array (or list or one column of a DataFrame) of days as one argument, an array (or list etc.) of precipitation depth as another, and an optional third argument for the plot title. The plot should have axes labeled, and title.*

### Hints

*The `bar()` function has two required arguments: the x-axis values (which in your case represent the days of the year), and the height for the bars (which here is the precipitation depth in mm).*

## Part 6 (for graduate students, or as a bonus point)

*It would be nice if we could stack the plots on top of one another and compare them directly. Matplotlib provides a way to combine multiple plots into a single figure using the `subplots()` function. There is a nice short tutorial on `subplot()` here: [https://matplotlib.org/stable/gallery/subplots_axes_and_figures/subplots_demo.html](https://matplotlib.org/stable/gallery/subplots_axes_and_figures/subplots_demo.html). The `subplots()` function returns two objects: a `figure` object and either a single `axis` object (if there's just one) or an array of `axis` objects.*

*Here's a tiny example that illustrates how to make two vertically stacked plots:*

```
import numpy as np
import matplotlib.pyplot as plt

x = np.array([0, 1, 2]) # some made-up data for x values...
y = np.array([0, 2, 1]) # ...and y values
fig, axs = plt.subplots(2) # create figure (fig) with 2 vertically stacked axes (axs)
axs[0].plot(x, y) # axs is an array of axis objects; here we plot in the first one
axs[1].plot(x, -y) # and here we plot in the second one
axs[1].set_xlabel("My lower x-axis") # here's an example of how to label axes
```

*Use `subplots()` to create plots of temperature and precipitation stacked on top of one another.*


## Part 7 (optional: bonus point for all)

*"Day of the year" is easy to plot, but for humans who tend to think in terms of months, it is not ideal. Modify your plotting functions for temperature and precipitation to include months on the x-axis.*

### Hints


*There are various ways to handle calendar dates. One is to use the matplotlib `DataFormatter` class. Here's a simple example:*

```
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter

fig, ax = plt.subplots()
ax.scatter(np.arange(365), np.arange(365) % 30.4)  # made-up data for illustration
date_form = DateFormatter("%m/%d")
ax.xaxis.set_major_formatter(date_form)
```

*There's a nice tutorial lesson on this and other tricks here: [https://www.earthdatascience.org/courses/use-data-open-source-python/use-time-series-data-in-python/date-time-types-in-pandas-python/customize-dates-matplotlib-plots-python/](https://www.earthdatascience.org/courses/use-data-open-source-python/use-time-series-data-in-python/date-time-types-in-pandas-python/customize-dates-matplotlib-plots-python/)*