<img style="float: left;" src="earth-lab-logo-rgb.png" width="150" height="150" />

# Earth Analytics Education - Bootcamp Course Fall 2020

## Important  - Assignment Guidelines

1. Before you turn in your assignment, make sure to run the entire notebook with a fresh kernel. To do this first, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart & Run All)
2. In the cells below you will replace the `raise NotImplementedError()` code with your code that addresses the activity challenge. If you don't replace that code, your notebook will not run properly.

```
# YOUR CODE HERE
raise NotImplementedError()
```

3. Any open ended questions will have a "YOUR ANSWER HERE" within a markdown cell. Replace that text with your answer also formatted using Markdown.
4. **IMPORTANT: DO NOT RENAME THIS NOTEBOOK!** If the file name changes, the autograder will not grade your assignment properly.
5. **Do not rename the notebook file.** If you do, the autograder will not recognize your submisson.
6. When you plot, please comment out `plt.show()` as the code below will effectively run `plt.show()` for you and also will grab your plot for autograding. DO NOT DELETE any code that says `DO NOT REMOVE LINE BELOW`. That code is for autograding!!

```
### DO NOT REMOVE LINE BELOW ###
student_plot1_ax = nb.convert_axes(plt)
```



## Follow to PEP 8 Syntax Guidelines

* Run the `autopep8` tool on all cells prior to submitting (HINT: hit shift + the tool to run it on all cells at once!
* Use clear and expressive names for variables. 
* Organize your code to support readability.
* Check for code line length
* Use comments and white space sparingly where it is needed


### Add Your Name Below 
**Your Name:**

<img style="float: left;" src="colored-bar.png"/>

---

# Week 10 Homework Template - Time Series Data

To complete assignment 10, be sure to review the following chapters:

* <a href="https://www.earthdatascience.org/courses/use-data-open-source-python/use-time-series-data-in-python/introduction-to-time-series-in-pandas-python/" target="_blank"> Time series data in Pandas </a>
* <a href="https://www.earthdatascience.org/courses/use-data-open-source-python/data-stories/colorado-floods-2013/" target="_blank">The overview of the 2013 Floods in Colorado, USA</a>

in the Earth Lab Intermediate Earth Data Science online textbook on the earthdatascience.org website.

## Assignment Data

For this assignment, you will write **Python** code to download and work with time 
series data associated with a large flood event that occured in Colorado, USA in 
2013. You will explore the relationship between precipitation and stream discharge 
for Boulder Creek in Boulder, CO as they increased and decreased during the flood 
event. You will also consider how the values compared to previous years before 
the flood event. 

There are two datasets that you will need to complete this assignment:

* `colorado-flood/precipitation/805333-precip-daily-1948-2013.csv`:
    * Hourly total precipitation in inches collected between 1948 and 2013
    * Because the data are hourly, there can be multiple records for each day
    * "no data" value is 999.99
    * [Original datasource from National Oceanic and Atmospheric Administration (NOAA)](https://www.ncdc.noaa.gov/cdo-web/search)

* `"colorado-flood/precipitation/06730200-discharge-daily-1986-2013.csv`:
    * Daily mean stream discharge in cubic feet per second (CFS) between 1986-2013 
    * Dataset does not have a value for "no data"
    * [Original datasource from U.S. Geological Survey (USGS)](http://waterdata.usgs.gov/nwis/dv?cb_00060=on&format=html&site_no=06730200&referred_module=sw&period=&begin_date=1986-10-01&end_date=2013-12-31)
    
## Data Download Instructions 
The data can be downloaded using earthpy as follows:

`et.data.get_data("colorado-flood")`

In [None]:
# Core imports needed for grading - Do not modify this cell!
import matplotcheck.notebook as nb
from matplotcheck.base import PlotTester
import matplotcheck.autograde as ag

## Import Python Packages

In the cell below, add code **after the line for `Your Code Here`**, replacing `raise NotImplementedError()` with your code to import the package/module needed to:
* create plots
* set your working directory
* download data using earthpy functions
* work with `pandas` DataFrames

You will need a special **Python** package to help format the dates when plotting your data. Make sure to include the line below which imports `DateFormatter` from `matplotlib`.

```
from matplotlib.dates import DateFormatter
```

Be sure to list the package imports following the appropriate PEP 8 order and 
spacing requirements. 

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Test package imports - DO NOT MODIFY THIS CELL!
import_answer_points = 0

try:
    pd.NA
    print("\u2705 Score! Pandas has been imported as a pd!")
    import_answer_points += 1
except NameError:
    print("\u274C Pandas has not been imported as a pd, please make sure to import is properly.")

try:
    plt.show()
    print("\u2705 Nice! matplotlib.pyplot has been imported as plt!")
    import_answer_points += 1
except NameError:
    print("matplotlib.pyplot has not been imported as plt, please make sure to import this properly.")

try:
    os.getcwd()
    print("\u2705 Great work! The os module has imported correctly!")
    import_answer_points += 1
except NameError:
    print("\u274C Oops make sure that the os package is imported.")

try:
    data = et.io
    print("\u2705 Score! The earthpy package has imported correctly!")
    import_answer_points += 1
except NameError:
    print("\u274C Oops make sure that the earthpy package is imported using the alias et.")

try:
    DateFormatter
    print("\u2705 Nice! The DateFormatter module from matplotlib has imported correctly!")
    import_answer_points += 1
except NameError:
    print("\u274C Oops make sure that the DateFormatter module from matplotlib is imported.")

print("\n \u27A1 You received {} out of 5 points.".format(import_answer_points))

import_answer_points

## Set Working Directory and Download Data

In the cell below complete the following task:

1. First, use EarthPy to download the `colorado-flood` data: `et.data.get_data("colorado-flood")`. When you download the data, the `earth-analytics/data` directory gets created on your computer for you.
2. **Use a conditional statement** to:
    * Set the working directory to the **`earth-analytics/data` directory in your home directory** if the path exists.
    * Print a helpful message if the path does not exist. 
* **Use reusable variable(s) to reduce repetition in your code.**
* Use the `os` package to ensure that the paths you create will run successfully on any operating system.


In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL
# Tests that the working directory is set to earth-analytics/data

path = os.path.normpath(os.getcwd())
student_wd_parts = path.split(os.sep)

wd_points = 0

if student_wd_parts[-2:] == ['earth-analytics', 'data']:
    print("\u2705 Great - it looks like your working directory is set correctly to ~/earth-analytics/data")
    wd_points += 5
else:
    print("\u274C Oops, the autograder will not run unless your working directory is set to earth-analytics/data")

print("\n \u27A1 You received {} out of 5 points for setting your working directory.".format(
    wd_points))
wd_points

### Set Data Paths

Create paths with the `os` package for the two datasets below: 
* `805333-precip-daily-1948-2013.csv` in the `precipitation` subdirectory
* `06730200-discharge-daily-1986-2013.csv` in the `discharge` subdirectory

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Open Precipitation Data Using Pandas

Using the `read_csv()` function in `pandas`, read in your precipitation data. Don't forget to use the `parse_dates` argument to parse the `DATE` column, and to set the `na_values` to the value specified in the information given about this data above. Set the `DATE` column to be the index of the DataFrame.

Call the final `DataFrame` object at the end of the cell. 

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Does your dataframe exist?
# Important - do not add a cell immediately below this cell!

student_precip_answer = _

if isinstance(student_precip_answer, pd.DataFrame):
    print("\u2705 Great, you created a pandas dataframe above")
else:
    print("\u274C Oops - the cell above should have a DataFrame output.")

In [None]:
# DO NOT MODIFY THIS CELL


### Resample Precipitation Data

The precipitation data that you have contains more data than you need for your analysis:

1. It has a time span that extends beyong your analysis time span which is the year of the Boulder Flood - 2013. 
2. It also contains hourly data yet you will want daily summaries for your analysis. 

To account for this in the cell below: 

1. Subset the data to only include data within your time period of interest: August 1st, 2013 and October 31st, 2013. 
2. Resample the data to represent the daily sum of precipitation. 

Hint: You can subset and resample in a single line of code if you wish. 

Call the final `DataFrame` object at the end of the cell below. 

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL


### Calculate the Monthly DAILY Maximum for Precipitation Data in 2013

In the cell below, use the same hourly precipitation dataset to 
calculate the **max daily value** in each month in the year 2013.

HINT: this means that you will need to calculate a daily sum first and 
then resample again to get the monthly max daily value. (the biggest
day of rainfall in each month)

Call the final `DataFrame` object at the end of the cell. 

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL


### Summarize Precipitation Data by Month

Find the monthly sum of the hourly precipitation dataset. This DataFrame should be for all months in the original dataset. 

Call the final `DataFrame` object at the end of the cell. 

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL


## Challenge 1d: Plot Precipitation Data in a Figure

Create a figure with 3 line subplots using the data that you calculated above as follows:

* Subplot 1: daily precipitation values from Aug 1, 2013 to Oct 1, 2013
    * Plot needs a title, x axis label, and y axis label. The y label should have units of measure in it. 
    * X axis tick labels should be month-day (e.g. "Aug-01")
* Subplot 2: monthly maximum values of precipitation in 2013
    * Plot needs a title, x axis label, and y axis label. The y label should have units of measure in it. 
    * X axis tick labels should be month-day (e.g. "Aug-01")
* Subplot 3 monthly totals of precipitation for all years in dataset
    * Plot needs a title, x axis label, and y axis label. The y label should have units of measure in it. 
    * X axis label should be Year (e.g. "2013")
    
You can use the `DateFormatter` package imported above to ensure you x axis labels are formatted correctly. 

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

### DO NOT REMOVE LINE BELOW ###
precip_plot = nb.convert_axes(plt, which_axes="all")

In [None]:
# DO NOT MODIFY THIS CELL


In [None]:
# DO NOT MODIFY THIS CELL
daily_precip = PlotTester(precip_plot[0])
monthly_max_precip = PlotTester(precip_plot[1])
sum_precip = PlotTester(precip_plot[2])
line_plot_1, line_plot_2, line_plot_3 = False, False, False

try:
    daily_precip.assert_plot_type('line')
    print("\u2705 First plot is a line plot!")
    line_plot_1 = True
except AssertionError:
    print("\u274C The first plot is not a line plot, make sure to make it to a line plot. Tests will not run successfully if this does not pass.")

try:
    monthly_max_precip.assert_plot_type('line')
    print("\u2705 Second plot is a line plot!")
    line_plot_2 = True
except AssertionError:
    print("\u274C The second plot is not a line plot, make sure to make it to a line plot. Tests will not run successfully if this does not pass.")

try:
    sum_precip.assert_plot_type('line')
    print("\u2705 Second plot is a line plot!")
    line_plot_3 = True
except AssertionError:
    print("\u274C The second plot is not a line plot, make sure to make it to a line plot. Tests will not run successfully if this does not pass.")



## Challenge 2: Open Stream Discharge Data Using Pandas

Using the same functions as above, read in your discharge data. Don't forget to use the `parse_dates` argument to parse the `datetime` column, and to set the `na_values` to the value specified in the information given about this data above. Set the `datetime` column to be the index of the DataFrame.

Call the final `DataFrame` object at the end of the cell. 

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL


### Challenge 2a: Subset Stream Discharge Data To Your Study Time Period

In the cell below, subset the stream discharge data to the same timeframe 
that you are interested in: August 1st, 2013 to October 31st, 2013. 

Call the final `DataFrame` object at the end of the cell. 

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL


### Challenge 2b: Calculate the Monthly Maximum and Sum for Stream Discharge Data in 2013

Use your original stream discharge dataset to calculate the maximum daily discharge rate for each month in the year 2013.

HINT: you can calulate multiple summary values using the `.agg()`.
The approach looks like this:

```python
your_df['year-here'].resample('value-here').agg({'column-you-want-to-summarize': ['max', 'sum']})
```

In the code above, you use `.agg` to summarize a specific column. You then specify which 
summary statistics you want. In the example above you are using max and sum values.

Call the final `DataFrame` object at the end of the cell. 

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL


## Challenge 2c: Clean Up Multi-index Dataframes

Above you created an output dataframe with that is called a multi-index.
A multi-index is when you have two or more header columns. It might 
be nice to clean that up for plotting. 

To combine the two headers into a single header in your dataframe, you can 
use the following approach (in the example below `df` is your dataframe
name):

`df.columns = df.columns.map('-'.join)`

Below you are joining the headers and adding a `-` between the first 
header name and the second. The result should be a column called
`disValue-mean` or `disValue-sum`.

Give this a try in the cell below.

IMPORTANT: if you run the code more than once it will continue to modify your
header columns! You may need to restart and run all cells to fix 
any issues with your columns if you have run the cell more than once. 

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Challenge 2d: Calculate Monthly Total Stream Discharge

In the cell below, calculate the sum for each month of stream discharge for 
the entire time period in the data.

Be sure to call your dataframe at the end of the cell.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL

## Challenge 2e: Figure - Plot Stream Discharge Data

Create a vertically stacked figure that contains the following stream discharge subplots (You should have all of the data above needed to create these figures:

* Subplot 1: daily discharge values from Aug 1, 2013 to Oct 1, 2013
    * Plot needs a title, x axis label, and y axis label. The y label should have units of measure in it. 
    * X axis tick labels should be month-day (e.g. "Aug-01")
* Subplot 2: Monthly maximum stream discharge for 2013
    * Plot needs a title, x axis label, and y axis label. The y label should have units of measure in it. 
    * X axis tick labels should be month-day (e.g. "Aug-01")
* Subplot 3: monthly totals of discharge for all years in dataset
    * Plot needs a title, x axis label, and y axis label. The y label should have units of measure in it. 
    * X axis label should be Year (e.g. "2013")
       
You can use the `DateFormatter` package imported above to ensure you x axis labels are formatted correctly. 

In [None]:
# DO NOT MOTIFY THIS CELL
# YOUR CODE HERE
raise NotImplementedError()

### DO NOT REMOVE LINE BELOW ###
disc_plot = nb.convert_axes(plt, which_axes="all")

In [None]:
# DO NOT MODIFY THIS CELL


In [None]:
# DO NOT MODIFY THIS CELL
daily_disc = PlotTester(disc_plot[0])
monthly_max_disc = PlotTester(disc_plot[1])
sum_disc = PlotTester(disc_plot[2])
line_plot_1, line_plot_2, line_plot_3 = False, False, False

try:
    daily_disc.assert_plot_type('line')
    print("\u2705 First plot is a line plot!")
    line_plot_1 = True
except AssertionError:
    print("\u274C The first plot is not a line plot, make sure to make it to a line plot. Tests will not run successfully if this does not pass.")

try:
    monthly_max_disc.assert_plot_type('line')
    print("\u2705 Second plot is a line plot!")
    line_plot_2 = True
except AssertionError:
    print("\u274C The second plot is not a line plot, make sure to make it to a line plot. Tests will not run successfully if this does not pass.")

try:
    sum_disc.assert_plot_type('line')
    print("\u2705 Second plot is a line plot!")
    line_plot_3 = True
except AssertionError:
    print("\u274C The second plot is not a line plot, make sure to make it to a line plot. Tests will not run successfully if this does not pass.")



## Pep 8, Spelling and Does the Notebook Run?
In this cell, we will give you points for the following

1. PEP 8 is followed throughout the notebook (4 points)
2. Spelling and grammar are considered in your written responses above (4 points)
3. The notebook runs from top to bottom without any editing (it is reproducible) - 4 points