# Exploratory Data Analysis Exercise with Pandas and HoloViews

In this exercise, you will use the data used in the MatplotLib exercise but explore the data interactively using the HoloViews plotting library. Filepath for the data:

    files -> Data -> NWIS_Streaflow -> <STATE>

After performing data cleaning and time-series alignment with Pandas (you can copy the  code used in the Matplotlib exercise), you will transition develop interactaive HoloViews visualizations. The core of the assignment emphasizes the HoloViews philosophy and leveraging the Matplotlib backend, encouring interactive exploratory data analysis to link, overlay, and explore discharge trends across Idaho, Utah, and Wyoming. 

The [USGS NWIS Mapper](https://apps.usgs.gov/nwismapper/) provides interactive mapping to locate sites and repective metadata.

## Task 1: Select, download, and bring the data into your notebook session (this can be copied from your Matplotlib exercise, and add a few more sites)

Use the [USGS NWIS Mapper](https://apps.usgs.gov/nwismapper/) to locate one site below a reservoir,  one site in a headwater catchment, and one site near a rivers terminus to the Great Salt Lake. In addition to these locations, ensure you have at least **2 sites in Idaho, 2 sites in Wyoming, and 2 sites in Utah.** Make a **data** directory in the getting_started folder create state folders for the data (e.g., UT, WY, ID). Drag and drop your data into these folders.

In the code block below, load the data into a Pandas DataFrame and inspect it as we previously did in the Pandas exercises (.head(), .describe()). Write down what you notice. Remove any outliers NaN values, and -999.



In [8]:
#Imports
import pandas as pd
import numpy as np
import holoviews as hv
from holoviews import opts

hv.extension('bokeh', 'matplotlib')

greenriver = pd.read_csv('../data/09234500_1980_2020.csv')
upper_bear = pd.read_csv('../data/10011500_1980_2020.csv')

print(greenriver.head())



     Datetime  USGS_flow    variable  USGS_ID measurement_unit     qualifiers  \
0  1986-10-01  2139.7144  streamflow  9234500            ft3/s  ['A', '[91]']   
1  1986-10-02  2295.4167  streamflow  9234500            ft3/s  ['A', '[91]']   
2  1986-10-03  2061.4792  streamflow  9234500            ft3/s  ['A', '[91]']   
3  1986-10-04  2219.3750  streamflow  9234500            ft3/s  ['A', '[91]']   
4  1986-10-05  1778.7500  streamflow  9234500            ft3/s  ['A', '[91]']   

   series  
0       0  
1       0  
2       0  
3       0  
4       0  


## Task 2: Creating a Tabular dataset.

Create a single dataframe named All_Streams and combine all streamflow monitoring data into this dataframe. Your dataset should look like the diseases dataset in [2-Customization.ipynb](./2-Customization.ipynb). Hint, the following columns should be present:USGS_flow, variable, USGS_ID, year month, day,s tate (Idaho, Utah, Wyoming), and streamflow_class (e.g., headwater, below reservoir, GSL Terminus)

Check to see that everything worked by running the .unique() function on the USGS_ID column, making sure all sites are present.


In [9]:
#create df with these columns: datetime, USGS_flow, variable, USGS_ID, state, and class
#all data in greenriver will be class = below res
#all data in upper_bear will be class = headwater
greenriver['class'] = 'below res'
upper_bear['class'] = 'headwater'

all_streams = pd.concat([greenriver, upper_bear], ignore_index=True)

#find unique USGS_IDs in all_streams
print(all_streams['USGS_ID'].unique())


[ 9234500 10011500]


## Task 3: Make an HoloViews Curve Plot for each state

Use the hv.Curve function to plot the streamflow for each state. Set the kdims to Datetime and the vdims to USGS_flow. combine into a layout and print

In [12]:
#make hv.curve to plot USGS_flow. kdims = datetime, vdims = USGS_flow, different colors for different USGS_ID
flow_curve = hv.Curve(all_streams, kdims='Datetime', vdims='USGS_flow').opts(xlabel='Date', ylabel='Flow (cfs)')

flow_curve 

## Task 4 Add functionality

To the existing plot, change the line color to red and add a hover tool. Stack the plots as rows