# The "Broken" Weather Station

Background: A mountain weather station has been offline for months and recently sent a burst of messy data. Before we can use the data and connect it to streamflow, it must be repaired and the Pandas toolkit can support this.

## Task 1: The Raw Feed 
Load the .csv of rainfall values and convert into a Pandas DataFrame named Rainfall_mm. The data can be found here:

```data/snotel_rainfall_data.csv```

Check your work: Use .head() and .describe(). Does the data look right?


In [1]:
import pandas as pd
rainfall = pd.read_csv("data/snotel_rainfall_data.csv")

#print(rainfall.head())
rainfall.describe()

Unnamed: 0,Date,Precip_in
count,61,58.0
unique,60,32.0
top,2024-01-16,0.0
freq,2,27.0


## Task 2: Using Pandas Tools to make the data readable

**coerce** turns anything it can't read as a number (like 'T' or 'error') into NaN


```df['colname'] = pd.to_numeric(df['colname'], errors='coerce')```

Apply this to your dataset and then try .describe()

In [19]:
rainfall['Precip_in'] = pd.to_numeric(rainfall['Precip_in'], errors='coerce')
rainfall.describe()

Unnamed: 0,Precip_in
count,56.0
mean,-8.467089
std,150.395286
min,-999.0
25%,0.0
50%,0.050563
75%,0.68289
max,500.0


## Task 3: Load the discharge data

Load the .csv of rainfall values and convert into a Pandas Dataframe named streamflow_cfs. The data can be found here:

```data/streamflow_data.csv```

Check your work: Use .head() and .describe(). Does the data look right? How can you extract more useful statistics? Show this.

In [34]:
streamflow_cfs = pd.read_csv("data/streamflow_data.csv")
#streamflow_cfs.head()
#streamflow_cfs.describe()

streamflow_cfs['Streamflow_cfs'] = pd.to_numeric(streamflow_cfs['Streamflow_cfs'], errors='coerce')
streamflow_cfs.describe()

Unnamed: 0,Streamflow_cfs
count,54.0
mean,40.058333
std,369.096212
min,-999.0
25%,10.1
50%,10.6
75%,13.9
max,2510.0


## Task 4: Clean and repair the data

The sensor cut out during the storm and we now have NaN **AND** -999 values. This prevents us from plotting or otherwise using the time series. Are there any other values we should remove?

Explore different methods and select one that fits the data.

Explore .dropna() (delete the gap), .fillna() with the mean, and .interpolate() for a smoother hydrograph for both datasets.

Note, it may be useful to create a new Pandas DataFrame to compare differences. 

In [26]:
stream_drop = streamflow_cfs
stream_drop['Streamflow_cfs'] = stream_drop['Streamflow_cfs'].dropna()


stream_fill = streamflow_cfs 
stream_fill['Streamflow_cfs'] = stream_fill['Streamflow_cfs'].fillna(stream_fill['Streamflow_cfs'].mean())

stream_int = streamflow_cfs
stream_int['Streamflow_cfs'] = stream_int['Streamflow_cfs'].interpolate(method = 'linear')


## Task 5: Join Pandas DataFrames

We often want to relate data to another, and having all the data in once centralized data frame supports this comparison. Create a new Pandas DataFrame named:
* rainfall_methods fill it with the three rainfall dataframes we cleaned
* streamflow_methods and fill it with the three streamflow dataframes we cleaned


In [None]:
#rainfall_methods
#streamflow_methods


Unnamed: 0,Streamflow_cfs
count,60.0
mean,37.09
std,349.94079
min,-999.0
25%,10.1
50%,10.5
75%,13.4125
max,2510.0


## Task 6: Plot the rainfall and streamflow data to visualize trends and relationships.

Use Pandas simple plotting functionality to separately plot the two dataframes. Which interpolation method do you like? 

## Task 7: Combining DataFrames

From the plots above, choose your most representative gap filling dataframe for rainfall and streamflow, and combine them into a rain_flow_df DataFrame.

## Task 8: Calculate the monthly statistics of the rainfall and discharge

Calculate the monthly total, mean daily (for each month), and the maximum (for each month) rainfall and flow. 

## Task 9: Data Corrections

We want to put the data into a streamflow model, but it requires precipitation to be in mm and streamflow to be in CMS (cubic meters per second). Create a new dataframe called rain_flow_SI_df that converts the previous dataframe to SI units.

## Task 10: Event diagnostics

A key element of hydroinformatics is to identify key events and learn from them. Here, we have two tasks.
* Create a new column and programatically label each day as 'Dry', 'Light Rain', or 'Heavy Rain' based on the precipitaiton colum.
* From our rain_flow_SI_df, create a new Pandas DataFrame called storm_df that programatically selects the streamflow and precipitation data for 5 days before and after the peak flow event.

## Task 11: Quick Data Visualziation

Use the pandas plot function to conduct a quick visualization of precipitation and streamflow. Do they seem coorelated? Any glaring errors?