![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# River and Lake Water Levels


### Recommended Grade levels: 6-12
<br>

### Instructions
#### “Run” the cells to see the graphs
Click “Cell” and select “Run All”.<br> This will import the data and run all the code, so you can see this week's data visualization. Scroll to the top after you’ve run the cells.<br> 

![instructions](https://github.com/callysto/data-viz-of-the-week/blob/main/images/instructions.png?raw=true)

**You don’t need to do any coding to view the visualizations**.
The plots generated in this notebook are interactive. You can hover over and click on elements to see more information. 

Email contact@callysto.ca if you experience issues.

### About this Notebook

Callysto's Weekly Data Visualization is a learning resource that aims to develop data literacy skills. We provide Grades 5-12 teachers and students with a data visualization, like a graph, to interpret. This companion resource walks learners through how the data visualization is created and interpreted by a data scientist. 

The steps of the data analysis process are listed below and applied to each weekly topic.

1. Question - What are we trying to answer? 
2. Gather - Find the data source(s) you will need. 
3. Organize - Arrange the data, so that you can easily explore it. 
4. Explore - Examine the data to look for evidence to answer the question. This includes creating visualizations. 
5. Interpret - Describe what's happening in the data visualization. 
6. Communicate - Explain how the evidence answers the question. 

# Question

Is water levels in Canada reaching *record* highs?

### Goal
Our goal is to show that water levels in Canada have reached record highs, based on all provinces with lakes and river.s

We will use line graphs to visually represent the data in an informative way. 

# Gather

### Code:
The code below will import the Python programming libraries we need to gather and organize the data to answer our question.

In [None]:
## import libraries
import pandas as pd
import plotly.express as px

### Data:

The Shuswap Lake is a popular and large lake in the Okanagan Region of the province of British Columbia. The water levels of this particular lake fluctuates over the year due to rain fall and snow run off coming from the mountains.

[![Shuswap Lake](https://img.youtube.com/vi/1fJlFh4eJ08/0.jpg)](https://www.youtube.com/watch?v=1fJlFh4eJ08)

### Import the data

In [None]:
# import data
#URL https://dd.weather.gc.ca/hydrometric/
station = "08LE070"
shuswap_data= pd.read_csv(f'https://dd.weather.gc.ca/hydrometric/csv/BC/daily/BC_{station}_daily_hydrometric.csv')
shuswap_data

### Comment on the data


In [None]:
#Display the coloumn names
print(*shuswap_data.columns, sep='\n')

In [None]:
#Display dataframe by date and water level
shuswap_data[["Date", "Water Level / Niveau d'eau (m)"]]

In [None]:
from datetime import datetime
from datetime import date
shuswap_data['date_ordinal'] = pd.to_datetime(shuswap_data['Date']).apply(lambda date: date.toordinal())
shuswap_data['date_ordinal']

In [None]:
average_of_dates = shuswap_data.groupby(['date_ordinal'], as_index=False)["Water Level / Niveau d'eau (m)"].mean()

In [None]:
import numpy as np
import matplotlib.pyplot as plt
x_train = average_of_dates['date_ordinal'].to_numpy()
y_train = average_of_dates["Water Level / Niveau d'eau (m)"].to_numpy()
print(f"x_train = {x_train}")
print(f"y_train = {y_train}")

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
x_train = x_train.reshape(-1, 1)
poly_reg = PolynomialFeatures(degree = 3)
X_poly = poly_reg.fit_transform(x_train)
lin_reg_2=LinearRegression().fit(X_poly, x_train)

lin_reg_2.coef_

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
x_train = x_train.reshape(-1, 1)
y_train = y_train.reshape(-1 ,1)

poly_reg = PolynomialFeatures(degree = 3)
X_poly = poly_reg.fit_transform(x_train)
lin_reg_2=LinearRegression().fit(X_poly, y_train)

print(lin_reg_2.coef_[0][1])
print(lin_reg_2.intercept_)


In [None]:
import seaborn as sns
from matplotlib import pyplot as plt

water_pred = sns.regplot(
    data=average_of_dates,
    x='date_ordinal',
    y="Water Level / Niveau d'eau (m)",
    order=3
)

new_labels = [date.fromordinal(int(item)) for item in water_pred.get_xticks()]
water_pred.set_xticklabels(new_labels)

plt.xlabel('date', fontsize=16)
plt.title('Linear Regression for Water Levels', fontsize=20)
plt.tick_params(axis='x', which='major', labelsize=8)

In [None]:
import scipy
slope, intercept, r, p, sterr = scipy.stats.linregress(x=water_pred.get_lines()[0].get_xdata(),
                                                       y=water_pred.get_lines()[0].get_ydata())
print(slope, intercept)

In [None]:
def regression_line(x, slope1, slope2, slope3, intercept):
    return(x**3 * slope3 + x**2 * slope2 + x * slope1 + intercept)

regression_model = pd.DataFrame()
regression_model['predicted_dates'] = average_of_dates['date_ordinal'] + 30
regression_model['predicted_water_levels'] = np.vectorize(regression_line)(regression_model['predicted_dates'], lin_reg_2.coef_[0][1], lin_reg_2.coef_[0][2], lin_reg_2.coef_[0][3], lin_reg_2.intercept_[0])
display(regression_model)

In [None]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go

fig = make_subplots(rows=1, cols=2, shared_yaxes=True)

fig.add_trace(
    go.Scatter(x=average_of_dates['date_ordinal'].apply(datetime.fromordinal), y=average_of_dates["Water Level / Niveau d'eau (m)"]),
    row=1, col=1
)

fig.add_trace(
    go.Scatter(x=regression_model['predicted_dates'].apply(datetime.fromordinal), y=regression_model['predicted_water_levels']),
    row=1, col=2
)

fig.update_layout(title_text="Side By Side Waterlevels and Predicted Waterlevels")
fig.show()

In [None]:
x_train = x_train.flatten()
y_train = y_train.flatten()
fit = np.polyfit(np.log(x_train), y_train, 1)
print(fit)

In [None]:
def logarithmic_reg(x, slope, intercept):
    return(intercept + slope * np.log(x))

log_model = pd.DataFrame()
log_model['predicted_dates'] = average_of_dates['date_ordinal'] + 30
log_model['predicted_water_levels'] = np.vectorize(logarithmic_reg)(log_model['predicted_dates'], fit[0], fit[1])
display(log_model)

In [None]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go

fig = make_subplots(rows=1, cols=2, shared_yaxes=True)

fig.add_trace(
    go.Scatter(x=average_of_dates['date_ordinal'].apply(datetime.fromordinal), y=average_of_dates["Water Level / Niveau d'eau (m)"]),
    row=1, col=1
)

fig.add_trace(
    go.Scatter(x=log_model['predicted_dates'].apply(datetime.fromordinal), y=log_model['predicted_water_levels']),
    row=1, col=2
)

fig.update_layout(title_text="Side By Side Waterlevels and Predicted Waterlevels")
fig.show()

# Visualization for Shuswap Lake Levels

Real time data for the Shuswap Lake

In [None]:
px.line(shuswap_data, x="Date", y="Water Level / Niveau d'eau (m)",title="Shuswap Lake Level")

#insert map of stations here
#https://wateroffice.ec.gc.ca/map/index_e.html

In [None]:
import pandas as pd
import folium
from folium.plugins import MarkerCluster
df = pd.read_csv('https://wateroffice.ec.gc.ca/map/download_e.html?type=real_time&filters=%7B%22station_id%22%3A%22%22%2C%22station_name%22%3A%22%22%2C%22province%22%3A%22all%22%2C%22region%22%3A%22CAN%22%2C%22basin%22%3A%22all%22%2C%22parameter%22%3A%22all%22%2C%22operation_schedule%22%3A%22all%22%2C%22operating_agency%22%3A%22all%22%7D')
latitude = df['Latitude'].mean()
longitude = df['Longitude'].mean()
station_map = folium.Map(location=[latitude,longitude], zoom_start=3)
marker_cluster = MarkerCluster()
for row in df.iterrows():
    marker_cluster.add_child(folium.Marker(location=[row[1]['Latitude'],row[1]['Longitude']], popup=[row[1]['Station Name'], row[1]['Station ID']]))
station_map.add_child(marker_cluster)
station_map

In [None]:
## import data
station = "05BJ004"
otherwater_data= pd.read_csv(f'https://dd.weather.gc.ca/hydrometric/csv/BC/daily/BC_{station}_daily_hydrometric.csv')
otherwater_data

In [None]:
px.line(otherwater_data, x="Date", y="Water Level / Niveau d'eau (m)",title="Other Water Level")

Comparison of water levels in the Shuswap Lake and other bodies of water

# Communicate
Below we will discuss the results of the data exploration.
(How does our key evidence help answer our question?)

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)