<a href="https://colab.research.google.com/github/byrnesy/Hands-On-Data-Analysis-with-Pandas/blob/master/geoglows_package_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# GEOGloWS ECMWF Streamflow
The GEOGloWS ECMWF Streamflow model is a global hydrologic model driven by the HTESSEL land surface model. It produces new 15 day streamflow forecasts each day at midnight (UTC +0). It also has a 40+ year (and growing) historical simulation of streamflow.

The GEOGloWS ECMWF Streamflow Data Service (REST API) and this python package client were developed at Brigham Young University in the Civil and Environmental Engineering Department by Riley Hales, Rohit Khatar, Chris Edwards, Kyler Ashby, Gio Romero, and others. This project is an axpansion and enhancement of the original work by Dr Jim Nelson and Dr Michael Suffront with funding from GEOGloWS, ECMWF, NASA, The World Bank, Microsoft Azure, BYU, and others.

You can interact with the streamflow model using the geoglows python package. This notebook will take you through some of the functions available. For more information, please refer to https://geoglows.readthedocs.io.

In [None]:
# Start by installing the package and importing it to your code. Run this cell to do that.
!pip install geoglows -q

In [None]:
import geoglows
from IPython.core.display import display, HTML
from google.colab import files

# Access the Streamflow Data Service

Data in the GEOGloWS Model is organized by the numeric ID assigned to each stream segment in the model. This is refered to as the "reach_id" in the following code. If you need help finding this number, use the graphical tool on the home page of https://geoglows.ecmwf.int or view the tutorial on identifying reach ID's.

In [None]:
# Pick the ID of a river. This ID is on the Srae Huy, Cambodia
reach_id = 5075575

## Forecasted streamflow products
There are 3 Forecasted Streamflow products

- **Forecast Ensembles** returns a csv containing 52, 15-day time series of discharge.
- **Forecast Stats** returns a csv with a 15-day time series for the Max, Min, 25 & 75 percentile, and average of the ensemble members.
- **Forecast Records** is a growing record of the previous forecast predictions. Each day the flow predictions for the first 24 hours are appended to the forecast record of each stream.


In [None]:
# 3 forecasted streamflow products:
ensembles = geoglows.streamflow.forecast_ensembles(reach_id)
stats = geoglows.streamflow.forecast_stats(reach_id)
records = geoglows.streamflow.forecast_records(reach_id)

## Historical simulation Products
There are 4 historical streamflow products
- **Historic Simulation** provides a timeseries of daily average streamflow since 1979. This is based on the ERA-5 historical data product. It is updated at the beginning of each year when the ERA-5 data for the previous year is available.
- **Return Periods** estimates the 2, 5, 10, 20, 50, and 100 year return period for the river based on the historic simulation on the river and the Gumbel Type 2 distribution of flood events.
- **Daily Averages** returns a times eries with 366 entries, one for each day of the year including leap day. This is the average simulated flow on each day of the year and is a view of seasonal trends on that river. 
- **Monthly Averages** returns a time series with 12 entries, one for each month of the year. This is the average of all simulated daily values for that month. This is another view of seasonal trends.

In [None]:
hist = geoglows.streamflow.historic_simulation(reach_id)

In [None]:
rperiods = geoglows.streamflow.return_periods(reach_id)

In [None]:
day_avg = geoglows.streamflow.daily_averages(reach_id)

In [None]:
mon_avg = geoglows.streamflow.monthly_averages(reach_id)

# Visualizing data
You just gathered forecasted and historical simulation streamflow data for the river you chose by latitude and longitude. While this is useful for many applications, a good place to start is with visualizations. 

The geoglows package has tools for graphing all the data you can get from the GEOGloWS ECMWF Streamflow model. Each of the following cells will turn the data you requested into a Plotly figure and then show it.

In [None]:
# The title dictionary allows you to add additional information to the title block of the resulting graphs
title = {'Reach ID': reach_id}

In [None]:
# Statistical summary of the forecasted flows
forecast_figure = geoglows.plots.forecast_stats(stats, rperiods, titles=title)
forecast_figure.show()

In [None]:
# View the previously saved forecasts
records_figure = geoglows.plots.forecast_records(records, rperiods, titles=title)
records_figure.show()

In [None]:
# View each of the forecasts individually
ensembles_figure = geoglows.plots.forecast_ensembles(ensembles, rperiods, titles=title)
ensembles_figure.show()

In [None]:
# Historically simulated flow (ERA-5)
historic_figure = geoglows.plots.historic_simulation(hist, rperiods, titles=title)
historic_figure.show()

In [None]:
# Processing the historical data into a daily average flow
day_figure = geoglows.plots.daily_averages(day_avg, titles=title)
day_figure.show()

In [None]:
# Processing the historical data into a monthly average flow
mon_figure = geoglows.plots.monthly_averages(mon_avg, titles=title)
mon_figure.show()

In [None]:
# Flow Duration Curve (derived from the ERA-5 data)
flow_duration_figure = geoglows.plots.flow_duration_curve(hist, titles=title)
flow_duration_figure.show()

In [None]:
# View the probabilities table
prob_table = geoglows.plots.probabilities_table(stats, ensembles, rperiods)
display(HTML(prob_table))

In [None]:
# View the probabilities table
rperiods_table = geoglows.plots.return_periods_table(rperiods)
display(HTML(rperiods_table))

## Saving data

The results from the GEOGloWS Model are stored as DataFrames using the `pandas` package. You can use the `.to_csv()` method of `pandas` DataFrames to save your resulting information to a csv file or use other similar methods. From the Google Collaboratory environment, this can be downloaded to your personal computer using the `files.download` command.


In [None]:
# Saving DataFrame to CSV file and downloading from the Google notebook instance
stats.to_csv('stats.csv')
files.download('stats.csv')

hist.to_csv('hist.csv')
files.download('hist.csv')

# What's next?
Go learn more about this tool at http://geoglows.readthedocs.io and more about GEOGloWS at https://www.geoglows.org.