
# A Notebook to TreatGeoSelf with gridded climate time-series data sets

## (Case study:  the Sauk-Suiattle river watershed, the Elwha river watershed, the Upper Rio Salado watershed)

<img src= "http://www.sauk-suiattle.com/images/Elliott.jpg"
style="float:left;width:150px;padding:20px">   
This data is compiled to digitally observe the watersheds, powered by HydroShare. <br />
<br />
Use this Jupyter Notebook to: <br /> 
Generate a list of available gridded data points in your area of interest, <br /> 
Download Livneh daily 1/16 degree gridded climate data, <br /> 
Download WRF daily 1/16 degree gridded climate data, <br /> 
Summarize the elevation range and data availability within the watershed areas, <br /> 
Visualize the elevation gradient within the watershed areas. <br /> 

<br /> <br /> <br /> <img src="https://www.washington.edu/brand/files/2014/09/W-Logo_Purple_Hex.png" style="float:right;width:120px;padding:20px">  
#### A Watershed Dynamics Model by the Watershed Dynamics Research Group in the Civil and Environmental Engineering Department at the University of Washington 

## 1.  HydroShare Setup and Preparation

To run this notebook, we must import several libaries. These are listed in order of 1) Python standard libraries, 2) hs_utils library provides functions for interacting with HydroShare, including resource querying, dowloading and creation, and 3) the observatory_gridded_hydromet library that is downloaded with this notebook. 

If the python library basemap-data-hires is not installed, please uncomment and run the following lines in terminal.

In [1]:
#conda install -c conda-forge basemap-data-hires --yes

In [1]:
# data processing
import os
import pandas as pd, numpy as np, dask, json
import geopandas as gpd
from utilities import hydroshare
import ogh

# plotting and shape libraries
# import matplotlib as mpl
# import matplotlib.pyplot as plt
# %matplotlib inline

# # spatial plotting
# import fiona
# import shapely.ops
# from shapely.geometry import MultiPolygon, shape, point, box, Polygon
# from descartes import PolygonPatch
# from matplotlib.collections import PatchCollection
# from mpl_toolkits.basemap import Basemap


In [2]:
# initialize ogh_meta
meta_file = dict(ogh.ogh_meta())
sorted(meta_file.keys())

['dailymet_bclivneh2013',
 'dailymet_livneh2013',
 'dailymet_livneh2015',
 'dailyvic_livneh2013',
 'dailyvic_livneh2015',
 'dailywrf_bcsalathe2014',
 'dailywrf_salathe2014']

In [3]:
sorted(meta_file['dailymet_livneh2013'].keys())

['decision_steps',
 'delimiter',
 'domain',
 'end_date',
 'file_format',
 'filename_structure',
 'reference',
 'spatial_resolution',
 'start_date',
 'subdomain',
 'temporal_resolution',
 'variable_info',
 'variable_list',
 'web_protocol']

Establish a secure connection with HydroShare by instantiating the hydroshare class that is defined within hs_utils. In addition to connecting with HydroShare, this command also sets and prints environment variables for several parameters that will be useful for saving work back to HydroShare. 

In [7]:
hs=hydroshare.hydroshare()
homedir = hs.getContentPath(os.environ["HS_RES_ID"])

print('Data will be loaded from and save to:'+homedir)

Adding the following system variables:
   HS_USR_NAME = jphuong
   HS_RES_ID = 7c3416535ab24d4f93b0b94741bb9572
   HS_RES_TYPE = compositeresource
   JUPYTER_HUB_IP = jupyter.cuahsi.org

These can be accessed using the following command: 
   os.environ[key]

   (e.g.)
   os.environ["HS_USR_NAME"]  => jphuong
Successfully established a connection with HydroShare
Data will be loaded from and save to:/home/jovyan/work/Observatory-1/ogh


If you are curious about where the data is being downloaded, click on the Jupyter Notebook dashboard icon to return to the File System view.  The homedir directory location printed above is where you can find the data and contents you will download to a HydroShare JupyterHub server.  At the end of this work session, you can migrate this data to the HydroShare iRods server as a Generic Resource. 

## 2. Get list of gridded climate points for the watershed

This example uses a shapefile with the watershed boundary of the Sauk-Suiattle Basin, which is stored in HydroShare at the following url: https://www.hydroshare.org/resource/c532e0578e974201a0bc40a37ef2d284/. 

The data for our processing routines can be retrieved using the getResourceFromHydroShare function by passing in the global identifier from the url above.  In the next cell, we download this resource from HydroShare, and identify that the points in this resource are available for downloading gridded hydrometeorology data, based on the point shapefile at https://www.hydroshare.org/resource/ef2d82bf960144b4bfb1bae6242bcc7f/, which is for the extent of North America and includes the average elevation for each 1/16 degree grid cell.  The file must include columns with station numbers, latitude, longitude, and elevation. The header of these columns must be FID, LAT, LONG_, and ELEV or RASTERVALU, respectively. The station numbers will be used for the remainder of the code to uniquely reference data from each climate station, as well as to identify minimum, maximum, and average elevation of all of the climate stations.  The webserice is currently set to a URL for the smallest geographic overlapping extent - e.g. WRF for Columbia River Basin (to use a limit using data from a FTP service, treatgeoself() would need to be edited in observatory_gridded_hydrometeorology utility). 

In [6]:
"""
1/16-degree Gridded cell centroids
"""
# List of available data
hs.getResourceFromHydroShare('ef2d82bf960144b4bfb1bae6242bcc7f')
NAmer = hs.content['NAmer_dem_list.shp']

"""
Sauk
"""
# Watershed extent
hs.getResourceFromHydroShare('c532e0578e974201a0bc40a37ef2d284')
sauk = hs.content['wbdhuc12_17110006_WGS84.shp']

"""
Elwha
"""
# Watershed extent
hs.getResourceFromHydroShare('4aff8b10bc424250b3d7bac2188391e8')
elwha = hs.content["elwha_ws_bnd_wgs84.shp"]

"""
Rio Salado
"""
# Watershed extent
hs.getResourceFromHydroShare('5c041d95ceb64dce8eb85d2a7db88ed7')
riosalado = hs.content['UpperRioSalado_delineatedBoundary.shp']


This resource already exists in your userspace.
Would you like to overwrite this data [Y/n]? y
Download Finished                               
Successfully downloaded resource ef2d82bf960144b4bfb1bae6242bcc7f


This resource already exists in your userspace.
Would you like to overwrite this data [Y/n]? y
Download Finished                               
Successfully downloaded resource c532e0578e974201a0bc40a37ef2d284


This resource already exists in your userspace.
Would you like to overwrite this data [Y/n]? y
Download Finished                               
Successfully downloaded resource 4aff8b10bc424250b3d7bac2188391e8


This resource already exists in your userspace.
Would you like to overwrite this data [Y/n]? y
Download Finished                               
Successfully downloaded resource 5c041d95ceb64dce8eb85d2a7db88ed7


### Summarize the file availability from each watershed mapping file

In [8]:
mappingfile1 = os.path.join(homedir,'Sauk_mappingfile.csv')
mappingfile2 = os.path.join(homedir,'Elwha_mappingfile.csv')
mappingfile3 = os.path.join(homedir,'RioSalado_mappingfile.csv')

In [9]:
t1 = ogh.mappingfileSummary(listofmappingfiles = [mappingfile1, mappingfile2, mappingfile3], 
                            listofwatershednames = ['Sauk-Suiattle river','Elwha river','Upper Rio Salado'],
                            meta_file=meta_file)

t1

Watershed,Sauk-Suiattle river,Elwha river,Upper Rio Salado
Median elevation in meters [range](No. gridded cells),1182[164-2216] (n=98),1120[36-1642] (n=55),2308[1962-2669] (n=31)
dailymet_bclivneh2013,1182[164-2216] (n=98),1120[36-1642] (n=55),0
dailymet_livneh2013,1182[164-2216] (n=98),1146[174-1642] (n=52),2308[1962-2669] (n=31)
dailymet_livneh2015,1182[164-2216] (n=98),1120[36-1642] (n=55),2308[1962-2669] (n=31)
dailyvic_livneh2013,1182[164-2216] (n=98),1146[174-1642] (n=52),2308[1962-2669] (n=31)
dailyvic_livneh2015,1182[164-2216] (n=98),1120[36-1642] (n=55),2308[1962-2669] (n=31)
dailywrf_bcsalathe2014,1182[164-2216] (n=98),1142[97-1642] (n=53),0
dailywrf_salathe2014,1182[164-2216] (n=98),1142[97-1642] (n=53),0


## 3. Define the subset time-period shared between two gridded data products of interest

In [13]:
# Livneh et al., 2013
dr1 = meta_file['dailymet_livneh2013']

# Salathe et al., 2014
dr2 = meta_file['dailywrf_salathe2014']

# define overlapping time window
dr = ogh.overlappingDates(date_set1=tuple([dr1['start_date'], dr1['end_date']]), 
                          date_set2=tuple([dr2['start_date'], dr2['end_date']]))
dr

('1950-01-01', '2010-12-31')

## 4. Read in time-series data files for Sauk-Suiattle, and compute Spatial-temporal statistics

### Do this for Livneh et al., 2013 daily meteorology data and Salathe et al., 2014 WRF-NNRP data

In [16]:
help(ogh.gridclim_dict)

Help on function gridclim_dict in module ogh:

gridclim_dict(mappingfile, dataset, gridclimname=None, metadata=None, min_elev=None, max_elev=None, file_start_date=None, file_end_date=None, file_time_step=None, file_colnames=None, file_delimiter=None, subset_start_date=None, subset_end_date=None, df_dict=None, colvar='all')
    # pipelined operation for assimilating data, processing it, and standardizing the plotting
    
    mappingfile: (dir) the path directory to the mappingfile
    dataset: (str) the name of the dataset within mappingfile to use
    gridclimname: (str) the suffix for the dataset to be named; if None is provided, default to the dataset name
    metadata: (str) the dictionary that contains the metadata explanations; default is None
    min_elev: (float) the minimum elevation criteria; default is None
    max_elev: (float) the maximum elevation criteria; default is None
    file_start_date: (date) the start date of the files that will be read-in; default is None
    fi

In [19]:
#initiate new dictionary with original data

# Livneh et al. 2013 daily meteorology
ltm_0to3000 = ogh.gridclim_dict(metadata=meta_file,
                                mappingfile=mappingfile1,
                                dataset='dailymet_livneh2013',
                                file_start_date=dr1['start_date'], 
                                file_end_date=dr1['end_date'], 
                                subset_start_date=dr[0],
                                subset_end_date=dr[1])

# Salathe et al. 2014 daily WRF-NNRP
ltm_0to3000 = ogh.gridclim_dict(metadata=meta_file,
                                mappingfile=mappingfile1,
                                dataset='dailywrf_salathe2014',
                                file_start_date=dr2['start_date'], 
                                file_end_date=dr2['end_date'], 
                                subset_start_date=dr[0],
                                subset_end_date=dr[1],
                                df_dict=ltm_0to3000)

   FID       LAT      LONG_    ELEV  \
0    0  48.53125 -121.59375  1113.0   
1    1  48.46875 -121.46875   646.0   
2    2  48.46875 -121.53125   321.0   
3    3  48.46875 -121.59375   164.0   
4    4  48.46875 -121.65625   369.0   

                                 dailymet_livneh2013  \
0  /home/jovyan/work/Observatory-1/ogh/livneh2013...   
1  /home/jovyan/work/Observatory-1/ogh/livneh2013...   
2  /home/jovyan/work/Observatory-1/ogh/livneh2013...   
3  /home/jovyan/work/Observatory-1/ogh/livneh2013...   
4  /home/jovyan/work/Observatory-1/ogh/livneh2013...   

                               dailymet_bclivneh2013  \
0  /home/jovyan/work/Observatory-1/ogh/livneh2013...   
1  /home/jovyan/work/Observatory-1/ogh/livneh2013...   
2  /home/jovyan/work/Observatory-1/ogh/livneh2013...   
3  /home/jovyan/work/Observatory-1/ogh/livneh2013...   
4  /home/jovyan/work/Observatory-1/ogh/livneh2013...   

                                 dailymet_livneh2015  \
0  /home/jovyan/work/Observatory-1/

In [20]:
# report the data objects within the dictionary
sorted(ltm_0to3000.keys())

['PRECIP_dailymet_livneh2013',
 'PRECIP_dailywrf_salathe2014',
 'TMAX_dailymet_livneh2013',
 'TMAX_dailywrf_salathe2014',
 'TMIN_dailymet_livneh2013',
 'TMIN_dailywrf_salathe2014',
 'WINDSPD_dailymet_livneh2013',
 'WINDSPD_dailywrf_salathe2014',
 'anom_year_PRECIP_dailymet_livneh2013',
 'anom_year_PRECIP_dailywrf_salathe2014',
 'anom_year_TMAX_dailymet_livneh2013',
 'anom_year_TMAX_dailywrf_salathe2014',
 'anom_year_TMIN_dailymet_livneh2013',
 'anom_year_TMIN_dailywrf_salathe2014',
 'anom_year_WINDSPD_dailymet_livneh2013',
 'anom_year_WINDSPD_dailywrf_salathe2014',
 'meanallyear_PRECIP_dailymet_livneh2013',
 'meanallyear_PRECIP_dailywrf_salathe2014',
 'meanallyear_TMAX_dailymet_livneh2013',
 'meanallyear_TMAX_dailywrf_salathe2014',
 'meanallyear_TMIN_dailymet_livneh2013',
 'meanallyear_TMIN_dailywrf_salathe2014',
 'meanallyear_WINDSPD_dailymet_livneh2013',
 'meanallyear_WINDSPD_dailywrf_salathe2014',
 'meanmonth_PRECIP_dailymet_livneh2013',
 'meanmonth_PRECIP_dailywrf_salathe2014',
 'm

## 5. Compare gridded model to point observations

### Read in  SNOTEL data - assess available data 
If you want to plot observed snotel point precipitation or temperature with the gridded climate data, set to 'Y' 
Give name of Snotel file and name to be used in figure legends. 
File format: Daily SNOTEL Data Report - Historic - By individual SNOTEL site, standard sensors (https://www.wcc.nrcs.usda.gov/snow/snotel-data.html)

In [21]:
# Sauk
SNOTEL_file = os.path.join(homedir,'ThunderBasinSNOTEL.txt')
SNOTEL_station_name='Thunder Creek'
SNOTEL_file_use_colsnames = ['Date','Air Temperature Maximum (degF)', 'Air Temperature Minimum (degF)','Air Temperature Average (degF)','Precipitation Increment (in)']
SNOTEL_station_elev=int(4320/3.281) # meters

SNOTEL_obs_daily = ogh.read_daily_snotel(file_name=SNOTEL_file, 
                                         usecols=SNOTEL_file_use_colsnames,
                                         delimiter=',', 
                                         header=58)

# generate the start and stop date
SNOTEL_obs_start_date=SNOTEL_obs_daily.index[0]
SNOTEL_obs_end_date=SNOTEL_obs_daily.index[-1]

# peek
SNOTEL_obs_daily.head(5)

FileNotFoundError: File b'/home/jovyan/work/Observatory-1/ogh/ThunderBasinSNOTEL.txt' does not exist

### Read in  COOP station data - assess available data
https://www.ncdc.noaa.gov/

In [None]:
COOP_file=os.path.join(homedir, 'USC00455678.csv') # Sauk
COOP_station_name='Mt Vernon'
COOP_file_use_colsnames = ['DATE','PRCP','TMAX', 'TMIN','TOBS']
COOP_station_elev=int(4.3) # meters

COOP_obs_daily = ogh.read_daily_coop(file_name=COOP_file,
                                     usecols=COOP_file_use_colsnames,
                                     delimiter=',',
                                     header=0)

# generate the start and stop date
COOP_obs_start_date=COOP_obs_daily.index[0]
COOP_obs_end_date=COOP_obs_daily.index[-1]

# peek
COOP_obs_daily.head(5)

In [None]:
# read in the mappingfile
mappingfile = mappingfile1

mapdf = pd.read_csv(mappingfile)

# select station by first FID
firstStation = ogh.findStationCode(mappingfile=mappingfile, colvar='FID', colvalue=0)

# select station by elevation
maxElevStation = ogh.findStationCode(mappingfile=mappingfile, colvar='ELEV', colvalue=mapdf.loc[:,'ELEV'].max())
medElevStation = ogh.findStationCode(mappingfile=mappingfile, colvar='ELEV', colvalue=mapdf.loc[:,'ELEV'].median())
minElevStation = ogh.findStationCode(mappingfile=mappingfile, colvar='ELEV', colvalue=mapdf.loc[:,'ELEV'].min())


# print(firstStation, mapdf.iloc[0].ELEV)
# print(maxElevStation, mapdf.loc[:,'ELEV'].max())
# print(medElevStation, mapdf.loc[:,'ELEV'].median())
# print(minElevStation, mapdf.loc[:,'ELEV'].min())

# let's compare monthly averages for TMAX using livneh, salathe, and the salathe-corrected livneh
comp = ['month_TMAX_dailymet_livneh2013',
        'month_TMAX_dailywrf_salathe2014']

obj = dict()
for eachkey in ltm_0to3000.keys():
    if eachkey in comp:
        obj[eachkey] = ltm_0to3000[eachkey]

panel_obj = pd.Panel.from_dict(obj)
panel_obj

In [None]:
comp = ['meanmonth_TMAX_dailymet_livneh2013',
        'meanmonth_TMAX_dailywrf_salathe2014']

obj = dict()
for eachkey in ltm_0to3000.keys():
    if eachkey in comp:
        obj[eachkey] = ltm_0to3000[eachkey]

        df_obj = pd.DataFrame.from_dict(obj)
df_obj

In [None]:
t_res, var, dataset, pub = each.rsplit('_',3)

print(t_res, var, dataset, pub)

In [None]:
ylab_var = meta_file['_'.join([dataset, pub])]['variable_info'][var]['desc']
ylab_unit = meta_file['_'.join([dataset, pub])]['variable_info'][var]['units']

print('{0} {1} ({2})'.format(t_res, ylab_var, ylab_unit))

In [None]:
%%time
comp = [['meanmonth_TMAX_dailymet_livneh2013','meanmonth_TMAX_dailywrf_salathe2014'],
        ['meanmonth_PRECIP_dailymet_livneh2013','meanmonth_PRECIP_dailywrf_salathe2014']]
wy_numbers=[10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9]
month_strings=[ 'Oct', 'Nov', 'Dec', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep']


fig = plt.figure(figsize=(20,5), dpi=500)

ax1 = plt.subplot2grid((2, 2), (0, 0), colspan=1)
ax2 = plt.subplot2grid((2, 2), (1, 0), colspan=1)


# monthly
for eachsumm in df_obj.columns:
    ax1.plot(df_obj[eachsumm])
    

ax1.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05), fancybox=True, shadow=True, ncol=2, fontsize=10)
plt.show()

In [None]:
df_obj[each].index.apply(lambda x: x+2)

In [None]:
fig, ax = plt.subplots()
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))

lws=[3, 10, 3, 3]
styles=['b--','go-','y--','ro-']

for col, style, lw in zip(comp, styles, lws):
    panel_obj.xs(key=(minElevStation[0][0], minElevStation[0][1], minElevStation[0][2]), axis=2)[col].plot(style=style, lw=lw, ax=ax, legend=True)
ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05), fancybox=True, shadow=True, ncol=2)
fig.show()
    

    
    
fig, ax = plt.subplots()
lws=[3, 10, 3, 3]
styles=['b--','go-','y--','ro-']

for col, style, lw in zip(comp, styles, lws):
    panel_obj.xs(key=(maxElevStation[0][0], maxElevStation[0][1], maxElevStation[0][2]), 
                 axis=2)[col].plot(style=style, lw=lw, ax=ax, legend=True)

ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05), fancybox=True, shadow=True, ncol=2)
fig.show()

## Set up VIC dictionary (as an example)  to compare to available data

In [None]:
vic_dr1 = meta_file['dailyvic_livneh2013']['date_range']
vic_dr2 = meta_file['dailyvic_livneh2015']['date_range']
vic_dr = ogh.overlappingDates(tuple([vic_dr1['start'], vic_dr1['end']]),
                              tuple([vic_dr2['start'], vic_dr2['end']]))

vic_ltm_3bands = ogh.gridclim_dict(mappingfile=mappingfile,
                                   metadata=meta_file,
                                   dataset='dailyvic_livneh2013',
                                   file_start_date=vic_dr1['start'], 
                                   file_end_date=vic_dr1['end'],
                                   file_time_step=vic_dr1['time_step'],
                                   subset_start_date=vic_dr[0],
                                   subset_end_date=vic_dr[1])

In [None]:
sorted(vic_ltm_3bands.keys())

## 10. Save the results back into HydroShare
<a name="creation"></a>

Using the `hs_utils` library, the results of the Geoprocessing steps above can be saved back into HydroShare.  First, define all of the required metadata for resource creation, i.e. *title*, *abstract*, *keywords*, *content files*.  In addition, we must define the type of resource that will be created, in this case *genericresource*.  

***Note:*** Make sure you save the notebook at this point, so that all notebook changes will be saved into the new HydroShare resource.

In [39]:
#execute this cell to list the content of the directory
!ls -lt

total 16956
-rwxrwx--- 1 root   users 3191412 Mar 26 23:35 Observatory_Sauk_TreatGeoSelf_usecase1.ipynb
-rwxrwx--- 1 root   users 6241597 Mar 26 23:35 Observatory_Sauk_TreatGeoSelf_usecase2.ipynb
-rwxrwx--- 1 root   users  395862 Mar 26 23:21 gcGradient_r.png
-rwxrwx--- 1 root   users  842081 Mar 26 23:21 gcGradient_e.png
-rwxrwx--- 1 root   users  774657 Mar 26 23:21 gcGradient_s.png
-rwxrwx--- 1 root   users   16802 Mar 26 22:57 RioSalado_mappingfile.csv
-rwxrwx--- 1 root   users   43565 Mar 26 22:56 Elwha_mappingfile.csv
-rwxrwx--- 1 root   users   79503 Mar 26 22:55 Sauk_mappingfile.csv
-rwxrwx--- 1 root   users  299095 Mar 26 22:49 statemap_annotated.png
-rw-r--r-- 1 jovyan users  208936 Mar 26 22:40 allwatersheds.shp
-rw-r--r-- 1 jovyan users     108 Mar 26 22:40 allwatersheds.shx
-rw-r--r-- 1 jovyan users      78 Mar 26 22:40 allwatersheds.dbf
-rw-r--r-- 1 jovyan users     143 Mar 26 22:40 allwatersheds.prj
-rwxrwx--- 1 root   users      10 Mar 26 22:40 allwatershe

## Archive the downloaded data files for collaborative use

Create list of files to save to HydroShare. Verify location and names.

In [40]:
!tar -zcf {climate2013_tar} livneh2013
!tar -zcf {climate2015_tar} livneh2015
!tar -zcf {wrf_tar} salathe2014

In [41]:
ThisNotebook='Observatory_Sauk_TreatGeoSelf_usecase1.ipynb' #check name for consistency
climate2013_tar = 'livneh2013.tar.gz'
climate2015_tar = 'livneh2015.tar.gz'
wrf_tar = 'salathe2014.tar.gz'
mappingfile = 'Sauk_mappingfile.csv'

files=[ThisNotebook, mappingfile, climate2013_tar, climate2015_tar, wrf_tar]

In [None]:
# for each file downloaded onto the server folder, move to a new HydroShare Generic Resource
title = 'Results from testing out the TreatGeoSelf utility'
abstract = 'This the output from the TreatGeoSelf utility integration notebook.'
keywords = ['Sauk', 'Elwha','Rio Salado','climate','hydromet','watershed'] 
rtype = 'genericresource'

# create the new resource
resource_id = hs.createHydroShareResource(abstract, 
                                          title,
                                          keywords=keywords, 
                                          resource_type=rtype, 
                                          content_files=files, 
                                          public=False)