# Retrieve NetCDF and model gridded climate time-series for a watershed

### Case study:  the Sauk-Suiattle Watershed
<img src="http://www.sauk-suiattle.com/images/Elliott.jpg" 
style="float:right;width:150px;padding:20px">

### Use this Jupyter Notebook to:
    1. HydroShare setup and preparation
    2. Re-establish the paths to the mapping file
    3. Compute daily, monthly, and annual temperature and precipitation statistics
    4. Visualize precipitation results relative to the forcing data
    5. Visualize the time-series trends
    6. Save results back into HydroShare

<br/><br/><br/>
<img src="https://www.washington.edu/brand/files/2014/09/W-Logo_Purple_Hex.png"
style="float:right;width:150px;padding:20px">

<br/><br/>
#### This data is compiled to digitally observe the watersheds, powered by HydroShare. <br/>Provided by the Watershed Dynamics Group, Dept. of Civil and Environmental Engineering, University of Washington

## 1.  Prepare HydroShare Setup and Preparation

To run this notebook, we must import several libaries. These are listed in order of 1) Python standard libraries, 2) hs_utils library provides functions for interacting with HydroShare, including resource querying, dowloading and creation, and 3) the observatory_gridded_hydromet library that is downloaded with this notebook. 

In [1]:
# silencing warning
import warnings
warnings.filterwarnings("ignore")

# data processing
import os
import pandas as pd, numpy as np, dask, json
import seaborn as sns

# data migration library
import ogh
import ogh_xarray_landlab as oxl
from utilities import hydroshare
from ecohydrology_model_functions import run_ecohydrology_model, plot_results
InputFile = 'ecohyd_inputs.yaml'

# plotting and shape libraries
%matplotlib inline

This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

The backend was *originally* set to 'module://ipykernel.pylab.backend_inline' by the following code:
  File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/opt/conda/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/opt/conda/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 486, in start
    self.io_loop.start()
  File "/opt/conda/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 127, in start
    self.asyncio_loop.run_fore

In [3]:
# initialize ogh_meta
meta_file = dict(ogh.ogh_meta())
sorted(meta_file.keys())

['dailymet_bclivneh2013',
 'dailymet_livneh2013',
 'dailymet_livneh2015',
 'dailyvic_livneh2013',
 'dailyvic_livneh2015',
 'dailywrf_bcsalathe2014',
 'dailywrf_salathe2014']

In [4]:
sorted(meta_file['dailymet_livneh2013'].keys())

['decision_steps',
 'delimiter',
 'domain',
 'end_date',
 'file_format',
 'filename_structure',
 'reference',
 'spatial_resolution',
 'start_date',
 'subdomain',
 'temporal_resolution',
 'variable_info',
 'variable_list',
 'web_protocol']

Establish a secure connection with HydroShare by instantiating the hydroshare class that is defined within hs_utils. In addition to connecting with HydroShare, this command also sets and prints environment variables for several parameters that will be useful for saving work back to HydroShare. 

In [5]:
notebookdir = os.getcwd()

hs=hydroshare.hydroshare()
homedir = hs.getContentPath(os.environ["HS_RES_ID"])
os.chdir(homedir)

Adding the following system variables:
   HS_USR_NAME = jphuong
   HS_RES_ID = 70b977e22af544f8a7e5a803935c329c
   HS_RES_TYPE = genericresource
   JUPYTER_HUB_IP = jupyter.cuahsi.org

These can be accessed using the following command: 
   os.environ[key]

   (e.g.)
   os.environ["HS_USR_NAME"]  => jphuong

The hs_utils library requires a secure connection to your HydroShare account.
Enter the HydroShare password for user 'jphuong': ········
Successfully established a connection with HydroShare


If you are curious about where the data is being downloaded, click on the Jupyter Notebook dashboard icon to return to the File System view.  The homedir directory location printed above is where you can find the data and contents you will download to a HydroShare JupyterHub server.  At the end of this work session, you can migrate this data to the HydroShare iRods server as a Generic Resource. 

## 2. Get list of gridded climate points for the watershed

For visualization purposes, we will also remap the study site shapefile, which is stored in HydroShare at the following url: https://www.hydroshare.org/resource/c532e0578e974201a0bc40a37ef2d284/. Since the shapefile was previously migrated, we can select 'N' for no overwriting.

In the usecase1 notebook, the treatgeoself function identified the gridded cell centroid coordinates that overlap with our study site. These coordinates were documented within the mapping file, which will be remapped here. In the usecase2 notebook, the downloaded files were cataloged within the mapping file, so we will use the mappingfileSummary function to characterize the files available for Sauk-Suiattle for each gridded data product.

In [6]:
"""
1/16-degree Gridded cell centroids
"""
# List of available data
hs.getResourceFromHydroShare('ef2d82bf960144b4bfb1bae6242bcc7f')
NAmer = hs.content['NAmer_dem_list.shp']


"""
Sauk
"""
# Watershed extent
hs.getResourceFromHydroShare('c532e0578e974201a0bc40a37ef2d284')
sauk = hs.content['wbdhub12_17110006_WGS84_Basin.shp']

# reproject the shapefile into WGS84
ogh.reprojShapefile(sourcepath=sauk)

This resource already exists in your userspace.
ef2d82bf960144b4bfb1bae6242bcc7f/
|-- ef2d82bf960144b4bfb1bae6242bcc7f/
|   |-- bagit.txt
|   |-- manifest-md5.txt
|   |-- readme.txt
|   |-- tagmanifest-md5.txt
|   |-- data/
|   |   |-- resourcemap.xml
|   |   |-- resourcemetadata.xml
|   |   |-- contents/
|   |   |   |-- NAmer_dem_list.cpg
|   |   |   |-- NAmer_dem_list.dbf
|   |   |   |-- NAmer_dem_list.prj
|   |   |   |-- NAmer_dem_list.sbn
|   |   |   |-- NAmer_dem_list.sbx
|   |   |   |-- NAmer_dem_list.shp
|   |   |   |-- NAmer_dem_list.shx

Do you want to overwrite these data [Y/n]? n


This resource already exists in your userspace.
c532e0578e974201a0bc40a37ef2d284/
|-- c532e0578e974201a0bc40a37ef2d284/
|   |-- bagit.txt
|   |-- manifest-md5.txt
|   |-- readme.txt
|   |-- tagmanifest-md5.txt
|   |-- data/
|   |   |-- resourcemap.xml
|   |   |-- resourcemetadata.xml
|   |   |-- contents/
|   |   |   |-- wbdhub12_17110006_WGS84_Basin.cpg
|   |   |   |-- wbdhub12_17110006_WGS84_Basin.shp
|   |   |   |-- wbdhub12_17110006_WGS84_Basin.shx
|   |   |   |-- wbdhub12_17110006_WGS84_Basin.dbf
|   |   |   |-- wbdhub12_17110006_WGS84_Basin.prj

Do you want to overwrite these data [Y/n]? n


### Summarize the file availability from each watershed mapping file

In [7]:
%%time

# map the mappingfiles from usecase1
mappingfile1=ogh.treatgeoself(shapefile=sauk, NAmer=NAmer, buffer_distance=0.06,
                              mappingfile=os.path.join(homedir,'Sauk_mappingfile.csv'))

(99, 4)
   FID       LAT      LONG_    ELEV
0    0  48.53125 -121.59375  1113.0
1    1  48.46875 -121.46875   646.0
2    2  48.46875 -121.53125   321.0
3    3  48.46875 -121.59375   164.0
4    4  48.46875 -121.65625   369.0
CPU times: user 22.3 s, sys: 493 ms, total: 22.7 s
Wall time: 22.8 s


## 3.  Compare Hydrometeorology 

This section performs computations and generates plots of the Livneh 2013 and Salathe 2014 mean temperature and mean total monthly precipitation in order to compare them with each other. The generated plots are automatically downloaded and saved as .png files within the "homedir" directory.

Let's compare the Livneh 2013 and Salathe 2014 using the period of overlapping history.

In [8]:
help(ogh.getDailyWRF_salathe2014)

Help on function getDailyWRF_salathe2014 in module ogh.ogh:

getDailyWRF_salathe2014(homedir, mappingfile, subdir='salathe2014/WWA_1950_2010/raw', catalog_label='dailywrf_salathe2014')
    Get the Salathe el al., 2014 raw Daily WRF files of interest using the reference mapping file
    
    homedir: (dir) the home directory to be used for establishing subdirectories
    mappingfile: (dir) the file path to the mappingfile, which contains LAT, LONG_, and ELEV coordinates of interest
    subdir: (dir) the subdirectory to be established under homedir
    catalog_label: (str) the preferred name for the series of catalogged filepaths



In [9]:
help(oxl.get_x_dailywrf_Salathe2014)

Help on function get_x_dailywrf_Salathe2014 in module ogh_xarray_landlab:

get_x_dailywrf_Salathe2014(homedir, spatialbounds, subdir='salathe2014/Daily_WRF_1970_1999/noBC', nworkers=4, start_date='1970-01-01', end_date='1989-12-31', rename_timelatlong_names={'LAT': 'LAT', 'LON': 'LON'}, file_prefix='sp_', replace_file=True)
    get Daily WRF data from Salathe et al. (2014) using xarray on netcdf files



## NetCDF retrieval and clipping to a spatial extent

The function get_x_dailywrf_salathe2014 retrieves and clips NetCDF files archived within the UW Rocinante NNRP repository. This archive contains daily data from January 1970 through December 1999 (30 years). Each netcdf file is comprised of meteorologic and VIC hydrologic outputs for a calendar month. The expected number of files would be 360 files (12 months for 30 years). 

In the code chunk below, 40 parallel workers will be initialized to distribute file retrieval and spatial clipping tasks. For each worker, they will wget the requested file, clip the netcdf file to gridded cell centroids within the the provided bounding box, then return the location of the spatially clipped output files.

In [11]:
maptable, nstations = ogh.mappingfileToDF(mappingfile1)
spatialbounds = {'minx':maptable.LONG_.min(), 'maxx':maptable.LONG_.max(),
                 'miny':maptable.LAT.min(), 'maxy':maptable.LAT.max()}

outputfiles = oxl.get_x_dailywrf_Salathe2014(homedir=homedir,
                                             subdir='salathe2014/Daily_WRF_1970_1979/noBC_netcdf',
                                             spatialbounds=spatialbounds,
                                             nworkers=40,
                                             start_date='1970-01-01', end_date='1979-12-31')

Number of gridded data files:99
Minimum elevation: 164.0m
Mean elevation: 1151.040404040404m
Maximum elevation: 2216.0m
[########################################] | 100% Completed |  7min 20.4s


In [12]:
%%time
outfilelist = oxl.netcdf_to_ascii(homedir=homedir, 
                                  subdir='salathe2014/Daily_WRF_1970_1979/noBC_ascii', 
                                  mappingfile=mappingfile1,
                                  source_directory=os.path.join(homedir, 'salathe2014/Daily_WRF_1970_1979/noBC_netcdf'),
                                  meta_file=meta_file,
                                  catalog_label='sp_WRF_NNRP_noBC_1970_1979')

[########################################] | 100% Completed |  0.6s
[########################################] | 100% Completed |  0.7s
[########################################] | 100% Completed |  0.7s
[########################################] | 100% Completed |  0.8s
[########################################] | 100% Completed |  0.6s
[########################################] | 100% Completed |  0.6s
[########################################] | 100% Completed |  0.6s
[########################################] | 100% Completed |  0.6s
[########################################] | 100% Completed |  0.6s
[########################################] | 100% Completed |  0.6s
[########################################] | 100% Completed |  0.7s
[########################################] | 100% Completed |  0.6s
[########################################] | 100% Completed |  0.7s
[########################################] | 100% Completed |  0.6s
[########################################] | 100

In [18]:
filedir = os.path.join(homedir, 'salathe2014/Daily_WRF_1970_1979/noBC_netcdf')
files = os.listdir(filedir)

In [19]:
import xarray

test = xarray.open_mfdataset(os.path.join(filedir, files[0]))

In [72]:
tmp = pd.Series(test.TIME)
tdiff = (tmp-tmp.shift(1))[1:2]
pd.timedelta_range(tdiff)

ValueError: Of the three parameters: start, end, and periods, exactly two must be specified

In [67]:
tdiff.dtype

dtype('<m8[ns]')

In [13]:
meta_file['sp_WRF_NNRP_noBC_1970_1979']

{'projection': 'Geographic',
 'contributors': 'Eric P. Salathe, A.F. Hamlet, C.F. Mass, S-Y. Lee, G.S. Mauger, M. Stumbaugh, R. Steed, B. Dotson',
 'keywords': 'climate change, climate impacts, dynamical downscaling, hydrologic change, extremes, Western U.S., Western United States, PNW, Pacific Northwest',
 'contributor_role': 'Principal Investigator',
 'contributor': 'Eric P. Salathe Jr.',
 'contact_name': 'Eric P. Salathe Jr.',
 'contributor_email': 'salathe@uw.edu',
 'lat_min': '38.09375',
 'lon_resolution': '0.0625',
 'lat_max': '52.40625',
 'title': 'Dynamically Downscaled Hydroclimate Projections: WRF model',
 'lat_resolution': '0.0625',
 'lon_min': '-124.65625',
 'creator_email': 'bdotson@uw.edu',
 'surfsgnconvention': 'Traditional',
 'contact_email': 'salathe@uw.edu',
 'institution': 'University of Washington, Climate Impacts Group',
 'creator_name': 'Bri Dotson',
 'rights': 'freely available',
 'acknowledgement': ' ',
 'summary': 'Dynamically downscaled NCEP-NCAR Reanalysis (N

In [None]:
t1 = ogh.mappingfileSummary(listofmappingfiles = [mappingfile1], 
                            listofwatershednames = ['Sauk-Suiattle river'],
                            meta_file=meta_file)

t1

### Create a dictionary of climate variables for the long-term mean (ltm).
#### INPUT: gridded meteorology ASCII files located from the Sauk-Suiattle Mapping file. The inputs to gridclim_dict() include the folder location and name of the hydrometeorology data, the file start and end, the analysis start and end, and the elevation band to be included in the analsyis (max and min elevation). <br/>OUTPUT: dictionary of dataframes where rows are temporal summaries and columns are spatial summaries

In [None]:
%%time

ltm = ogh.gridclim_dict(mappingfile=mappingfile1,
                        metadata=meta_file,
                        dataset='sp_WRF_NNRP_noBC_1970_1979')

sorted(ltm.keys())

### Compute the total monthly and yearly precipitation, as well as the mean values across time and across stations
#### INPUT: daily precipitation for each station from the long-term mean dictionary (ltm) <br/>OUTPUT: Append the computed dataframes and values into the ltm dictionary

In [None]:
# extract metadata
dr = meta_file['sp_WRF_NNRP_noBC_1970_1979']

# compute sums and mean monthly an yearly sums
ltm = ogh.aggregate_space_time_sum(df_dict=ltm,
                                   suffix='PREC_sp_WRF_NNRP_noBC_1970_1979',
                                   start_date=dr['start_date'],
                                   end_date=dr['end_date'])

In [None]:
# print the name of the analytical dataframes and values within ltm
sorted(ltm.keys())

In [None]:
# initialize list of outputs
files=[]

# create the destination path for the dictionary of dataframes
ltm_sauk=os.path.join(homedir, 'ltm_1970_1979_sauk.json')
ogh.saveDictOfDf(dictionaryObject=ltm, outfilepath=ltm_sauk)
files.append(ltm_sauk)

# append the mapping file for Sauk-Suiattle gridded cell centroids
files.append(mappingfile1)

### Visualize the "average monthly total precipitations"

#### INPUT: dataframe with each month as a row and each station as a column. <br/>OUTPUT: A png file that represents the distribution across stations (in Wateryear order)

In [None]:
# # two lowest elevation locations
lowE_ref = ogh.findCentroidCode(mappingfile=mappingfile1, colvar='ELEV', colvalue=164)

# one highest elevation location
highE_ref = ogh.findCentroidCode(mappingfile=mappingfile1, colvar='ELEV', colvalue=2216)

# combine references together
reference_lines = highE_ref + lowE_ref
reference_lines


In [None]:
# """
# #Higher resolution children gridded cells 
# #get data from Lower resolution parent grid cells to the children
# """
# import landlab as L2

# watershed_dem_sc = os.path.join(homedir, 'DEM_10m.asc')
# (rmg_sc, z_sc) = L2.io.read_esri_ascii(watershed_dem_sc, name='topographic__elevation')
# rmg_sc.set_watershed_boundary_condition(z_sc)

In [None]:
# test0=pd.read_table(watershed_dem_sc, nrows=5, sep='\s+', header=None).set_index(0)[1].to_dict()
# print(test0)

# test1 = pd.read_table(watershed_dem_sc, 
#                       skiprows=6, 
#                       nrows=test0['nrows'],
#                       sep='\s+',
#                       header=None)
# print(test1.tail())

In [None]:
# test1.unstack().as_matrix().shape

In [None]:
# test1.as_matrix().shape

In [None]:
minx2, miny2, maxx2, maxy2 = oxl.calculateUTMbounds(mappingfile=mappingfile1,
                                                    mappingfile_crs={'init':'epsg:4326'},
                                                    spatial_resolution=0.06250)

In [None]:
minx2, miny2, maxx2, maxy2

In [None]:
# generate a raster
raster, t1, t2 = oxl.rasterDimensions (minx=minx2, miny=miny2, maxx=maxx2, maxy=maxy2, dx=100, dy=100)
raster.shape

In [None]:
nodeXmap, raster = oxl.mappingfileToRaster(mappingfile=mappingfile1,
                                           spatial_resolution=0.06250,
                                           approx_distance_m_x=6000)

In [None]:
vector = ogh.rasterVector(vardf=ltm['meanbymonthsum_PREC_sp_WRF_NNRP_noBC_1970_1979'],
                          vardf_dateindex=3,
                          crossmap=nodeXmap,
                          nodata=-9999)
np.array(vector)

In [None]:
%%time
(VegType_low, yrs_low, debug_low) = run_ecohydrology_model(raster, 
                                                           input_data=vector,
                                                           input_file=InputFile,
                                                           synthetic_storms=False,
                                                           number_of_storms=50000,
                                                           pet_method='PriestleyTaylor')

In [None]:
vector = ogh.rasterVector(vardf=ltm['meanbyyearsum_PREC_sp_WRF_NNRP_noBC_1970_1979'],
                          vardf_dateindex=0,
                          crossmap=nodeXmap,
                          nodata=-9999)
vector

### Visualize the "average monthly total precipitation"

## 5. Save the results back into HydroShare
<a name="creation"></a>

Using the `hs_utils` library, the results of the Geoprocessing steps above can be saved back into HydroShare.  First, define all of the required metadata for resource creation, i.e. *title*, *abstract*, *keywords*, *content files*.  In addition, we must define the type of resource that will be created, in this case *genericresource*.  

***Note:*** Make sure you save the notebook at this point, so that all notebook changes will be saved into the new HydroShare resource.

### Total files and image to migrate

In [None]:
len(files)

In [None]:
# for each file downloaded onto the server folder, move to a new HydroShare Generic Resource
title = 'Computed spatial-temporal summaries of two gridded data product data sets for Sauk-Suiattle'
abstract = 'This resource contains the computed summaries for the Meteorology data from Livneh et al. 2013 and the WRF data from Salathe et al. 2014.'
keywords = ['Sauk-Suiattle', 'Livneh 2013', 'Salathe 2014','climate','hydromet','watershed', 'visualizations and summaries'] 
rtype = 'genericresource'

# create the new resource
resource_id = hs.createHydroShareResource(abstract, 
                                          title,
                                          keywords=keywords, 
                                          resource_type=rtype, 
                                          content_files=files, 
                                          public=False)

In [None]:
df=df_bc45
models=[model for model in df.columns if model not in ['Date','Year','Month','Day']]
time=time1



In [None]:
import pandas as pd

t=pd.DataFrame({'test':['terrific']})

In [None]:
t1=pd.DataFrame({'test':['terrific']})
t2=pd.DataFrame({'test':['terrific']})
t3=pd.DataFrame({'test':['terrific']})
t4=pd.DataFrame({'test':['terrific']})

for somedf in (t1, t2, t3, t4):
    if somedf in locals():
        print(locals())
    list(somedf.to_dict().keys())

In [None]:
for each in [t1,t2,t2,t4]:
    if each in globals().keys():
        print(each)

In [None]:
y = eval(t)
str(y)