Github will render jupyter notebooks, but Bokeh plots won't work<br>
View this notebook with NBViewer:<br>
https://nbviewer.jupyter.org/github/DouglasPatton/Hydro/blob/master/main.ipynb

# Welcome to my hydrology modeling tool, Hydro
<br>
## This model is a continuation of my work:
<br>
Patton, Douglas A, Rebecca Moore, Alan P Covich, and John C Bergstrom. 2013. “Ex-Post Reliability
Assessment of Benefit Transfer Valuation Estimates of Wetland Ecosystem Service Supported by
Okefenokee National Wildlife Refuge.” SSRN Working Paper. https://ssrn.com/abstract=2294080

In [1]:
import numpy as np
import USGShydro #view via https://nbviewer.jupyter.org/github/DouglasPatton/Hydro/blob/master/helpers.ipynb

to do:
### tools for handling data
-USGShydro.py<br>
<https://nbviewer.jupyter.org/github/DouglasPatton/Hydro/blob/master/helpers.ipynb><br>
- ~~download rainfall~~ 
- ~~download runoff~~
- ~~research time conversions~~
- ~~match data values aross time values~~
- ~~check for gaps in time~~
- ~~convert data to numpy array~~
- ~~create simple plot to view downloaded gage and precip data~~
- verify parameter download
- handle errors
- merge time series from different XML files capability
    - check data from overlapping time periods
    - accomodate series with unequal frequencies
        - distinguish between missing values and lower frequency series
- Use bokeh to interactively plot each site, its basin, and NLCD data.
    - overlay rainfall runoff?
    - multi site tool to connect rainfall runoff plot to each basin?

-----------------------
### tools for modeling runoff
rainfallrunoff.py

- ~~create time series dataset of lags, etc.~~
- ~~drop observations with missing values~~
- create and run
    - ~~simple distributed lag model~~
    - Locally weighted distributed lag model
      - point estimate
      - weighted average
      - use boosting to estimate local models for observations with high error
- plot predicted runoff vs. actual
- predict days above flood stage with models trained over different time periods

-----------------------
### general time series tools
tstools.py
- ~~create tool to create lagged variables~~

-----------------------
### Multi-site tools
- Missing data
    - combine with nearby site data and use latent variable matrix factorization based approach to fill in missing values
- create tool to query sites for availability of needed series
- compare sites and see if model relates to hydrologic featues, basin topography, landcover, etc.
------------------------


#### setup the request

In [2]:
site='02314500' #Fargo, Ga below the Okefenokee NWR
start='2011-01-01T00:00-0400'
start='2010-12-01T00:00-0400'
end='2011-03-15T00:00-0400'
paramlist=['00045', '00065'] #must be entered as strings


#### setup a single, global (for this site and data) distributed lag model for all of the data with all 0 to 90 lags of precipitation, a single lag of runoff, a constant term

In [3]:
modelfeatures={'RRmodeltype':'distributed_lag', 'maxlag':90, 'startlag':0, 'incl_AR1':'yes','incl_constant':'yes','local':'no', 'local_count':0}#dictionary of model features

#### Create the object, downloading data if not already saved, saving if not already saved, clean data, convert to numpy and run the model with selected features. 

In [4]:
try1=USGShydro.Hydrositedatamodel(site,start,end,paramlist,modelfeatures) 

all series have matching times from start to end
all time steps are evenly spaced
The request has returned 9981 observations for 2 series


### The plot below is created with the Python package, 'bokeh'. You can manipulate the plot using the tools on the right side. 

In [5]:
try1.simpleplot() #plot a time series of rainfall and gage height (above minimum for series)

#### print numpy array of m observations spaced evenly from start to end time (time,precip,gageht)

In [6]:
m=10
try1.data_array[0:try1.data_array.shape[0]:int(try1.data_array.shape[0]/m),:]

array([[0.00000000e+00, 2.00000000e-02, 5.30000000e-01],
       [1.03958333e+01, 0.00000000e+00, 4.40000000e-01],
       [2.07916667e+01, 0.00000000e+00, 6.20000000e-01],
       [3.11875000e+01, 0.00000000e+00, 7.10000000e-01],
       [4.15833333e+01, 0.00000000e+00, 9.50000000e-01],
       [5.19791667e+01, 0.00000000e+00, 9.80000000e-01],
       [6.23750000e+01, 0.00000000e+00, 1.10000000e+00],
       [7.27708333e+01, 0.00000000e+00, 1.67000000e+00],
       [8.31666667e+01, 0.00000000e+00, 1.33000000e+00],
       [9.35625000e+01, 0.00000000e+00, 1.12000000e+00],
       [1.03958333e+02, 0.00000000e+00, 1.22000000e+00]])

### run the first model as specified by the model features dictionary above

In [7]:
try1.runTSmodel1()

#### plot the predicted runoff values. The AR1 term makes for a very close fit. 

In [8]:
try1.predictplot()

#### try again with new model features

In [9]:
try1.runTSmodel1({'RRmodeltype':'distributed_lag', 'maxlag':200, 'startlag':1, 'incl_AR1':'no','incl_constant':'yes','local':'no', 'local_count':0})

In [10]:
try1.predictplot()

#### Create and check pandas dataset for each observation to prepare for geopandas

In [11]:
try1.geoplot() #create pandas dataset for each series to prepare for gis plotting
try1.df.head()

2


NameError: name 'l' is not defined

In [None]:
len(try1.latlon)