### WaMDaM_Use_Case2.1: What differences are there across datasets in flow data values at a site? 

This notebook demonstrates basic WaMDaM use cases analysis using scientific Python libraries such as [pandas](https://pandas.pydata.org/) and [plotly](https://plot.ly/).  It reads WaMDaM SQLite data from a published HydroShare Generic Resource, runs SQL script, and them uses Python plotly to visualize the results

**What differences are there across datasets in volume and elevation curves of Hyrum Reservoir, Utah?**

This use case identifies five time series and seasonal flow data for the site below Stewart Dam, Idaho

For more info: http://docs.wamdam.org/UseCases/use_case_2/#use-case-2.1



### Retrieve a resource using its ID

WaMDaM database test file (SQLite) on HydroShare
https://www.hydroshare.org/resource/1601e9f029984a87affcd94af6b4bad0/

The data for our processing routines can be retrieved using the `getResourceFromHydroShare` function by passing it the global identifier from the url above.

In [None]:
import os
from utilities import hydroshare
#from hs_restclient import HydroShare

import sqlite3
import numpy as np
import pandas as pd

import plotly
import plotly.plotly as py
import plotly.graph_objs as go

from random import randint
import pandas as pd


In [None]:
!pip install plotly  # python 2.7
!pip3 install plotly # python3
!conda install -c plotly plotly -y



Next we need to establish a secure connection with HydroShare. This is done by simply instantiating the hydroshare class that is defined within hs_utils. In addition to connecting with HydroShare, this command also sets environment variables for several parameters that may useful to you:

Your username
The ID of the resource which launched the notebook
The type of resource that launched this notebook
The url for the notebook server.

In [None]:
# establish a secure connection to HydroShare
hs = hydroshare.hydroshare()


In [None]:
### Retrieve a resource using its ID

# The data for our processing routines can be retrieved using the `getResourceFromHydroShare` function by passing it the global identifier from the url above
# get some resource content. The resource content is returned as a dictionary
# Abdallah, A. (2018). Bear River Datasets, HydroShare, http://www.hydroshare.org/resource/bec9b20118804d119c4bfc232caea559
content = hs.getResourceFromHydroShare('bec9b20118804d119c4bfc232caea559')



In [None]:
conn = sqlite3.connect(hs.content["BearRiverDatasets_Jan2018.sqlite"])
print 'done'

In [None]:
import urllib

# 
txt = urllib.urlopen("https://raw.githubusercontent.com/WamdamProject/WaMDaM_UseCases/master/UseCases_files/4Queries_SQL/UseCase2/UseCase2.1/2_Identify_aggregate_TimeSeriesValues.sql").read()

#df_TimeSeries = pd.read_sql_query(txt, conn)
#df_TimeSeries.to_csv('query_resut.csv')
# df

In [None]:
import plotly.offline as offline
import plotly.graph_objs as go

offline.init_notebook_mode()

In [None]:
# Use Case 2.2Identify_aggregate_TimeSeriesValues.csv
# plot aggregated to monthly and converted to acre-feet time series data of multiple sources

# Adel Abdallah
# November 16, 2017

import plotly
import plotly.plotly as py
import plotly.graph_objs as go

from random import randint
import pandas as pd

## read the input data from GitHub csv file which is a direct query output for this  query:
# 2.2Identify_aggregate_TimeSeriesValues.csv

df_TimeSeries = pd.read_csv("https://raw.githubusercontent.com/WamdamProject/WaMDaM_UseCases/master/UseCases_files/5Results_CSV/2.2Identify_aggregate_TimeSeriesValues.csv")

#df = pd.read_csv(results)

# identify the data for four time series only based on the DatasetAcronym column header 
column_name = "DatasetAcronym"
subsets = df_TimeSeries.groupby(column_name)
data = []

# for each subset (curve), set up its legend and line info manually so they can be edited
subsets_settings = {
    'UDWRFlowData': {
        'dash': 'solid',
        'legend_index': 0,
        'legend_name': 'Utah Division of Water Res.',
        'width':'3',
        'color':'rgb(153, 15, 15)'
        },
    'CUHASI': {
        'dash': 'dash',
        'legend_index': 1,
        'legend_name': 'USGS',
        'width':'4',
        'color':'rgb(15, 107, 153)'
        },
    'IdahoWRA': {
        'dash': 'soild',
        'legend_index': 2,
        'legend_name': 'Idaho Department of Water Res.',
        'width':'3',
        'color':'rgb(38, 15, 153)'
        },    
    'BearRiverCommission': { # this oone is the name of subset as it appears in the csv file
        'dash': 'dot',     # this is properity of the line (curve)
        'legend_index': 3,   # to order the legend
        'legend_name': 'Bear River Commission',  # this is the manual curve name 
         'width':'4',
        'color':'rgb(107, 153, 15)'
        }
    }
    
# This dict is used to map legend_name to original subset name
subsets_names = {y['legend_name']: x for x,y in subsets_settings.iteritems()}

# prepare the scater plot for each curve
for subset in subsets.groups.keys():
    #print subset
    dt = subsets.get_group(name=subset)
    s = go.Scatter(
                    x=dt.CalenderYear.map(lambda z: str(z)[:-3]),
                    y=dt['CumulativeMonthly'],
                    name = subsets_settings[subset]['legend_name'],
                    line = dict(
                        color =subsets_settings[subset]['color'],
                        width =subsets_settings[subset]['width'], 
                        dash=subsets_settings[subset]['dash']
                               ),
                        opacity = 1                                
                  )
    data.append(s)
    
# Legend is ordered based on data, so we are sorting the data based 
# on desired legend order indicarted by the index value entered above
data.sort(key=lambda x: subsets_settings[subsets_names[x['name']]]['legend_index'])

# set up the figure layout parameters
layout = dict(
     #title = "UseCase3.2",
     yaxis = dict(
         title = "Cumulative monthly flow <br> (acre-feet/month)",
         tickformat= ',',
         zeroline=True,
         showline=True,
         ticks='outside',
         ticklen=15,
         #zerolinewidth=4,
         zerolinecolor='#00000',

         dtick=30000,
                 ),
    xaxis = dict(
         #title = "Time <br> (month/year)",
         #autotick=False,
        tick0='1900-01-01',
        dtick='M180',
        ticks='inside',
        tickwidth=0.5,
        #zerolinewidth=4,
        ticklen=27,
        zerolinecolor='#00000',
        tickcolor='#000',
        tickformat= "%Y",
       range = ['1920', '2020']

                ),
    legend=dict(
        x=0.2,y=0.9,
        bordercolor='#00000',
            borderwidth=2


                ),
    autosize=False,
    width=1200,
    height=800,
    margin=go.Margin(l=300, b=150),
    #paper_bgcolor='rgb(233,233,233)',
    #plot_bgcolor='rgb(233,233,233)',
    
    
    font=dict( size=35)
             )
# create the figure object            
fig = dict(data=data, layout=layout)

# plot the figure 
offline.iplot(fig,filename = 'jupyter/2.2Identify_aggregate_TimeSeriesValues' )       


## it can be run from the local machine on Pycharm like this like below
## It would also work here offline but in a seperate window  

#plotly.offline.plot(fig, filename = "2.2Identify_aggregate_TimeSeriesValues.html") 





### Zone in to the 

In [None]:
# Use Case 2.2bIdentify_aggregate_TimeSeriesValues.py
# plot aggregated to monthly and converted to acre-feet time series data of multiple sources

# Adel Abdallah
# November 16, 2017

import plotly
import plotly.plotly as py
import plotly.graph_objs as go

from random import randint
import pandas as pd

## read the input data from GitHub csv file which is a direct query output for this  query:
# 3.2Identify_aggregate_TimeSeriesValues.sql


# identify the data for four time series only based on the DatasetAcronym column header 
column_name = "DatasetAcronym"
subsets = df_TimeSeries.groupby(column_name)
data = []

# for each subset (curve), set up its legend and line info manually so they can be edited

subsets_settings = {
    'UDWRFlowData': {
        'symbol': "star",
        'legend_index': 0,
        'legend_name': 'Utah Division of Water Res.',
        'width':'2',
        'size' :'7',
        'color':'rgb(153, 15, 15)',
        'mode': 'lines+markers'
        },
    'CUHASI': {
        'symbol': "square",
        'legend_index': 1,
         'size' :'10',
        'legend_name': 'CUAHSI',
        'width':'3',
        'color':'rgb(15, 107, 153)',
        'show_legend': False,
        },
    'IdahoWRA': {
        'symbol': "triangle-down",
        'legend_index': 2,
         'size' :'6',
        'legend_name': 'Idaho Department of Water Res.',
        'width':'3',
        'color':'rgb(38, 15, 153)'
        },    
    'BearRiverCommission': { # this one is the name of subset as it appears in the csv file
        'symbol': "106",     # this is property of the line (curve)
                'size' :'6',

        'legend_index': 3,   # to order the legend
        'legend_name': "Bear River Commission",  # this is the manual curve name 
         'width':'4',
        'color':'rgb(107, 153, 15)'
        }
    }
    
# This dict is used to map legend_name to original subset name
subsets_names = {y['legend_name']: x for x,y in subsets_settings.iteritems()}

# prepare the scater plot for each curve
for subset in subsets.groups.keys():
    print subset
    dt = subsets.get_group(name=subset)
    s = go.Scatter(
        x=dt.CalenderYear.map(lambda z: str(z)[:-3]),
        y=dt['CumulativeMonthly'],
        name = subsets_settings[subset]['legend_name'],       
        opacity = 1,
        
        # Get mode from settings dictionary, if there is no mode
        # defined in dictinoary, then default is markers.
        mode = subsets_settings[subset].get('mode', 'markers'),
        
        # Get legend mode from settings dictionary, if there is no mode
        # defined in dictinoary, then default is to show item in legend.
        showlegend = subsets_settings[subset].get('show_legend', True),
        
        marker = dict(
            size =subsets_settings[subset]['size'],
            color = '#FFFFFF',      # white
            symbol =subsets_settings[subset]['symbol'],
            line = dict(
                color =subsets_settings[subset]['color'],
                width =subsets_settings[subset]['width'], 
                ),
            ),
            
        line = dict(
            color =subsets_settings[subset]['color'],
            width =subsets_settings[subset]['width'], 
            ),
        )
    
    data.append(s)
    
# Legend is ordered based on data, so we are sorting the data based 
# on desired legend order indicated by the index value entered above
data.sort(key=lambda x: subsets_settings[subsets_names[x['name']]]['legend_index'])

# set up the figure layout parameters
layout = dict(
     #title = "UseCase3.2",
     yaxis = dict(
         title = "Cumulative monthly flow <br> (acre-feet/month)",
         tickformat= ',',
         zeroline=True,
         showline=True,
         ticks='outside',
         ticklen=15,
         #zerolinewidth=4,
         zerolinecolor='#00000',
         range = ['0', '6000'],
         dtick=1000,
                 ),
    xaxis = dict(
         #title = "Time <br> (month/year)",
         #autotick=False,
        tick0='1994-01-01',
        showline=True,
        dtick='M12',
        ticks='outside',
        tickwidth=0.5,
        #zerolinewidth=4,
        ticklen=27,
        #zerolinecolor='#00000',
        tickcolor='#000',
        tickformat= "%Y",
        range = ['1994', '2000']
                ),
    legend=dict(
        x=0.3,y=1,
        bordercolor='#00000',
            borderwidth=2


                ),
    autosize=False,
    width=1200,
    height=800,
    margin=go.Margin(l=300, b=150),
    #paper_bgcolor='rgb(233,233,233)',
    #plot_bgcolor='rgb(233,233,233)',
    
    
    font=dict( size=35)
             )
             
# create the figure object            
fig = dict(data=data, layout=layout)

# plot the figure 
#py.iplot(fig, filename = "2.2bIdentify_aggregate_TimeSeriesValues")       


## it can be run from the local machine on Pycharm like this like below
## It would also work here offline but in a seperate window  
offline.iplot(fig,filename = 'jupyter/2.2bIdentify_aggregate_TimeSeriesValues' )       


## Seasonal flow data 

In [None]:
import urllib

# 
txt = urllib.urlopen("https://raw.githubusercontent.com/WamdamProject/WaMDaM_UseCases/master/UseCases_files/4Queries_SQL/UseCase2/UseCase2.1/3_Identify_SeasonalValues.sql").read()

#df_Seasonal = pd.read_sql_query(txt, conn)
#df_Seasonal.to_csv('query_resut.csv')
#df_Seasonal

In [None]:
# Use Case 2.3Identify_SeasonalValues

# plot Seasonal data for multiple scenarios

# Adel Abdallah
# November 16, 2017


import plotly
import plotly.plotly as py
import plotly.graph_objs as go
from random import randint
import pandas as pd

## read the input data from GitHub csv file which is a direct query output
# 3.3Identify_SeasonalValues.csv 

df_Seasonal = pd.read_csv("https://raw.githubusercontent.com/WamdamProject/WaMDaM_UseCases/master/UseCases_files/5Results_CSV/2.3Identify_SeasonalValues.csv")

#get the many curves by looking under "ScenarioName" column header. 
#Then plot Season name vs season value
column_name = "ScenarioName"
subsets = df_Seasonal.groupby(column_name)

data = []


#for each subset (curve), set up its legend and line info manually so they can be edited
subsets_settings = {
    'Bear Wet Year Model': {
        'dash': 'solid',
         'mode':'lines+markers',
        'width':'4',
        'legend_index': 0,
        'legend_name': 'Wet Year Model',
         'color':'rgb(41, 10, 216)'
        },

    'Bear Normal Year Model': { # this oone is the name of subset as it appears in the csv file
        'dash': 'solid',     # this is properity of the line (curve)
        'width':'4',
        'mode':'lines+markers',
        'legend_index': 1,   # to order the legend
        'legend_name': 'Normal Year Model',  # this is the manual curve name 
         'color':'rgb(38, 77, 255)'

        },
    'Bear Dry Year Model': {
        'dash': 'solid',
        'mode':'lines+markers',
         'width':'4',
        'legend_index': 2,
        'legend_name': 'Dry Year Model',
         'color':'rgb(63, 160, 255)'
        },


        }


# This dict is used to map legend_name to original subset name
subsets_names = {y['legend_name']: x for x,y in subsets_settings.iteritems()}


for subset in subsets.groups.keys():
    print subset
    dt = subsets.get_group(name=subset)
    s = go.Scatter(
                    x=df_Seasonal.SeasonName,
                    y=dt['SeasonNumericValue'],
                    name = subsets_settings[subset]['legend_name'],
                    line = dict(
                        color =subsets_settings[subset]['color'],
                        width =subsets_settings[subset]['width'],
                        dash=subsets_settings[subset]['dash']
                                ),
                    marker=dict(size=10),            
                    opacity = 0.8
                   )
    data.append(s)
    
    
# Legend is ordered based on data, so we are sorting the data based 
# on desired legend order indicarted by the index value entered above
data.sort(key=lambda x: subsets_settings[subsets_names[x['name']]]['legend_index'])

    

layout = dict(
    #title = "Use Case 3.3",
    yaxis = dict(
        title = "Cumulative flow <br> (acre-feet/month)",
        tickformat= ',',
        showline=True,
        dtick='5000',
        ticks='outside',
        ticklen=10

                ),
    
    xaxis = dict(
        #title = "Month",
        ticks='inside',

        ticklen=25
                    ),
    legend=dict(
        x=0.6,y=0.5,
          bordercolor='#00000',
            borderwidth=2
               ),
    width=1200,
    height=800,
    #paper_bgcolor='rgb(233,233,233)',
    #plot_bgcolor='rgb(233,233,233)',
    margin=go.Margin(l=260,b=100),
    font=dict(size=35)
             )
# create a figure object
fig = dict(data=data, layout=layout)
#py.iplot(fig, filename = "2.3Identify_SeasonalValues") 


## it can be run from the local machine on Pycharm like this like below
## It would also work here offline but in a seperate window  
offline.iplot(fig,filename = 'jupyter/3Identify_SeasonalValues' )       



# CDF Plot 

In [None]:
# Use Case 2.4_plotcdf 

# plot Cumulative flow for June for the UDWR dataset. 
# Then get the percentage of time it exceeds dry and wet years 

# Adel Abdallah
# Dec 2, 2017


import plotly
import plotly.plotly as py
import plotly.graph_objs as go
import numpy as np
import scipy
import pandas as pd

## read the input data from GitHub csv file which is a direct query output for this  query:
# 3.2Identify_aggregate_TimeSeriesValues.sql

# Convert CalenderYear column data type to datetime
df_TimeSeries['CalenderYear'] = pd.to_datetime(df_TimeSeries['CalenderYear'], errors='coerce')

# Slice rows based on DatasetAcronym column
subsets = df_TimeSeries.groupby('DatasetAcronym')

# Select rows where DatasetAcronym is UDWRFlowData
dt = subsets.get_group(name='UDWRFlowData')

# From the selected rows, select rows where month is June
specific_month = dt.CalenderYear.dt.month == 6

# CumulativeMonthly data of the desired DatasetAcronym name and month
cumulative_monthly = dt[specific_month].CumulativeMonthly.values.tolist()

# Sort cumulative_monthly in ascending order
cumulative_monthly.sort()

# Save the filtered data to csv, CumulativeMonthly and CalenderYear columns
filtered_data = dt[specific_month][['CumulativeMonthly', 'CalenderYear']]
filtered_data.to_csv('Filtered Data.csv', index=False)


# Create the y-axis list, which should be same length as x-axis and range
# from 0 to 1, to represent probability and have equal spacing between it's
# numbers, so we create a list of floats starting from 1 to length of
# cumsum(which represents the x-axis) + 1, (+1) because we started from 1 not 0,
# we want the same length of cumsum, and we are dividing the list by length of
# cumsum to produce the desired probability values, So the last number in the
# list should be equal to the length of cumsum, so that when we divide both
# both values we get 1.
# To get the last number equal length of cumsum, we have to use
# max range = len(cumsum)+1, because np.arange will stop before
# the maximum number, so it will stop at len(cumsum)
probability = np.arange(1.0, len(cumulative_monthly)+1) /len(cumulative_monthly) # 1.0 to make it float

data = []
# just plot the sorted_data array against the number of items smaller 
# than each element in the array 

cdf = go.Scatter(
    x = cumulative_monthly,
    y = probability,
        showlegend=True,
name='UDWR from 1923 to 2014',
    marker = dict(
        color='rgb(0, 0, 0)'
        )
    )

cdfdata=pd.DataFrame(data=dict(probability=probability,cumulative_monthly=cumulative_monthly))

data.append(cdf)


# Save the filtered data to csv, CumulativeMonthly and probability columns
filtered_data = cdfdata
filtered_data.to_csv('CDF_data.csv', index=False)


# cdfdata

lowerthanDry=cdfdata.loc[cdfdata['cumulative_monthly'] <= 666, 'probability']
# print lowerthanDry

UpperthanNormal=cdfdata.loc[cdfdata['cumulative_monthly'] >= 2506, 'probability']
# print UpperthanNormal

UpperthanWet=cdfdata.loc[cdfdata['cumulative_monthly'] >= 17181, 'probability']
# print UpperthanWet



# vertical line dry year 
dry = go.Scatter(
    x=[666, 666 ],
    y=[0, 0.48],
    mode='lines',
        name='Dry year scenario <br> (BRSDM model)',
    hoverinfo='dry',
    showlegend=True,
    line=dict(
        shape='vh',
        width='4',
        dash = 'dot',
        color = '#3FA0FF'
            )
                    )
data.append(dry)



# horizontal line dry year 
dryHo = go.Scatter(
    x=[0, 666 ],
    y=[0.48, 0.48],
    mode='lines',
        name='Dry year scenario <br> (BRSDM model)',
    hoverinfo='dry',
    showlegend=False,
    line=dict(
        shape='vh',
        width='4',
        dash = 'dot',
        color = '#3FA0FF'
            )
                    )
data.append(dryHo)
# ------------------------------------------------------------


# vertical line normal year 
normal = go.Scatter(
    x=[2506, 2506],
    y=[0, 0.844],
    mode='lines',
        name='Normal year scenario <br> (BRSDM model)',
    hoverinfo='wet',
    showlegend=True,
    line=dict(
        shape='vh',
        dash = 'dashdot',
        width='4',
        color = '#264DFF'
            )
                    )
data.append(normal)


# horizontal line normal year 
normalHo = go.Scatter(
    x=[0, 2506],
    y=[0.844, 0.844],
    mode='lines',
        name='Normal year scenario <br> (BRSDM model)',
    hoverinfo='wet',
    showlegend=False,
    line=dict(
        shape='vh',
        dash = 'dashdot',
        width='4',
        color = '#264DFF'
            )
                    )
data.append(normalHo)

# ------------------------------------------------------------


# vertical line wet year 
wet = go.Scatter(
    x=[17181, 17181],
    y=[0, 0.93],
    mode='lines',
        name='Wet year scenario <br> (BRSDM model)',
    hoverinfo='wet',
    showlegend=True,
    line=dict(
        shape='vh',
        dash = 'dash',
        width='4',
        color = '#290AD8'
            )
                    )
data.append(wet)


# horizontal line wet year 
wetHo = go.Scatter(
    x=[0, 17181],
    y=[0.93, 0.93],
    mode='lines',
        name='Wet year scenario <br> (BRSDM model)',
    hoverinfo='wet',
    showlegend=False,
    line=dict(
        shape='vh',
        dash = 'dash',
        width='4',
        color = '#290AD8'
            )
                    )
data.append(wetHo)





layout = go.Layout(
    xaxis = dict(
        title = "Cumulative flow for June <br> (acre-feet/month)",
        zeroline=True,
         #showline=True,
        tickformat= ',',
        dtick='10000',
        ticks='inside',
        ticklen=25,   
        range = ['0', '40000'],


            ),
    yaxis = dict(
                title = 'Cumulative probability',
                dtick='0.1',
                ticks='outside',
                ticklen=25,
#                 range = ['0', '1'],


             showline=True,
),
    font=dict(size=35,family='arial'),
    width=1100,
    height=800,
    margin=go.Margin(
        l=230,
        b=150       ),
    legend=dict(
        x=0.5,y=0.5,
            bordercolor='#00000',
            borderwidth=2, 
     font=dict(
            family='arial',
            size=35                    )           
    ),
 
        
        
        
    )

fig = dict(data=data, layout=layout)

offline.iplot(fig,filename = 'jupyter/2.4_plotcdf' )       


<a id='section4'></a>
### 4. Creating a new HydroShare resource

The best way to save your data is to put it back into HydroShare and is done using the `createHydroShareResource` function. The first step is to identify the files you want to save to a HydroShare.  The cell below lists all the files in the current working directory.

In [None]:
# define HydroShare required metadata
title = 'WaMDaM_Use_Case2.1'
abstract = 'This a test for runing a use case of wamdam using jupyter in HydroShare'
keywords = ['Time series', 'Bear River']

# set the resource type that will be created.
rtype = 'genericresource'

# create a list of files that will be added to the HydroShare resource.
    
files = [hs.content['BearRiverDatasets_Jan2018.sqlite'],'WaMDaM_Use_Case2.1.ipynb']  # this notebook

        

In [None]:
# create a hydroshare resource containing these data
resource_id = hs.createHydroShareResource(abstract, 
                                          title, 
                                          keywords=keywords, 
                                          resource_type=rtype, 
                                          content_files=files, 
                                          public=False)

## 4. Additional Info and citation

For additional information on WaMDaM, please refer to:

http://docs.wamdam.org/