## WaMDaM_Use_Case 3.1_Seasonal: What seasonal flow values to use at a site (e.g., below Steward Dam)? 

#### By Adel M. Abdallah, Utah State University, August 2018


This notebook demonstrates basic WaMDaM use cases analysis using scientific Python libraries such as [pandas](https://pandas.pydata.org/) and [plotly](https://plot.ly/).  It reads WaMDaM SQLite data from a published HydroShare Generic Resource, runs SQL script, and them uses Python plotly to visualize the results

This use case identifies five time series and seasonal flow data for the site below Stewart Dam, Idaho


Execute the following cells by pressing `Shift-Enter`, or by pressing the play button 
<img style='display:inline;padding-bottom:15px' src='play-button.png'>
on the toolbar above.



### Steps to reproduce this use case results and plots 

1.[Import python libraries](#Import)   
   
   
2.[Connect to the WaMDaM populated SQLite file](#Connect)    
 
 
3.[Query WaMDaM database for flow seasonal data](#QueryFlowSeasonal)   
  
  
4.[Plot the seasonal figure](#Seasonal_13a)  

 
5.[Query WaMDaM database for time series to create the (cumulative distribution function) CDF plot](#QueryTimeSeries)  
 
6.[Plot the CDF figure ](#PlotCDF)  

7.[Close the SQLite and WEAP API connections](#Close)  



<a name="Import"></a>
# 1. Import python libraries 


In [None]:
# 1. Import python libraries 
### set the notebook mode to embed the figures within the cell

import plotly
plotly.__version__
import plotly.offline as offline
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
offline.init_notebook_mode(connected=True)
from plotly.offline import init_notebook_mode, iplot
from plotly.graph_objs import *

init_notebook_mode(connected=True)         # initiate notebook for offline plot

import os
import csv
from collections import OrderedDict
import sqlite3
import pandas as pd
import numpy as np
from IPython.display import display, Image, SVG, Math, YouTubeVideo
import urllib

print 'The needed Python libraries have been imported'

<a name="Connect"></a>
# 2. Connect to the WaMDaM populated SQLite file 

In [None]:
# 2. Connect to the WaMDaM populated SQLite file 

# Then we can run queries against it within this notebook :)  

# the SQLite file is published here 
#https://github.com/WamdamProject/WaMDaM_UseCases/blob/master/UseCases_files/3SQLite_database/BearRiverDatasets_June_2018.sqlite

conn = sqlite3.connect('BearRiverDatasets_June_2018_Final.sqlite')

print 'Connected to the WaMDaM SQLite file called: BearRiverDatasets_June_2018_Final'

<a name="QueryFlowSeasonal"></a>
# 3. Query WaMDaM dababase for flow seasonal data 

In [None]:
# Use Case 3.1Identify_aggregate_TimeSeriesValues.csv
# plot aggregated to monthly and converted to acre-feet time series data of multiple sources



# 2.2Identify_aggregate_TimeSeriesValues.csv
Query_UseCase3_1_seasonal_URL="""
https://raw.githubusercontent.com/WamdamProject/WaMDaM_UseCases/master/UseCases_files/4Queries_SQL/UseCase3/UseCase3.1/3_Identify_SeasonalValues.sql

"""

# Read the query text inside the URL
Query_UseCase3_1_Seasonal_text = urllib.urlopen(Query_UseCase3_1_seasonal_URL).read()


# return query result in a pandas data frame
result_df_UseCase3_1_Seasonal= pd.read_sql_query(Query_UseCase3_1_Seasonal_text, conn)

# uncomment the below line to see the list of attributes
display (result_df_UseCase3_1_Seasonal)


# Save the datafrom as a csv file into the Jupyter notebook working space
result_df_UseCase3_1_Seasonal.to_csv('UseCases_Results_csv\UseCase3_1_Seasonal.csv', index = False)

print "Queries are done"

<a name="Seasonal_13a"></a>
# 4. Plot the seasonal figure 



#### Reproduce this plot [Figure 13-A] in the WaMDaM paper 


<img src="https://github.com/WamdamProject/WaMDaM_UseCases/raw/master/UseCases_files/8Figures_jpg/UseCase3.1_seasonal_a.png" width="800">


In [None]:
# Use Case 2.3Identify_SeasonalValues

# plot Seasonal data for multiple scenarios



import plotly
import plotly.plotly as py
import plotly.graph_objs as go
from random import randint
import pandas as pd

## read the input data from GitHub csv file which is a direct query output
# 3.3Identify_SeasonalValues.csv 

df_Seasonal =result_df_UseCase3_1_Seasonal
#get the many curves by looking under "ScenarioName" column header. 
#Then plot Season name vs season value
column_name = "ScenarioName"
subsets = df_Seasonal.groupby(column_name)

data = []


#for each subset (curve), set up its legend and line info manually so they can be edited
subsets_settings = {
    'Bear Wet Year Model': {
        'dash': 'solid',
         'mode':'lines+markers',
        'width':4,
        'legend_index': 0,
        'legend_name': 'Wet Year Model',
         'color':'rgb(41, 10, 216)'
        },

    'Bear Normal Year Model': { # this oone is the name of subset as it appears in the csv file
        'dash': 'solid',     # this is properity of the line (curve)
        'width':4,
        'mode':'lines+markers',
        'legend_index': 1,   # to order the legend
        'legend_name': 'Normal Year Model',  # this is the manual curve name 
         'color':'rgb(38, 77, 255)'

        },
    'Bear Dry Year Model': {
        'dash': 'solid',
        'mode':'lines+markers',
         'width':4,
        'legend_index': 2,
        'legend_name': 'Dry Year Model',
         'color':'rgb(63, 160, 255)'
        },


        }


# This dict is used to map legend_name to original subset name
subsets_names = {y['legend_name']: x for x,y in subsets_settings.iteritems()}


for subset in subsets.groups.keys():
    print subset
    dt = subsets.get_group(name=subset)
    s = go.Scatter(
                    x=df_Seasonal.SeasonName,
                    y=dt['SeasonNumericValue'],
                    name = subsets_settings[subset]['legend_name'],
                    line = dict(
                        color =subsets_settings[subset]['color'],
                        width =subsets_settings[subset]['width'],
                        dash=subsets_settings[subset]['dash']
                                ),
                    marker=dict(size=10),            
                    opacity = 0.8
                   )
    data.append(s)
    
    
# Legend is ordered based on data, so we are sorting the data based 
# on desired legend order indicarted by the index value entered above
data.sort(key=lambda x: subsets_settings[subsets_names[x['name']]]['legend_index'])

    

layout = dict(
    #title = "Use Case 3.3",
    yaxis = dict(
        title = "Cumulative flow <br> (acre-feet/month)",
        tickformat= ',',
        showline=True,
        dtick='5000',
        ticks='outside',
        ticklen=10

                ),
    
    xaxis = dict(
        #title = "Month",
        ticks='inside',

        ticklen=25
                    ),
    legend=dict(
        x=0.6,y=0.5,
          bordercolor='#00000f',
            borderwidth=2
               ),
    width=1200,
    height=800,
    #paper_bgcolor='rgb(233,233,233)',
    #plot_bgcolor='rgb(233,233,233)',
    margin=go.Margin(l=260,b=100),
    font=dict(size=35)
             )
# create a figure object
fig = dict(data=data, layout=layout)
#py.iplot(fig, filename = "2.3Identify_SeasonalValues") 


## it can be run from the local machine on Pycharm like this like below
## It would also work here offline but in a seperate window  
offline.iplot(fig,filename = 'UseCase3.1_seasonal_a')#,image='png' )       


###########################################################################################################
# Have you encounterd the messages below? if not, dont worry about it
# ----------------------------------------------
# Javascript error adding output!
# ReferenceError: Plotly is not defined
# See your browser Javascript console for more details.
# ----------------------------------------------

# Do the follwoing:

# Kernel -> Restart -> Clear all outputs and restart
# Save
# Close browser
# Open browser and run again
print "the plot is generated"

<a name="QueryTimeSeries"></a>
# 5. Query WaMDaM dababase for time series to create the (cumulative distribution function) CDF plot


In [None]:
# Use Case 3.1Identify_aggregate_TimeSeriesValues.csv
# plot aggregated to monthly and converted to acre-feet time series data of multiple sources



# 2.2Identify_aggregate_TimeSeriesValues.csv
Query_UseCase3_1_URL="""
https://raw.githubusercontent.com/WamdamProject/WaMDaM_UseCases/master/UseCases_files/4Queries_SQL/UseCase3/UseCase3.1/2_Identify_aggregate_TimeSeriesValues.sql
"""

# Read the query text inside the URL
Query_UseCase3_1_text = urllib.urlopen(Query_UseCase3_1_URL).read()


# return query result in a pandas data frame
result_df_UseCase3_1= pd.read_sql_query(Query_UseCase3_1_text, conn)
df_TimeSeries=result_df_UseCase3_1
# uncomment the below line to see the list of attributes
# display (result_df_required)


# Save the datafrom as a csv file into the Jupyter notebook working space
result_df_UseCase3_1.to_csv('UseCases_Results_csv\UseCase3_1.csv', index = False)

print "Queries are done"

<a name="PlotCDF"></a>
# 6. Plot the CDF figure 


#### Reproduce this plot [Figure 13-B] in the WaMDaM paper 


<img src="https://github.com/WamdamProject/WaMDaM_UseCases/raw/master/UseCases_files/8Figures_jpg/UseCase3.1_seasonal_b.png" width="800">


In [None]:
# generate the CDF table


## read the input data from GitHub csv file which is a direct query output for this  query:
# 3.2Identify_aggregate_TimeSeriesValues.sql

# Convert CalenderYear column data type to datetime
df_TimeSeries['CalenderYear'] = pd.to_datetime(df_TimeSeries['CalenderYear'], errors='coerce')

# Slice rows based on DatasetAcronym column
subsets = df_TimeSeries.groupby('ResourceTypeAcronym')

# Select rows where DatasetAcronym is UDWRFlowData
dt = subsets.get_group(name='UDWRFlowData')

# From the selected rows, select rows where month is June
specific_month = dt.CalenderYear.dt.month == 6

# CumulativeMonthly data of the desired DatasetAcronym name and month
cumulative_monthly = dt[specific_month].CumulativeMonthly.values.tolist()

# Sort cumulative_monthly in ascending order
cumulative_monthly.sort()

# Save the filtered data to csv, CumulativeMonthly and CalenderYear columns
filtered_data = dt[specific_month][['CumulativeMonthly', 'CalenderYear']]
filtered_data.to_csv('Filtered Data.csv', index=False)


# Create the y-axis list, which should be same length as x-axis and range
# from 0 to 1, to represent probability and have equal spacing between it's
# numbers, so we create a list of floats starting from 1 to length of
# cumsum(which represents the x-axis) + 1, (+1) because we started from 1 not 0,
# we want the same length of cumsum, and we are dividing the list by length of
# cumsum to produce the desired probability values, So the last number in the
# list should be equal to the length of cumsum, so that when we divide both
# both values we get 1.
# To get the last number equal length of cumsum, we have to use
# max range = len(cumsum)+1, because np.arange will stop before
# the maximum number, so it will stop at len(cumsum)
probability = np.arange(1.0, len(cumulative_monthly)+1) /len(cumulative_monthly) # 1.0 to make it float

data = []
# just plot the sorted_data array against the number of items smaller 
# than each element in the array 

cdf = go.Scatter(
    x = cumulative_monthly,
    y = probability,
        showlegend=True,
name='UDWR from 1923 to 2014',
    marker = dict(
        color='rgb(0, 0, 0)'
        )
    )

cdfdata=pd.DataFrame(data=dict(probability=probability,cumulative_monthly=cumulative_monthly))

data.append(cdf)


# Save the filtered data to csv, CumulativeMonthly and probability columns
filtered_data = cdfdata
filtered_data.to_csv('CDF_data.csv', index=False)
display (filtered_data)

# cdfdata

lowerthanDry=cdfdata.loc[cdfdata['cumulative_monthly'] <= 666, 'probability']
print 'lowerthanDry='
print lowerthanDry

UpperthanNormal=cdfdata.loc[cdfdata['cumulative_monthly'] >= 2506, 'probability']
print 'UpperthanNormal='
print UpperthanNormal

UpperthanWet=cdfdata.loc[cdfdata['cumulative_monthly'] >= 17181, 'probability']
print 'UpperthanWet='
print UpperthanWet


In [None]:
# Use Case 2.4_plotcdf 

# plot Cumulative flow for June for the UDWR dataset. 
# Then get the percentage of time it exceeds dry and wet years 

# Adel Abdallah




# vertical line dry year 
dry = go.Scatter(
    x=[666, 666 ],
    y=[0, 0.48],
    mode='lines',
        name='Dry year scenario <br> (BRSDM model)',
#     hoverinfo='dry',
    showlegend=True,
    line=dict(
        shape='vh',
        width=4,
        dash = 'dot',
        color = '#3FA0FF'
            )
                    )
data.append(dry)



# horizontal line dry year 
dryHo = go.Scatter(
    x=[0, 666 ],
    y=[0.48, 0.48],
    mode='lines',
        name='Dry year scenario <br> (BRSDM model)',
#     hoverinfo='dry',
    showlegend=False,
    line=dict(
        shape='vh',
        width=4,
        dash = 'dot',
        color = '#3FA0FF'
            )
                    )
data.append(dryHo)
# ------------------------------------------------------------


# vertical line normal year 
normal = go.Scatter(
    x=[2506, 2506],
    y=[0, 0.844],
    mode='lines',
        name='Normal year scenario <br> (BRSDM model)',
#     hoverinfo='wet',
    showlegend=True,
    line=dict(
        shape='vh',
        dash = 'dashdot',
        width=4,
        color = '#264DFF'
            )
                    )
data.append(normal)


# horizontal line normal year 
normalHo = go.Scatter(
    x=[0, 2506],
    y=[0.844, 0.844],
    mode='lines',
        name='Normal year scenario <br> (BRSDM model)',
#     hoverinfo='wet',
    showlegend=False,
    line=dict(
        shape='vh',
        dash = 'dashdot',
        width=4,
        color = '#264DFF'
            )
                    )
data.append(normalHo)

# ------------------------------------------------------------


# vertical line wet year 
wet = go.Scatter(
    x=[17181, 17181],
    y=[0, 0.93],
    mode='lines',
        name='Wet year scenario <br> (BRSDM model)',
#     hoverinfo='wet',
    showlegend=True,
    line=dict(
        shape='vh',
        dash = 'dash',
        width=4,
        color = '#290AD8'
            )
                    )
data.append(wet)


# horizontal line wet year 
wetHo = go.Scatter(
    x=[0, 17181],
    y=[0.93, 0.93],
    mode='lines',
        name='Wet year scenario <br> (BRSDM model)',
#     hoverinfo='wet',
    showlegend=False,
    line=dict(
        shape='vh',
        dash = 'dash',
        width=4,
        color = '#290AD8'
            )
                    )
data.append(wetHo)




layout = go.Layout(
    xaxis = dict(
        title = "Cumulative flow for June <br> (acre-feet/month)",
        zeroline=True,
         #showline=True,
        tickformat= ',',
        dtick='10000',
        ticks='inside',
        ticklen=25,   
        range = ['0', '40000'],


            ),
    yaxis = dict(
                title = 'Cumulative probability',
                dtick='0.1',
                ticks='outside',
                ticklen=25,
#                 range = ['0', '1'],


             showline=True,
),
    font=dict(size=35,family='arial'),
    width=1100,
    height=800,
    margin=go.Margin(
        l=230,
        b=150       ),
    legend=dict(
        x=0.5,y=0.5,
            bordercolor='#00000f',
            borderwidth=2, 
     font=dict(
            family='arial',
            size=35                    )           
    ),
 
        
        
        
    )

fig = dict(data=data, layout=layout)

offline.iplot(fig,filename = 'UseCase3.1_seasonal_b')#,image='png' )       
print "the plot is generated"

<a name="Close"></a>
# 6. Close the SQLite connection

In [None]:
conn.close()

print 'Connection to SQLite engine is disconnected'

# The End :) Congratulations