## WaMDaM_Use_Case 3.1: What flow values to use at a site (e.g., below Steward Dam)? 

This notebook demonstrates basic WaMDaM use cases analysis using scientific Python libraries such as [pandas](https://pandas.pydata.org/) and [plotly](https://plot.ly/).  It reads WaMDaM SQLite data from a published HydroShare Generic Resource, runs SQL script, and them uses Python plotly to visualize the results

This use case identifies five time series and seasonal flow data for the site below Stewart Dam, Idaho

For more info: http://docs.wamdam.org/UseCases/use_case_3/#use-case-3.1


### Steps to reproduce this use case results and plots 

1.[Import python libraries](#Import)   
   
   
2.[Connect to the WaMDaM populated SQLite file](#Connect)    
 
 
3.[Query WaMDaM dababase for flow time series](#QueryFlowTimeSeries)   
  
  
4.[Plot the compiled time series for Stewart Dam (Figure 11-A)](#PlotFlow12A)  
 
 
5.[Plot the last 15 years to show discrepency in time series for Stewart Dam (Figure 12-B))](#PlotFlow12B)  
 
 
6.[Pick a a flow source and update the WaMDaM db to reflect "Verified"](#PickaSource)  
 
 
7.[Connect to the WEAP API](#ConnectWEAP)  
  
  
8.[Prepare the time series to be ready for WEAP](#PrepareWEAP)  
  
  
9.[Load the time series  data into WEAP](#Load)  
   
10.[Close the SQLite and WEAP API connections](#Close)  



# 1. Import python libraries 
<a name="Import"></a>
### Install any missing ones you dont have. How? see at this link here
https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/

In [1]:
# 1. Import python libraries 
### set the notebook mode to embed the figures within the cell

import plotly
plotly.__version__
import plotly.offline as offline
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
offline.init_notebook_mode(connected=True)
from plotly.offline import init_notebook_mode, iplot
from plotly.graph_objs import *

init_notebook_mode(connected=True)         # initiate notebook for offline plot

import os
import csv
from collections import OrderedDict
import sqlite3
import pandas as pd
import numpy as np
from IPython.display import display, Image, SVG, Math, YouTubeVideo
import urllib

print 'imported'

imported


# 2. Connect to the WaMDaM populated SQLite file 
<a name="Connect"></a>

In [2]:
# 2. Connect to the WaMDaM populated SQLite file 


# Then we can run queries against it within this notebook :)  

# the SQLite file is published here 
#https://github.com/WamdamProject/WaMDaM_UseCases/blob/master/UseCases_files/3SQLite_database/BearRiverDatasets_June_2018.sqlite

conn = sqlite3.connect('BearRiverDatasets_June_2018_Final.sqlite')

print 'connected'

connected


# 3. Query WaMDaM dababase for flow time series 
<a name="QueryFlowTimeSeries"></a>

In [3]:
# Use Case 3.1Identify_aggregate_TimeSeriesValues.csv
# plot aggregated to monthly and converted to acre-feet time series data of multiple sources



# 2.2Identify_aggregate_TimeSeriesValues.csv
Query_UseCase3_1_URL="""
https://raw.githubusercontent.com/WamdamProject/WaMDaM_UseCases/master/UseCases_files/4Queries_SQL/UseCase3/UseCase3.1/2_Identify_aggregate_TimeSeriesValues.sql

"""

# Read the query text inside the URL
Query_UseCase3_1_text = urllib.urlopen(Query_UseCase3_1_URL).read()


# return query result in a pandas data frame
result_df_UseCase3_1= pd.read_sql_query(Query_UseCase3_1_text, conn)

# uncomment the below line to see the list of attributes
# display (result_df_required)


# Save the datafrom as a csv file into the Jupyter notebook working space
result_df_UseCase3_1.to_csv('UseCases_Results_csv\UseCase3_1.csv', index = False)



# 4. Plot the compiled time series for Stewart Dam (Figure 11-A)

<a name="PlotFlow12A"></a>

<img src="https://github.com/WamdamProject/WaMDaM_UseCases/raw/master/UseCases_files/8Figures_jpg/UseCase3.1a_TimeSeries.png" width="800">


In [1]:
# 4. Plot the compiled time series for Stewart Dam (Figure 11-A)


df_TimeSeries=result_df_UseCase3_1
# identify the data for four time series only based on the DatasetAcronym column header 
column_name = "ResourceTypeAcronym"
subsets = df_TimeSeries.groupby(column_name)
data = []

# for each subset (curve), set up its legend and line info manually so they can be edited
subsets_settings = {
    'UDWRFlowData': {
        'dash': 'solid',
        'legend_index': 0,
        'legend_name': 'Utah Division of Water Res.',
        'width':'3',
        'color':'rgb(153, 15, 15)'
        },
    'CUAHSI': {
        'dash': 'dash',
        'legend_index': 1,
        'legend_name': 'USGS',
        'width':'4',
        'color':'rgb(15, 107, 153)'
        },
    'IdahoWRA': {
        'dash': 'soild',
        'legend_index': 2,
        'legend_name': 'Idaho Department of Water Res.',
        'width':'3',
        'color':'rgb(38, 15, 153)'
        },    
    'BearRiverCommission': { # this oone is the name of subset as it appears in the csv file
        'dash': 'dot',     # this is properity of the line (curve)
        'legend_index': 3,   # to order the legend
        'legend_name': 'Bear River Commission',  # this is the manual curve name 
         'width':'4',
        'color':'rgb(107, 153, 15)'
        }
    }
    
# This dict is used to map legend_name to original subset name
subsets_names = {y['legend_name']: x for x,y in subsets_settings.iteritems()}

# prepare the scater plot for each curve
for subset in subsets.groups.keys():
    #print subset
    dt = subsets.get_group(name=subset)
    s = go.Scatter(
                    x=dt.CalenderYear.map(lambda z: str(z)[:-3]),
                    y=dt['CumulativeMonthly'],
                    name = subsets_settings[subset]['legend_name'],
                    line = dict(
                        color =subsets_settings[subset]['color'],
                        width =subsets_settings[subset]['width'], 
                        dash=subsets_settings[subset]['dash']
                               ),
                        opacity = 1                                
                  )
    data.append(s)
    
# Legend is ordered based on data, so we are sorting the data based 
# on desired legend order indicarted by the index value entered above
data.sort(key=lambda x: subsets_settings[subsets_names[x['name']]]['legend_index'])

# set up the figure layout parameters
layout = dict(
     #title = "UseCase3.2",
     yaxis = dict(
         title = "Cumulative monthly flow <br> (acre-feet/month)",
         tickformat= ',',
         zeroline=True,
         showline=True,
         ticks='outside',
         ticklen=15,
         #zerolinewidth=4,
         zerolinecolor='#00000',

         dtick=30000,
                 ),
    xaxis = dict(
         #title = "Time <br> (month/year)",
         #autotick=False,
        tick0='1900-01-01',
        dtick='M180',
        ticks='inside',
        tickwidth=0.5,
        #zerolinewidth=4,
        ticklen=27,
        zerolinecolor='#00000',
        tickcolor='#000',
        tickformat= "%Y",
       range = ['1920', '2020']

                ),
    legend=dict(
        x=0.2,y=0.9,
        bordercolor='#00000',
            borderwidth=2


                ),
    autosize=False,
    width=1200,
    height=800,
    margin=go.Margin(l=300, b=150),
    #paper_bgcolor='rgb(233,233,233)',
    #plot_bgcolor='rgb(233,233,233)',
    
    
    font=dict( size=35)
             )
# create the figure object            
fig = dict(data=data, layout=layout)

# plot the figure 
offline.iplot(fig,filename = 'UseCase3.1a_TimeSeries',image='png' )       


## it can be run from the local machine on Pycharm like this like below
## It would also work here offline but in a seperate window  

#plotly.offline.plot(fig, filename = "2.2Identify_aggregate_TimeSeriesValues.html") 

###########################################################################################################
# Have you encounterd the messages below? if not, dont worry about it
# ----------------------------------------------
# Javascript error adding output!
# ReferenceError: Plotly is not defined
# See your browser Javascript console for more details.
# ----------------------------------------------

# Do the follwoing:

# Kernel -> Restart -> Clear all outputs and restart
# Save
# Close browser
# Open browser and run again



NameError: name 'result_df_UseCase3_1' is not defined

# 5. Plot the last 15 years to show discrepency in time series for Stewart Dam (Figure 12-B)

<a name="PlotFlow12B"></a>


<img src="https://github.com/WamdamProject/WaMDaM_UseCases/raw/master/UseCases_files/8Figures_jpg/UseCase3.1b_TimeSeries.png" width="800">


In [5]:
# 5. Plot the last 15 years to show discrepency in time series for Stewart Dam (Figure 12-b)

# Use Case 2.2bIdentify_aggregate_TimeSeriesValues.py
# plot aggregated to monthly and converted to acre-feet time series data of multiple sources

# Adel Abdallah
# November 16, 2017

import plotly
import plotly.plotly as py
import plotly.graph_objs as go

from random import randint
import pandas as pd

## read the input data from GitHub csv file which is a direct query output for this  query:
# 3.2Identify_aggregate_TimeSeriesValues.sql


# identify the data for four time series only based on the DatasetAcronym column header 
column_name = "ResourceTypeAcronym"
subsets = df_TimeSeries.groupby(column_name)
data = []

# for each subset (curve), set up its legend and line info manually so they can be edited

subsets_settings = {
    'UDWRFlowData': {
        'symbol': "star",
        'legend_index': 0,
        'legend_name': 'Utah Division of Water Res.',
        'width':'2',
        'size' :'7',
        'color':'rgb(153, 15, 15)',
        'mode': 'lines+markers'
        },
    'CUAHSI': {
        'symbol': "square",
        'legend_index': 1,
         'size' :'10',
        'legend_name': 'CUAHSI',
        'width':'3',
        'color':'rgb(15, 107, 153)',
        'show_legend': False,
        },
    'IdahoWRA': {
        'symbol': "triangle-down",
        'legend_index': 2,
         'size' :'6',
        'legend_name': 'Idaho Department of Water Res.',
        'width':'3',
        'color':'rgb(38, 15, 153)'
        },    
    'BearRiverCommission': { # this one is the name of subset as it appears in the csv file
        'symbol': "106",     # this is property of the line (curve)
                'size' :'6',

        'legend_index': 3,   # to order the legend
        'legend_name': "Bear River Commission",  # this is the manual curve name 
         'width':'4',
        'color':'rgb(107, 153, 15)'
        }
    }
    
# This dict is used to map legend_name to original subset name
subsets_names = {y['legend_name']: x for x,y in subsets_settings.iteritems()}

# prepare the scater plot for each curve
for subset in subsets.groups.keys():
    print subset
    dt = subsets.get_group(name=subset)
    s = go.Scatter(
        x=dt.CalenderYear.map(lambda z: str(z)[:-3]),
        y=dt['CumulativeMonthly'],
        name = subsets_settings[subset]['legend_name'],       
        opacity = 1,
        
        # Get mode from settings dictionary, if there is no mode
        # defined in dictinoary, then default is markers.
        mode = subsets_settings[subset].get('mode', 'markers'),
        
        # Get legend mode from settings dictionary, if there is no mode
        # defined in dictinoary, then default is to show item in legend.
        showlegend = subsets_settings[subset].get('show_legend', True),
        
        marker = dict(
            size =subsets_settings[subset]['size'],
            color = '#FFFFFF',      # white
            symbol =subsets_settings[subset]['symbol'],
            line = dict(
                color =subsets_settings[subset]['color'],
                width =subsets_settings[subset]['width'], 
                ),
            ),
            
        line = dict(
            color =subsets_settings[subset]['color'],
            width =subsets_settings[subset]['width'], 
            ),
        )
    
    data.append(s)
    
# Legend is ordered based on data, so we are sorting the data based 
# on desired legend order indicated by the index value entered above
data.sort(key=lambda x: subsets_settings[subsets_names[x['name']]]['legend_index'])

# set up the figure layout parameters
layout = dict(
     #title = "UseCase3.2",
     yaxis = dict(
         title = "Cumulative monthly flow <br> (acre-feet/month)",
         tickformat= ',',
         zeroline=True,
         showline=True,
         ticks='outside',
         ticklen=15,
         #zerolinewidth=4,
         zerolinecolor='#00000',
         range = ['0', '6000'],
         dtick=1000,
                 ),
    xaxis = dict(
         #title = "Time <br> (month/year)",
         #autotick=False,
        tick0='1994-01-01',
        showline=True,
        dtick='M12',
        ticks='outside',
        tickwidth=0.5,
        #zerolinewidth=4,
        ticklen=27,
        #zerolinecolor='#00000',
        tickcolor='#000',
        tickformat= "%Y",
        range = ['1994', '2000']
                ),
    legend=dict(
        x=0.3,y=1,
        bordercolor='#00000',
            borderwidth=2


                ),
    autosize=False,
    width=1200,
    height=800,
    margin=go.Margin(l=300, b=150),
    #paper_bgcolor='rgb(233,233,233)',
    #plot_bgcolor='rgb(233,233,233)',
    
    
    font=dict( size=35)
             )
             
# create the figure object            
fig = dict(data=data, layout=layout)

# plot the figure 
#py.iplot(fig, filename = "2.2bIdentify_aggregate_TimeSeriesValues")       


## it can be run from the local machine on Pycharm like this like below
## It would also work here offline but in a seperate window  
offline.iplot(fig,filename = 'UseCase3.1b_TimeSeries',image='png' )       


BearRiverCommission
IdahoWRA
UDWRFlowData
CUAHSI


# 6. Pick a a flow source and update the db to reflect "Verified"
<a name="PickaSource"></a>

This "Update" SQL query allows users to update the Mappings table to indicate a "verified" DataValue. 
A verified record set to True indicates that the user has verified, curated, checked, or selected this 
data value as ready to be used for models. A verified recored can then be used from an automated script to 
serve data to models. Its particularly useful when the same set of controlled object type, attribute, and instances names 
return multiple data values from different sources with potentially smiliar or different values due to many factors.

In [None]:
# 6. Pick a a flow source and update the db to reflect "Verified"

# scenario_name_data = subsets.get_group(name='Base case')
# print scenario_name_data
# Get a cursor object

SQL_update = """
UPDATE Mappings 

SET Verified= 'True'
WHERE  MappingID in

(SELECT Mappings.MappingID FROM Mappings

-- Join the Mappings to get their Attributes
LEFT JOIN "Attributes"
ON Attributes.AttributeID= Mappings.AttributeID

-- Join the Attributes to get their ObjectTypes
LEFT JOIN  "ObjectTypes"
ON "ObjectTypes"."ObjectTypeID"="Attributes"."ObjectTypeID"

-- Join the Mappings to get their Instances   
LEFT JOIN "Instances" 
ON "Instances"."InstanceID"="Mappings"."InstanceID"

-- Join the Mappings to get their ScenarioMappings   
LEFT JOIN "ScenarioMappings"
ON "ScenarioMappings"."MappingID"="Mappings"."MappingID"

-- Join the ScenarioMappings to get their Scenarios   
LEFT JOIN "Scenarios"
ON "Scenarios"."ScenarioID"="ScenarioMappings"."ScenarioID"

-- Join the Scenarios to get their MasterNetworks   
LEFT JOIN "MasterNetworks" 
ON "MasterNetworks"."MasterNetworkID"="Scenarios"."MasterNetworkID"

where 
ObjectTypes.ObjectType='Site'  

AND "Instances"."InstanceName"="10046500.MONBEAR RIVER BL STEWART DAM NR MONTPELIER IDAHO"  

AND AttributeName='Delivered volume per month'

AND ScenarioName='Existing data'

AND MasterNetworkName='UDWRFlowData')
"""

cur = conn.cursor()

res = cur.execute(SQL_update)

print 'updated'

# 7. Connect to the WEAP API
<a name="ConnectWEAP"></a>

First make sure to have a copy of the Water Evaluation And Planning" system (WEAP) installed on your local machine (Windows). 
You will need to have an active licence to use the API
For more info, see here http://www.weap21.org/index.asp?action=40

## WEAP API info 
http://www.weap21.org/WebHelp/API.htm

## Install dependency and register WEAP
### A. Install pywin32 extensions which provide access to many of the Windows APIs from Python.
**Choose on option**
1. Install using an executable basedon your python version. I used Python 2.7
https://github.com/mhammond/pywin32/releases

2. Install from source code (for advanced users) 
https://github.com/mhammond/pywin32

### B. Register WEAP with Windows 
Use Windows "Command Prompt" as Administrator, go to WEAP install directory (e.g. `cd C:\Program Files (x86)\WEAP`) and simply run the following command: 

`WEAP /regserver`

In [None]:
# this library is needed to connect to the WEAP API
import win32com.client

# this command will open the WEAP software (if closed) and get the last active model
# you could change the active area to another one inside WEAP or by passing it to the command here
#WEAP.ActiveArea = "BearRiverFeb2017_V10.9"

WEAP=win32com.client.Dispatch("WEAP.WEAPApplication")

if not WEAP.Registered:
    print "Because WEAP is not registered, you cannot use the API"

# get the active WEAP Area (model) to serve data into it 
ActiveArea=WEAP.ActiveArea.Name 
print  'ActiveArea= '+ActiveArea

# get the active WEAP scenario to serve data into it 

ActiveScenario= WEAP.ActiveScenario.Name
print 'ActiveScenario= '+ActiveScenario

WEAP_Area_dir=WEAP.AreasDirectory
print WEAP_Area_dir

# 8. Prepare the time series to be ready for WEAP
<a name="PrepareWEAP"></a>


In [None]:
# 8. Prepare the time series to be ready for WEAP


# Select the UDWR subset 

for subset in subsets.groups.keys():
    #print subset
    dt = subsets.get_group(name='UDWRFlowData')

# uncoment this line below if you want to see the table
# display (dt)


Metadata_TimeSeries = []

# dataframe output of WaMDaM query 
df_TimeSeries=dt

# x = df_TimeSeries['AttributeName'][1]


AttributeName='Streamflow Data'

InstanceName='USGS 10046500'
# print x
#y = df_TimeSeries['InstanceName'][1]


z = AttributeName.replace(" ", "_")

w = InstanceName.replace(" ", "_")

WEAP_PATH=WEAP_Area_dir+ActiveArea+"\\"
print WEAP_PATH

outputFolder="TimeSeries_csv_files\\"

output_dir = WEAP_PATH+"TimeSeries_csv_files\\"

if not os.path.exists(output_dir):
    os.makedirs(output_dir)
        
# WEAP_Area_dir is where the WEAP Area folder exists on your machine 
csv_file_name = outputFolder+z + '_' + w + '.csv'

csv_file_location=WEAP_PATH+outputFolder +z + '_' + w + '.csv'

print csv_file_location
# print csv_file_name
# total_csv_file_name.append(csv_file_name)

# there are more complex issues regarding what to do with missing values

timeSeriesValue = "ReadFromFile("+csv_file_name + ")"
print timeSeriesValue

# total_timeSeriesValue.append(timeSeriesValue)

    # combne many output paramters here to pass them to the metadata writing file
Metadata_TimeSeries1 = OrderedDict()

Metadata_TimeSeries1['Value'] = timeSeriesValue

Metadata_TimeSeries1['csv_fileName'] = csv_file_name

# for TimeSeries_Full_Branch in TimeSeries_Full_Branch_total:

#Metadata_TimeSeries1['FullBranch'] = TimeSeries_Full_Branch

Metadata_TimeSeries.append(Metadata_TimeSeries1)

x_data = df_TimeSeries['CalenderYear']
# print x_data

# save the three columns into a csv file with a name csv_file_name

#################################################################################
#How to save the file in the WEAP area?

field_names = ['Column1', 'Column2', 'Column3']
# I see 
f1 = open(csv_file_location, "wb")

# writer = csv.writer(f1, delimiter=',', quoting=csv.QUOTE_ALL)
# writer.writerow(field_names)

# for ii in x:

x = []

# save all of   them into a folder called: TimeSeries_csv_files

for i, val in x_data.iteritems():

    year, month, day = val.split('-')


    yx = df_TimeSeries['CumulativeMonthly'][i]
    # print year
    # print month

    # print year, month, date
    # year,month,date=Column1.str.split('-')

    # field_names = ['Column1', 'Column2', 'column3']

    Column1 = year
    Column2 = month
    
    #############################################################
    # Convert the Acre-feet per month to cfs as required by WEAP
    # comulitive monthly to cfs

    Column3 = yx*43560/(60*60*24*30)
# no headers to the csv file 
    f1.write("{},{},{}\n".format(Column1, Column2, Column3))


f1.close()
# return csv_file_name

# You can verify all these conversions by comparing the monthly values with the original USGS @
# https://nwis.waterdata.usgs.gov/ut/nwis/monthly?search_site_no=10046500&format=sites_selection_links

# 9. Load the time series  data into WEAP
<a name="Load"></a>

In [2]:
# 9. Load the time series  data into WEAP


InstanceName='USGS 10046500'
AttributeName='Streamflow Data'

# Get the Instance Name and Attribute names and pass them to 
# the function below to load their values into WEAP


# timeSeriesValue is the PATH for the cvs file for time series

for Branch in WEAP.Branches:
    if Branch.Name == InstanceName:
        GetInstanceFullBranch = Branch.FullName
        WEAP.Branch(GetInstanceFullBranch).Variable(AttributeName).Expression = timeSeriesValue

print 'The time series data have been sucsesfully loaded into WEAP'

NameError: name 'WEAP' is not defined

# 10. Close the SQLite and WEAP API connections
<a name="Close"></a>

In [None]:
# conn.close()

print 'connection disconnected'

# Uncomment 
# WEAP.SaveArea

# Or 
NewWEAPCopyName=ActiveArea+"Test"
print NewWEAPCopyName

# Call API function to save WEAP
# WEAP.SaveAreaAS(NewWEAPCopyName)

# this command will clode WEAP
# WEAP.Quit
WEAP='nil'

# The End :)