<img src="https://i.imgur.com/hZgPddE.jpg" alt="IBMDBG"> <br>

# Part 2: Introduction

In this Jupyter Notebook you'll learn step-by-step how to use the Watson Machine Learning API that was automatically generated when the previously created WS Modeler flow was deployed. You will also learn how to download files from IBM Cloud Object Storage and generate interactive visualizations using `Bokeh`. 

This Notebook is part of the series on stock market forecasting.

<hr>

# Table of Contents

#### 1. Downloading Files from IBM COS
* 1.1: IBM COS Credentials Setup
* 1.2: Using the ibm_boto3 Package

#### 2. Reading CSV Files as Pandas Dataframes
* 2.1: Reading the `AAPL_Test` data
* 2.2: Reading the `AAPL_Train` data

#### 3. Using the Watson Machine Learning API
* 3.1: Preparing Input Data
* 3.2: Setting up WML Credentials
* 3.3: Making an API Call to WML
* 3.4: Parsing the results

#### 4. Visualizing the Results

* 4.1: Plotting the Modeler Flow Forecasts
* 4.2: Validating Modeler Flow Forecasts with Observed Data
* 4.3: Interacting with Complete Historic and Forecasted Data

<hr>

# 1: Downloading Files from IBM COS

This section objective is to extract files previously stored at IBM Cloud Object Storage:

* The AAPL stocks historical data stored at the `AAPL.csv` file.
* The Modeler Flow Time Series Model results exported as `AAPL_1-Year_Future_Data.csv` file.

We will use the `ibm_boto3` library to communicate with IBM Cloud Object Storage.

In [54]:
from ibm_botocore.client import Config
import ibm_boto3

print('Packages imported.')

Packages imported.


### 1.1: IBM COS Credentials Setup

Configure your IBM Cloud Object Storage credentials in the cell below.

These credentials can be viewed on the service page instantiated in the IBM Cloud Web page.

In [55]:
# Paste here your IBM COS credentials
cos_credentials = {
    'IAM_SERVICE_ID': '',
    'IBM_API_KEY_ID': '',
    'ENDPOINT': '',
    'IBM_AUTH_ENDPOINT': '',
    'BUCKET': '',
}

### 1.2: Using the ibm_boto3 Package

Next, we define a function to authenticate with IBM COS and download a defined file.

In [129]:
def download_file_from_cos(credentials, save_file_locally_as, target_file_name):
    """ Download a file from IBM COS """
    
    # Configure IBM COS API credentials
    cos = ibm_boto3.client(service_name='s3',
    ibm_api_key_id=credentials['IBM_API_KEY_ID'],
    ibm_service_instance_id=credentials['IAM_SERVICE_ID'],
    ibm_auth_endpoint=credentials['IBM_AUTH_ENDPOINT'],
    config=Config(signature_version='oauth'),
    endpoint_url=credentials['ENDPOINT'])
    
    # Try to download the file
    try:
        res=cos.download_file(Bucket=credentials['BUCKET'], Key=target_file_name, Filename=save_file_locally_as)
    except Exception as e:
        print(Exception, e)
    else:
        print("'{}' file downloaded.".format(target_file_name))

We use the previously defined function to download the `AAPL.csv_shaped.csv` file.

In [130]:
download_file_from_cos(cos_credentials, 'AAPL_Train.csv', 'AAPL_Train.csv')
download_file_from_cos(cos_credentials, 'AAPL_Test.csv', 'AAPL_Test.csv')

'AAPL_Train.csv' file downloaded.
'AAPL_Test.csv' file downloaded.


<hr>

# 2: Reading CSV Files as Pandas Dataframes

To generate an interactive graph using the `bokeh` library, we first need to format the data into a panda dataframe.

In [131]:
import pandas as pd
import numpy as np
import os
import dateutil

print('Packages imported.')

Packages imported.


### 2.1: Reading the `AAPL_Test` data

In [132]:
# Loading the CSV file into a pandas dataframe, with the correct datatypes for each column
dateparse = lambda dates: pd.datetime.strftime(dateutil.parser.parse(dates), '%Y-%m-%d')
df_test = pd.read_csv('AAPL_Test.csv', parse_dates=['Date'], date_parser=dateparse)
print(df_test.info())
df_test = df_test[['Date','Open','High','Low','Close']]
df_test.tail()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8588 entries, 0 to 8587
Data columns (total 5 columns):
Date     8588 non-null datetime64[ns]
Open     8588 non-null float64
High     8588 non-null float64
Low      8588 non-null float64
Close    8588 non-null float64
dtypes: datetime64[ns](1), float64(4)
memory usage: 335.5 KB
None


Unnamed: 0,Date,Open,High,Low,Close
244,2017-04-04,143.25,144.89,143.17,144.77
245,2017-04-03,143.71,144.12,143.05,143.7
246,2017-03-31,143.72,144.27,143.01,143.66
247,2017-03-30,144.19,144.5,143.5,143.93
248,2017-03-29,143.68,144.49,143.19,144.12


### 2.2: Reading the `AAPL_Train` data

In [133]:
dateparse = lambda dates: pd.datetime.strftime(dateutil.parser.parse(dates), '%Y-%m-%d')
df_train = pd.read_csv('AAPL_Train.csv', parse_dates=['Date'], date_parser=dateparse)
print(df_train.info())
df_train = df_train[['Date','Open','High','Low','Close']]
df_train.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8588 entries, 0 to 8587
Data columns (total 5 columns):
Date     8588 non-null datetime64[ns]
Open     8588 non-null float64
High     8588 non-null float64
Low      8588 non-null float64
Close    8588 non-null float64
dtypes: datetime64[ns](1), float64(4)
memory usage: 335.5 KB
None


Unnamed: 0,Date,Open,High,Low,Close
0,2017-03-28,140.91,144.04,140.62,143.8
1,2017-03-27,139.39,141.22,138.62,140.88
2,2017-03-24,141.5,141.74,140.35,140.64
3,2017-03-23,141.26,141.5844,140.61,140.92
4,2017-03-22,139.845,141.6,139.76,141.42


The `AAPL.csv` file contains the historical data for the Apple Inc. stocks. 

The data read as a Pandas dataframe will only be used in Section 4 of this Notebook, for visualization purposes.

<hr>

# 3: Using the Watson Machine Learning API

Previously, we trained a time series forecaster in Watson Modeler Flow and later deployed this forecaster in a Watson Machine Learning service instance.

Now, in this section, we will use the WML API to send new input data to our time series forecaster.

### 3.1: Preparing Input Data

As it can be noted in the cell below, the input data must be a `dict` type and the values must be in the same format as the input CSV file used as data source in Modeler flow.

<img src="https://i.imgur.com/WND3Pqg.png" alt="EX1"> <br>

The `payload_scoring` dict will contain only some points of data for demonstration. 

Remember that the `WIKI/TABLE` Quandl database only goes until 27-March-2018.

The `COLUMN2`, `COLUMN3`, `COLUMN4`, `COLUMN5`, and `COLUMN6` fields are the `DATE`, `OPEN`, `CLOSE`, `HIGH`, `LOW` labels.

In [134]:
# NOTE: manually define and pass the array(s) of values to be scored in the next line
payload_scoring = {"fields": ["COLUMN2", "COLUMN3", "COLUMN4", "COLUMN5", "COLUMN6"],
                   "values": [['2017-03-28', 140.91, 144.04, 140.62, 143.80]]}

### 3.2: Setting up WML Credentials

Retrieve your `wml_service_credentials_username`, `wml_service_credentials_password`, and `wml_service_credentials_url` from the service credentials associated with your IBM Cloud Watson Machine Learning Service instance. 

This can be done accessing the service instance in the IBM Cloud web portal.

In [135]:
wml_credentials = {
      "password": "",
      "url": "",
      "username": ""
}

    
deployment_endpoint = ""

### 3.3: Making an API Call to WML

The next code cell uses `urllib3` and `requests` to communicate with the WML API. The `payload_scoring` dict is used as input.

In [136]:
import urllib3, requests, json


headers = urllib3.util.make_headers(basic_auth='{username}:{password}'.format(username=wml_credentials['username'], password=wml_credentials['password']))
url = '{}/v3/identity/token'.format(wml_credentials['url'])
response = requests.get(url, headers=headers)
mltoken = json.loads(response.text).get('token')

header = {'Content-Type': 'application/json', 'Authorization': 'Bearer ' + mltoken}

response_scoring = requests.post(deployment_endpoint, json=payload_scoring, headers=header)
print("Scoring finished")
print("Response type: {}".format(type(json.loads(response_scoring.text))))

Scoring finished
Response type: <class 'dict'>


In the next cell we can visualize the WML response, that is also of type `dict`. 

In [137]:
data = json.loads(response_scoring.text)
print(data)

{'fields': ['Date', '$FutureFlag', 'Open', '$TS-Open', '$TSLCI-Open', '$TSUCI-Open', '$TSResidual-Open', 'Close', '$TS-Close', '$TSLCI-Close', '$TSUCI-Close', '$TSResidual-Close'], 'values': [['2017-03-28 00:00 AM UTC', 0, 140.91, 140.82539373095153, 136.6309523505463, 145.0284770664795, 0.0010145740497017927, 143.8, 144.00336946752495, 139.7742411831146, 148.24108948359182, -0.0010107994431093037], ['2017-03-29 00:00 AM UTC', 1, None, 141.10919783105598, 136.90630343212996, 145.32075160115616, None, None, 144.00511873258978, 139.77593907530957, 148.24289022589366, None], ['2017-03-30 00:00 AM UTC', 1, None, 141.18627102666295, 135.24333527106683, 147.1464980439679, None, None, 144.04994742491166, 137.9246617297609, 150.1932359562979, None], ['2017-03-31 00:00 AM UTC', 1, None, 141.2159640111935, 133.82205404123994, 148.63660535197235, None, None, 144.10729576229068, 136.4159416345517, 151.82699131245747, None], ['2017-04-01 00:00 AM UTC', 1, None, 141.24549032422127, 132.6433695760884

Before using `Bokeh` to interact with the data, we need to parse it in a Pandas dataframe.

### 3.4: Parsing the WML Results

In [138]:
from datetime import datetime


def parse(dic):
    for k, v in dic.items():
        if isinstance(v, dict):
            for p in parse(v):
                yield [k] + p
        else:
            yield [k, v]

lst = list(parse(data))
columns = lst[0][1]
values = lst[1][1]

def parse(values):
    for k in values:
        string_lst = k[0].split(" ")
        k[0] = datetime.strptime(string_lst[0], '%Y-%m-%d')

parse(values)

The code cell above transformed the `dict` response into two lists: the labels (columns) and rows (values).

In [139]:
print(values)

[[datetime.datetime(2017, 3, 28, 0, 0), 0, 140.91, 140.82539373095153, 136.6309523505463, 145.0284770664795, 0.0010145740497017927, 143.8, 144.00336946752495, 139.7742411831146, 148.24108948359182, -0.0010107994431093037], [datetime.datetime(2017, 3, 29, 0, 0), 1, None, 141.10919783105598, 136.90630343212996, 145.32075160115616, None, None, 144.00511873258978, 139.77593907530957, 148.24289022589366, None], [datetime.datetime(2017, 3, 30, 0, 0), 1, None, 141.18627102666295, 135.24333527106683, 147.1464980439679, None, None, 144.04994742491166, 137.9246617297609, 150.1932359562979, None], [datetime.datetime(2017, 3, 31, 0, 0), 1, None, 141.2159640111935, 133.82205404123994, 148.63660535197235, None, None, 144.10729576229068, 136.4159416345517, 151.82699131245747, None], [datetime.datetime(2017, 4, 1, 0, 0), 1, None, 141.24549032422127, 132.64336957608845, 149.88374194101928, None, None, 144.17784022642883, 135.19729322758897, 153.19695762467887, None], [datetime.datetime(2017, 4, 2, 0, 0

Next we just create a new Pandas dataframe with the future data for Apple Inc. stocks, retrieved from WML using the API.

In [140]:
ndf = pd.DataFrame.from_records(values, columns=columns)
print(ndf.info())
ndf.tail()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8588 entries, 0 to 8587
Data columns (total 5 columns):
Date     8588 non-null datetime64[ns]
Open     8588 non-null float64
High     8588 non-null float64
Low      8588 non-null float64
Close    8588 non-null float64
dtypes: datetime64[ns](1), float64(4)
memory usage: 335.5 KB
None


Unnamed: 0,Date,$FutureFlag,Open,$TS-Open,$TSLCI-Open,$TSUCI-Open,$TSResidual-Open,Close,$TS-Close,$TSLCI-Close,$TSUCI-Close,$TSResidual-Close
361,2018-03-24,1,,169.049729,75.589449,263.189197,,,171.171835,78.162182,264.985106,
362,2018-03-25,1,,169.134762,75.524058,263.417076,,,171.254142,78.09613,265.209224,
363,2018-03-26,1,,169.219838,75.458813,263.644836,,,171.336489,78.030225,265.433221,
364,2018-03-27,1,,169.304956,75.393715,263.872478,,,171.418875,77.964466,265.657099,
365,2018-03-28,1,,169.390117,75.328763,264.100004,,,171.501301,77.898854,265.880858,


<hr>

# 4: Validating and Visualizing the Results

In [141]:
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource
from bokeh.embed import components
from bokeh.io import output_notebook

print('Packages imported.')

Packages imported.


In [142]:
# Load bokeh
output_notebook()

### 4.1: Plotting the Modeler Flow Forecasts

In [143]:
# Figure
p = figure(plot_width=1200, plot_height=550, title='Historic and Predicted Stock Value Data', x_axis_type="datetime")

# Plot Lines
p.line(ndf.Date, ndf['$TS-Close'], line_width=3, line_color="#ff6699", legend='Modeled Close Value')
p.line(ndf.Date, ndf['$TS-Open'], line_width=3, line_color="#0099ff", legend='Modeled Open Value')
p.line(ndf.Date, ndf['$TSLCI-Close'], line_width=0.5, line_color="#ff6699", legend='Modeled Close Value Bounds')
p.line(ndf.Date, ndf['$TSUCI-Close'], line_width=0.5, line_color="#ff6699", legend='Modeled Close Value Bounds')
p.line(ndf.Date, ndf['$TSLCI-Open'], line_width=0.5, line_color="#0099ff", legend='Modeled Open Value Bounds')
p.line(ndf.Date, ndf['$TSUCI-Open'], line_width=0.5, line_color="#0099ff", legend='Modeled Open Value Bounds')

# Axis and Labels
p.legend.orientation = "vertical"
p.legend.location = "top_left"
p.xaxis.axis_label = "Date"
p.xaxis.axis_label_text_font_style = 'bold'
p.xaxis.axis_label_text_font_size = '16pt'
p.xaxis.major_label_text_font_size = '14pt'
p.yaxis.axis_label = "Value ($ USD)"
p.yaxis.axis_label_text_font_style = 'bold'
p.yaxis.axis_label_text_font_size = '16pt'
p.yaxis.major_label_text_font_size = '12pt'

In [144]:
show(p)

### 4.2: Validating Modeler Flow Forecasts with Observed Data

In [153]:
# Figure
p = figure(plot_width=1200, plot_height=550, title='Historic and Predicted Stock Value Data', x_axis_type="datetime")

# Plot Lines
p.line(ndf.Date, ndf['$TS-Close'], line_width=3, line_color="#ff6699", legend='Forecasted Close Value')
p.line(ndf.Date, ndf['$TS-Open'], line_width=3, line_color="#0099ff", legend='Forecasted Open Value')
p.line(df_test[df_test['Date'] > datetime(2015,1,1)].Date, df_test[df_test['Date'] > datetime(2015,1,1)].Close, line_width=0.5, line_color="#ff6699", legend='Historic Close Data (Test Sample)')
p.line(df_test[df_test['Date'] > datetime(2015,1,1)].Date, df_test[df_test['Date'] > datetime(2015,1,1)].Open, line_width=0.5, line_color="#0099ff", legend='Historic Open Data (Test Sample)')

# Axis and Labels
p.legend.orientation = "vertical"
p.legend.location = "top_left"
p.xaxis.axis_label = "Date"
p.xaxis.axis_label_text_font_style = 'bold'
p.xaxis.axis_label_text_font_size = '16pt'
p.xaxis.major_label_text_font_size = '14pt'
p.yaxis.axis_label = "Value ($ USD)"
p.yaxis.axis_label_text_font_style = 'bold'
p.yaxis.axis_label_text_font_size = '16pt'
p.yaxis.major_label_text_font_size = '12pt'

In [154]:
show(p)

In [193]:
ndf_filtered = ndf.drop(['Close', 'Open', '$TSResidual-Open', '$TSResidual-Close'], axis=1)

result = pd.concat([ndf_filtered, df_test], axis=1).dropna()
result.tail()

Unnamed: 0,Date,$FutureFlag,$TS-Open,$TSLCI-Open,$TSUCI-Open,$TS-Close,$TSLCI-Close,$TSUCI-Close,Date.1,Open,High,Low,Close
244,2017-11-27,1,159.390355,84.484369,235.479376,161.809966,87.133165,237.702993,2017-04-04,143.25,144.89,143.17,144.77
245,2017-11-28,1,159.470529,84.395456,235.727459,161.887771,87.043626,237.947508,2017-04-03,143.71,144.12,143.05,143.7
246,2017-11-29,1,159.550744,84.306819,235.975295,161.965614,86.954362,238.191776,2017-03-31,143.72,144.27,143.01,143.66
247,2017-11-30,1,159.630998,84.218455,236.222885,162.043494,86.865371,238.435797,2017-03-30,144.19,144.5,143.5,143.93
248,2017-12-01,1,159.711293,84.130363,236.470232,162.121412,86.776651,238.679573,2017-03-29,143.68,144.49,143.19,144.12


In the next cell, simple mean errors are calculated (percentual and absolute):

In [215]:
open_abs_errors = []
close_abs_errors = []
open_pct_errors = []
close_pct_errors = []

for index, row in result.iterrows():
    open_abs_errors.append(abs(row['Open']-row['$TS-Open']))
    close_abs_errors.append(abs(row['Close']-row['$TS-Close']))
    open_pct_errors.append((abs(row['Open']-row['$TS-Open']))/row['Open'])
    close_pct_errors.append((abs(row['Close']-row['$TS-Close']))/row['Close'])
    
mean_open_error = sum(open_abs_errors) / len(open_abs_errors)
mean_close_error = sum(close_abs_errors) / len(close_abs_errors)
mean_open_pct_error = sum(open_pct_errors) / len(open_pct_errors)
mean_close_pct_error = sum(close_pct_errors) / len(close_pct_errors)

print('Mean Errors in 1-Year Future Prediction:')
print('Analyzed Stock: AAPL (Apple Inc.)')
print('----------------------------------------')
print('Mean Open Value Error (USD): {} $'.format(round(mean_open_error, 3)))
print('Mean Close Value Error (USD): {} $'.format(round(mean_close_error, 3)))
print('Mean Open Value Error: {}%'.format(round(mean_open_pct_error*100, 3)))
print('Mean Close Value Error: {}%'.format(round(mean_close_pct_error*100, 3)))

Mean Errors in 1-Year Future Prediction:
Analyzed Stock: AAPL (Apple Inc.)
----------------------------------------
Mean Open Value Error (USD): 15.973 $
Mean Close Value Error (USD): 15.046 $
Mean Open Value Error: 9.653%
Mean Close Value Error: 9.166%


### 4.3: Interacting with Complete Historic and Forecasted Data

In [156]:
# Figure
p = figure(plot_width=1200, plot_height=550, title='Historic and Predicted Stock Value Data', x_axis_type="datetime")

# Plot Lines
p.line(ndf.Date, ndf['$TSLCI-Close'], line_width=0.5, line_color="#ff6699", legend='Modeled Close Value Bounds')
p.line(ndf.Date, ndf['$TSUCI-Close'], line_width=0.5, line_color="#ff6699", legend='Modeled Close Value Bounds')
p.line(ndf.Date, ndf['$TSLCI-Open'], line_width=0.5, line_color="#0099ff", legend='Modeled Open Value Bounds')
p.line(ndf.Date, ndf['$TSUCI-Open'], line_width=0.5, line_color="#0099ff", legend='Modeled Open Value Bounds')

p.line(df_train.Date, df_train['Open'], line_width=0.5, line_color="#0099ff", legend='Historic Open Data (Train Sample)')
p.line(df_train.Date, df_train['Close'], line_width=0.5, line_color="#ff6699", legend='Historic Close Data (Train Sample)')

p.line(ndf.Date, ndf['$TS-Close'], line_width=3, line_color="#ff6699", legend='Forecasted Close Value')
p.line(ndf.Date, ndf['$TS-Open'], line_width=3, line_color="#0099ff", legend='Forecasted Open Value')
p.line(df_test[df_test['Date'] > datetime(2015,1,1)].Date, df_test[df_test['Date'] > datetime(2015,1,1)].Close, line_width=0.5, line_color="#ff6699", legend='Historic Close Data (Test Sample)')
p.line(df_test[df_test['Date'] > datetime(2015,1,1)].Date, df_test[df_test['Date'] > datetime(2015,1,1)].Open, line_width=0.5, line_color="#0099ff", legend='Historic Open Data (Test Sample)')

# Axis and Labels
p.legend.orientation = "vertical"
p.legend.location = "top_left"
p.xaxis.axis_label = "Date"
p.xaxis.axis_label_text_font_style = 'bold'
p.xaxis.axis_label_text_font_size = '16pt'
p.xaxis.major_label_text_font_size = '14pt'
p.yaxis.axis_label = "Value ($ USD)"
p.yaxis.axis_label_text_font_style = 'bold'
p.yaxis.axis_label_text_font_size = '16pt'
p.yaxis.major_label_text_font_size = '12pt'

In [158]:
show(p)

<hr>

This notebook and its source code is made available under the terms of the <a href = "https://github.com/vanderleipf/ibmdegla-ws-projects/blob/master/LICENSE">MIT License</a>.

<hr>

### Thank you for completing this journey!

Notebook created by: <a href = "https://www.linkedin.com/in/vanderleimpf87719/">Vanderlei Pereira</a>