<hr>

# PREDICTING THE STOCK MARKET WITH WATSON

## Part 2: Introduction

In this Jupyter Notebook you'll learn step-by-step how to use the Watson Machine Learning API that was automatically generated when the previously created WS Modeler flow was deployed. You will also learn how to download files from IBM Cloud Object Storage and generate interactive visualizations using `Bokeh`. 

This Notebook is part of the series on stock market forecasting.

## Table of Contents

#### 1. Using the Watson Machine Learning Python Client
* 1.1: Setting up WML Credentials and Client
* 1.2: Preparing Input Data
* 1.3: Making an API Call to WML
* 1.4: Parsing the WML results

#### 2. Visualizing the Results
* 2.1: Plotting the Modeler Flow Forecasts
* 2.2: Validating Modeler Flow Forecasts with Observed Data
* 2.3: Interacting with Complete Historic and Forecasted Data

<hr>

# 1: Using the Watson Machine Learning Python Client

Previously, we trained a time series forecaster in Watson Modeler Flow and later deployed this forecaster in a Watson Machine Learning service instance.

Now, in this section, we will use the WML API to send new input data to our time series forecaster.

### 1.1: Setting up WML Credentials and Client

Go to the the IBM Cloud portal and access your Watson Machine Learning instance.

Copy your credentials in the variable in the next cell, as shown.

In [68]:
wml_credentials = {
    "apikey": "",
    "iam_apikey_description": "",
    "iam_apikey_name": "",
    "iam_role_crn": "",
    "iam_serviceid_crn": "",
    "instance_id": "",
    "url": ""
}

Now, paste the scoring_endpoint link you copied in the first part of this code pattern into the variable below.

In [69]:
scoring_endpoint = ""

In [70]:
# Instaling the Watson Machine Learning Python Client
!pip install watson-machine-learning-client



In [71]:
# Create a WML client
from watson_machine_learning_client import WatsonMachineLearningAPIClient

wml_client = WatsonMachineLearningAPIClient(wml_credentials)

### 1.2: Preparing Input Data

The `payload` dict will contain only some points of data for demonstration. 

This payload must be a dict type with the same structure as the csv file prepared with Data Refinery in Watson Studio.

Remember that the `WIKI/TABLE` Quandl database only goes until 27-March-2018.

In [72]:
# NOTE: manually define and pass the array(s) of values to be scored in the next line
payload = {
    "fields": ["Date", "Open", "High", "Low", "Close"],
    "values": [['2017-03-28', 140.91, 144.04, 140.62, 143.80]]
}

### 1.3: Making an API Call to WML

The next code cell executes a HTTP request to WML with the payload as input.

In [73]:
data = wml_client.deployments.score(scoring_endpoint, payload)

Before using `Bokeh` to interact with the data, we need to parse it in a Pandas dataframe.

### 1.4: Parsing the WML Results

In [74]:
from datetime import datetime


def parse(dic):
    for k, v in dic.items():
        if isinstance(v, dict):
            for p in parse(v):
                yield [k] + p
        else:
            yield [k, v]

lst = list(parse(data))
columns = lst[0][1]
values = lst[1][1]

def parse(values):
    for k in values:
        string_lst = k[0].split(" ")
        k[0] = datetime.strptime(string_lst[0], '%Y-%m-%d')

parse(values)

The code cell above transformed the `dict` response into two lists: the labels (columns) and rows (values).

In [75]:
print(values)

[[datetime.datetime(2017, 3, 28, 0, 0), 0, 140.91, 140.82539373095153, 136.6309523505463, 145.0284770664795, 0.0010145740497017927, 143.8, 144.00336946752495, 139.7742411831146, 148.24108948359182, -0.0010107994431093037], [datetime.datetime(2017, 3, 29, 0, 0), 1, None, 141.10919783105598, 136.90630343212996, 145.32075160115616, None, None, 144.00511873258978, 139.77593907530957, 148.24289022589366, None], [datetime.datetime(2017, 3, 30, 0, 0), 1, None, 141.18627102666295, 135.24333527106683, 147.1464980439679, None, None, 144.04994742491166, 137.9246617297609, 150.1932359562979, None], [datetime.datetime(2017, 3, 31, 0, 0), 1, None, 141.2159640111935, 133.82205404123994, 148.63660535197235, None, None, 144.10729576229068, 136.4159416345517, 151.82699131245747, None], [datetime.datetime(2017, 4, 1, 0, 0), 1, None, 141.24549032422127, 132.64336957608845, 149.88374194101928, None, None, 144.17784022642883, 135.19729322758897, 153.19695762467887, None], [datetime.datetime(2017, 4, 2, 0, 0

Next we just create a new Pandas dataframe with the future data for Apple Inc. stocks, retrieved from WML using the API.

In [76]:
import pandas as pd

ndf = pd.DataFrame.from_records(values, columns=columns)
print(ndf.info())
ndf.tail()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 366 entries, 0 to 365
Data columns (total 12 columns):
Date                 366 non-null datetime64[ns]
$FutureFlag          366 non-null int64
Open                 1 non-null float64
$TS-Open             366 non-null float64
$TSLCI-Open          366 non-null float64
$TSUCI-Open          366 non-null float64
$TSResidual-Open     1 non-null float64
Close                1 non-null float64
$TS-Close            366 non-null float64
$TSLCI-Close         366 non-null float64
$TSUCI-Close         366 non-null float64
$TSResidual-Close    1 non-null float64
dtypes: datetime64[ns](1), float64(10), int64(1)
memory usage: 34.4 KB
None


Unnamed: 0,Date,$FutureFlag,Open,$TS-Open,$TSLCI-Open,$TSUCI-Open,$TSResidual-Open,Close,$TS-Close,$TSLCI-Close,$TSUCI-Close,$TSResidual-Close
361,2018-03-24,1,,169.049729,75.589449,263.189197,,,171.171835,78.162182,264.985106,
362,2018-03-25,1,,169.134762,75.524058,263.417076,,,171.254142,78.09613,265.209224,
363,2018-03-26,1,,169.219838,75.458813,263.644836,,,171.336489,78.030225,265.433221,
364,2018-03-27,1,,169.304956,75.393715,263.872478,,,171.418875,77.964466,265.657099,
365,2018-03-28,1,,169.390117,75.328763,264.100004,,,171.501301,77.898854,265.880858,


<hr>

# 2: Validating and Visualizing the Results

In [77]:
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource
from bokeh.embed import components
from bokeh.io import output_notebook

print('Packages imported.')

Packages imported.


In [78]:
# Load bokeh
output_notebook()

### 2.1: Plotting the Modeler Flow Forecasts

In [79]:
# Figure
p = figure(plot_width=1200, plot_height=550, title='Historic and Predicted Stock Value Data', x_axis_type="datetime")

# Plot Lines
p.line(ndf.Date, ndf['$TS-Close'], line_width=3, line_color="#ff6699", legend='Modeled Close Value')
p.line(ndf.Date, ndf['$TS-Open'], line_width=3, line_color="#0099ff", legend='Modeled Open Value')
p.line(ndf.Date, ndf['$TSLCI-Close'], line_width=0.5, line_color="#ff6699", legend='Modeled Close Value Bounds')
p.line(ndf.Date, ndf['$TSUCI-Close'], line_width=0.5, line_color="#ff6699", legend='Modeled Close Value Bounds')
p.line(ndf.Date, ndf['$TSLCI-Open'], line_width=0.5, line_color="#0099ff", legend='Modeled Open Value Bounds')
p.line(ndf.Date, ndf['$TSUCI-Open'], line_width=0.5, line_color="#0099ff", legend='Modeled Open Value Bounds')

# Axis and Labels
p.legend.orientation = "vertical"
p.legend.location = "top_left"
p.xaxis.axis_label = "Date"
p.xaxis.axis_label_text_font_style = 'bold'
p.xaxis.axis_label_text_font_size = '16pt'
p.xaxis.major_label_text_font_size = '14pt'
p.yaxis.axis_label = "Value ($ USD)"
p.yaxis.axis_label_text_font_style = 'bold'
p.yaxis.axis_label_text_font_size = '16pt'
p.yaxis.major_label_text_font_size = '12pt'

In [80]:
show(p)

### 2.3: Validating Modeler Flow Forecasts with Observed Data

Click on the `0100` button on the top right corner here in Watson Studio, and select the `AAPL.csv_shaped.csv` file and then Insert to code -> Insert pandas DataFrame.

In [81]:
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_a4bf3cc686cc4ca6a966bdc9f46ae2b8 = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='FgBUyGAWCMntsRuRqPvme8kHGRDJEnnx5W_yfnEohzFC',
    ibm_auth_endpoint="https://iam.ng.bluemix.net/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_a4bf3cc686cc4ca6a966bdc9f46ae2b8.get_object(Bucket='watsonstockmarketpredictor-donotdelete-pr-gfd7hukq2auktz',Key='data_asset/AAPL.csv_shaped_531cec44.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_data_1 = pd.read_csv(body)
df_data_1.tail()

Unnamed: 0,Date,Open,High,Low,Close
9395,1981-01-05,33.87,33.87,33.75,33.75
9396,1980-12-30,35.25,35.25,35.12,35.12
9397,1980-12-24,32.5,32.63,32.5,32.5
9398,1980-12-19,28.25,28.38,28.25,28.25
9399,1980-12-16,25.37,25.37,25.25,25.25


In [82]:
# Now we split the full historical data into a train and a test dataset:
import datetime

split_date = datetime.date(2015, 1, 2)
#df_train = df_data_1[(pd.to_datetime(df_data_1["Date"]) < split_date)]
df_test = df_data_1[(pd.to_datetime(df_data_1["Date"]) > split_date)]
df_test['Date'] = df_test['Date'].apply(lambda x: datetime.datetime.strptime(x,'%Y-%m-%d'))
df_test = df_test.sort_values(['Date'])
df_test.tail()

Unnamed: 0,Date,Open,High,Low,Close
6268,2018-03-21,175.04,175.09,171.26,171.27
3134,2018-03-22,170.0,172.68,168.6,168.845
0,2018-03-23,168.39,169.92,164.94,164.94
6267,2018-03-26,168.07,173.1,166.44,172.77
3133,2018-03-27,173.68,175.15,166.92,168.34


In [83]:
# Figure
p = figure(plot_width=1200, plot_height=550, title='Historic and Predicted Stock Value Data', x_axis_type="datetime")

# Plot Lines
p.line(ndf.Date, ndf['$TS-Close'], line_width=3, line_color="#ff6699", legend='Forecasted Close Value')
p.line(ndf.Date, ndf['$TS-Open'], line_width=3, line_color="#0099ff", legend='Forecasted Open Value')
p.line(df_test.Date, df_test.Close, line_width=0.5, line_color="#ff6699", legend='Historic Close Data (Test Sample)')
p.line(df_test.Date, df_test.Open, line_width=0.5, line_color="#0099ff", legend='Historic Open Data (Test Sample)')

# Axis and Labels
p.legend.orientation = "vertical"
p.legend.location = "top_left"
p.xaxis.axis_label = "Date"
p.xaxis.axis_label_text_font_style = 'bold'
p.xaxis.axis_label_text_font_size = '16pt'
p.xaxis.major_label_text_font_size = '14pt'
p.yaxis.axis_label = "Value ($ USD)"
p.yaxis.axis_label_text_font_style = 'bold'
p.yaxis.axis_label_text_font_size = '16pt'
p.yaxis.major_label_text_font_size = '12pt'

In [84]:
show(p)

In [85]:
ndf_filtered = ndf.drop(['Close', 'Open', '$TSResidual-Open', '$TSResidual-Close'], axis=1)

result = pd.concat([ndf_filtered, df_test], axis=1).dropna()
result = result.loc[:,~result.columns.duplicated()]
result = result.sort_values(['Date'])
result.tail()

Unnamed: 0,Date,$FutureFlag,$TS-Open,$TSLCI-Open,$TSUCI-Open,$TS-Close,$TSLCI-Close,$TSUCI-Close,Open,High,Low,Close
265,2017-12-18,1.0,161.082505,82.672737,240.639505,163.451762,85.308245,242.787952,112.3,113.75,111.53,112.98
266,2017-12-19,1.0,161.16353,82.589253,240.882752,163.530357,85.224121,243.027604,107.84,108.9667,106.5,108.72
267,2017-12-20,1.0,161.244596,82.50601,241.125785,163.60899,85.140239,243.267042,109.04,110.49,108.5,109.8
268,2017-12-21,1.0,161.325703,82.423007,241.368607,163.68766,85.056595,243.506267,112.67,113.25,110.21,112.01
269,2017-12-22,1.0,161.40685,82.340243,241.611219,163.766369,84.97319,243.74528,106.54,107.43,104.63,106.26


In the next cell, simple mean errors are calculated (percentual and absolute):

In [86]:
open_abs_errors = []
close_abs_errors = []
open_pct_errors = []
close_pct_errors = []

for index, row in result.iterrows():
    open_abs_errors.append(abs(row['Open']-row['$TS-Open']))
    close_abs_errors.append(abs(row['Close']-row['$TS-Close']))
    open_pct_errors.append((abs(row['Open']-row['$TS-Open']))/row['Open'])
    close_pct_errors.append((abs(row['Close']-row['$TS-Close']))/row['Close'])
    
mean_open_error = sum(open_abs_errors) / len(open_abs_errors)
mean_close_error = sum(close_abs_errors) / len(close_abs_errors)
mean_open_pct_error = sum(open_pct_errors) / len(open_pct_errors)
mean_close_pct_error = sum(close_pct_errors) / len(close_pct_errors)

print('Mean Errors in 1-Year Future Prediction:')
print('Analyzed Stock: AAPL (Apple Inc.)')
print('----------------------------------------')
print('Mean Open Value Error (USD): {} $'.format(round(mean_open_error, 3)))
print('Mean Close Value Error (USD): {} $'.format(round(mean_close_error, 3)))
print('Mean Open Value Error: {}%'.format(round(mean_open_pct_error*100, 3)))
print('Mean Close Value Error: {}%'.format(round(mean_close_pct_error*100, 3)))

Mean Errors in 1-Year Future Prediction:
Analyzed Stock: AAPL (Apple Inc.)
----------------------------------------
Mean Open Value Error (USD): 32.791 $
Mean Close Value Error (USD): 34.041 $
Mean Open Value Error: 28.284%
Mean Close Value Error: 29.539%


### 2.3: Interacting with Complete Historic and Forecasted Data

In [87]:
# Figure
p = figure(plot_width=1200, plot_height=550, title='Historic and Predicted Stock Value Data', x_axis_type="datetime")

# Plot Lines
p.line(ndf.Date, ndf['$TSLCI-Close'], line_width=0.5, line_color="#ff6699", legend='Modeled Close Value Bounds')
p.line(ndf.Date, ndf['$TSUCI-Close'], line_width=0.5, line_color="#ff6699", legend='Modeled Close Value Bounds')
p.line(ndf.Date, ndf['$TSLCI-Open'], line_width=0.5, line_color="#0099ff", legend='Modeled Open Value Bounds')
p.line(ndf.Date, ndf['$TSUCI-Open'], line_width=0.5, line_color="#0099ff", legend='Modeled Open Value Bounds')

p.line(df_train.Date, df_train['Open'], line_width=0.5, line_color="#0099ff", legend='Historic Open Data (Train Sample)')
p.line(df_train.Date, df_train['Close'], line_width=0.5, line_color="#ff6699", legend='Historic Close Data (Train Sample)')

p.line(ndf.Date, ndf['$TS-Close'], line_width=3, line_color="#ff6699", legend='Forecasted Close Value')
p.line(ndf.Date, ndf['$TS-Open'], line_width=3, line_color="#0099ff", legend='Forecasted Open Value')
p.line(df_test['Date'], df_test['Close'], line_width=0.5, line_color="#ff6699", legend='Historic Close Data (Test Sample)')
p.line(df_test['Date'], df_test['Open'], line_width=0.5, line_color="#0099ff", legend='Historic Open Data (Test Sample)')

# Axis and Labels
p.legend.orientation = "vertical"
p.legend.location = "top_left"
p.xaxis.axis_label = "Date"
p.xaxis.axis_label_text_font_style = 'bold'
p.xaxis.axis_label_text_font_size = '16pt'
p.xaxis.major_label_text_font_size = '14pt'
p.yaxis.axis_label = "Value ($ USD)"
p.yaxis.axis_label_text_font_style = 'bold'
p.yaxis.axis_label_text_font_size = '16pt'
p.yaxis.major_label_text_font_size = '12pt'

In [88]:
show(p)

<hr>

This notebook and its source code is made available under the terms of the <a href = "https://github.com/IBM/watson-stock-market-predictor/blob/master/LICENSE">Apache License 2.0</a>.

<hr>

### Thank you for completing this journey!