<hr>

# PREDICTING THE STOCK MARKET WITH WATSON

## Part 2: Introduction

In this Jupyter Notebook you'll learn step-by-step how to use the Watson Machine Learning API that was automatically generated when the previously created WS Modeler flow was deployed. You will also learn how to download files from IBM Cloud Object Storage and generate interactive visualizations using `Bokeh`. 

This Notebook is part of the series on stock market forecasting.

## Table of Contents

#### 1. Using the Watson Machine Learning Python Client
* 1.1: Setting up WML Credentials and Client
* 1.2: Preparing Input Data
* 1.3: Making an API Call to WML
* 1.4: Parsing the WML results

#### 2. Visualizing the Results
* 2.1: Plotting the Modeler Flow Forecasts
* 2.2: Validating Modeler Flow Forecasts with Observed Data
* 2.3: Interacting with Complete Historic and Forecasted Data

<hr>

# 1: Using the Watson Machine Learning Python Client

Previously, we trained a time series forecaster in Watson Modeler Flow and later deployed this forecaster in a Watson Machine Learning service instance.

Now, in this section, we will use the WML API to send new input data to our time series forecaster.

### 1.1: Setting up WML Credentials and Client

Go to the the IBM Cloud portal and access your Watson Machine Learning instance.

Copy your credentials in the variable in the next cell, as shown.

In [None]:
from ibm_watson_machine_learning import APIClient
print('Packages imported.')

In [None]:
wml_credentials = {
    "apikey": "",
    "url": ""
}

In [None]:
client = APIClient(wml_credentials)

Now, paste the scoring_endpoint link you copied in the first part of this code pattern into the variable below.

In [None]:
scoring_endpoint = ""

Now, paste the Space ID from Settings of the Deployment Space for this Watson Machine Learning Service in the variable below.

In [None]:
space_uid = ""

### 1.2: Preparing Input Data

The `payload` dict will contain only some points of data for demonstration. 

This payload must be a dict type with the same structure as the csv file prepared with Data Refinery in Watson Studio.

Remember that the `WIKI/TABLE` Quandl database only goes until 27-March-2018.

In [None]:
# NOTE: manually define and pass the array(s) of values to be scored in the next line
payload = {
    "fields": ["Date", "Open", "High", "Low", "Close"],
    "values": [['2017-03-28', 140.91, 144.04, 140.62, 143.80]]
}

### 1.3: Making an API Call to WML

The next code cell executes a HTTP request to WML with the payload as input.

In [None]:
data = wml_client.deployments.score(scoring_endpoint, payload)

Before using `Bokeh` to interact with the data, we need to parse it in a Pandas dataframe.

### 1.4: Parsing the WML Results

In [None]:
from datetime import datetime


def parse(dic):
    for k, v in dic.items():
        if isinstance(v, dict):
            for p in parse(v):
                yield [k] + p
        else:
            yield [k, v]

lst = list(parse(data))
columns = lst[0][1]
values = lst[1][1]

def parse(values):
    for k in values:
        string_lst = k[0].split(" ")
        k[0] = datetime.strptime(string_lst[0], '%Y-%m-%d')

parse(values)

The code cell above transformed the `dict` response into two lists: the labels (columns) and rows (values).

In [None]:
print(values)

Next we just create a new Pandas dataframe with the future data for Apple Inc. stocks, retrieved from WML using the API.

In [None]:
import pandas as pd

ndf = pd.DataFrame.from_records(values, columns=columns)
print(ndf.info())
ndf.tail()

<hr>

# 2: Validating and Visualizing the Results

In [None]:
!pip install --user bokeh==1.0.4 --upgrade

In [None]:
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource
from bokeh.embed import components
from bokeh.io import output_notebook

print('Packages imported.')

In [None]:
# Load bokeh
output_notebook()

### 2.1: Plotting the Modeler Flow Forecasts

In [None]:
# Figure
p = figure(plot_width=1200, plot_height=550, title='Historic and Predicted Stock Value Data', x_axis_type="datetime")

# Plot Lines
p.line(ndf.Date, ndf['$TS-Close'], line_width=3, line_color="#ff6699", legend='Modeled Close Value')
p.line(ndf.Date, ndf['$TS-Open'], line_width=3, line_color="#0099ff", legend='Modeled Open Value')
p.line(ndf.Date, ndf['$TSLCI-Close'], line_width=0.5, line_color="#ff6699", legend='Modeled Close Value Bounds')
p.line(ndf.Date, ndf['$TSUCI-Close'], line_width=0.5, line_color="#ff6699", legend='Modeled Close Value Bounds')
p.line(ndf.Date, ndf['$TSLCI-Open'], line_width=0.5, line_color="#0099ff", legend='Modeled Open Value Bounds')
p.line(ndf.Date, ndf['$TSUCI-Open'], line_width=0.5, line_color="#0099ff", legend='Modeled Open Value Bounds')

# Axis and Labels
p.legend.orientation = "vertical"
p.legend.location = "top_left"
p.xaxis.axis_label = "Date"
p.xaxis.axis_label_text_font_style = 'bold'
p.xaxis.axis_label_text_font_size = '16pt'
p.xaxis.major_label_text_font_size = '14pt'
p.yaxis.axis_label = "Value ($ USD)"
p.yaxis.axis_label_text_font_style = 'bold'
p.yaxis.axis_label_text_font_size = '16pt'
p.yaxis.major_label_text_font_size = '12pt'

In [None]:
show(p)

### 2.3: Validating Modeler Flow Forecasts with Observed Data

Click on the `0100` button on the top right corner here in Watson Studio, and select the `AAPL.csv_shaped.csv` file and then Insert to code -> Insert pandas DataFrame.

In [None]:
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_a4bf3cc686cc4ca6a966bdc9f46ae2b8 = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='FgBUyGAWCMntsRuRqPvme8kHGRDJEnnx5W_yfnEohzFC',
    ibm_auth_endpoint="https://iam.ng.bluemix.net/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_a4bf3cc686cc4ca6a966bdc9f46ae2b8.get_object(Bucket='watsonstockmarketpredictor-donotdelete-pr-gfd7hukq2auktz',Key='data_asset/AAPL.csv_shaped_531cec44.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_data_1 = pd.read_csv(body)
df_data_1.tail()

In [None]:
# Now we split the full historical data into a train and a test dataset:
import datetime

split_date = datetime.date(2015, 1, 2)
#df_train = df_data_1[(pd.to_datetime(df_data_1["Date"]) < split_date)]
df_test = df_data_1[(pd.to_datetime(df_data_1["Date"]) > split_date)]
df_test['Date'] = df_test['Date'].apply(lambda x: datetime.datetime.strptime(x,'%Y-%m-%d'))
df_test = df_test.sort_values(['Date'])
df_test.tail()

In [None]:
# Figure
p = figure(plot_width=1200, plot_height=550, title='Historic and Predicted Stock Value Data', x_axis_type="datetime")

# Plot Lines
p.line(ndf.Date, ndf['$TS-Close'], line_width=3, line_color="#ff6699", legend='Forecasted Close Value')
p.line(ndf.Date, ndf['$TS-Open'], line_width=3, line_color="#0099ff", legend='Forecasted Open Value')
p.line(df_test.Date, df_test.Close, line_width=0.5, line_color="#ff6699", legend='Historic Close Data (Test Sample)')
p.line(df_test.Date, df_test.Open, line_width=0.5, line_color="#0099ff", legend='Historic Open Data (Test Sample)')

# Axis and Labels
p.legend.orientation = "vertical"
p.legend.location = "top_left"
p.xaxis.axis_label = "Date"
p.xaxis.axis_label_text_font_style = 'bold'
p.xaxis.axis_label_text_font_size = '16pt'
p.xaxis.major_label_text_font_size = '14pt'
p.yaxis.axis_label = "Value ($ USD)"
p.yaxis.axis_label_text_font_style = 'bold'
p.yaxis.axis_label_text_font_size = '16pt'
p.yaxis.major_label_text_font_size = '12pt'

In [None]:
show(p)

In [None]:
ndf_filtered = ndf.drop(['Close', 'Open', '$TSResidual-Open', '$TSResidual-Close'], axis=1)

result = pd.concat([ndf_filtered, df_test], axis=1).dropna()
result = result.loc[:,~result.columns.duplicated()]
result = result.sort_values(['Date'])
result.tail()

In the next cell, simple mean errors are calculated (percentual and absolute):

In [None]:
open_abs_errors = []
close_abs_errors = []
open_pct_errors = []
close_pct_errors = []

for index, row in result.iterrows():
    open_abs_errors.append(abs(row['Open']-row['$TS-Open']))
    close_abs_errors.append(abs(row['Close']-row['$TS-Close']))
    open_pct_errors.append((abs(row['Open']-row['$TS-Open']))/row['Open'])
    close_pct_errors.append((abs(row['Close']-row['$TS-Close']))/row['Close'])
    
mean_open_error = sum(open_abs_errors) / len(open_abs_errors)
mean_close_error = sum(close_abs_errors) / len(close_abs_errors)
mean_open_pct_error = sum(open_pct_errors) / len(open_pct_errors)
mean_close_pct_error = sum(close_pct_errors) / len(close_pct_errors)

print('Mean Errors in 1-Year Future Prediction:')
print('Analyzed Stock: AAPL (Apple Inc.)')
print('----------------------------------------')
print('Mean Open Value Error (USD): {} $'.format(round(mean_open_error, 3)))
print('Mean Close Value Error (USD): {} $'.format(round(mean_close_error, 3)))
print('Mean Open Value Error: {}%'.format(round(mean_open_pct_error*100, 3)))
print('Mean Close Value Error: {}%'.format(round(mean_close_pct_error*100, 3)))

### 2.3: Interacting with Complete Historic and Forecasted Data

In [None]:
# Figure
p = figure(plot_width=1200, plot_height=550, title='Historic and Predicted Stock Value Data', x_axis_type="datetime")

# Plot Lines
p.line(ndf.Date, ndf['$TSLCI-Close'], line_width=0.5, line_color="#ff6699", legend='Modeled Close Value Bounds')
p.line(ndf.Date, ndf['$TSUCI-Close'], line_width=0.5, line_color="#ff6699", legend='Modeled Close Value Bounds')
p.line(ndf.Date, ndf['$TSLCI-Open'], line_width=0.5, line_color="#0099ff", legend='Modeled Open Value Bounds')
p.line(ndf.Date, ndf['$TSUCI-Open'], line_width=0.5, line_color="#0099ff", legend='Modeled Open Value Bounds')

p.line(ndf.Date, ndf['$TS-Close'], line_width=3, line_color="#ff6699", legend='Forecasted Close Value')
p.line(ndf.Date, ndf['$TS-Open'], line_width=3, line_color="#0099ff", legend='Forecasted Open Value')
p.line(df_test['Date'], df_test['Close'], line_width=0.5, line_color="#ff6699", legend='Historic Close Data (Test Sample)')
p.line(df_test['Date'], df_test['Open'], line_width=0.5, line_color="#0099ff", legend='Historic Open Data (Test Sample)')

# Axis and Labels
p.legend.orientation = "vertical"
p.legend.location = "top_left"
p.xaxis.axis_label = "Date"
p.xaxis.axis_label_text_font_style = 'bold'
p.xaxis.axis_label_text_font_size = '16pt'
p.xaxis.major_label_text_font_size = '14pt'
p.yaxis.axis_label = "Value ($ USD)"
p.yaxis.axis_label_text_font_style = 'bold'
p.yaxis.axis_label_text_font_size = '16pt'
p.yaxis.major_label_text_font_size = '12pt'

In [None]:
show(p)

<hr>

This notebook and its source code is made available under the terms of the <a href = "https://github.com/IBM/watson-stock-market-predictor/blob/master/LICENSE">Apache License 2.0</a>.

<hr>

### Thank you for completing this journey!