<img src="https://i.imgur.com/hZgPddE.jpg" alt="IBMDBG"> <br>

# Introduction: Modeler Flow & Watson Machine Learning

In this Jupyter Notebook you'll learn how to use the Watson Machine Learning API that was automatically generated when the Watson Studio Modeler flow was deployed. You will also learn how to download files from IBM Cloud Object Storage and generate interactive visualizations using `Bokeh`. This Notebook is part of the series on stock market forecasting.

<hr>

# Table of Contents

#### 1. Downloading Files from IBM COS

#### 2. Reading CSV Files as Pandas Dataframes

#### 3. Using the Watson Machine Learning API
* 3.1. Preparing Input Data
* 3.2. Setting up WML Credentials
* 3.3. Making an API Call to WML
* 3.4. Parsing the results

#### 4. Visualizing the Results

<hr>

# 1: Downloading Files from IBM COS

We will use the `ibm_boto3` library to communicate with IBM Cloud Object Storage.

In [21]:
from ibm_botocore.client import Config
import ibm_boto3

print('Packages imported.')

Packages imported.


Next, we define a function to authenticate with IBM COS and download a defined file.

In [22]:
def download_file_from_cos(credentials, save_file_locally_as, target_file_name):
    """ Download a file from IBM COS """
    
    # Configure IBM COS API credentials
    cos = ibm_boto3.client(service_name='s3',
    ibm_api_key_id=credentials['IBM_API_KEY_ID'],
    ibm_service_instance_id=credentials['IAM_SERVICE_ID'],
    ibm_auth_endpoint=credentials['IBM_AUTH_ENDPOINT'],
    config=Config(signature_version='oauth'),
    endpoint_url=credentials['ENDPOINT'])
    
    # Try to download the file
    try:
        res=cos.download_file(Bucket=credentials['BUCKET'], Key=target_file_name, Filename=save_file_locally_as)
    except Exception as e:
        print(Exception, e)
    else:
        print("'{}' file downloaded.".format(target_file_name))

Configure your IBM Cloud Object Storage credentials in the cell below.

These credentials can be viewed on the service page instantiated in the IBM Cloud Web page.

In [23]:
# Paste here your IBM COS credentials
cos_credentials = {
    'IAM_SERVICE_ID': '',
    'IBM_API_KEY_ID': '',
    'ENDPOINT': '',
    'IBM_AUTH_ENDPOINT': '',
    'BUCKET': '',
}

We use the previously defined function to download the `AAPL.csv_shaped.csv` file.

In [24]:
download_file_from_cos(cos_credentials, 'AAPL.csv', 'AAPL.csv')

'AAPL.csv' file downloaded.


<hr>

# 2: Reading CSV Files as Pandas Dataframes

To generate an interactive graph using the `bokeh` library, we first need to format the data into a panda dataframe.

In [25]:
import pandas as pd
import numpy as np
import os

print('Packages imported.')

Packages imported.


In [26]:
# Loading the CSV file into a pandas dataframe, with the correct datatypes for each column
dateparse = lambda dates: pd.datetime.strptime(dates, '%Y-%m-%d')
df = pd.read_csv('AAPL.csv', parse_dates=['Date'], date_parser=dateparse)
print(df.info())
df = df[['Date','Open','High','Low','Close']]
df.tail()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9400 entries, 0 to 9399
Data columns (total 6 columns):
Unnamed: 0    9400 non-null int64
Date          9400 non-null datetime64[ns]
Open          9400 non-null float64
High          9400 non-null float64
Low           9400 non-null float64
Close         9400 non-null float64
dtypes: datetime64[ns](1), float64(4), int64(1)
memory usage: 440.7 KB
None


Unnamed: 0,Date,Open,High,Low,Close
9395,1980-12-18,26.63,26.75,26.63,26.63
9396,1980-12-17,25.87,26.0,25.87,25.87
9397,1980-12-16,25.37,25.37,25.25,25.25
9398,1980-12-15,27.38,27.38,27.25,27.25
9399,1980-12-12,28.75,28.87,28.75,28.75


The `AAPL.csv` file contains the historical data for the Apple Inc. stocks. 

The data read as a Pandas dataframe will only be used in Section 4 of this Notebook, for visualization purposes.

<hr>

# 3: Using the Watson Machine Learning API

Previously, we trained a time series forecaster in Watson Modeler Flow and later deployed this forecaster in a Watson Machine Learning service instance.

Now, in this section, we will use the WML API to send new input data to our time series forecaster.

### 3.1: Preparing Input Data

As it can be noted in the cell below, the input data must be a `dict` type and the values must be in the same format as the input CSV file used as data source in Modeler flow.

<img src="https://i.imgur.com/MbbgCni.png" alt="EX1"> <br>

The `payload_scoring` dict will contain only some points of data for demonstration. 

Remember that the `WIKI/TABLE` Quandl database only goes until 27-March-2018.

The `COLUMN2`, `COLUMN3`, `COLUMN4`, `COLUMN5`, and `COLUMN6` fields are the `DATE`, `OPEN`, `CLOSE`, `HIGH`, `LOW` labels.

In [27]:
# NOTE: manually define and pass the array(s) of values to be scored in the next line
payload_scoring = {"fields": ["COLUMN2", "COLUMN3", "COLUMN4", "COLUMN5", "COLUMN6"],
                   "values": [['2018-03-27', 155.68, 155.15, 155.92, 155.34],
                              ['2018-03-28', 154.68, 154.15, 154.92, 154.34],
                              ['2018-03-29', 153.68, 153.15, 153.92, 153.34],
                              ['2018-03-30', 152.68, 152.15, 152.92, 152.34],
                              ['2018-03-31', 151.68, 151.15, 151.92, 151.34],
                              ['2018-04-01', 150.68, 150.15, 150.92, 150.34],
                              ['2018-04-02', 149.68, 151.15, 151.92, 151.34],
                              ['2018-04-03', 148.68, 152.15, 152.92, 152.34],
                              ['2018-04-04', 147.68, 153.15, 153.92, 153.34],
                              ['2018-04-05', 146.68, 154.15, 154.92, 154.34],
                              ['2018-04-06', 145.68, 155.15, 155.92, 155.34],
                              ['2018-04-07', 144.68, 156.15, 156.92, 156.34],
                              ['2018-04-08', 143.68, 157.15, 156.92, 157.34],
                              ['2018-04-09', 144.68, 156.15, 156.92, 158.34],
                              ['2018-04-10', 145.68, 155.15, 155.92, 159.34],
                              ['2018-04-11', 146.68, 154.15, 154.92, 154.34],
                              ['2018-04-12', 147.68, 153.15, 153.92, 153.34],
                              ['2018-04-13', 148.68, 152.15, 152.92, 152.34],
                              ['2018-04-14', 149.68, 151.15, 151.92, 151.34]]}

### 3.2: Setting up WML Credentials

Retrieve your `wml_service_credentials_username`, `wml_service_credentials_password`, and `wml_service_credentials_url` from the service credentials associated with your IBM Cloud Watson Machine Learning Service instance. 

This can be done accessing the service instance in the IBM Cloud web portal.

In [28]:
wml_credentials = {
    "password": "",
    "url": "",
    "username": ""
}

deployment_endpoint = ""

### 3.3: Making an API Call to WML

The next code cell uses `urllib3` and `requests` to communicate with the WML API. The `payload_scoring` dict is used as input.

In [42]:
import urllib3, requests, json


headers = urllib3.util.make_headers(basic_auth='{username}:{password}'.format(username=wml_credentials['username'], password=wml_credentials['password']))
url = '{}/v3/identity/token'.format(wml_credentials['url'])
response = requests.get(url, headers=headers)
mltoken = json.loads(response.text).get('token')

header = {'Content-Type': 'application/json', 'Authorization': 'Bearer ' + mltoken}

response_scoring = requests.post(deployment_endpoint, json=payload_scoring, headers=header)
print("Scoring finished")
print("Response type: {}".format(type(json.loads(response_scoring.text))))

Scoring finished
Response type: <class 'dict'>


In the next cell we can visualize the WML response, that is also of type `dict`. 

In [30]:
data = json.loads(response_scoring.text)
print(data)

{'fields': ['Date', '$FutureFlag', 'Close', '$TS-Close', '$TSLCI-Close', '$TSUCI-Close', '$TSResidual-Close'], 'values': [['2018-03-27 00:00 AM UTC', 0, 155.34, 168.24935011849652, 163.26793474457358, 173.24097347698742, -0.07942181615950931], ['2018-03-28 00:00 AM UTC', 0, 154.34, 154.3392533442192, 149.76967890683088, 158.9181918160912, 0.0004138645171452111], ['2018-03-29 00:00 AM UTC', 0, 153.34, 153.65240406359464, 149.10316540499926, 158.2109650843174, -0.0016262299138271137], ['2018-03-30 00:00 AM UTC', 0, 152.34, 153.3901746847682, 148.8486999400201, 157.94095588172516, -0.006460944870352002], ['2018-03-31 00:00 AM UTC', 0, 151.34, 152.28598997248523, 147.77720719770636, 156.8040122066406, -0.005822277956358617], ['2018-04-01 00:00 AM UTC', 0, 150.34, 151.2838038170971, 146.80469310654, 155.77209318263058, -0.005849145729441344], ['2018-04-02 00:00 AM UTC', 0, 151.34, 150.28974021813116, 145.84006108452255, 154.74853769512583, 0.007372955631145285], ['2018-04-03 00:00 AM UTC', 

Before using `Bokeh` to interact with the data, we need to parse it in a Pandas dataframe.

### 3.4: Parsing the WML Results

In [31]:
from datetime import datetime


def parse(dic):
    for k, v in dic.items():
        if isinstance(v, dict):
            for p in parse(v):
                yield [k] + p
        else:
            yield [k, v]

lst = list(parse(data))
columns = lst[0][1]
values = lst[1][1]

def parse(values):
    for k in values:
        string_lst = k[0].split(" ")
        k[0] = datetime.strptime(string_lst[0], '%Y-%m-%d')

parse(values)

The code cell above transformed the `dict` response into two lists: the labels (columns) and rows (values).

In [32]:
print(values)

[[datetime.datetime(2018, 3, 27, 0, 0), 0, 155.34, 168.24935011849652, 163.26793474457358, 173.24097347698742, -0.07942181615950931], [datetime.datetime(2018, 3, 28, 0, 0), 0, 154.34, 154.3392533442192, 149.76967890683088, 158.9181918160912, 0.0004138645171452111], [datetime.datetime(2018, 3, 29, 0, 0), 0, 153.34, 153.65240406359464, 149.10316540499926, 158.2109650843174, -0.0016262299138271137], [datetime.datetime(2018, 3, 30, 0, 0), 0, 152.34, 153.3901746847682, 148.8486999400201, 157.94095588172516, -0.006460944870352002], [datetime.datetime(2018, 3, 31, 0, 0), 0, 151.34, 152.28598997248523, 147.77720719770636, 156.8040122066406, -0.005822277956358617], [datetime.datetime(2018, 4, 1, 0, 0), 0, 150.34, 151.2838038170971, 146.80469310654, 155.77209318263058, -0.005849145729441344], [datetime.datetime(2018, 4, 2, 0, 0), 0, 151.34, 150.28974021813116, 145.84006108452255, 154.74853769512583, 0.007372955631145285], [datetime.datetime(2018, 4, 3, 0, 0), 0, 152.34, 151.41713483961024, 146.9

Next we just create a new Pandas dataframe with the future data for Apple Inc. stocks, retrieved from WML using the API.

In [33]:
ndf = pd.DataFrame.from_records(values, columns=columns)
print(df.info())
ndf.tail()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9400 entries, 0 to 9399
Data columns (total 5 columns):
Date     9400 non-null datetime64[ns]
Open     9400 non-null float64
High     9400 non-null float64
Low      9400 non-null float64
Close    9400 non-null float64
dtypes: datetime64[ns](1), float64(4)
memory usage: 367.3 KB
None


Unnamed: 0,Date,$FutureFlag,Close,$TS-Close,$TSLCI-Close,$TSUCI-Close,$TSResidual-Close
379,2019-04-10,1,,187.297047,82.139857,293.07941,
380,2019-04-11,1,,187.394839,82.067515,293.337759,
381,2019-04-12,1,,187.492683,81.995336,293.595979,
382,2019-04-13,1,,187.590577,81.92332,293.85407,
383,2019-04-14,1,,187.688523,81.851466,294.112033,


<hr>

# 4: Visualizing the Results

In [34]:
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource
from bokeh.embed import components
from bokeh.io import output_notebook

print('Packages imported.')

Packages imported.


In [35]:
# Load bokeh
output_notebook()

In [40]:
# Figure
p = figure(plot_width=1200, plot_height=550, title='Histórico de Preços das Ações', x_axis_type="datetime")

# Plot Lines
p.line(df.Date, df.Close, line_width=1, line_color="#0099ff", legend='Observed Close Value')
p.line(ndf.Date, ndf['$TS-Close'], line_width=2, line_color="#ff6699", legend='Modeled Close Value')
p.line(ndf.Date, ndf['$TSUCI-Close'], line_width=0.5, line_color="#000000", legend='Upper Estimate')
p.line(ndf.Date, ndf['$TSLCI-Close'], line_width=0.5, line_color="#000000", legend='Lower Estimate')

# Axis and Labels
p.legend.orientation = "vertical"
p.xaxis.axis_label = "Date"
p.xaxis.axis_label_text_font_style = 'bold'
p.xaxis.axis_label_text_font_size = '16pt'
p.xaxis.major_label_text_font_size = '14pt'
p.yaxis.axis_label = "Value ($ USD)"
p.yaxis.axis_label_text_font_style = 'bold'
p.yaxis.axis_label_text_font_size = '16pt'
p.yaxis.major_label_text_font_size = '12pt'

In [41]:
show(p)

<hr>

This notebook and its source code is made available under the terms of the <a href = "https://github.com/vanderleipf/ibmdegla-ws-projects/blob/master/LICENSE">MIT License</a>.

<hr>

### Thank you for completing this journey!

Notebook created by: <a href = "https://www.linkedin.com/in/vanderleimpf87719/">Vanderlei Pereira</a>