# Build a product recommendation engine

![](https://raw.githubusercontent.com/IBM/product-recommendation-with-watson-ml/master/doc/source/images/shopping.png)

This notebook contains steps and code to create a recommendation engine based on shopping history and deploy that model to Watson Machine Learning. This notebook runs on Python 3.x with Apache Spark 2.3.

## Learning Goals

The learning goals of this notebook are:

* Load a CSV file into the Object Storage service linked to your Watson Studio
* Use the *k*-means algorithm, which is useful for cluster analysis in data mining, to segment customers into clusters for the purpose of making an in-store purchase recommendation
* Deploy the model to the IBM Watson Machine Learning service in IBM Cloud

## Table of contents

1. [Setup](#setup)<br>
2. [Load and explore data](#load)<br>
3. [Create a KMeans model](#kmeans)<br>
   3.1. [Prepare data](#prepare_data)<br>
   3.2. [Create clusters and define the model](#build_model)<br>
4. [Persist the model](#persist)<br>
5. [Deploy the model to the cloud](#deploy)<br>
   5.1. [Create deployment for the model](#create_deploy)<br>
   5.2. [Test model deployment](#test_deploy)<br>
6. [Create product recommendations](#create_recomm)<br>
   6.1. [Test product recommendations model](#test_recomm)<br>
7. [Summary and next steps](#summary)<br>

## 1. Setup


Before you use the sample code in this notebook, you must perform the following setup tasks:

* Create a Watson Machine Learning service instance (a free plan is offered) and associate it with your project
* Create a Cloud Object Storage service instance (a free plan is offered) and associate it with your project


We'll be using a couple libraries for this exercise:

1. [Watson Machine Learning Client](http://wml-api-pyclient.mybluemix.net/): Client library to work with the Watson Machine Learning service on IBM Cloud. Library available on [pypi](https://pypi.org/project/watson-machine-learning-client/). Service available on [IBM Cloud](https://cloud.ibm.com/catalog/services/machine-learning).
1. [ibmos2spark](https://github.com/ibm-watson-data-lab/ibmos2spark): Facilitates Data I/O between Spark and IBM Object Storage services

In [1]:
!pip install --upgrade ibmos2spark
!pip install --upgrade watson-machine-learning-client

Waiting for a Spark session to start...
Spark Initialization Done! ApplicationId = app-20200218172511-0000
KERNEL_ID = 0576217d-9da4-4667-9dad-ad0f1093407c
Collecting ibmos2spark
  Downloading https://files.pythonhosted.org/packages/c6/81/1edb24382edef1ca636e87972b2da286b8271a586c728a21f916d3cd76cd/ibmos2spark-1.0.1-py2.py3-none-any.whl
Installing collected packages: ibmos2spark
Successfully installed ibmos2spark-1.0.1
Collecting watson-machine-learning-client
[?25l  Downloading https://files.pythonhosted.org/packages/12/67/66db412f00d19bfdc5725078bff373787513bfb14320f2804b9db3abb53a/watson_machine_learning_client-1.0.378-py3-none-any.whl (536kB)
[K    100% |################################| 542kB 3.4MB/s eta 0:00:01
[?25hCollecting pandas (from watson-machine-learning-client)
[?25l  Downloading https://files.pythonhosted.org/packages/08/ec/b5dd8cfb078380fb5ae9325771146bccd4e8cad2d3e4c72c7433010684eb/pandas-1.0.1-cp36-cp36m-manylinux1_x86_64.whl (10.1MB)
[K    100% |##############

<a id="load"></a>
## 2. Load and explore data

In this section you will load and access the data file that contains the customer shopping data using [Cloud Object Storage in the notebook](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/load-and-access-data.html):

1. Place cursor into the next cell (`# Generated Code Here`)
1. Click the **Find and Add Data** icon to open the Files and Connections side bar
1. Click **browse** and navigate to and select the `customers_orders1_opt.csv`
1. Click **Insert to code**
1. Select **SparkSession DataFrame**

Code to download and import the CSV data into a Spark DataFrame is generated and added into the notebook cell.

```
import ibmos2spark
# @hidden_cell
credentials = {
    'endpoint': 'https://s3-api.us-geo.objectstorage.service.networklayer.com',
    'service_id': '***',
    'iam_service_endpoint': 'https://iam.ng.bluemix.net/oidc/token',
    'api_key': '***'
}

configuration_name = 'os_7135ade4b1d24e67b69b610d4a20966c_configs'
cos = ibmos2spark.CloudObjectStorage(sc, credentials, configuration_name, 'bluemix_cos')

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df_data_1 = spark.read\
  .format('org.apache.spark.sql.execution.datasources.csv.CSVFileFormat')\
  .option('header', 'true')\
  .load(cos.url('customers_orders1_opt.csv', '***'))
df_data_1.take(5)
```

Run the generated code.


In [2]:
# Generated Code Here
import ibmos2spark
# @hidden_cell
credentials = {
    'endpoint': 'https://s3-api.us-geo.objectstorage.service.networklayer.com',
    'service_id': '***',
    'iam_service_endpoint': 'https://iam.ng.bluemix.net/oidc/token',
    'api_key': '***'
}

configuration_name = 'os_7135ade4b1d24e67b69b610d4a20966c_configs'
cos = ibmos2spark.CloudObjectStorage(sc, credentials, configuration_name, 'bluemix_cos')

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df_data_2 = spark.read\
  .format('org.apache.spark.sql.execution.datasources.csv.CSVFileFormat')\
  .option('header', 'true')\
  .load(cos.url('customers_orders1_opt.csv', '***'))
df_data_2.take(5)


[Row(CUSTNAME='Allen Perl          ', GenderCode='Mr.', ADDRESS1='4707    Hillcrest Lane', CITY='Abeto', STATE='PG', COUNTRY_CODE='IT', POSTAL_CODE='6040', POSTAL_CODE_PLUS4='0', ADDRESS2=None, EMAIL_ADDRESS='Allen.M.Perl@spambob.com', PHONE_NUMBER='0370 4762239', CREDITCARD_TYPE='Master Card', LOCALITY=None, SALESMAN_ID='RP385 ', NATIONALITY='U.S.', NATIONAL_ID='22867928', CREDITCARD_NUMBER='5179762243750832', DRIVER_LICENSE=None, CUST_ID='10003', ORDER_ID='1106', ORDER_DATE='2016-06-23 00:00:00.000', ORDER_TIME='2016-06-23 15:29:06.250', FREIGHT_CHARGES='29.790000', ORDER_SALESMAN='NC298 ', ORDER_POSTED_DATE='2016-07-15 00:00:00.000', ORDER_SHIP_DATE='27/07/2016', AGE='27', ORDER_VALUE='134.24', T_TYPE='Complete', PURCHASE_TOUCHPOINT='Phone', PURCHASE_STATUS='Frequent', ORDER_TYPE='MediumValue', GENERATION='Gen_Y', Baby Food='0', Diapers='0', Formula='1', Lotion='1', Baby wash='0', Wipes='0', Fresh Fruits='0', Fresh Vegetables='0', Beer='0', Wine='0', Club Soda='0', Sports Drink='0',

<br>

Update and set the `df` variable to the dataframe variable (e.g., `df_data_1`) created by the generated code


In [4]:
df = df_data_2

<a id="kmeans"></a>
## 3. Create a *k*-means model with Spark

In this section of the notebook you use the *k*-means implementation to associate every customer to a cluster based on their shopping history.

First, import the Apache Spark Machine Learning packages ([MLlib](http://spark.apache.org/docs/2.2.0/api/python/pyspark.ml.html)) that you need in the subsequent steps:


In [5]:
from pyspark.ml import Pipeline
from pyspark.ml.clustering import KMeans
from pyspark.ml.clustering import KMeansModel
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.linalg import Vectors

<a id="prepare_data"></a>
### 3.1 Prepare data

Create a new data set with just the data that you need. Filter the columns that you want, in this case the customer ID column and the product-related columns. Remove the columns that you don't need for aggregating the data and training the model. Convert the column types from `StringType` to `IntegerType`:

In [6]:
from pyspark.sql.types import IntegerType

# Here are the product cols. In a real world scenario we would query a product table, or similar.
product_cols = ['Baby Food', 'Diapers', 'Formula', 'Lotion', 'Baby wash', 'Wipes', 'Fresh Fruits', 'Fresh Vegetables', 'Beer', 'Wine', 'Club Soda', 'Sports Drink', 'Chips', 'Popcorn', 'Oatmeal', 'Medicines', 'Canned Foods', 'Cigarettes', 'Cheese', 'Cleaning Products', 'Condiments', 'Frozen Foods', 'Kitchen Items', 'Meat', 'Office Supplies', 'Personal Care', 'Pet Supplies', 'Sea Food', 'Spices']
# Here we get the customer ID and the products they purchased
df_filtered = df.select(['CUST_ID'] + product_cols)

for c in product_cols:
    df_filtered = df_filtered.withColumn(c, df[c].cast(IntegerType()))

<br>

View the filtered information:

In [7]:
df_filtered.show()

+-------+---------+-------+-------+------+---------+-----+------------+----------------+----+----+---------+------------+-----+-------+-------+---------+------------+----------+------+-----------------+----------+------------+-------------+----+---------------+-------------+------------+--------+------+
|CUST_ID|Baby Food|Diapers|Formula|Lotion|Baby wash|Wipes|Fresh Fruits|Fresh Vegetables|Beer|Wine|Club Soda|Sports Drink|Chips|Popcorn|Oatmeal|Medicines|Canned Foods|Cigarettes|Cheese|Cleaning Products|Condiments|Frozen Foods|Kitchen Items|Meat|Office Supplies|Personal Care|Pet Supplies|Sea Food|Spices|
+-------+---------+-------+-------+------+---------+-----+------------+----------------+----+----+---------+------------+-----+-------+-------+---------+------------+----------+------+-----------------+----------+------------+-------------+----+---------------+-------------+------------+--------+------+
|  10003|        0|      0|      1|     1|        0|    0|           0|              

Now, aggregate the individual transactions for each customer to get a single score per product, per customer.

In [8]:
df_customer_products = df_filtered.groupby('CUST_ID').sum()  # Use customer IDs to group transactions by customer and sum them up
df_customer_products = df_customer_products.drop('sum(CUST_ID)')

df_customer_products.show()

+-------+--------------+------------+------------+-----------+--------------+----------+-----------------+---------------------+---------+---------+--------------+-----------------+----------+------------+------------+--------------+-----------------+---------------+-----------+----------------------+---------------+-----------------+------------------+---------+--------------------+------------------+-----------------+-------------+-----------+
|CUST_ID|sum(Baby Food)|sum(Diapers)|sum(Formula)|sum(Lotion)|sum(Baby wash)|sum(Wipes)|sum(Fresh Fruits)|sum(Fresh Vegetables)|sum(Beer)|sum(Wine)|sum(Club Soda)|sum(Sports Drink)|sum(Chips)|sum(Popcorn)|sum(Oatmeal)|sum(Medicines)|sum(Canned Foods)|sum(Cigarettes)|sum(Cheese)|sum(Cleaning Products)|sum(Condiments)|sum(Frozen Foods)|sum(Kitchen Items)|sum(Meat)|sum(Office Supplies)|sum(Personal Care)|sum(Pet Supplies)|sum(Sea Food)|sum(Spices)|
+-------+--------------+------------+------------+-----------+--------------+----------+------------

<a id="build_model"></a>
### 3.2 Create clusters and define the model 

Create 100 clusters with a *k*-means model based on the number of times a specific customer purchased a product.

| No Clustering | Clustering |
|------|------|
|  ![](https://raw.githubusercontent.com/IBM/product-recommendation-with-watson-ml/master/doc/source/images/kmeans-1.jpg)  | ![](https://raw.githubusercontent.com/IBM/product-recommendation-with-watson-ml/master/doc/source/images/kmeans-2.jpg) |

First, create a feature vector by combining the product and quantity columns:

In [9]:
assembler = VectorAssembler(inputCols=["sum({})".format(x) for x in product_cols],outputCol="features") # Assemble vectors using product fields

Next, create the *k*-means clusters and the pipeline to define the model:

In [10]:
kmeans = KMeans(maxIter=50, predictionCol="cluster").setK(100).setSeed(1)  # Initialize model
pipeline = Pipeline(stages=[assembler, kmeans])
model = pipeline.fit(df_customer_products)

Finally, calculate the cluster for each customer by running the original dataset against the *k*-means model: 

In [11]:
df_customer_products_cluster = model.transform(df_customer_products)
df_customer_products_cluster.show()

+-------+--------------+------------+------------+-----------+--------------+----------+-----------------+---------------------+---------+---------+--------------+-----------------+----------+------------+------------+--------------+-----------------+---------------+-----------+----------------------+---------------+-----------------+------------------+---------+--------------------+------------------+-----------------+-------------+-----------+--------------------+-------+
|CUST_ID|sum(Baby Food)|sum(Diapers)|sum(Formula)|sum(Lotion)|sum(Baby wash)|sum(Wipes)|sum(Fresh Fruits)|sum(Fresh Vegetables)|sum(Beer)|sum(Wine)|sum(Club Soda)|sum(Sports Drink)|sum(Chips)|sum(Popcorn)|sum(Oatmeal)|sum(Medicines)|sum(Canned Foods)|sum(Cigarettes)|sum(Cheese)|sum(Cleaning Products)|sum(Condiments)|sum(Frozen Foods)|sum(Kitchen Items)|sum(Meat)|sum(Office Supplies)|sum(Personal Care)|sum(Pet Supplies)|sum(Sea Food)|sum(Spices)|            features|cluster|
+-------+--------------+------------+-----

<a id="persist"></a>
## 4. Persist the model 

In this section you will learn how to store your pipeline and model in Watson Machine Learning repository by using Python client libraries.

### 4.1 Configure IBM Watson Machine Learning credentials

To access your machine learning repository programmatically, you need to copy in your credentials, which you can see in your **IBM Watson Machine Learning** service details in IBM Cloud.

> **IMPORTANT**: Update `apikey` and `instance_id` below. Credentials can be found on _Service Credentials_ tab of the Watson Machine Learning service instance created on the IBM Cloud.

In [14]:
# @hidden_cell
wml_credentials = {
  "apikey": "***",
  "iam_apikey_description": "Auto-generated for key ***",
  "iam_apikey_name": "Service credentials-1",
  "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
  "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/***",
  "instance_id": "***",
  "url": "https://us-south.ml.cloud.ibm.com"
}

print(wml_credentials)

{'apikey': '***', 'iam_apikey_description': 'Auto-generated for key ***', 'iam_apikey_name': 'Service credentials-1', 'iam_role_crn': 'crn:v1:bluemix:public:iam::::serviceRole:Writer', 'iam_serviceid_crn': 'crn:v1:bluemix:public:iam-identity::a/***', 'instance_id': '***', 'url': 'https://us-south.ml.cloud.ibm.com'}


Connect to the Watson Machine Learning service using the provided credentials.

In [13]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient
client = WatsonMachineLearningAPIClient(wml_credentials)
print(client.version)

1.0.378


### 4.2 Save the model 

#### Save the model to the Watson Machine Learning repository

You use the Watson Machine Learning client's [Repository class](http://wml-api-pyclient.mybluemix.net/#repository) to store and manage models in the Watson Machine Learning service. 

> **NOTE**: You can also use Watson Studio to manage models. In this notebook we are using the client library instead.

In [15]:
train_data = df_customer_products.withColumnRenamed('CUST_ID', 'label')

> **TIP**: Update the cell below with your name, email, and name you wish to give to your model.

In [16]:
model_props = {client.repository.ModelMetaNames.AUTHOR_NAME: "IBM", 
               client.repository.ModelMetaNames.NAME: "Shopping Recommendation Engine"}
published_model = client.repository.store_model(model=model, pipeline=pipeline, meta_props=model_props, training_data=train_data)

> **NOTE**: You can delete a model from the repository by calling `client.repository.delete`.

#### Display list of existing models in the Watson Machine Learning repository 

In [17]:
client.repository.list_models()

------------------------------------  ------------------------------  ------------------------  -----------------
GUID                                  NAME                            CREATED                   FRAMEWORK
585f73ac-7263-45bb-a59e-35c6f9e38bf4  Shopping Recommendation Engine  2020-02-18T17:33:27.009Z  mllib-2.3
a3345023-4887-4e07-b1bf-0dc1e31e6343  Handwritten Digits Recognition  2017-09-15T17:53:15.409Z  scikit-learn-0.17
b2006e75-2ff2-494a-b915-a56bc63b1db8  Sentiment Prediction            2017-09-15T17:53:09.561Z  mllib-2.0
------------------------------------  ------------------------------  ------------------------  -----------------


#### Display information about the saved model

In [18]:
import json
saved_model_uid = client.repository.get_model_uid(published_model)
model_details = client.repository.get_details(saved_model_uid)
print(json.dumps(model_details, indent=2))

{
  "metadata": {
    "guid": "585f73ac-7263-45bb-a59e-35c6f9e38bf4",
    "url": "https://us-south.ml.cloud.ibm.com/v3/wml_instances/745bd577-af9e-4896-a105-0aa49d0befe8/published_models/585f73ac-7263-45bb-a59e-35c6f9e38bf4",
    "created_at": "2020-02-18T17:33:27.009Z",
    "modified_at": "2020-02-18T17:33:27.075Z"
  },
  "entity": {
    "runtime_environment": "spark-2.3",
    "learning_configuration_url": "https://us-south.ml.cloud.ibm.com/v3/wml_instances/745bd577-af9e-4896-a105-0aa49d0befe8/published_models/585f73ac-7263-45bb-a59e-35c6f9e38bf4/learning_configuration",
    "author": {
      "name": "IBM"
    },
    "name": "Shopping Recommendation Engine",
    "label_col": "label",
    "learning_iterations_url": "https://us-south.ml.cloud.ibm.com/v3/wml_instances/745bd577-af9e-4896-a105-0aa49d0befe8/published_models/585f73ac-7263-45bb-a59e-35c6f9e38bf4/learning_iterations",
    "training_data_schema": {
      "fields": [
        {
          "metadata": {
            "modeling_role":

<a id="deploy"></a>
## 5. Deploy model to the IBM cloud

You use the Watson Machine Learning client's [Deployments class](http://wml-api-pyclient.mybluemix.net/#deployments) to deploy and score models.

### 5.1 Create an online deployment for the model


In [19]:
created_deployment = client.deployments.create(saved_model_uid, 'Shopping Recommendation Engine Deployment')



#######################################################################################

Synchronous deployment creation for uid: '585f73ac-7263-45bb-a59e-35c6f9e38bf4' started

#######################################################################################


INITIALIZING
DEPLOY_SUCCESS


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='9c34e64f-a9c7-45b6-b934-3351e6702b90'
------------------------------------------------------------------------------------------------




### 5.2 Retrieve the scoring endpoint for this model

In [20]:
scoring_endpoint = client.deployments.get_scoring_url(created_deployment)
print(scoring_endpoint)

https://us-south.ml.cloud.ibm.com/v3/wml_instances/745bd577-af9e-4896-a105-0aa49d0befe8/deployments/9c34e64f-a9c7-45b6-b934-3351e6702b90/online


<a id="test_deploy"></a>
### 5.3 Test the deployed model

To verify that the model was successfully deployed to the cloud, you'll specify a customer ID, for example customer 12027, to predict this customer's cluster against the Watson Machine Learning deployment, and see if it matches the cluster that was previously associated this customer ID.

In [21]:
customer = df_customer_products_cluster.filter('CUST_ID = 12027').collect()
print("Previously calculated cluster = {}".format(customer[0].cluster))

Previously calculated cluster = 31


To determine the customer's cluster using Watson Machine Learning, you need to load the customer's purchase history. This function uses the local data frame to select every product field and the number of times that customer 12027 purchased a product.

In [22]:
from six import iteritems
def get_product_counts_for_customer(cust_id):
    cust = df_customer_products.filter('CUST_ID = {}'.format(cust_id)).take(1)
    fields = []
    values = []
    for row in cust:
        for product_col in product_cols:
            field = 'sum({})'.format(product_col)
            value = row[field]
            fields.append(field)
            values.append(value)
    return (fields, values)

This function takes the customer's purchase history and calls the scoring endpoint:

In [23]:
def get_cluster_from_watson_ml(fields, values):
    scoring_payload = {'fields': fields, 'values': [values]}
    predictions = client.deployments.score(scoring_endpoint, scoring_payload)   
    return predictions['values'][0][len(product_cols)+1]

Finally, call the functions defined above to get the product history, call the scoring endpoint, and get the cluster associated to customer 12027:

In [24]:
product_counts = get_product_counts_for_customer(12027)
fields = product_counts[0]
values = product_counts[1]
print("Cluster calculated by Watson ML = {}".format(get_cluster_from_watson_ml(fields, values)))

Cluster calculated by Watson ML = 31


<a id="create_recomm"></a>
## 6. Create product recommendations

Now you can create some product recommendations.

First, run this cell to create a function that queries the database and finds the most popular items for a cluster. In this case, the **df_customer_products_cluster** dataframe is the database.

In [25]:
# This function gets the most popular clusters in the cell by grouping by the cluster column
def get_popular_products_in_cluster(cluster):
    df_cluster_products = df_customer_products_cluster.filter('cluster = {}'.format(cluster))
    df_cluster_products_agg = df_cluster_products.groupby('cluster').sum()
    row = df_cluster_products_agg.rdd.collect()[0]
    items = []
    for product_col in product_cols:
        field = 'sum(sum({}))'.format(product_col)
        items.append((product_col, row[field]))
    sortedItems = sorted(items, key=lambda x: x[1], reverse=True) # Sort by score
    popular = [x for x in sortedItems if x[1] > 0]
    return popular

Now, run this cell to create a function that will calculate the recommendations based on a given cluster. This function finds the most popular products in the cluster, filters out products already purchased by the customer or currently in the customer's shopping cart, and finally produces a list of recommended products.

In [26]:
# This function takes a cluster and the quantity of every product already purchased or in the user's cart
from pyspark.sql.functions import desc
def get_recommendations_by_cluster(cluster, purchased_quantities):
    # Existing customer products
    print('PRODUCTS ALREADY PURCHASED/IN CART:')
    customer_products = []
    for i in range(0, len(product_cols)):
        if purchased_quantities[i] > 0:
            customer_products.append((product_cols[i], purchased_quantities[i]))
    df_customer_products = sc.parallelize(customer_products).toDF(["PRODUCT","COUNT"])
    df_customer_products.show()
    # Get popular products in the cluster
    print('POPULAR PRODUCTS IN CLUSTER:')
    cluster_products = get_popular_products_in_cluster(cluster)
    df_cluster_products = sc.parallelize(cluster_products).toDF(["PRODUCT","COUNT"])
    df_cluster_products.show()
    # Filter out products the user has already purchased
    print('RECOMMENDED PRODUCTS:')
    df_recommended_products = df_cluster_products.alias('cl').join(df_customer_products.alias('cu'), df_cluster_products['PRODUCT'] == df_customer_products['PRODUCT'], 'leftouter')
    df_recommended_products = df_recommended_products.filter('cu.PRODUCT IS NULL').select('cl.PRODUCT','cl.COUNT').sort(desc('cl.COUNT'))
    df_recommended_products.show(10)

Next, run this cell to create a function that produces a list of recommended items based on the products and quantities in a user's cart. This function uses Watson Machine Learning to calculate the cluster based on the shopping cart contents and then calls the **get_recommendations_by_cluster** function.

In [27]:
# This function would be used to find recommendations based on the products and quantities in a user's cart
def get_recommendations_for_shopping_cart(products, quantities):
    fields = []
    values = []
    for product_col in product_cols:
        field = 'sum({})'.format(product_col)
        if product_col in products:
            value = quantities[products.index(product_col)]
        else:
            value = 0
        fields.append(field)
        values.append(value)
    return get_recommendations_by_cluster(get_cluster_from_watson_ml(fields, values), values)

Run this cell to create a function that produces a list of recommended items based on the purchase history of a customer. This function uses Watson Machine Learning to calculate the cluster based on the customer's purchase history and then calls the **get_recommendations_by_cluster** function.

In [28]:
# This function is used to find recommendations based on the purchase history of a customer
def get_recommendations_for_customer_purchase_history(customer_id):
    product_counts = get_product_counts_for_customer(customer_id)
    fields = product_counts[0]
    values = product_counts[1]
    return get_recommendations_by_cluster(get_cluster_from_watson_ml(fields, values), values)

Now you can take customer 12027 and produce a recommendation based on that customer's purchase history:

In [29]:
get_recommendations_for_customer_purchase_history(12027)

PRODUCTS ALREADY PURCHASED/IN CART:
+-------------+-----+
|      PRODUCT|COUNT|
+-------------+-----+
|      Diapers|    1|
|    Baby wash|    1|
|         Beer|    1|
|         Wine|    3|
|    Medicines|    3|
|       Cheese|    3|
| Frozen Foods|    1|
|Kitchen Items|    1|
|     Sea Food|    1|
|       Spices|    2|
+-------------+-----+

POPULAR PRODUCTS IN CLUSTER:
+-----------------+-----+
|          PRODUCT|COUNT|
+-----------------+-----+
|           Cheese|   72|
|        Medicines|   71|
|             Wine|   62|
|           Spices|   50|
|    Personal Care|   39|
|         Sea Food|   31|
|       Condiments|   27|
|     Frozen Foods|   26|
|Cleaning Products|   24|
|       Cigarettes|   21|
|     Canned Foods|   16|
|          Diapers|   15|
|     Fresh Fruits|   14|
|          Formula|   13|
|        Baby wash|   11|
| Fresh Vegetables|   10|
|            Wipes|    9|
|             Beer|    9|
|     Pet Supplies|    9|
|     Sports Drink|    8|
+-----------------+-----+
on

Now, take a sample shopping cart and produce a recommendation based on the items in the cart:

In [30]:
get_recommendations_for_shopping_cart(['Diapers','Baby wash','Oatmeal'],[1,2,1])

PRODUCTS ALREADY PURCHASED/IN CART:
+---------+-----+
|  PRODUCT|COUNT|
+---------+-----+
|  Diapers|    1|
|Baby wash|    2|
|  Oatmeal|    1|
+---------+-----+

POPULAR PRODUCTS IN CLUSTER:
+-----------------+-----+
|          PRODUCT|COUNT|
+-----------------+-----+
|          Diapers|   81|
|             Beer|   70|
|       Condiments|   22|
|        Club Soda|   18|
|     Sports Drink|   18|
|          Popcorn|   18|
|Cleaning Products|   18|
|        Baby wash|   16|
|  Office Supplies|   14|
|           Lotion|   13|
|            Wipes|   13|
|          Oatmeal|   13|
|     Pet Supplies|   12|
|    Kitchen Items|   11|
|          Formula|   10|
|        Baby Food|    8|
|     Fresh Fruits|    7|
|     Canned Foods|    7|
| Fresh Vegetables|    5|
+-----------------+-----+

RECOMMENDED PRODUCTS:
+-----------------+-----+
|          PRODUCT|COUNT|
+-----------------+-----+
|             Beer|   70|
|       Condiments|   22|
|        Club Soda|   18|
|Cleaning Products|   18|
|    

## <font color=green>Congratulations</font>, you've sucessfully created a recommendation engine and deployed it to the Watson Machine Learning service

You can now switch to the Watson Machine Learning console to deploy the model and then test it in application, or continue within the notebook to deploy the model using the APIs.