# CPG Industry - Personalization Workshop

Welcome to the CPG Industry Personalization Workshop. In this module we're going to be adding three core personalization features powered by [Amazon Personalize](https://aws.amazon.com/personalize/): related product recommendations on the product detail page, personalized recommendations, and personalized ranking of items. This will allow us to give our users targeted recommendations based on their activity.
This workshop reuse a lot of code and behaviour from Retail Demo Store, if you want to expand to explore retail related cases take a look at: https://github.com/aws-samples/retail-demo-store

Recommended Time: 2 Hours

## Setup

To run this notebook, you need to have run the previous notebook, 01_Data_Layer, where you created a dataset and imported interaction data into Amazon Personalize. At the end of that notebook, you saved some of the variable values, which you now need to load into this notebook.

In [1]:
%store -r

### Import Dependencies and Setup Boto3 Python Clients

Throughout this workshop we will need access to some common libraries and clients for connecting to AWS services. We also have to retrieve Uid from a SageMaker notebook instance tag.

In [2]:
# Import Dependencies

import boto3
import json
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import time
import requests
import csv
import sys
import botocore
import uuid

from packaging import version
from random import randint
from botocore.exceptions import ClientError

%matplotlib inline

# Setup Clients

personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')
personalize_events = boto3.client('personalize-events')
s3 = boto3.client('s3')

with open('/opt/ml/metadata/resource-metadata.json') as f:
  data = json.load(f)
sagemaker = boto3.client('sagemaker')
sagemakerResponce = sagemaker.list_tags(ResourceArn=data["ResourceArn"])
for tag in sagemakerResponce["Tags"]:
    if tag['Key'] == 'Uid':
        Uid = tag['Value']
        break

### Implement some visualization functions for displaying information of the products in a dataframe

Throughout this workshop we will need to search information of products several times, this function will help us to do it without repeating the same code.

In [3]:
def search_items_in_dataframe(item_list):
    df = pd.DataFrame() 
    for x in range(len(item_list)):
        temp = products_dataset_df.loc[products_dataset_df['ITEM_ID'] == int(item_list[x]['itemId'])]
        df = df.append(temp, ignore_index=True)
    pd.set_option('display.max_rows', 10)
    return df

## Create Solutions

With our three datasets imported into our dataset group, we can now turn to training models. As a reminder, we will be training three models in this workshop to support three different personalization use-cases. One model will be used to make related product recommendations on the product detail view/page, another model will be used to make personalized product recommendations to users on the homepage, and the last model will be used to rerank product lists on the category and featured products page. In Amazon Personalize, training a model involves creating a Solution and Solution Version. So when we are finished we will have three solutions and a solution version for each solution. 

When creating a solution, you provide your dataset group and the recipe for training. Let's declare the recipes that we will need for our solutions.

### List Recipes

First, let's list all available recipes.

In [4]:
list_recipes_response = personalize.list_recipes()
list_recipes_response

{'recipes': [{'name': 'aws-hrnn',
   'recipeArn': 'arn:aws:personalize:::recipe/aws-hrnn',
   'status': 'ACTIVE',
   'creationDateTime': datetime.datetime(2019, 6, 10, 0, 0, tzinfo=tzlocal()),
   'lastUpdatedDateTime': datetime.datetime(2021, 1, 5, 0, 8, 53, 800000, tzinfo=tzlocal())},
  {'name': 'aws-hrnn-coldstart',
   'recipeArn': 'arn:aws:personalize:::recipe/aws-hrnn-coldstart',
   'status': 'ACTIVE',
   'creationDateTime': datetime.datetime(2019, 6, 10, 0, 0, tzinfo=tzlocal()),
   'lastUpdatedDateTime': datetime.datetime(2021, 1, 5, 0, 8, 53, 800000, tzinfo=tzlocal())},
  {'name': 'aws-hrnn-metadata',
   'recipeArn': 'arn:aws:personalize:::recipe/aws-hrnn-metadata',
   'status': 'ACTIVE',
   'creationDateTime': datetime.datetime(2019, 6, 10, 0, 0, tzinfo=tzlocal()),
   'lastUpdatedDateTime': datetime.datetime(2021, 1, 5, 0, 8, 53, 800000, tzinfo=tzlocal())},
  {'name': 'aws-personalized-ranking',
   'recipeArn': 'arn:aws:personalize:::recipe/aws-personalized-ranking',
   'status'

As you can see above, there are several recipes to choose from. Let's declare the recipes for each Solution.

#### Declare Personalize Recipe for Related Products

On the product detail page we want to display related products so we'll create a campaign using the [SIMS](https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-sims.html) recipe.

> The Item-to-item similarities (SIMS) recipe is based on the concept of collaborative filtering. A SIMS model leverages user-item interaction data to recommend items similar to a given item. In the absence of sufficient user behavior data for an item, this recipe recommends popular items.

In [5]:
related_recipe_arn = "arn:aws:personalize:::recipe/aws-sims"

#### Declare Personalize Recipe for Product Recommendations

Since we are providing metadata for users and items, we will be using the [HRNN-Metadata](https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-hrnn-metadata.html) recipe for our product recommendations solution.

> The HRNN-Metadata recipe predicts the items that a user will interact with. It is similar to the HRNN recipe, with additional features derived from contextual, user, and item metadata (from Interactions, Users, and Items datasets, respectively). HRNN-Metadata provides accuracy benefits over non-metadata models when high quality metadata is available. Using this recipe might require longer training times.

In [6]:
recommend_recipe_arn = "arn:aws:personalize:::recipe/aws-user-personalization"

#### Declare Personalize Recipe for Personalized Ranking

In use-cases where we have a curated list of products, we can use the [Personalized-Ranking](https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-search.html) recipe to reorder the products for the current user.

> The Personalized-Ranking recipe generates personalized rankings. A personalized ranking is a list of recommended items that are re-ranked for a specific user.

In [7]:
ranking_recipe_arn = "arn:aws:personalize:::recipe/aws-personalized-ranking"

### Create Solutions and Solution Versions

With our recipes defined, we can now create our solutions and solution versions.

#### Create Related Products Solution

In [8]:
create_solution_response = personalize.create_solution(
    name = "cpg-related-products",
    datasetGroupArn = dataset_group_arn,
    recipeArn = related_recipe_arn
)

related_solution_arn = create_solution_response['solutionArn']
print(json.dumps(create_solution_response, indent=2))

{
  "solutionArn": "arn:aws:personalize:us-east-1:444208467160:solution/cpg-related-products",
  "ResponseMetadata": {
    "RequestId": "bf4c0044-5cae-424a-93b2-7b0b9dc57fe1",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 05 Feb 2021 20:19:31 GMT",
      "x-amzn-requestid": "bf4c0044-5cae-424a-93b2-7b0b9dc57fe1",
      "content-length": "90",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Create Related Products Solution Version

In [9]:
create_solution_version_response = personalize.create_solution_version(
    solutionArn = related_solution_arn
)

related_solution_version_arn = create_solution_version_response['solutionVersionArn']
print(json.dumps(create_solution_version_response, indent=2))

{
  "solutionVersionArn": "arn:aws:personalize:us-east-1:444208467160:solution/cpg-related-products/e4228ca5",
  "ResponseMetadata": {
    "RequestId": "4231ff4d-69f3-4a3a-8732-83cb12701e9f",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 05 Feb 2021 20:19:36 GMT",
      "x-amzn-requestid": "4231ff4d-69f3-4a3a-8732-83cb12701e9f",
      "content-length": "106",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Create Product Recommendation Solution

In [10]:
create_solution_response = personalize.create_solution(
    name = "cpg-product-personalization",
    datasetGroupArn = dataset_group_arn,
    recipeArn = recommend_recipe_arn
)

recommend_solution_arn = create_solution_response['solutionArn']
print(json.dumps(create_solution_response, indent=2))

{
  "solutionArn": "arn:aws:personalize:us-east-1:444208467160:solution/cpg-product-personalization",
  "ResponseMetadata": {
    "RequestId": "b2bc5871-88c9-425f-9bff-ca9ede9b9804",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 05 Feb 2021 20:19:43 GMT",
      "x-amzn-requestid": "b2bc5871-88c9-425f-9bff-ca9ede9b9804",
      "content-length": "97",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Create Product Recommendation Solution Version

In [11]:
create_solution_version_response = personalize.create_solution_version(
    solutionArn = recommend_solution_arn
)

recommend_solution_version_arn = create_solution_version_response['solutionVersionArn']
print(json.dumps(create_solution_version_response, indent=2))

{
  "solutionVersionArn": "arn:aws:personalize:us-east-1:444208467160:solution/cpg-product-personalization/3e4a7b67",
  "ResponseMetadata": {
    "RequestId": "6f1a5ce5-651f-4d39-91ab-f25ab454ebca",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 05 Feb 2021 20:19:58 GMT",
      "x-amzn-requestid": "6f1a5ce5-651f-4d39-91ab-f25ab454ebca",
      "content-length": "113",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Create Personalized Ranking Solution

In [12]:
create_solution_response = personalize.create_solution(
    name = "cpg-personalized-ranking",
    datasetGroupArn = dataset_group_arn,
    recipeArn = ranking_recipe_arn
)

ranking_solution_arn = create_solution_response['solutionArn']
print(json.dumps(create_solution_response, indent=2))

{
  "solutionArn": "arn:aws:personalize:us-east-1:444208467160:solution/cpg-personalized-ranking",
  "ResponseMetadata": {
    "RequestId": "ad5add37-a2dd-4e45-aa12-05992e6b7e3a",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 05 Feb 2021 20:20:01 GMT",
      "x-amzn-requestid": "ad5add37-a2dd-4e45-aa12-05992e6b7e3a",
      "content-length": "94",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Create Personalized Ranking Solution Version

In [13]:
create_solution_version_response = personalize.create_solution_version(
    solutionArn = ranking_solution_arn
)

ranking_solution_version_arn = create_solution_version_response['solutionVersionArn']
print(json.dumps(create_solution_version_response, indent=2))

{
  "solutionVersionArn": "arn:aws:personalize:us-east-1:444208467160:solution/cpg-personalized-ranking/6cb4732f",
  "ResponseMetadata": {
    "RequestId": "93811eae-9bd2-454b-8f27-5a858613b172",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 05 Feb 2021 20:20:04 GMT",
      "x-amzn-requestid": "93811eae-9bd2-454b-8f27-5a858613b172",
      "content-length": "110",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


### Wait for Solution Versions to Complete

It can take 40-60 minutes for all solution versions to be created. During this process a model is being trained and tested with the data contained within your datasets. The duration of training jobs can increase based on the size of the dataset, training parameters and using AutoML vs. manually selecting a recipe. We submitted requests for all three solutions and versions at once so they are trained in parallel and then below we will wait for all three to finish.

While you are waiting for this process to complete you can learn more about solutions here: https://docs.aws.amazon.com/personalize/latest/dg/training-deploying-solutions.html

#### Wait for Related Products Solution Version to Have ACTIVE Status

In [14]:
%%time

soln_ver_arns = [ related_solution_version_arn, recommend_solution_version_arn, ranking_solution_version_arn ]

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    for soln_ver_arn in reversed(soln_ver_arns):
        soln_ver_response = personalize.describe_solution_version(
            solutionVersionArn = soln_ver_arn
        )
        status = soln_ver_response["solutionVersion"]["status"]

        if status == "ACTIVE":
            print(f'Solution version {soln_ver_arn} successfully completed')
            soln_ver_arns.remove(soln_ver_arn)
        elif status == "CREATE FAILED":
            print(f'Solution version {soln_ver_arn} failed')
            if soln_ver_response.get('failureReason'):
                print('   Reason: ' + soln_ver_response['failureReason'])
            soln_ver_arns.remove(soln_ver_arn)

    if len(soln_ver_arns) > 0:
        print('At least one solution version is still in progress')
        time.sleep(60)
    else:
        print("All solution versions have completed")
        break

At least one solution version is still in progress
At least one solution version is still in progress
At least one solution version is still in progress
At least one solution version is still in progress
At least one solution version is still in progress
At least one solution version is still in progress
At least one solution version is still in progress
At least one solution version is still in progress
At least one solution version is still in progress
At least one solution version is still in progress
At least one solution version is still in progress
At least one solution version is still in progress
At least one solution version is still in progress
At least one solution version is still in progress
At least one solution version is still in progress
At least one solution version is still in progress
At least one solution version is still in progress
At least one solution version is still in progress
At least one solution version is still in progress
At least one solution version i

### Evaluate Offline Metrics for Solution Versions

Amazon Personalize provides [offline metrics](https://docs.aws.amazon.com/personalize/latest/dg/working-with-training-metrics.html#working-with-training-metrics-metrics) that allow you to evaluate the performance of the solution version before you deploy the model in your application. Metrics can also be used to view the effects of modifying a Solution's hyperparameters or to compare the metrics between solutions that use the same training data but created with different recipes.

Let's retrieve the metrics for the solution versions we just created.

#### Related Products Metrics

In [16]:
get_solution_metrics_response = personalize.get_solution_metrics(
    solutionVersionArn = related_solution_version_arn
)

print(json.dumps(get_solution_metrics_response, indent=2))

{
  "solutionVersionArn": "arn:aws:personalize:us-east-1:444208467160:solution/cpg-related-products/e4228ca5",
  "metrics": {
    "coverage": 1.0,
    "mean_reciprocal_rank_at_25": 0.5728,
    "normalized_discounted_cumulative_gain_at_10": 0.4869,
    "normalized_discounted_cumulative_gain_at_25": 0.602,
    "normalized_discounted_cumulative_gain_at_5": 0.3931,
    "precision_at_10": 0.2702,
    "precision_at_25": 0.1711,
    "precision_at_5": 0.3321
  },
  "ResponseMetadata": {
    "RequestId": "1d8c9078-13d0-427b-bf7c-7aeebd01c73f",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 05 Feb 2021 21:16:21 GMT",
      "x-amzn-requestid": "1d8c9078-13d0-427b-bf7c-7aeebd01c73f",
      "content-length": "400",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Product Recommendations Metrics

In [17]:
get_solution_metrics_response = personalize.get_solution_metrics(
    solutionVersionArn = recommend_solution_version_arn
)

print(json.dumps(get_solution_metrics_response, indent=2))

{
  "solutionVersionArn": "arn:aws:personalize:us-east-1:444208467160:solution/cpg-product-personalization/3e4a7b67",
  "metrics": {
    "coverage": 1.0,
    "mean_reciprocal_rank_at_25": 0.73,
    "normalized_discounted_cumulative_gain_at_10": 0.6346,
    "normalized_discounted_cumulative_gain_at_25": 0.7053,
    "normalized_discounted_cumulative_gain_at_5": 0.5452,
    "precision_at_10": 0.2993,
    "precision_at_25": 0.1575,
    "precision_at_5": 0.4153
  },
  "ResponseMetadata": {
    "RequestId": "49925184-7142-4ccf-8680-af4e02db273b",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 05 Feb 2021 21:16:24 GMT",
      "x-amzn-requestid": "49925184-7142-4ccf-8680-af4e02db273b",
      "content-length": "406",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Personalized Ranking Metrics

In [18]:
get_solution_metrics_response = personalize.get_solution_metrics(
    solutionVersionArn = ranking_solution_version_arn
)

print(json.dumps(get_solution_metrics_response, indent=2))

{
  "solutionVersionArn": "arn:aws:personalize:us-east-1:444208467160:solution/cpg-personalized-ranking/6cb4732f",
  "metrics": {
    "coverage": 0.9825,
    "mean_reciprocal_rank_at_25": 0.7147,
    "normalized_discounted_cumulative_gain_at_10": 0.5892,
    "normalized_discounted_cumulative_gain_at_25": 0.6895,
    "normalized_discounted_cumulative_gain_at_5": 0.507,
    "precision_at_10": 0.3047,
    "precision_at_25": 0.1775,
    "precision_at_5": 0.4176
  },
  "ResponseMetadata": {
    "RequestId": "fa917e91-c6f9-40a9-9915-2aecd9c5f655",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 05 Feb 2021 21:16:27 GMT",
      "x-amzn-requestid": "fa917e91-c6f9-40a9-9915-2aecd9c5f655",
      "content-length": "407",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


## Create Campaigns

Once we're satisfied with our solution versions, we need to create Campaigns for each solution version. When creating a campaign you specify the minimum transactions per second (`minProvisionedTPS`) that you expect to make against the service for this campaign. Personalize will automatically scale the inference endpoint up and down for the campaign to match demand but will never scale below `minProvisionedTPS`.

Let's create campaigns for our three solution versions with each set at `minProvisionedTPS` of 1.

#### Create Related Products Campaign

In [19]:
create_campaign_response = personalize.create_campaign(
    name = "cpg-related-products",
    solutionVersionArn = related_solution_version_arn,
    minProvisionedTPS = 1
)

related_campaign_arn = create_campaign_response['campaignArn']
print(json.dumps(create_campaign_response, indent=2))

{
  "campaignArn": "arn:aws:personalize:us-east-1:444208467160:campaign/cpg-related-products",
  "ResponseMetadata": {
    "RequestId": "d5fb391f-fe6e-4567-aae4-2f15c0609d25",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 05 Feb 2021 21:16:31 GMT",
      "x-amzn-requestid": "d5fb391f-fe6e-4567-aae4-2f15c0609d25",
      "content-length": "90",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Create Product Recommendation Campaign

In [20]:
create_campaign_response = personalize.create_campaign(
    name = "cpg-product-personalization",
    solutionVersionArn = recommend_solution_version_arn,
    minProvisionedTPS = 1
)

recommend_campaign_arn = create_campaign_response['campaignArn']
print(json.dumps(create_campaign_response, indent=2))

{
  "campaignArn": "arn:aws:personalize:us-east-1:444208467160:campaign/cpg-product-personalization",
  "ResponseMetadata": {
    "RequestId": "08c54655-c22f-4fe3-8a1e-758d5a80cb2b",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 05 Feb 2021 21:16:33 GMT",
      "x-amzn-requestid": "08c54655-c22f-4fe3-8a1e-758d5a80cb2b",
      "content-length": "97",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Create Personalized Ranking Campaign

In [21]:
create_campaign_response = personalize.create_campaign(
    name = "cpg-personalized-ranking",
    solutionVersionArn = ranking_solution_version_arn,
    minProvisionedTPS = 1
)

ranking_campaign_arn = create_campaign_response['campaignArn']
print(json.dumps(create_campaign_response, indent=2))

{
  "campaignArn": "arn:aws:personalize:us-east-1:444208467160:campaign/cpg-personalized-ranking",
  "ResponseMetadata": {
    "RequestId": "0ca5b454-0aac-4273-ac26-0f4ca59c878d",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Fri, 05 Feb 2021 21:16:35 GMT",
      "x-amzn-requestid": "0ca5b454-0aac-4273-ac26-0f4ca59c878d",
      "content-length": "94",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Wait for Related Products Campaign to Have ACTIVE Status

It can take 20-30 minutes for the campaigns to be fully created. 

While you are waiting for this to complete you can learn more about campaigns here: https://docs.aws.amazon.com/personalize/latest/dg/campaigns.html

In [22]:
%%time

campaign_arns = [ related_campaign_arn, recommend_campaign_arn, ranking_campaign_arn ]

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    for campaign_arn in reversed(campaign_arns):
        campaign_response = personalize.describe_campaign(
            campaignArn = campaign_arn
        )
        status = campaign_response["campaign"]["status"]

        if status == "ACTIVE":
            print(f'Campaign {campaign_arn} successfully completed')
            campaign_arns.remove(campaign_arn)
        elif status == "CREATE FAILED":
            print(f'Campaign {campaign_arn} failed')
            if campaign_response.get('failureReason'):
                print('   Reason: ' + campaign_response['failureReason'])
            campaign_arns.remove(campaign_arn)

    if len(campaign_arns) > 0:
        print('At least one campaign is still in progress')
        time.sleep(60)
    else:
        print("All campaigns have completed")
        break

At least one campaign is still in progress
At least one campaign is still in progress
At least one campaign is still in progress
At least one campaign is still in progress
At least one campaign is still in progress
At least one campaign is still in progress
At least one campaign is still in progress
At least one campaign is still in progress
At least one campaign is still in progress
Campaign arn:aws:personalize:us-east-1:444208467160:campaign/cpg-personalized-ranking successfully completed
Campaign arn:aws:personalize:us-east-1:444208467160:campaign/cpg-product-personalization successfully completed
Campaign arn:aws:personalize:us-east-1:444208467160:campaign/cpg-related-products successfully completed
All campaigns have completed
CPU times: user 81.6 ms, sys: 5.83 ms, total: 87.4 ms
Wall time: 9min 1s


### Congratulations you finished the training layer notebook

Now, lets store all the values needed to continue on the next notebook.

In [23]:
%store dataset_group_arn
%store items_dataset_arn
%store users_dataset_arn
%store interactions_dataset_arn
%store role_arn
%store users_dataset_import_job_arn
%store interactions_dataset_import_job_arn
%store items_dataset_import_job_arn
%store related_campaign_arn
%store recommend_campaign_arn
%store ranking_campaign_arn

Stored 'dataset_group_arn' (str)
Stored 'items_dataset_arn' (str)
Stored 'users_dataset_arn' (str)
Stored 'interactions_dataset_arn' (str)
Stored 'role_arn' (str)
Stored 'users_dataset_import_job_arn' (str)
Stored 'interactions_dataset_import_job_arn' (str)
Stored 'items_dataset_import_job_arn' (str)
Stored 'related_campaign_arn' (str)
Stored 'recommend_campaign_arn' (str)
Stored 'ranking_campaign_arn' (str)
