# Building E-Commerce Recommender


This notebook we will walk you through the steps to build a Domain dataset group and a recommender that returns product recommendations based on data generated for our fictitious retail store data set. The goal is to recommend products that are relevant based on a particular user.

### Importing Libraries

In [10]:
# Imports
import boto3
import json
import numpy as np
import pandas as pd
import time
import datetime

Validate that the environment can communicate successfully with Amazon Personalize, the lines below do just that.



In [16]:
# Configure the SDK to Personalize:
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

### Specify an S3 Bucket and Data Output Location

Amazon Personalize will need an S3 bucket to act as the source of your data. The code bellow will create a bucket with a unique bucket_name.
The Amazon S3 bucket needs to be in the same region as the Amazon Personalize resources. Simply define the region as a string below.

In [2]:
region = "ap-southeast-1"  #Specify the region where your bucket will be domiciled

s3 = boto3.client('s3')
account_id = boto3.client('sts').get_caller_identity().get('Account')
bucket_name = 'mkawtharani-datasets-bucket' #Specify your bucket name
print('bucket_name:', bucket_name)

try: 
    if region == "us-east-1":
        s3.create_bucket(Bucket=bucket_name)
    else:
        s3.create_bucket(
            Bucket=bucket_name,
            CreateBucketConfiguration={'LocationConstraint': region}
            )
except:
    print("Bucket already exists. Using bucket", bucket_name)

bucket_name: mkawtharani-datasets-bucket
Bucket already exists. Using bucket mkawtharani-datasets-bucket


### Download and Explore the Dataset

First we need to download the data (training data).<br> 
In this tutorial we'll use the Purchase history from a retail store dataset. <br>
The dataset contains the user id,items id,the interaction between customers and items and the time this interaction took place(Timestamp).

In [3]:
!aws s3 cp s3://retail-demo-store-us-east-1/csvs/items.csv .
!aws s3 cp s3://retail-demo-store-us-east-1/csvs/interactions.csv .

download: s3://retail-demo-store-us-east-1/csvs/items.csv to ./items.csv
download: s3://retail-demo-store-us-east-1/csvs/interactions.csv to ./interactions.csv


## Data Exploration 

In [4]:
df = pd.read_csv('./interactions.csv')
df

Unnamed: 0,ITEM_ID,USER_ID,EVENT_TYPE,TIMESTAMP,DISCOUNT
0,b93b7b15-9bb3-407c-b80b-517e7c45e090,3156,ProductViewed,1591803788,No
1,b93b7b15-9bb3-407c-b80b-517e7c45e090,3156,ProductViewed,1591803788,No
2,8ebed2f4-c0c0-4dc7-9875-1836502f2eb3,332,ProductViewed,1591803812,Yes
3,8ebed2f4-c0c0-4dc7-9875-1836502f2eb3,332,ProductViewed,1591803812,Yes
4,6ae04681-0217-46c7-a34c-a3e74c96a1fe,3981,ProductViewed,1591803830,Yes
...,...,...,...,...,...
675000,ccaec8f5-b33d-4676-a1b9-bc4b96951ee4,2090,ProductAdded,1598204672,Yes
675001,ccaec8f5-b33d-4676-a1b9-bc4b96951ee4,2090,CartViewed,1598204675,Yes
675002,ccaec8f5-b33d-4676-a1b9-bc4b96951ee4,2090,CheckoutStarted,1598204677,Yes
675003,ccaec8f5-b33d-4676-a1b9-bc4b96951ee4,2090,OrderCompleted,1598204680,Yes


In [5]:
df.EVENT_TYPE.value_counts()

ProductViewed      581901
ProductAdded        46552
CartViewed          29095
CheckoutStarted     11638
OrderCompleted       5819
Name: EVENT_TYPE, dtype: int64

The ECOMMERCE recommenders require to provide specific EVENT_TYPE values in order to understand the context of an interaction, therefore we are going to modify our interactions EVENTYPE column.

In [6]:
def convert_event_type(event_type_in_some_format):
    if(event_type_in_some_format == "ProductViewed"):
        return "View"
    if(event_type_in_some_format == "OrderCompleted"):
        return "Purchase"
    else:
        return event_type_in_some_format

df['EVENT_TYPE'] = df['EVENT_TYPE'].apply(convert_event_type)

In [8]:
df.EVENT_TYPE.value_counts()

View               581901
ProductAdded        46552
CartViewed          29095
CheckoutStarted     11638
Purchase             5819
Name: EVENT_TYPE, dtype: int64

In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 675005 entries, 0 to 675004
Data columns (total 5 columns):
 #   Column      Non-Null Count   Dtype 
---  ------      --------------   ----- 
 0   ITEM_ID     675005 non-null  object
 1   USER_ID     675005 non-null  int64 
 2   EVENT_TYPE  675005 non-null  object
 3   TIMESTAMP   675005 non-null  int64 
 4   DISCOUNT    675005 non-null  object
dtypes: int64(2), object(3)
memory usage: 25.7+ MB


To be compatible with an Amazon Personalize interactions schema, this dataset requires column headings compatible with Amazon Personalize default column names (read about column names [here](https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html)
 )

## Data Processing

### Drop Columns
Some columns in this dataset would not add value to our model and as such need to be dropped from this dataset. Columns such as discount.

In [11]:
test=df.drop(columns=['DISCOUNT'])
df=test
df.sample(10)

Unnamed: 0,ITEM_ID,USER_ID,EVENT_TYPE,TIMESTAMP
537102,ece250d1-7d4d-403b-8941-b90441f42705,4301,View,1596896989
377010,e1c0c67b-4e01-47bd-bad9-db36ab87c4f7,522,View,1595378894
172134,7aba3399-b868-4dad-a2fd-05c170aa4232,1653,View,1593436116
634497,a8c62835-0fb8-45a5-9ebd-f324566047dd,3077,View,1597820584
468672,bb80331b-7da8-4edc-a797-e1d462c14e82,1545,View,1596248095
408107,c7a27dcc-dbed-4953-b4d4-4d1c3e0a5f40,285,View,1595673769
171554,ffbf120a-0b8e-41dd-bbe9-5b2a87b0c8c5,1924,View,1593430597
221885,f995ec8d-237c-4513-8bfa-9aee210f097c,415,View,1593907865
459439,8a94535e-4638-43ed-ab9a-2ac90849a98b,2666,View,1596160548
538686,c86e8896-b9e0-4121-a3e4-25dfb435e2f1,4931,View,1596912034


In the cells below, we will write our cleaned data to a file named "final_training_data.csv"

In [12]:
df.to_csv("cleaned_training_data.csv")

### Upload to S3
Now that our training data is ready for Amazon Personalize,the next step is to upload it to the s3 bucket created earlier


In [13]:
interactions_file_path = 'cleaned_training_data.csv'
boto3.Session().resource('s3').Bucket(bucket_name).Object(interactions_file_path).upload_file(interactions_file_path)
interactions_s3DataPath = "s3://"+bucket_name+"/"+interactions_file_path

### Configure an S3 bucket and an IAM role


So far, we have downloaded, manipulated, and saved the data onto the Amazon EBS instance attached to instance running this Jupyter notebook. However, Amazon Personalize will need an S3 bucket to act as the source of your data, as well as IAM roles for accessing that bucket. Let's set all of that up.

#### Set the S3 bucket policy
Amazon Personalize needs to be able to read the contents of your S3 bucket. So we will add a bucket policy which allows that.<br>
Note: We are going to make sure the role we are using to run the code in this notebook has the necessary permissions to modify the S3 bucket policy.

In [14]:
s3 = boto3.client("s3")

policy = {
    "Version": "2012-10-17",
    "Id": "PersonalizeS3BucketAccessPolicy",
    "Statement": [
        {
            "Sid": "PersonalizeS3BucketAccessPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "personalize.amazonaws.com"
            },
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::{}".format(bucket_name),
                "arn:aws:s3:::{}/*".format(bucket_name)
            ]
        }
    ]
}

s3.put_bucket_policy(Bucket=bucket_name, Policy=json.dumps(policy))

{'ResponseMetadata': {'RequestId': '8D0XXHV4VKGA9KTE',
  'HostId': 'zJu3cGiajqWADIK/Ef1e/ltqI1v3PzOVhK8RF53XFJJRWlpDrh5x6KulOmyR8YlTs4oAR/Cy3NM=',
  'HTTPStatusCode': 204,
  'HTTPHeaders': {'x-amz-id-2': 'zJu3cGiajqWADIK/Ef1e/ltqI1v3PzOVhK8RF53XFJJRWlpDrh5x6KulOmyR8YlTs4oAR/Cy3NM=',
   'x-amz-request-id': '8D0XXHV4VKGA9KTE',
   'date': 'Mon, 07 Feb 2022 09:15:14 GMT',
   'server': 'AmazonS3'},
  'RetryAttempts': 0}}

### Create and Wait for Dataset Group

The largest grouping in Personalize is a Dataset Group, this will isolate the data, event trackers, solutions, and campaigns.<br>
Grouping things together that share a common collection of data. Feel free to alter the name below.



### Create Dataset Group

In [18]:
response = personalize.create_dataset_group(
    name='personalize-ecomemerce-ds-group',
    domain='ECOMMERCE'
)

dataset_group_arn = response['datasetGroupArn']
print(json.dumps(response, indent=2))

{
  "datasetGroupArn": "arn:aws:personalize:ap-southeast-1:563781936732:dataset-group/personalize-ecomemerce-ds-group",
  "domain": "ECOMMERCE",
  "ResponseMetadata": {
    "RequestId": "56ac0cd7-fd88-41e2-ad0b-e947097f2bb0",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 07 Feb 2022 09:20:05 GMT",
      "x-amzn-requestid": "56ac0cd7-fd88-41e2-ad0b-e947097f2bb0",
      "content-length": "136",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


let's wait for Dataset Group to Have ACTIVE Status Before we can use the Dataset Group in any items below it must be active, execute the cell below and wait for it to show active.



In [19]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_group_response = personalize.describe_dataset_group(
        datasetGroupArn = dataset_group_arn
    )
    status = describe_dataset_group_response["datasetGroup"]["status"]
    print("DatasetGroup: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

DatasetGroup: ACTIVE


### Create Interactions Schema

A core component of how Personalize understands the data comes from the Schema that is defined below.<br>
This configuration tells the service how to digest the data provided via the CSV file. Note the columns and types align to what was in the file we have created above.



In [20]:
interactions_schema = schema = {
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "TIMESTAMP",
            "type": "long"
        },
        {
            "name": "EVENT_TYPE",
            "type": "string"
            
        }
    ],
    "version": "1.0"
}

create_schema_response = personalize.create_schema(
    name = "personalize-ecommerce-interatn_group",
    domain = "ECOMMERCE",
    schema = json.dumps(interactions_schema)
)

interaction_schema_arn = create_schema_response['schemaArn']
print(json.dumps(create_schema_response, indent=2))

{
  "schemaArn": "arn:aws:personalize:ap-southeast-1:563781936732:schema/personalize-ecommerce-interatn_group",
  "ResponseMetadata": {
    "RequestId": "acbe6c7b-e150-43d9-b06d-9c8c9f220a8c",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 07 Feb 2022 09:22:53 GMT",
      "x-amzn-requestid": "acbe6c7b-e150-43d9-b06d-9c8c9f220a8c",
      "content-length": "107",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


## Create Datasets

After the group, the next thing to create is the actual datasets.



### Create Interactions Dataset


In [21]:
dataset_type = "INTERACTIONS"

create_dataset_response = personalize.create_dataset(
    name = "personalize-ecommerce-demo-interactions",
    datasetType = dataset_type,
    datasetGroupArn = dataset_group_arn,
    schemaArn = interaction_schema_arn
)

interactions_dataset_arn = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))

{
  "datasetArn": "arn:aws:personalize:ap-southeast-1:563781936732:dataset/personalize-ecomemerce-ds-group/INTERACTIONS",
  "ResponseMetadata": {
    "RequestId": "36c43b6f-a629-410c-9007-df958e14ba39",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 07 Feb 2022 09:23:50 GMT",
      "x-amzn-requestid": "36c43b6f-a629-410c-9007-df958e14ba39",
      "content-length": "117",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


### Create Personalize Role

Also Amazon Personalize needs the ability to assume Roles in AWS in order to have the permissions to execute certain tasks, the lines below grant that.<br>
Note: Make sure the role we are using to run the code in this notebook has the necessary permissions to create a role

In [22]:
iam = boto3.client("iam")

role_name = "PersonalizeRoleEcommerceRecommender"
assume_role_policy_document = {
    "Version": "2012-10-17",
    "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "personalize.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
    ]
}

create_role_response = iam.create_role(
    RoleName = role_name,
    AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)
)

# AmazonPersonalizeFullAccess provides access to any S3 bucket with a name that includes "personalize" or "Personalize" 
# if you would like to use a bucket with a different name, please consider creating and attaching a new policy
# that provides read access to your bucket or attaching the AmazonS3ReadOnlyAccess policy to the role
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonPersonalizeFullAccess"
iam.attach_role_policy(
    RoleName = role_name,
    PolicyArn = policy_arn
)

# Now add S3 support
iam.attach_role_policy(
    PolicyArn='arn:aws:iam::aws:policy/AmazonS3FullAccess',
    RoleName=role_name
)
time.sleep(60) # wait for a minute to allow IAM role policy attachment to propagate

role_arn = create_role_response["Role"]["Arn"]
print(role_arn)

arn:aws:iam::563781936732:role/PersonalizeRoleEcommerceRecommender


### Import the data

Earlier we created the DatasetGroup and Dataset to house the information, now we will execute an import job that will load the data from S3 into Amazon Personalize for usage building our model.

#### Create Interactions Dataset Import Job

In [23]:
create_interactions_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "personalize-ecommerce-demo-interactions",
    datasetArn = interactions_dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket_name, interactions_file_path)
    },
    roleArn = role_arn
)

dataset_interactions_import_job_arn = create_interactions_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_interactions_dataset_import_job_response, indent=2))

{
  "datasetImportJobArn": "arn:aws:personalize:ap-southeast-1:563781936732:dataset-import-job/personalize-ecommerce-demo-interactions",
  "ResponseMetadata": {
    "RequestId": "a16a7ab2-82a8-4183-a00a-b887ae2110c6",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 07 Feb 2022 09:35:58 GMT",
      "x-amzn-requestid": "a16a7ab2-82a8-4183-a00a-b887ae2110c6",
      "content-length": "132",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


Wait for Dataset Import Job to Have ACTIVE Status It can take a while before the import job completes, please wait until the status is active.

In [24]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = dataset_interactions_import_job_arn
    )
    status = describe_dataset_import_job_response["datasetImportJob"]['status']
    print("DatasetImportJob: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: ACTIVE


## Choose a recommender use cases

Each domain has different use cases. When we create a recommender, we create it for a specific use case, and each use case has different requirements for getting recommendations.

In [25]:
available_recipes = personalize.list_recipes(domain='ECOMMERCE') # See a list of recommenders for the domain. 
print (available_recipes['recipes'])

[{'name': 'aws-ecomm-customers-who-viewed-x-also-viewed', 'recipeArn': 'arn:aws:personalize:::recipe/aws-ecomm-customers-who-viewed-x-also-viewed', 'status': 'ACTIVE', 'creationDateTime': datetime.datetime(2019, 6, 10, 0, 0, tzinfo=tzlocal()), 'lastUpdatedDateTime': datetime.datetime(2021, 11, 23, 15, 10, 33, 309000, tzinfo=tzlocal()), 'domain': 'ECOMMERCE'}, {'name': 'aws-ecomm-frequently-bought-together', 'recipeArn': 'arn:aws:personalize:::recipe/aws-ecomm-frequently-bought-together', 'status': 'ACTIVE', 'creationDateTime': datetime.datetime(2019, 6, 10, 0, 0, tzinfo=tzlocal()), 'lastUpdatedDateTime': datetime.datetime(2021, 11, 23, 15, 10, 33, 309000, tzinfo=tzlocal()), 'domain': 'ECOMMERCE'}, {'name': 'aws-ecomm-popular-items-by-purchases', 'recipeArn': 'arn:aws:personalize:::recipe/aws-ecomm-popular-items-by-purchases', 'status': 'ACTIVE', 'creationDateTime': datetime.datetime(2019, 6, 10, 0, 0, tzinfo=tzlocal()), 'lastUpdatedDateTime': datetime.datetime(2021, 11, 23, 15, 10, 33,

We are going to create a recommender of the type "Frequently Bought Together". This type of recommender offers recommendations for frequently bought together items based on historical user purchases.

In [26]:
create_recommender_response = personalize.create_recommender(
  name = 'frequently_bought_together',
  recipeArn = 'arn:aws:personalize:::recipe/aws-ecomm-frequently-bought-together',
  datasetGroupArn = dataset_group_arn
)
frequently_bought_together_arn = create_recommender_response["recommenderArn"]
print (json.dumps(create_recommender_response))

{"recommenderArn": "arn:aws:personalize:ap-southeast-1:563781936732:recommender/frequently_bought_together_demo", "ResponseMetadata": {"RequestId": "b589524e-cdd3-4a6b-847d-22c3fe6cbc55", "HTTPStatusCode": 200, "HTTPHeaders": {"content-type": "application/x-amz-json-1.1", "date": "Mon, 07 Feb 2022 09:41:34 GMT", "x-amzn-requestid": "b589524e-cdd3-4a6b-847d-22c3fe6cbc55", "content-length": "112", "connection": "keep-alive"}, "RetryAttempts": 0}}


We are going to create a second recommender of the type "Recommended For You". This type of recommender offers personalized recommendations for items based on a user that you specify. With this use case, Amazon Personalize automatically filters items the user purchased based on the userId that you specify and 'Purchase' events.

In [27]:
create_recommender_response = personalize.create_recommender(
  name = 'recommended_for_you',
  recipeArn = 'arn:aws:personalize:::recipe/aws-ecomm-recommended-for-you',
  datasetGroupArn = dataset_group_arn
)
recommended_for_you_arn = create_recommender_response["recommenderArn"]
print (json.dumps(create_recommender_response))

{"recommenderArn": "arn:aws:personalize:ap-southeast-1:563781936732:recommender/recommended_for_you_demo", "ResponseMetadata": {"RequestId": "edba0783-8b8a-470c-9012-4f5da206eb85", "HTTPStatusCode": 200, "HTTPHeaders": {"content-type": "application/x-amz-json-1.1", "date": "Mon, 07 Feb 2022 09:41:37 GMT", "x-amzn-requestid": "edba0783-8b8a-470c-9012-4f5da206eb85", "content-length": "105", "connection": "keep-alive"}, "RetryAttempts": 0}}


We wait until the recomenders have finished creating and have status ACTIVE. We check periodically on the status of the recommender



In [28]:
max_time = time.time() + 10*60*60 # 10 hours

while time.time() < max_time:

    version_response = personalize.describe_recommender(
        recommenderArn = frequently_bought_together_arn
    )
    status = version_response["recommender"]["status"]

    if status == "ACTIVE":
        print("Build succeeded for {}".format(frequently_bought_together_arn))
        
    elif status == "CREATE FAILED":
        print("Build failed for {}".format(frequently_bought_together_arn))

    if status == "ACTIVE":
        break
    else:
        print("The solution build is still in progress")
        
    time.sleep(60)
    
while time.time() < max_time:

    version_response = personalize.describe_recommender(
        recommenderArn = recommended_for_you_arn
    )
    status = version_response["recommender"]["status"]

    if status == "ACTIVE":
        print("Build succeeded for {}".format(recommended_for_you_arn))
        
    elif status == "CREATE FAILED":
        print("Build failed for {}".format(recommended_for_you_arn))

    if status == "ACTIVE":
        break
    else:
        print("The solution build is still in progress")
        
    time.sleep(120)

The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress
The solution build is still in progress


## Getting recommendations with a recommender


Now that the recommenders have been trained, let's have a look at the recommendations we can get for our users!

In [29]:
# reading the original data in order to have a dataframe that has both movie_ids 
# and the corresponding titles to make out recommendations easier to read.
items_df = pd.read_csv('./items.csv')
items_df.sample(10)

Unnamed: 0,ITEM_ID,CATEGORY,STYLE,DESCRIPTION
957,f9031c60-0201-4e0a-a146-8fbecf6b9e9a,furniture,chairs,Cool dark khaki armchair for any room
1226,e1580149-87c5-4ff4-bb03-482465b67b16,groceries,bakery,Cookie made fresh daily in our kitchens
271,9b60b9e7-d6ec-4401-a773-7d037f858418,apparel,jacket,Warm all seasons casual jacket for men
1457,d8bf902d-3dce-453b-975f-cf81d6e4b1a6,homedecor,cushion,This light steel blue square cushion is a must...
1491,6d9e6bd1-3e50-4311-bd58-7913dcdbce43,homedecor,decorative,This wiker basket is a must-have for your home...
737,61697f16-9198-4d1a-89a5-e43386c2b759,floral,plant,Drought-resistant indoor plant delivered fresh...
868,53c51962-3546-4c9c-b88e-fe287c94782f,footwear,formal,Pair of black formal shoes for men
1892,f7b90f15-74e2-4dd0-a003-93e7965e2266,instruments,strings,This acoustic bass will delight the most deman...
1555,304cd58e-1e7e-4151-9716-1bb222cb724e,homedecor,lighting,This floor lamp is a must-have for the lightin...
1570,7b56f6cc-5f6d-41a1-b98c-a128aaff24a7,homedecor,lighting,This hanging lamp is a must-have for the light...


In [30]:
def get_item_by_id(item_id, item_df):
    """
    This takes in an item_id from a recommendation in string format,
    converts it to an int, and then does a lookup in a default or specified
    dataframe and returns the item description.
    
    A really broad try/except clause was added in case anything goes wrong.
    
    Feel free to add more debugging or filtering here to improve results if
    you hit an error.
    """
    try:
        return items_df.loc[items_df["ITEM_ID"]==str(item_id)]['DESCRIPTION'].values[0]
    except:
        print (item_id)
        return "Error obtaining item description"

Let us get some recommendations:

In [31]:
# use a random valid id for a quick sanity check, modify the line of code bellow to a valid id in your dataset
get_item_by_id("c72257d4-430b-4eb7-9de3-28396e593381", items_df)

'Your dog will love this accessory'

In [32]:
# First pick a user
test_user_id = "777"

# Select a random item
test_item_id = "8fbe091c-f73c-4727-8fe7-d27eabd17bea" # a random item: 8fbe091c-f73c-4727-8fe7-d27eabd17bea

# Get recommendations for the user for this item
get_recommendations_response = personalize_runtime.get_recommendations(
    recommenderArn = frequently_bought_together_arn,
    itemId = test_item_id,
    numResults = 10
)

# Build a new dataframe for the recommendations
item_list = get_recommendations_response['itemList']
recommendation_list = []
for item in item_list:
    item = get_item_by_id(item['itemId'], items_df)
    recommendation_list.append(item)

user_recommendations_df = pd.DataFrame(recommendation_list, columns = [get_item_by_id(test_item_id, items_df)])

pd.options.display.max_rows =10
display(user_recommendations_df)

Unnamed: 0,This video camera is perfect for capturing those special moments
0,Slice of delicious pepperoni pizza
1,The best nachos north of Mexico
2,"Lentils with potatos, carrots and spices; a jo..."
3,Juicy prawns with spicy sauce and rice
4,16oz fountain soda always hits the spot
5,"Exotic and healthy, grape-vine leaves stuffed ..."
6,The taste of summer and energy
7,Full of omega oils and taste
8,A delightful blend of Californian and Oriental...
9,Made with ginseng and joy


Get recommendations from the recommender returning "Top picks for you":

In [33]:
# First pick a user
test_user_id = "777" 

# Get recommendations for the user
get_recommendations_response = personalize_runtime.get_recommendations(
    recommenderArn = recommended_for_you_arn,
    userId = test_user_id,
    numResults = 20
)

# Build a new dataframe for the recommendations
item_list = get_recommendations_response['itemList']
recommendation_list = []
for item in item_list:
    item = get_item_by_id(item['itemId'], items_df)
    recommendation_list.append(item)


user_recommendations_df = pd.DataFrame(recommendation_list, columns = [test_user_id])

pd.options.display.max_rows =20
display(user_recommendations_df)

Unnamed: 0,777
0,This rattan basket is a must-have for your hom...
1,This ceramic vase is a must-have for your home...
2,This decorative candle is a must-have for the ...
3,This decorative candle is a must-have for the ...
4,This accurate pair of headphones is unrivaled ...
5,This desktop computer is a boon for productivity
6,Camera for amateurs and professonals
7,This rattan basket is a must-have for your hom...
8,This fishing reel is sure to bring success
9,These high definition speakers are incomparabl...


In [34]:
item_list

[{'itemId': '2a68a810-b819-4d5c-9c13-c43be2eba3c4'},
 {'itemId': '91b4d3d6-9880-40f9-a8f3-e732c91dbe3c'},
 {'itemId': '9f50d5db-5054-43e8-97b0-80d5036a6bf7'},
 {'itemId': '833214f2-cdbc-4fbd-a25f-6f33357b503a'},
 {'itemId': '23aa70ab-959f-4835-a114-c30ff5e4f974'},
 {'itemId': '5156955f-dda2-4e19-831e-752c92bd8f85'},
 {'itemId': '0cb3ab29-b939-4732-b8ac-72ec61a4f950'},
 {'itemId': '771a0ced-1a1c-45d4-b94c-3d7b2188e48b'},
 {'itemId': 'cc42e0f4-abaf-445b-b843-54884c4f6845'},
 {'itemId': '095c73c4-fa7d-4910-ac92-e7289058d9c6'},
 {'itemId': '7160b264-e3ed-4ac3-9dd7-2c537b00e5ed'},
 {'itemId': '4b86c44c-547e-4e54-bd16-b96d91875e4a'},
 {'itemId': '48c9f12f-a9c3-4c71-a537-1c478a0e16e0'},
 {'itemId': 'e780c3e7-9c9c-4b54-87ad-8bde1b837dd8'},
 {'itemId': 'b803aff7-6e4b-4beb-8d1d-dc7fe609274d'},
 {'itemId': '322c0e7a-4ab8-485d-b3c4-234a5962562d'},
 {'itemId': 'ea84753c-3c7c-4ab4-a60b-b6fa7f191c25'},
 {'itemId': '3b145528-d5fc-4c2a-b2a5-e119128caa5f'},
 {'itemId': '52f04147-c46e-452c-8e26-21c089cea

## Review

Using the codes above we have successfully trained a deep learning model to generate item recommendations based on prior user behavior. We have created two recommenders for two foundational use cases. Going forward, we can adapt this code to create other recommenders.

## Cleanup

After building the model we may want to delete your campaign, solutions, and datasets. The following cells will ensure that we have successfully cleaned up all of the resources we created in this lab.



In [None]:
# delete recommenders 
personalize.delete_recommender(recommenderArn=frequently_bought_together_arn)
personalize.delete_recommender(recommenderArn=recommended_for_you_arn)
time.sleep(180)

In [54]:
# delete the interaction dataset
personalize.delete_dataset(datasetArn=interactions_dataset_arn)
time.sleep(60)

In [55]:
# delete dataset group 
personalize.delete_dataset_group(datasetGroupArn=dataset_group_arn)
time.sleep(60)