# Configuring Amazon Personalize resources

**Image:Data Science 3.0, Kernel: Python3, Instance: ml.t3.medium 2vCPU + 4GiB**

## How to use the Notebook

The code is broken up into cells like the one below. There's a triangular Run button at the top of this page that you can click to execute each cell and move onto the next, or you can press `Shift` + `Enter` while in the cell to execute it and move onto the next one.

As a cell is executing you'll notice a line to the side showcase an `*` while the cell is running or it will update to a number to indicate the last cell that completed executing after it has finished exectuting all the code within a cell.

Simply follow the instructions below and execute the cells to get started.

## Introduction to Amazon Personalize

[Amazon Personalize](https://aws.amazon.com/personalize/) makes it easy for customers to develop applications with a wide array of personalization use cases, including real time product recommendations and customized direct marketing. Amazon Personalize brings the same machine learning technology used by Amazon.com to everyone for use in their applications – with no machine learning experience required. Amazon Personalize customers pay for what they use, with no minimum fees or upfront commitment.

Amazon Personalize uses advanced algorithms and machine learning techniques to analyze data from various sources, including user interactions, item metadata, and user profiles, to generate personalized recommendations for each user.

The core of Amazon Personalize is built around three main datasets:

1. **Items Dataset**: This dataset contains information about the items or products that you want to recommend. It can include attributes such as product descriptions, categories, prices, and any other relevant metadata.

2. **Interactions Dataset**: This dataset records the interactions between users and items. It typically includes information such as the user ID, item ID, timestamp, and any additional contextual data like ratings or purchase quantities.

3. **Users Dataset**: This optional dataset contains demographic information about users, such as age, gender, location, or any other relevant user attributes that can be used to enhance the personalization capabilities.

Amazon Personalize ingests these datasets and applies machine learning algorithms to identify patterns and correlations within the data. It then builds personalized recommendation models tailored to each user's preferences and behavior. These models can be used to generate real-time recommendations for various use cases, such as product recommendations on e-commerce websites, content recommendations on streaming platforms, or personalized marketing campaigns.

By leveraging the power of machine learning and the flexibility of Amazon Personalize, businesses can deliver highly relevant and personalized experiences to their customers, increasing engagement, loyalty, and ultimately driving business growth.

You can start using Amazon Personalize with a simple three step process, which only takes a few clicks in the AWS console, or a set of simple API calls. 

First, point Amazon Personalize to user data, catalog data, and activity stream of views, clicks, purchases, etc. in Amazon S3 or upload using a simple API call. 

Second, with a single click in the console or an API call, train a private recommendation model for your data. 

Third, retrieve personalized recommendations for any user by creating a recommender, and using the GetRecommendations API.

If you are not familiar with Amazon Personalize, you can learn more about the service on by looking at [Github Sample Notebooks](https://github.com/aws-samples/amazon-personalize-samples) and [Product Documentation](https://docs.aws.amazon.com/personalize/latest/dg/what-is-personalize.html).

## Imports
Python ships with a broad collection of libraries and we need to import those as well as the ones installed to help us like [boto3](https://aws.amazon.com/sdk-for-python/) (AWS SDK for python) and [Pandas](https://pandas.pydata.org/)/[Numpy](https://numpy.org/) which are core data science tools.

In [1]:
# Get the latest version of botocore to ensure we have the latest features in the SDK
import sys
!{sys.executable} -m pip install --upgrade pip
!{sys.executable} -m pip install --upgrade --no-deps --force-reinstall botocore

Collecting pip
  Downloading pip-24.1.1-py3-none-any.whl.metadata (3.6 kB)
Downloading pip-24.1.1-py3-none-any.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m43.0 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.3.2
    Uninstalling pip-23.3.2:
      Successfully uninstalled pip-23.3.2
Successfully installed pip-24.1.1
Collecting botocore
  Downloading botocore-1.34.139-py3-none-any.whl.metadata (5.7 kB)
Downloading botocore-1.34.139-py3-none-any.whl (12.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.4/12.4 MB[0m [31m30.3 MB/s[0m eta [36m0:00:00[0m:00:01[0m0:01[0m
[?25hInstalling collected packages: botocore
  Attempting uninstall: botocore
    Found existing installation: botocore 1.34.101
    Uninstalling botocore-1.34.101:
      Successfully uninstalled botocore-1.34.101
Successfully installed botocore-1.34

In [2]:
# Imports
import boto3
import json
import numpy as np
import pandas as pd
import time
import datetime

In [3]:
# Configure the SDK to Personalize:
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

## Creating Amazon Personalize Resources and Importing data

### Get the account id and region

In [4]:
account_id = boto3.client('sts').get_caller_identity().get('Account')
print("account id:", account_id)

with open('/opt/ml/metadata/resource-metadata.json') as notebook_info:
    data = json.load(notebook_info)
    resource_arn = data['ResourceArn']
    region = resource_arn.split(':')[3]
print("region:", region)

account id: 058264209953
region: us-east-1


### IAM role
Amazon Personalize needs the ability to assume roles in AWS in order to have the permissions to execute certain tasks.

We will be using the S3 bucket that you created when you deployed the Cloud Formation using [personalizeCFRecommenderAgent.yaml](./personalizeCFRecommenderAgent.yaml).

The Assume Role Policy document needs to have the following format:

```python
assume_role_policy_document = {
    "Version": "2012-10-17",
    "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "personalize.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
    ]
}
````

The S3 Access Policy document needs to have the following format:

```python
s3_access_policy_document = {
    "Version": "2012-10-17",
    "Statement": {
            "Sid" : "myStatement" ,
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::{}".format(bucket_name),
                "arn:aws:s3:::{}/*".format(bucket_name)
            ],
            "Action": "s3:*"
        }
}
```
Let's get the ARN of the role we created via the Cloud Formation.

In [5]:
# Configure the SDK to SSM:
ssm = boto3.client('ssm')

In [6]:
role_arn_info = ssm.get_parameter(Name='/cloudformation/personalize-iam-role-arn', WithDecryption=False)
role_arn = role_arn_info['Parameter']['Value']

In [7]:
# get the role name
role_name = role_arn.split('/')[1]
role_name

'recommenderAgent-PersonalizeIamRole-cP45UOEDURMe'

### S3 bucket

So far, we have downloaded, manipulated, and saved the data onto the Amazon EBS instance attached to instance running this Jupyter notebook.

By default, the Amazon Personalize service does not have permission to access the data we uploaded into the S3 bucket in our account. In order to grant access to the Amazon Personalize service to read our CSVs, you need to set a Bucket Policy and create an IAM role that the Amazon Personalize service will assume.

Use the metadata stored on the instance underlying this Amazon SageMaker notebook, to determine the region it is operating in. If you are using a Jupyter notebook outside of Amazon SageMaker, simply define the region as a string below. The Amazon S3 bucket needs to be in the same region as the Amazon Personalize resources we have been creating so far.

We will be using the S3 bucket that you created when you deployed the Cloud Formation using personalizeSimpleCFMarketingContentGen.yml.

This bucket is created with the policy:

```python
policy = {
    "Version": "2012-10-17",
    "Id": "PersonalizeS3BucketAccessPolicy",
    "Statement": [
        {
            "Sid": "PersonalizeS3BucketAccessPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "personalize.amazonaws.com"
            },
            "Action": [
                "s3:*Object",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::{}".format(bucket_name),
                "arn:aws:s3:::{}/*".format(bucket_name)
            ]
        }
    ]
}
```

This S3 bucket policy allows Amazon Personalize to be able to read the contents of your S3 bucket.



In [8]:
# get the name of the bucket created by the Cloud Formation
personalizes3bucket = ssm.get_parameter(Name='/cloudformation/personalize-s3-bucket', WithDecryption=False)
bucket_name = personalizes3bucket['Parameter']['Value']

print('bucket_name:', bucket_name)

bucket_name: personalizesamples-sgarcesv-piqebvlkj


Let's have a look at the S3 bucket policy.

In [9]:
s3 = boto3.client('s3')

try:
    bucket_current_policy = s3.get_bucket_policy(Bucket=bucket_name)['Policy']
    print ("Bucket current policy")
    print(json.dumps(json.loads(bucket_current_policy), indent=4))
    
except Exception as e: 
    raise(e)

Bucket current policy
{
    "Version": "2012-10-17",
    "Id": "PersonalizeS3BucketAccessPolicy",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "personalize.amazonaws.com"
            },
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::personalizesamples-sgarcesv-piqebvlkj",
                "arn:aws:s3:::personalizesamples-sgarcesv-piqebvlkj/*"
            ]
        }
    ]
}


## Download, Prepare, and Upload Training Data

We generated the synthetic data based on the code in the [Retail Demo Store project](https://github.com/aws-samples/retail-demo-store). Follow the link to learn more about the data and potential uses.

First we need to download the data (training data). In this tutorial we'll use the Purchase history from a retail store  dataset. The dataset contains the user_id, item_id, the interactions between customers and items and the time this interaction took place (Timestamp).

### Download and Explore the Interactions Dataset

In [10]:
!wget https://code.retaildemostore.retail.aws.dev/csvs/interactions.csv -O ./interactions.csv --no-check-certificate

--2024-07-03 23:45:53--  https://code.retaildemostore.retail.aws.dev/csvs/interactions.csv
Resolving code.retaildemostore.retail.aws.dev (code.retaildemostore.retail.aws.dev)... 18.165.83.129, 18.165.83.120, 18.165.83.69, ...
Connecting to code.retaildemostore.retail.aws.dev (code.retaildemostore.retail.aws.dev)|18.165.83.129|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 42447843 (40M) [text/csv]
Saving to: ‘./interactions.csv’


2024-07-03 23:45:55 (31.0 MB/s) - ‘./interactions.csv’ saved [42447843/42447843]



The dataset has been successfully downloaded as interactions.csv

Lets learn more about the dataset by viewing its charateristics

In [11]:
df = pd.read_csv('./interactions.csv')
df

Unnamed: 0,ITEM_ID,USER_ID,EVENT_TYPE,TIMESTAMP,DISCOUNT
0,b93b7b15-9bb3-407c-b80b-517e7c45e090,3156,View,1709970535,No
1,b93b7b15-9bb3-407c-b80b-517e7c45e090,3156,View,1709970540,No
2,3946f4c8-1b5b-4161-b794-70b33affb671,2122,View,1709970558,No
3,3946f4c8-1b5b-4161-b794-70b33affb671,2122,View,1709970568,No
4,e9daa7cd-8230-4544-9f07-86fa84d7c3c1,2485,View,1709970578,No
...,...,...,...,...,...
674999,8770aa6d-44e7-4219-9ab4-71b3fd828f36,4406,View,1716371426,Yes
675000,8770aa6d-44e7-4219-9ab4-71b3fd828f36,4406,AddToCart,1716371430,Yes
675001,8770aa6d-44e7-4219-9ab4-71b3fd828f36,4406,ViewCart,1716371435,Yes
675002,8770aa6d-44e7-4219-9ab4-71b3fd828f36,4406,StartCheckout,1716371438,Yes


In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 675004 entries, 0 to 675003
Data columns (total 5 columns):
 #   Column      Non-Null Count   Dtype 
---  ------      --------------   ----- 
 0   ITEM_ID     675004 non-null  object
 1   USER_ID     675004 non-null  int64 
 2   EVENT_TYPE  675004 non-null  object
 3   TIMESTAMP   675004 non-null  int64 
 4   DISCOUNT    675004 non-null  object
dtypes: int64(2), object(3)
memory usage: 25.7+ MB


From the cells above, we've learned that our data has has 5 columns, 675004 rows and the headers are: ITEM_ID, USER_ID, EVENT_TYPE, TIMESTAMP and DISCOUNT.

To be compatible with an Amazon Personalize interactions schema, this dataset requires column headings compatible with Amazon Personalize default column names (read about column names [here](https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html) )

The ECOMMERCE recommenders require you to provide specific EVENT_TYPE values in order to understand the context of an interaction. Let's look at what event types are currently in our dataset:

In [13]:
df.EVENT_TYPE.value_counts()

EVENT_TYPE
View             581900
AddToCart         46552
ViewCart          29095
StartCheckout     11638
Purchase           5819
Name: count, dtype: int64

We can see that 'View' and 'Purchase' event are present and we can proceed. 

### Prepare the Interactions Data


### Drop Columns

Some columns in this dataset would not add value to our model and as such need to be dropped from this dataset. Columns such as *discount*.

In [14]:
test=df.drop(columns=['DISCOUNT'])
df=test
df.sample(10)

Unnamed: 0,ITEM_ID,USER_ID,EVENT_TYPE,TIMESTAMP
314097,aad9fe48-92f9-4197-af5a-505a80316c8c,4543,View,1712949061
216150,84ab276d-9713-409f-861e-35502d6dc64a,2246,View,1712020256
248890,b87da3f8-9a3e-417d-abd7-16329c5be1ba,4465,View,1712330716
82468,ec02b332-05a5-40dd-ae9d-2b0672baaa6e,232,View,1710752595
224804,2aba4640-946f-48e2-bd43-c3ac3a70fb7c,3552,AddToCart,1712102321
58251,e61935e9-bfd3-4ea7-81a0-3212f542b18a,4419,View,1710522937
163590,122f738a-92c6-479c-9bd7-c45b17bf0417,4691,View,1711521829
342969,3ab996bb-9c82-4e05-b14a-81a68352c418,2802,View,1713222843
435374,6a90d0b3-930c-46c3-b093-f55c60eb27a2,2520,View,1714099094
64270,4df77d59-732e-4194-b9aa-7ad3878345e7,4732,View,1710579996


Setup names for interactions files to use later

In [15]:
interactions_file_path = './cleaned_interactions_training_data.csv'
interactions_file_name = 'cleaned_interactions_training_data.csv'

In the cell below, we will write our cleaned data to a file named "final_training_data.csv

In [16]:
df.to_csv(interactions_file_path)

### Download and Explore the Items Dataset

In [17]:
!wget https://code.retaildemostore.retail.aws.dev/csvs/items.csv -O ./items.csv --no-check-certificate

--2024-07-03 23:45:59--  https://code.retaildemostore.retail.aws.dev/csvs/items.csv
Resolving code.retaildemostore.retail.aws.dev (code.retaildemostore.retail.aws.dev)... 18.165.83.109, 18.165.83.69, 18.165.83.120, ...
Connecting to code.retaildemostore.retail.aws.dev (code.retaildemostore.retail.aws.dev)|18.165.83.109|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 789018 (771K) [text/csv]
Saving to: ‘./items.csv’


2024-07-03 23:45:59 (2.47 MB/s) - ‘./items.csv’ saved [789018/789018]



The dataset has been successfully downloaded as items.csv

Lets learn more about the dataset by viewing its charateristics

In [18]:
items_df = pd.read_csv('./items.csv')
items_df

Unnamed: 0,ITEM_ID,PRICE,CATEGORY_L1,CATEGORY_L2,PRODUCT_NAME,PRODUCT_DESCRIPTION,GENDER,PROMOTED
0,6579c22f-be2b-444c-a52b-0116dd82df6c,90.99,accessories,backpack,Spacious Tan Backpack for Her Travels,This versatile tan travel backpack is thoughtf...,F,
1,2e852905-c6f4-47db-802c-654013571922,123.99,accessories,backpack,Blush Backpack for Everyday Chic,This chic pale pink backpack adds a feminine t...,F,
2,4ec7ff5c-f70f-4984-b6c4-c7ef37cc0c09,87.99,accessories,backpack,Sleek Gainsboro Pack for Fashionable Women,"Sleek and spacious, this stylish gainsboro bac...",F,
3,7977f680-2cf7-457d-8f4d-afa0aa168cb9,125.99,accessories,backpack,Stylish Gray Backpack for Women,Chic and functional gray backpack designed for...,F,
4,b5649d7c-4651-458d-a07f-912f253784ce,141.99,accessories,backpack,Stylish Orange Backpack,Style and function meet in this durable canvas...,F,
...,...,...,...,...,...,...,...,...
2460,5afced84-ed2d-4520-a06d-dcfeab382e52,2.50,cold dispensed,fountain-non-carbonated,Refreshing Ginseng Iced Tea,"Refresh and recharge with our crisp, lightly s...",Any,
2461,0987bfa1-0a23-4b90-8882-8a6e9bd91e24,5.50,food service,seafood,Spicy Prawn Curry,"Succulent prawns in a creamy, aromatic curry s...",Any,
2462,575c0ac0-5494-4c64-a886-a9c0cf8b779a,3.50,food service,other cuisine,Lentil Potato Carrot Dish,"Hearty lentil dish with protein-rich legumes, ...",Any,True
2463,7000f6e7-41f7-4957-878a-ccc42a39ca59,1.20,hot dispensed,hot chocolate,Hot Chocolate - Creamy Chocolatey Warmth,"Indulge in creamy, chocolatey warmth with our ...",Any,


In [19]:
items_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2465 entries, 0 to 2464
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   ITEM_ID              2465 non-null   object 
 1   PRICE                2465 non-null   float64
 2   CATEGORY_L1          2465 non-null   object 
 3   CATEGORY_L2          2465 non-null   object 
 4   PRODUCT_NAME         2465 non-null   object 
 5   PRODUCT_DESCRIPTION  2465 non-null   object 
 6   GENDER               2465 non-null   object 
 7   PROMOTED             609 non-null    object 
dtypes: float64(1), object(7)
memory usage: 154.2+ KB


Let's explore the kinds of items included in the dataset.

In [20]:
items_df.CATEGORY_L1.unique()

array(['accessories', 'apparel', 'beauty', 'books', 'electronics',
       'floral', 'footwear', 'furniture', 'groceries', 'homedecor',
       'housewares', 'instruments', 'jewelry', 'outdoors', 'seasonal',
       'tools', 'food service', 'cold dispensed', 'salty snacks',
       'hot dispensed'], dtype=object)

In [21]:
items_df.CATEGORY_L2.unique()

array(['backpack', 'bag', 'belt', 'glasses', 'handbag', 'watch', 'jacket',
       'scarf', 'shirt', 'socks', 'bathing', 'grooming', 'cooking',
       'travel', 'cable', 'camera', 'computer', 'headphones', 'keyboard',
       'speaker', 'television', 'arrangement', 'bouquet', 'centerpiece',
       'plant', 'wreath', 'boot', 'formal', 'sandals', 'sneaker',
       'chairs', 'dressers', 'sofas', 'tables', 'bakery', 'dairy',
       'fruits', 'meat', 'seafood', 'vegetables', 'clock', 'cushion',
       'decorative', 'lighting', 'bowls', 'consumable', 'kitchen', 'keys',
       'microphone', 'percussion', 'strings', 'wind', 'bracelet',
       'earrings', 'necklace', 'camping', 'fishing', 'kayaking', 'pet',
       'christmas', 'easter', 'halloween', 'valentine', 'axe', 'drill',
       'hammer', 'knife', 'plier', 'saw', 'screwdriver', 'set', 'wrench',
       'bedroom', 'salon', 'pizza', 'nachos', 'fountain-carbonated',
       'nuts/seeds', 'sandwiches/wraps', 'soup and salad',
       'fountain-non

Setup names for items files

In [22]:
items_file_path = './cleaned_item_training_data.csv'
items_file_name = 'cleaned_item_training_data.csv'

Write the data to a .csv file 

In [23]:
items_df.to_csv(items_file_path)

Let's validate that your environment can communicate successfully with Amazon Personalize, the code below do just that.

In [24]:
personalize.list_dataset_groups()

{'datasetGroups': [],
 'ResponseMetadata': {'RequestId': '818d0172-1487-4fa6-a551-fd51c9a2a686',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Wed, 03 Jul 2024 23:46:00 GMT',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '20',
   'connection': 'keep-alive',
   'x-amzn-requestid': '818d0172-1487-4fa6-a551-fd51c9a2a686',
   'strict-transport-security': 'max-age=47304000; includeSubDomains',
   'x-frame-options': 'DENY',
   'cache-control': 'no-cache',
   'x-content-type-options': 'nosniff'},
  'RetryAttempts': 0}}

### Upload Interactions data to S3
Now that our training data is ready for Amazon Personalize,the next step is to upload it to the s3 bucket created earlier

In [25]:
boto3.Session().resource('s3').Bucket(bucket_name).Object(interactions_file_name).upload_file(interactions_file_path)
interactions_s3DataPath = "s3://"+bucket_name+"/"+interactions_file_name
print(interactions_s3DataPath)

s3://personalizesamples-sgarcesv-piqebvlkj/cleaned_interactions_training_data.csv


### Upload Items data to S3
Now that our training data is ready for Amazon Personalize,the next step is to upload it to the s3 bucket created earlier

In [26]:
boto3.Session().resource('s3').Bucket(bucket_name).Object(items_file_name).upload_file(items_file_path)
items_s3DataPath = "s3://"+bucket_name+"/"+items_file_name
print(items_s3DataPath)

s3://personalizesamples-sgarcesv-piqebvlkj/cleaned_item_training_data.csv


## Create and Wait for Dataset Group
The largest grouping in Amazon Personalize is a Dataset Group, this will isolate your data, event trackers, solutions, Recommenders, and campaigns. Grouping things together that share a common collection of data. Feel free to alter the name below if you'd like. 

When you create a Domain dataset group, you choose your domain. The domain you specify determines the default schemas for datasets and the use cases that are available for recommenders. 

You can find more information about creating a Domain dataset group in [the documentation](https://docs.aws.amazon.com/personalize/latest/dg/create-domain-dataset-group.html).

### Create Dataset Group

In [27]:
dataset_group_name='personalize_ecommerce_dsg'

try:
    response = personalize.create_dataset_group(
        name=dataset_group_name,
        domain='ECOMMERCE'
    )
    
    dataset_group_arn = response['datasetGroupArn']
    print(json.dumps(response, indent=2))
    
except personalize.exceptions.ResourceAlreadyExistsException:
    dataset_group_arn = 'arn:aws:personalize:' + region + ':' + account_id + ':dataset-group/' + dataset_group_name 
    print ('\nThe the Dataset Group with dataset_group_arn = {} already exists'.format(dataset_group_arn))
    print ('\nWe will be using the existing Dataset Group dataset_group_arn = {}'.format(dataset_group_arn))

{
  "datasetGroupArn": "arn:aws:personalize:us-east-1:058264209953:dataset-group/personalize_ecommerce_dsg",
  "domain": "ECOMMERCE",
  "ResponseMetadata": {
    "RequestId": "05413ccc-fa5d-413f-8c7b-52206f90d90b",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Wed, 03 Jul 2024 23:46:01 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "125",
      "connection": "keep-alive",
      "x-amzn-requestid": "05413ccc-fa5d-413f-8c7b-52206f90d90b",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


Wait for Dataset Group to have ACTIVE Status
Before we can use the Dataset Group in any items below it must be active, execute the cell below and wait for it to show active.

In [28]:
%%time

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_group_response = personalize.describe_dataset_group(
        datasetGroupArn = dataset_group_arn
    )
    status = describe_dataset_group_response["datasetGroup"]["status"]
    print("DatasetGroup: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

DatasetGroup: CREATE PENDING
DatasetGroup: ACTIVE
CPU times: user 25.4 ms, sys: 1.09 ms, total: 26.5 ms
Wall time: 1min


## Create Interactions Schema
A core component of how Amazon Personalize understands your data comes from the Schema that is defined below. This configuration tells the service how to digest the data provided via your CSV file. Note the columns and types align to what was in the file you created above.

In [29]:
interactions_schema = {
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "TIMESTAMP",
            "type": "long"
        },
        {
            "name": "EVENT_TYPE",
            "type": "string"
        }
        
    ],
    "version": "1.0"
}

interactions_schema_name='personalize-ecommerce-interactions_sch'

try:
    create_schema_response = personalize.create_schema(
        name = interactions_schema_name,
        domain = 'ECOMMERCE',
        schema = json.dumps(interactions_schema)
    )

    interactions_schema_arn = create_schema_response['schemaArn']
    print(json.dumps(create_schema_response, indent=2))

except personalize.exceptions.ResourceAlreadyExistsException:
    interactions_schema_arn = 'arn:aws:personalize:' + region + ':' + account_id + ':schema/' + interactions_schema_name 
    print('The schema {} already exists.'.format(interactions_schema_arn))
    print ('\nWe will be using the existing Interactions Schema with interactions_schema_arn = {}'.format(interactions_schema_arn))
 

{
  "schemaArn": "arn:aws:personalize:us-east-1:058264209953:schema/personalize-ecommerce-interactions_sch",
  "ResponseMetadata": {
    "RequestId": "f3f3527f-5e83-4575-bac0-be4dad1b701f",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Wed, 03 Jul 2024 23:47:01 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "104",
      "connection": "keep-alive",
      "x-amzn-requestid": "f3f3527f-5e83-4575-bac0-be4dad1b701f",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


## Create Items Schema
A core component of how Amazon Personalize understands your data comes from the Schema that is defined below. This configuration tells the service how to digest the data provided via your CSV file. Note the columns and types align to what was in the file you created above.

In [30]:
items_schema = {
    "type": "record",
    "name": "Items",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "PRICE",
            "type": "float"
        },
        {
            "name": "CATEGORY_L1",
            "type": ["string"],
            "categorical": True
        },
        {
            "name": "CATEGORY_L2",
            "type": ["string"],
            "categorical": True
            
        },
        {
            "name": "PRODUCT_NAME",
            "type": "string"
        },
        {
            "name": "PRODUCT_DESCRIPTION",
            "type": ["string"],
            "textual": True
        },
        {
            "name": "GENDER",
            "type": ["string"],
            "categorical": True
            
        }
    ],
    "version": "1.0"
}

items_schema_name = "personalize-ecommerce-items_sch"

try:
    create_schema_response = personalize.create_schema(
        name = items_schema_name,
        domain = "ECOMMERCE",
        schema = json.dumps(items_schema)
    )

    items_schema_arn = create_schema_response['schemaArn']
    print(json.dumps(create_schema_response, indent=2))
    
except personalize.exceptions.ResourceAlreadyExistsException:
    items_schema_arn = 'arn:aws:personalize:' + region + ':' + account_id + ':schema/' + items_schema_name 
    print('The schema {} already exists.'.format(items_schema_arn))
    print ('\nWe will be using the existing Items Schema with items_schema_arn = {}'.format(items_schema_arn))
 

{
  "schemaArn": "arn:aws:personalize:us-east-1:058264209953:schema/personalize-ecommerce-items_sch",
  "ResponseMetadata": {
    "RequestId": "575bf80f-e5fb-4e19-863d-7badc05dc13e",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Wed, 03 Jul 2024 23:47:01 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "97",
      "connection": "keep-alive",
      "x-amzn-requestid": "575bf80f-e5fb-4e19-863d-7badc05dc13e",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


## Create Datasets
After the group, the next thing to create is the datasets where your data will be uploaded to in Amazon Personalize.

### Create Interactions Dataset

In [31]:
dataset_type = "INTERACTIONS"
interactions_dataset_name = "personalize_ecommerce_interactions_ds"

try:
    create_dataset_response = personalize.create_dataset(
        name = interactions_dataset_name,
        datasetType = dataset_type,
        datasetGroupArn = dataset_group_arn,
        schemaArn = interactions_schema_arn
    )

    interactions_dataset_arn = create_dataset_response['datasetArn']
    print(json.dumps(create_dataset_response, indent=2))

except personalize.exceptions.ResourceAlreadyExistsException:
    interactions_dataset_arn = 'arn:aws:personalize:' + region + ':' + account_id + ':dataset/' + dataset_group_name + '/INTERACTIONS'
    print('The Interactions Dataset {} already exists.'.format(interactions_dataset_arn))
    print ('\nWe will be using the existing Interactions Dataset with interactions_dataset_arn = {}'.format(interactions_dataset_arn))


{
  "datasetArn": "arn:aws:personalize:us-east-1:058264209953:dataset/personalize_ecommerce_dsg/INTERACTIONS",
  "ResponseMetadata": {
    "RequestId": "2422d97e-6dca-4600-b73f-bc0a2eeb2aaa",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Wed, 03 Jul 2024 23:47:01 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "106",
      "connection": "keep-alive",
      "x-amzn-requestid": "2422d97e-6dca-4600-b73f-bc0a2eeb2aaa",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


### Create Items Dataset

In [32]:
dataset_type = "ITEMS"
items_dataset_name = "personalize_ecommerce_items_ds"

try:
    create_dataset_response = personalize.create_dataset(
        name = items_dataset_name,
        datasetType = dataset_type,
        datasetGroupArn = dataset_group_arn,
        schemaArn = items_schema_arn
    )

    items_dataset_arn = create_dataset_response['datasetArn']
    print(json.dumps(create_dataset_response, indent=2))

except personalize.exceptions.ResourceAlreadyExistsException:    
    items_dataset_arn = 'arn:aws:personalize:' + region + ':' + account_id + ':dataset/' + dataset_group_name + '/ITEMS'
    print('The Items Dataset {} already exists.'.format(items_dataset_arn))
    print ('\nWe will be using the existing Items Dataset with items_dataset_arn = {}'.format(items_dataset_arn))   

{
  "datasetArn": "arn:aws:personalize:us-east-1:058264209953:dataset/personalize_ecommerce_dsg/ITEMS",
  "ResponseMetadata": {
    "RequestId": "4c0ca56a-694a-48b3-bea2-acbf7801af9b",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Wed, 03 Jul 2024 23:47:01 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "99",
      "connection": "keep-alive",
      "x-amzn-requestid": "4c0ca56a-694a-48b3-bea2-acbf7801af9b",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


In [33]:
%%time

max_time = time.time() + 6*60*60 # 6 hours
while time.time() < max_time:
    describe_dataset_response = personalize.describe_dataset(
        datasetArn = interactions_dataset_arn
    )
    status_interaction_dataset =  describe_dataset_response["dataset"]['status']
    print("Interactions Dataset: {}".format(status_interaction_dataset))
    
    if status_interaction_dataset == "ACTIVE":
        print("Build succeeded for {}".format(interactions_dataset_arn))
        
    elif status_interaction_dataset == "CREATE FAILED":
        print("Build failed for {}".format(interactions_dataset_arn))
        break
        
    if not status_interaction_dataset == "ACTIVE":
        print("The interaction dataset creation is still in progress")
    else:
        print("The interaction dataset  is ACTIVE")
        

    describe_dataset_response = personalize.describe_dataset(
        datasetArn = items_dataset_arn
    )
    status_item_dataset =  describe_dataset_response["dataset"]['status']
    print("Items Dataset: {}".format(status_item_dataset))
    
    if status_item_dataset == "ACTIVE":
        print("Build succeeded for {}".format(items_dataset_arn))
        
    elif status_item_dataset == "CREATE FAILED":
        print("Build failed for {}".format(items_dataset_arn))
        break
        
    if not status_item_dataset == "ACTIVE":
        print("The item dataset creation is still in progress")
    else:
        print("The item dataset  is ACTIVE")
        
    if status_interaction_dataset == "ACTIVE" and status_item_dataset == "ACTIVE":
        break
    time.sleep(30)

Interactions Dataset: CREATE PENDING
The interaction dataset creation is still in progress
Items Dataset: CREATE PENDING
The item dataset creation is still in progress
Interactions Dataset: ACTIVE
Build succeeded for arn:aws:personalize:us-east-1:058264209953:dataset/personalize_ecommerce_dsg/INTERACTIONS
The interaction dataset  is ACTIVE
Items Dataset: ACTIVE
Build succeeded for arn:aws:personalize:us-east-1:058264209953:dataset/personalize_ecommerce_dsg/ITEMS
The item dataset  is ACTIVE
CPU times: user 12.4 ms, sys: 8.17 ms, total: 20.6 ms
Wall time: 30.2 s


## Import the data
Earlier you created the DatasetGroup and Dataset to house your information, now you will execute an import job that will load the data from S3 into Amazon Personalize for usage building your model.
### Create Interactions Dataset Import Job

In [34]:
# Check if the import job already exists
interactions_import_job_name = 'personalize_ecommerce_interactions_import'

# List the import jobs
interactions_dataset_import_jobs = personalize.list_dataset_import_jobs(
    datasetArn=interactions_dataset_arn,
    maxResults=100
)['datasetImportJobs']

#check if there is an existing job with the prefix
job_exists = False  
job_arn = None

for job in interactions_dataset_import_jobs:
    if (interactions_import_job_name in job['jobName']):
        job_exists = True
        job_arn = job['datasetImportJobArn']
    
if (job_exists):
    interactions_dataset_import_job_arn = job_arn
    print('The Interactions Import Job {} already exists.'.format(interactions_dataset_import_job_arn))
    print ('\nWe will be using the existing Interactions Import Job with interactions_dataset_import_job_arn = {}'.format(interactions_dataset_import_job_arn))
        
else:
    # If there is no import job with the prefix, create it:   
    create_dataset_import_job_response = personalize.create_dataset_import_job(
        jobName = interactions_import_job_name,
        datasetArn = interactions_dataset_arn,
        dataSource = {
            "dataLocation": "s3://{}/{}".format(bucket_name, interactions_file_name)
        },
        roleArn = role_arn
    )
    interactions_dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
    print(json.dumps(create_dataset_import_job_response, indent=2))
    
    print ('\nImporting the Interactions Data with workshop_interactions_dataset_import_job_arn = {}'.format(interactions_dataset_import_job_arn))

{
  "datasetImportJobArn": "arn:aws:personalize:us-east-1:058264209953:dataset-import-job/personalize_ecommerce_interactions_import",
  "ResponseMetadata": {
    "RequestId": "a818b699-b8c5-456f-a166-bf6f3d084c6a",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Wed, 03 Jul 2024 23:47:32 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "129",
      "connection": "keep-alive",
      "x-amzn-requestid": "a818b699-b8c5-456f-a166-bf6f3d084c6a",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}

Importing the Interactions Data with workshop_interactions_dataset_import_job_arn = arn:aws:personalize:us-east-1:058264209953:dataset-import-job/personalize_ecommerce_interactions_import


### Create Items Dataset Import Job

In [35]:
# Checking if the import job already exists
items_import_job_name = 'personalize_ecommerce_items_import'

# List the import jobs
items_dataset_import_jobs = personalize.list_dataset_import_jobs(
    datasetArn=items_dataset_arn,
    maxResults=100
)['datasetImportJobs']

job_exists = False
job_arn = None

#check if there is an existing job with the prefix
for job in items_dataset_import_jobs:
    if (items_import_job_name in job['jobName']):
        job_exists = True
        job_arn = job['datasetImportJobArn']
    
if (job_exists):
    items_dataset_import_job_arn =  job_arn
    print('The Items Import Job {} already exists.'.format(items_dataset_import_job_arn))
    print ('\nWe will be using the existing Items Import Job with items_dataset_import_job_arn = {}'.format(items_dataset_import_job_arn))
        
else:
    # If there is no import job with the prefix, create it:    
    create_dataset_import_job_response = personalize.create_dataset_import_job(
        jobName = items_import_job_name,
        datasetArn = items_dataset_arn,
        dataSource = {
            "dataLocation": "s3://{}/{}".format(bucket_name, items_file_name)
        },
        roleArn = role_arn
    )

    items_dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
    print(json.dumps(create_dataset_import_job_response, indent=2))
    print ('\nImporting the Items Data with items_dataset_import_job_arn = {}'.format(items_dataset_import_job_arn))
    



{
  "datasetImportJobArn": "arn:aws:personalize:us-east-1:058264209953:dataset-import-job/personalize_ecommerce_items_import",
  "ResponseMetadata": {
    "RequestId": "eb45769d-590a-46e6-a15f-e3d4f80b5322",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Wed, 03 Jul 2024 23:47:32 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "122",
      "connection": "keep-alive",
      "x-amzn-requestid": "eb45769d-590a-46e6-a15f-e3d4f80b5322",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}

Importing the Items Data with items_dataset_import_job_arn = arn:aws:personalize:us-east-1:058264209953:dataset-import-job/personalize_ecommerce_items_import


### Wait for Dataset Import Jobs to Have ACTIVE Status
It can take a while before the import jobs complete, please wait until you see that they are active below.

In [39]:
max_time = time.time() + 3*60*60 # 3 hours

while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = items_dataset_import_job_arn
    )
    status = describe_dataset_import_job_response["datasetImportJob"]['status']
    print("ItemsDatasetImportJob: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = interactions_dataset_import_job_arn
    )
    status = describe_dataset_import_job_response["datasetImportJob"]['status']
    print("InteractionsDatasetImportJob: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

ItemsDatasetImportJob: ACTIVE
InteractionsDatasetImportJob: ACTIVE


## Choose a recommender use case

Each domain has different use cases. When you create a recommender you create it for a specific use case, and each use case has different requirements for getting recommendations.


In [40]:
available_recipes = personalize.list_recipes(domain='ECOMMERCE') # See a list of recommenders for the domain. 
display (available_recipes['recipes'])

[{'name': 'aws-ecomm-customers-who-viewed-x-also-viewed',
  'recipeArn': 'arn:aws:personalize:::recipe/aws-ecomm-customers-who-viewed-x-also-viewed',
  'status': 'ACTIVE',
  'creationDateTime': datetime.datetime(2019, 6, 10, 0, 0, tzinfo=tzlocal()),
  'lastUpdatedDateTime': datetime.datetime(2024, 6, 19, 16, 47, 19, 191000, tzinfo=tzlocal()),
  'domain': 'ECOMMERCE'},
 {'name': 'aws-ecomm-frequently-bought-together',
  'recipeArn': 'arn:aws:personalize:::recipe/aws-ecomm-frequently-bought-together',
  'status': 'ACTIVE',
  'creationDateTime': datetime.datetime(2019, 6, 10, 0, 0, tzinfo=tzlocal()),
  'lastUpdatedDateTime': datetime.datetime(2024, 6, 19, 16, 47, 19, 191000, tzinfo=tzlocal()),
  'domain': 'ECOMMERCE'},
 {'name': 'aws-ecomm-popular-items-by-purchases',
  'recipeArn': 'arn:aws:personalize:::recipe/aws-ecomm-popular-items-by-purchases',
  'status': 'ACTIVE',
  'creationDateTime': datetime.datetime(2019, 6, 10, 0, 0, tzinfo=tzlocal()),
  'lastUpdatedDateTime': datetime.dateti

We are going to create a recommender of the type "[Recommended For you](https://docs.aws.amazon.com/personalize/latest/dg/ECOMMERCE-use-cases.html#recommended-for-you-use-case)". This type of recommender offers personalized recommendations for items based on a user that you specify. With this use case, Amazon Personalize automatically filters items the user purchased based on the userId that you specify and `Purchase` events.

[More use cases per domain](https://docs.aws.amazon.com/personalize/latest/dg/domain-use-cases.html)

In [41]:
try:
    create_recommender_response = personalize.create_recommender(
      name = 'recommended_for_you_demo',
      recipeArn = 'arn:aws:personalize:::recipe/aws-ecomm-recommended-for-you',
      datasetGroupArn = dataset_group_arn
    )
    recommended_for_you_arn = create_recommender_response['recommenderArn']
    print(json.dumps(create_recommender_response, indent=2))
except personalize.exceptions.ResourceAlreadyExistsException:
    print('You aready created this recommender, seemingly')
    paginator = personalize.get_paginator('list_recommenders')
    for paginate_result in paginator.paginate(datasetGroupArn = dataset_group_arn):
        for recommender in paginate_result['recommenders']:
            if recommender['name'] == 'recommended_for_you_demo':
                recommended_for_you_arn = recommender['recommenderArn']
                break
                
print(f'Recommended For you recommender ARN = {recommended_for_you_arn}')

{
  "recommenderArn": "arn:aws:personalize:us-east-1:058264209953:recommender/recommended_for_you_demo",
  "ResponseMetadata": {
    "RequestId": "6d2de771-7f61-425c-bd8d-c567084e4f38",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Thu, 04 Jul 2024 00:11:05 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "100",
      "connection": "keep-alive",
      "x-amzn-requestid": "6d2de771-7f61-425c-bd8d-c567084e4f38",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}
Recommended For you recommender ARN = arn:aws:personalize:us-east-1:058264209953:recommender/recommended_for_you_demo


## Create Custom Solutions

### List Recipes

First, let's list all available recipes that aren't associated with a domain.

In [42]:
response = personalize.list_recipes()
custom_recipes = []
for recipe in response['recipes']:
    if not recipe.get('domain'):
        custom_recipes.append(recipe)
        
print(json.dumps(custom_recipes, indent=2, default=str))

[
  {
    "name": "aws-item-affinity",
    "recipeArn": "arn:aws:personalize:::recipe/aws-item-affinity",
    "status": "ACTIVE",
    "creationDateTime": "2021-07-15 00:00:00+00:00",
    "lastUpdatedDateTime": "2024-06-19 16:47:19.191000+00:00"
  },
  {
    "name": "aws-item-attribute-affinity",
    "recipeArn": "arn:aws:personalize:::recipe/aws-item-attribute-affinity",
    "status": "ACTIVE",
    "creationDateTime": "2021-08-25 00:00:00+00:00",
    "lastUpdatedDateTime": "2024-06-19 16:47:19.191000+00:00"
  },
  {
    "name": "aws-next-best-action",
    "recipeArn": "arn:aws:personalize:::recipe/aws-next-best-action",
    "status": "ACTIVE",
    "creationDateTime": "2023-08-11 00:00:00+00:00",
    "lastUpdatedDateTime": "2024-06-19 16:47:19.191000+00:00"
  },
  {
    "name": "aws-personalized-ranking",
    "recipeArn": "arn:aws:personalize:::recipe/aws-personalized-ranking",
    "status": "ACTIVE",
    "creationDateTime": "2019-06-10 00:00:00+00:00",
    "lastUpdatedDateTime": "2024-

As you can see above, there are several recipes to choose from. Let's declare the recipe for the Similar Items custom solution.

### Declare Personalize Recipe for Similar Items

In use-cases where we have an item/product and we want to display similar items based on the co-interactions of all users as well as draw upon thematic similarities based on item metadata, we can use the Similar-Items recipe to provide related items recommendations.

The [Similar-Items](https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-similar-items.html) (aws-similar-items) generates recommendations for items that are similar to an item you specify. Use Similar-Items to help customers discover new items in your catalog based on their previous behavior and item metadata. Recommending similar items can increase user engagement, click-through rate, and conversion rate for your application.

Similar-Items calculates similarity based on interactions data and any item metadata you provide. It takes into account the co-occurrence of the item in user histories in your Interaction dataset, and any item metadata similarities. For example, with Similar-Items Amazon Personalize could recommend items customers frequently bought together with a similar style (Categorical metadata), or movies that different users also watched with a similar description (Unstructured text metadata).

Note that Amazon Personalize also has the SIMS recipe for the related items use case. However, SIMS only trains on co-interaction data (i.e. the interactions dataset) and does not consider item metadata. Since we may have some items with fewer (or no) interactions, the Similar-Items recipe is a better match for our use case.

In [43]:
similar_items_recipe_arn = "arn:aws:personalize:::recipe/aws-similar-items"

### Create Custom Solution and Solution Version

With our recipe defined, we can now create our solution and solution version.

### Create Similar Items Solution

In [44]:
similar_items_solution_version_arn = None

try:
    create_solution_response = personalize.create_solution(
        name = "related-items-demo",
        datasetGroupArn = dataset_group_arn,
        recipeArn = similar_items_recipe_arn
    )

    similar_items_solution_arn = create_solution_response['solutionArn']
    print(json.dumps(create_solution_response, indent=2))
except personalize.exceptions.ResourceAlreadyExistsException:
    print('You aready created this solution, seemingly')
    paginator = personalize.get_paginator('list_solutions')
    for paginate_result in paginator.paginate(datasetGroupArn = dataset_group_arn):
        for solution in paginate_result['solutions']:
            if solution['name'] == 'related-items-demo':
                similar_items_solution_arn = solution['solutionArn']
                print(f'Similar Items solution ARN = {similar_items_solution_arn}')
                
                response = personalize.list_solution_versions(
                    solutionArn = similar_items_solution_arn,
                    maxResults = 100
                )
                if len(response['solutionVersions']) > 0:
                    similar_items_solution_version_arn = response['solutionVersions'][-1]['solutionVersionArn']
                    print(f'Will use most recent solution version for this solution: {similar_items_solution_version_arn}')
                    
                break

{
  "solutionArn": "arn:aws:personalize:us-east-1:058264209953:solution/related-items-demo",
  "ResponseMetadata": {
    "RequestId": "ead4ac9b-2898-4bd4-859b-af6a07d12bb6",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Thu, 04 Jul 2024 00:11:07 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "88",
      "connection": "keep-alive",
      "x-amzn-requestid": "ead4ac9b-2898-4bd4-859b-af6a07d12bb6",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


### Create Similar Items Solution Version

Next we can create a solution version for the solution. This is where the model is trained for this custom solution.

In [45]:
if not similar_items_solution_version_arn:
    create_solution_version_response = personalize.create_solution_version(
        solutionArn = similar_items_solution_arn
    )

    similar_items_solution_version_arn = create_solution_version_response['solutionVersionArn']
    print(json.dumps(create_solution_version_response, indent=2))
else:
    print(f'Solution version {similar_items_solution_version_arn} already exists; not creating')

{
  "solutionVersionArn": "arn:aws:personalize:us-east-1:058264209953:solution/related-items-demo/d12f3c16",
  "ResponseMetadata": {
    "RequestId": "a5e4a15d-13d0-4a57-a6de-6cefff7d0bab",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Thu, 04 Jul 2024 00:11:07 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "104",
      "connection": "keep-alive",
      "x-amzn-requestid": "a5e4a15d-13d0-4a57-a6de-6cefff7d0bab",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


### Wait for the recommender to become active

In [None]:
%%time

max_time = time.time() + 10*60*60 # 10 hours
    
while time.time() < max_time:

    version_response = personalize.describe_recommender(
        recommenderArn = recommended_for_you_arn
    )
    status = version_response["recommender"]["status"]
    print(status)

    if status == "ACTIVE":
        print("Build succeeded for {}".format(recommended_for_you_arn))
        
    elif status == "CREATE FAILED":
        print("Build failed for {}".format(recommended_for_you_arn))
        break

    if status == "ACTIVE" or status == "CREATE FAILED":
        break
    else:
        print('The "Recommended for you" Recommender build is still in progress')
        
    time.sleep(60)

CREATE IN_PROGRESS
The "Recommended for you" Recommender build is still in progress
CREATE IN_PROGRESS
The "Recommended for you" Recommender build is still in progress


### Wait for the custom solution version to become active

The following cell waits for the solution version for the similar items use case to become active. It's likely that they're already active (or close to being active) since they were being created in parallel with the recommender. Nevertheless, we'll make sure they are active too before proceeding.

In [None]:
%%time

soln_ver_arns = [ 
    similar_items_solution_version_arn
]

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    for soln_ver_arn in reversed(soln_ver_arns):
        soln_ver_response = personalize.describe_solution_version(
            solutionVersionArn = soln_ver_arn
        )
        status = soln_ver_response["solutionVersion"]["status"]

        if status == "ACTIVE":
            print(f'Solution version {soln_ver_arn} successfully completed')
            soln_ver_arns.remove(soln_ver_arn)
        elif status == "CREATE FAILED":
            print(f'Solution version {soln_ver_arn} failed')
            if soln_ver_response["solutionVersion"].get('failureReason'):
                print('   Reason: ' + soln_ver_response["solutionVersion"]['failureReason'])
            soln_ver_arns.remove(soln_ver_arn)

    if len(soln_ver_arns) > 0:
        print('At least one solution version is still in progress')
        time.sleep(60)
    else:
        print("All solution versions have completed")
        break

### Create campaigns for the Similar Items solution

Once we're satisfied with our solution version, we need to create campaigns for the custom solution version created for the similar items recipe. This is required so we have an real-time API endpoints that can be called by the agent. When creating a campaign you can specify the minimum transactions per second (minProvisionedTPS) that you expect to make against the service for this campaign. Amazon Personalize will automatically scale resources for the inference endpoint up and down for the campaign to match demand but will never scale below minProvisionedTPS.

Let's create the campaign for the similar items version set at minProvisionedTPS of 1 (which is also the default if not specified).


#### Create Similar Items campaign

In [None]:
try:
    create_campaign_response = personalize.create_campaign(
        name = "ecommerce-related-items",
        solutionVersionArn = similar_items_solution_version_arn,
        minProvisionedTPS = 1
    )

    similar_items_campaign_arn = create_campaign_response['campaignArn']
    print(json.dumps(create_campaign_response, indent=2))
except personalize.exceptions.ResourceAlreadyExistsException:
    print('You aready created this campaign, seemingly. Will update campaign instead.')
    paginator = personalize.get_paginator('list_campaigns')
    for paginate_result in paginator.paginate(solutionArn = similar_items_solution_arn):
        for campaign in paginate_result['campaigns']:
            if campaign['name'] == 'ecommerce-related-items':
                similar_items_campaign_arn = campaign['campaignArn']
                print(f'Found existing campaign for solution: {similar_items_campaign_arn}')
                
                response = personalize.describe_campaign(campaignArn = similar_items_campaign_arn)
                if response['campaign']['solutionVersionArn'] == similar_items_solution_version_arn:
                    print('Campaign is already using the latest solution version')
                else:
                    print('Updating campaign with the latest solution version')
                    response = personalize.update_campaign(
                        campaignArn = similar_items_campaign_arn,
                        solutionVersionArn = similar_items_solution_version_arn,
                        minProvisionedTPS = 1
                    )
                    print(json.dumps(response, indent=2))
                break

#### Wait for campaigns to Have ACTIVE Status

It can take 15-20 minutes for a campaign to be fully created.

While you are waiting for this to complete you can learn more about campaigns here: https://docs.aws.amazon.com/personalize/latest/dg/campaigns.html

In [None]:
%%time

campaign_arns = [ similar_items_campaign_arn ]

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    for campaign_arn in reversed(campaign_arns):
        campaign_response = personalize.describe_campaign(
            campaignArn = campaign_arn
        )
        status = campaign_response["campaign"]["status"]
        if status == 'ACTIVE' and campaign_response.get('latestCampaignUpdate'):
            status = campaign_response['latestCampaignUpdate']['status']

        if status == "ACTIVE":
            print(f'Campaign {campaign_arn} successfully completed')
            campaign_arns.remove(campaign_arn)
        elif status == "CREATE FAILED":
            print(f'Campaign {campaign_arn} failed')
            if campaign_response["campaign"].get('failureReason'):
                print('   Reason: ' + campaign_response["campaign"]['failureReason'])
            campaign_arns.remove(campaign_arn)

    if len(campaign_arns) > 0:
        print('At least one campaign is still in progress')
        time.sleep(60)
    else:
        print("All campaigns have completed")
        break

## Getting recommendations with a recommender
Now that the recommender has been trained, lets have a look at the recommendations we can get for our users!

In [None]:
# reading the original data in order to have a dataframe that has both item_ids 
# and product_name to make our recommendations easier to read.
items_df = pd.read_csv('./items.csv')
items_df.sample(10)

In [None]:
def get_item_by_id(item_id, item_df):
    """
    This takes in an item_id from a recommendation in string format,
    converts it to an int, and then does a lookup in a default or specified
    dataframe and returns the item description.
    
    A really broad try/except clause was added in case anything goes wrong.
    
    Feel free to add more debugging or filtering here to improve results if
    you hit an error.
    """
    try:
        return items_df.loc[items_df["ITEM_ID"]==str(item_id)]['PRODUCT_DESCRIPTION'].values[0]
    except:
        print (item_id)
        return "Error obtaining item description"

In [None]:
def get_category_by_id(item_id, item_df):
    """
    This takes in an item_id from a recommendation in string format,
    converts it to an int, and then does a lookup in a default or specified
    dataframe and returns the item category.
    
    A really broad try/except clause was added in case anything goes wrong.
    """
    
    try:
        return items_df.loc[items_df["ITEM_ID"]==str(item_id)]['CATEGORY_L1'].values[0]
    except:
        print (item_id)
        return "Error obtaining item category"
    

Let us get some  recommendations from the recommender returning "Recommended for you":

In [None]:
# First pick a user
test_user_id = "1234" 

# Get recommendations for the user
get_recommendations_response = personalize_runtime.get_recommendations(
    recommenderArn = recommended_for_you_arn,
    userId = test_user_id,
    numResults = 3
)

# Build a new dataframe for the recommendations
item_list = get_recommendations_response['itemList']
recommendation_id_list = []
recommendation_description_list = []
recommendation_category_list = []

for item in item_list:
    description = get_item_by_id(item['itemId'], items_df)
    recommendation_description_list.append(description)
    recommendation_id_list.append(item['itemId'])
    recommendation_category_list.append(get_category_by_id(item['itemId'], items_df))

user_recommendations_df = pd.DataFrame(recommendation_id_list, columns = ["ID"])
user_recommendations_df["description"] = recommendation_description_list
user_recommendations_df["category level 2"] = recommendation_category_list

pd.options.display.max_rows =20
display(user_recommendations_df)

## Getting similar items

The Similar-Items recipe is designed to balance co-interactions across all users and thematic similarity between items to make relevant related items recommendations. Since the input for related items recommendations is an item ID, let's select a product from the catalog to use as our source item.

In [None]:
product = items_df.sample(1)
product

Now let's get some related item recommendations from the Similar Items based campaign for the above product. Notice that we're using the same GetRecommendation API as the recommenders above but this time we're specifying a campaignArn rather than a recommenderArn.

In [None]:
product_id = product.iloc[0]['ITEM_ID']

get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = similar_items_campaign_arn,
    itemId = str(product_id),
    numResults = 3
)

item_list = get_recommendations_response['itemList']
print(json.dumps(item_list, indent=4))

We will later analize the returned similar items while testing the agent

## Event Tracking - Keeping up with evolving user intent

Up to this point we have trained and deployed Amazon Personalize recommender and campaign based on historical data that we generated in this workshop. This allows us to make related product and user recommendations based on already observed behavior of our users. However, user intent often changes in real-time such that what products the user is interested in now may be different than what they were interested in a week ago, a day ago, or even a few minutes ago. Making recommendations that keep up with evolving user intent is one of the more difficult challenges with personalization. Fortunately, Amazon Personalize has a mechanism for this exact issue.

Amazon Personalize supports the ability to send real-time user events (i.e. clickstream) data into the service. Amazon Personalize uses this event data to adjust recommendations. It will also save these events and automatically include them when recommenders and solutions for the same dataset group are re-trained.

During Notebook 2 we will use the event tracker to send purchased products events to enhance the user's experience. 

#### Create Personalize Event Tracker

Let's start by creating an event tracker for our dataset group.

In [None]:
try:
    event_tracker_response = personalize.create_event_tracker(
        datasetGroupArn=dataset_group_arn,
        name='ecommerce-event-tracker'
    )

    event_tracker_arn = event_tracker_response['eventTrackerArn']
    event_tracking_id = event_tracker_response['trackingId']
except personalize.exceptions.ResourceAlreadyExistsException:
    print('You aready created an event tracker for this dataset group, seemingly')
    paginator = personalize.get_paginator('list_event_trackers')
    for paginate_result in paginator.paginate(datasetGroupArn = dataset_group_arn):
        for event_tracker in paginate_result['eventTrackers']:
            if event_tracker['name'] == 'ecommerce-event-tracker':
                event_tracker_arn = event_tracker['eventTrackerArn']
                
                response = personalize.describe_event_tracker(eventTrackerArn = event_tracker_arn)
                event_tracking_id = response['eventTracker']['trackingId']
                break

print('Event Tracker ARN: ' + event_tracker_arn)
print('Event Tracking ID: ' + event_tracking_id)

#### Wait for Event Tracker Status to Become ACTIVE

The event tracker should take a minute or so to become active.

In [None]:
status = None
max_time = time.time() + 60*60 # 1 hours
while time.time() < max_time:
    describe_event_tracker_response = personalize.describe_event_tracker(
        eventTrackerArn = event_tracker_arn
    )
    status = describe_event_tracker_response["eventTracker"]["status"]
    print("EventTracker: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(15)

## Using filters with a recommender
Now, lets create a filter to ensure that the agent can use the item's category to filter for recommendations for users. 

### Create a filter to use

First we need to create a filter that will define what items will be recommended. We will use a dynamic [filter](https://docs.aws.amazon.com/personalize/latest/dg/filter-expressions.html) that takes in a value at inference time. 

This filter will include only items that have the specified value for *Items.CATEGORY_L1*.

To get the same results we could also use a static filter of the form:

```
'INCLUDE ItemID WHERE Items.CATEGORY_L1 IN ("electronics")'
```

However, using the dynamic filter gives the agent more flexibility to decide what products to recommend based on the category the user provides.

In [None]:
create_filter_response = personalize.create_filter(
    name = 'category_filter',
    datasetGroupArn = dataset_group_arn,
    filterExpression = 'INCLUDE ItemID WHERE ITEMS.CATEGORY_L1 IN ($CATEGORIES)'
) 
category_filter_arn = create_filter_response["filterArn"]
print("Filter ARN: " + category_filter_arn)

Wait for the filter we created to have status "Active".

In [None]:
%%time

max_time = time.time() + 10*60*60 # 10 hours
    
while time.time() < max_time:
    version_response = personalize.describe_filter(
        filterArn = category_filter_arn
    )
    status = version_response["filter"]["status"]

    if status == "ACTIVE":
        print("Build succeeded for {}".format(category_filter_arn))
        
    elif status == "CREATE FAILED":
        print("Build failed for {}".format(category_filter_arn))
        break

    if status == "ACTIVE" or status == "CREATE FAILED":
        break
    else:
        print('The Filter build is still in progress')
        
    time.sleep(30)

### Get recommendations for our test user using the filter 

Now that the filter has been created, we can get recommendations for a user using the filter. Let's check our existing test user.

In [None]:
print(test_user_id)

In [None]:
get_recommendations_response = personalize_runtime.get_recommendations(
    recommenderArn = recommended_for_you_arn,
    userId = test_user_id,
    numResults = 20,
    filterArn = category_filter_arn,
    filterValues={"CATEGORIES" : "\"electronics\""}
)
user_recommendations_df =[]

# Build a new dataframe for the recommendations
item_list = get_recommendations_response['itemList']
recommendation_id_list = []
recommendation_description_list = []
recommendation_category_list = []

for item in item_list:
    description = get_item_by_id(item['itemId'], items_df)
    recommendation_description_list.append(description)
    recommendation_id_list.append(item['itemId'])
    recommendation_category_list.append(get_category_by_id(item['itemId'], items_df))

user_recommendations_df = pd.DataFrame(recommendation_id_list, columns = ["ID"])
user_recommendations_df["description"] = recommendation_description_list
user_recommendations_df["category level 1"] = recommendation_category_list

pd.options.display.max_rows =20
display(user_recommendations_df)

As you can see from the results, all returned items have the category level 1 "electronics".

## Store variables to use in Notebook 2

In [None]:
# Store dataset ARN
%store dataset_group_arn

# Store role variables
%store role_arn
%store role_name

# Store recommender ARN
%store recommended_for_you_arn

# Store solution version ARN
%store similar_items_solution_version_arn

# Store campaign ARN
%store similar_items_campaign_arn

# Store event tracking id and ARN
%store event_tracker_arn
%store event_tracking_id

# Store filters ARNs
%store category_filter_arn

# Store Dataset Schemas ARNs
%store interactions_schema_arn
%store items_schema_arn

# Store regio
%store region

## Review
After successfully training two Amazon Personalize models to generate recommendations for items and similar items based on past user behavior, you have created an event tracker to transmit real-time user event (clickstream) data into the service. Additionally, you have utilized a filter to incorporate products from a specific category.

On the next Notebook, you will create the recommender agent and you will use the created resources to augment the agent's capabilities.

Continue to [02_Recommender-Agent_Build-Agent-ConverseAPI.ipynb](./02_Recommender-Agent_Build-Agent-ConverseAPI.ipynb)