In this notebook, we'll create a dataset within Amazon Personalize and import relevant data sources, including user interactions and product metadata. We'll then choose an appropriate recipe based on our recommendation goals, such as user-personalization. After creating a solution and training the model using the selected recipe and dataset,  we'll leverage the deployed solution to produce personalized product recommendations tailored to individual users.

In [1]:
import time
from time import sleep
import json
from datetime import datetime
import boto3
import pandas as pd
from io import StringIO
%store -r


***Creating a Dataset Group for Personalize***

Configure the AWS SDK to interact with Amazon Personalize, then creates a new dataset group named "personalize-product-recommendations." Capture the ARN of the created dataset group. Sets a maximum runtime of 3 hours and continuously polls the status of the dataset group creation in Amazon Personalize. Check if the status of the dataset group is "ACTIVE" or "CREATE FAILED" every minute

In [2]:
#Create Dataset

# Configure the SDK to Personalize:
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

create_dataset_group_response = personalize.create_dataset_group(
    name = "personalize-product-recommendations"
)

dataset_group_arn = create_dataset_group_response['datasetGroupArn']
print(json.dumps(create_dataset_group_response, indent=2))



{
  "datasetGroupArn": "arn:aws:personalize:us-west-2:151368154323:dataset-group/personalize-product-recommendations",
  "ResponseMetadata": {
    "RequestId": "231e19a7-05e7-439d-b6cc-54f0de536fcd",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Thu, 25 Jul 2024 22:17:02 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "114",
      "connection": "keep-alive",
      "x-amzn-requestid": "231e19a7-05e7-439d-b6cc-54f0de536fcd",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


In [3]:
%%time
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_group_response = personalize.describe_dataset_group(
        datasetGroupArn = dataset_group_arn
    )
    status = describe_dataset_group_response["datasetGroup"]["status"]
    print("DatasetGroup: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

DatasetGroup: ACTIVE
CPU times: user 3.89 ms, sys: 0 ns, total: 3.89 ms
Wall time: 30 ms


***Creating Interactions Schema and Dataset***

Define an interactions schema for Amazon Personalize, specifying fields for TIMESTAMP, USER_ID, ITEM_ID, and ITEM_NAME. Create the schema in Personalize and retrieves its ARN. Next, we must create an interactions dataset using this schema and associates it with the previously created dataset group

In [5]:
#create interactions schema


interactions_schema = {
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "TIMESTAMP",
            "type": "long"
        },
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "PRODUCT_NAME",
            "type": "string"
        }
    ],
    "version": "1.0"
}



create_schema_response = personalize.create_schema(
    name = "personalize-product-recommendations-interactions",
    schema = json.dumps(interactions_schema)
)

interaction_schema_arn = create_schema_response['schemaArn']
print(json.dumps(create_schema_response, indent=2))

dataset_type = "INTERACTIONS"
create_dataset_response = personalize.create_dataset(
    name = "personalize-product-recommendations-ints-dataset",
    datasetType = dataset_type,
    datasetGroupArn = dataset_group_arn,
    schemaArn = interaction_schema_arn
)

interactions_dataset_arn = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))

{
  "schemaArn": "arn:aws:personalize:us-west-2:151368154323:schema/personalize-product-recommendations-interactions1",
  "ResponseMetadata": {
    "RequestId": "558ef225-432b-47df-a280-ced69a26a2e3",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Thu, 25 Jul 2024 22:17:20 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "115",
      "connection": "keep-alive",
      "x-amzn-requestid": "558ef225-432b-47df-a280-ced69a26a2e3",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}
{
  "datasetArn": "arn:aws:personalize:us-west-2:151368154323:dataset/personalize-product-recommendations/INTERACTIONS",
  "ResponseMetadata": {
    "RequestId": "e6aadb2b-a856-4c30-9217-df1b13f242d6",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Thu, 25 Jul 2024 22:17:20 GMT",
    

***Creating Users Schema and Dataset***

Define a schema for user data in Amazon Personalize, specifying fields such as TIMESTAMP, product_id, product_name, rating, AGE, USER_ID, user_name, review_id, and review_title. Next, we must create a users dataset using the defined schema and associate it with the existing dataset group

In [6]:
# create user schema
users_schema = {
    "type": "record",
    "name": "Users",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "TIMESTAMP",
            "type": "long"
        },
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "PRODUCT_NAME",
            "type": "string"
        },
        {
            "name": "RATING",
            "type": "string"
        },
        {
            "name": "AGE",
            "type": "int"
        },
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "USERNAME",
            "type": "string"
        },
        {
            "name": "REVIEW_ID",
            "type": "string"
        },
        {
            "name": "REVIEW_TITLE",
            "type": "string"
        }
    ],
    "version": "1.0"
}


create_users_schema_response = personalize.create_schema(
    name='personalize-user-recommendations-users',
    schema=json.dumps(users_schema)
)

users_schema_arn = create_users_schema_response['schemaArn']
print(json.dumps(create_users_schema_response, indent=2))



dataset_type = "USERS"
create_dataset_response = personalize.create_dataset(
    name = "personalize-product-recommendations-users-dataset",
    datasetType = dataset_type,
    datasetGroupArn = dataset_group_arn,
    schemaArn = users_schema_arn
)


users_dataset_arn = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))

{
  "schemaArn": "arn:aws:personalize:us-west-2:151368154323:schema/personalize-user-recommendations-users1",
  "ResponseMetadata": {
    "RequestId": "7fa929f5-a7c3-44e4-ba1c-8b527ab46932",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Thu, 25 Jul 2024 22:17:28 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "105",
      "connection": "keep-alive",
      "x-amzn-requestid": "7fa929f5-a7c3-44e4-ba1c-8b527ab46932",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}
{
  "datasetArn": "arn:aws:personalize:us-west-2:151368154323:dataset/personalize-product-recommendations/USERS",
  "ResponseMetadata": {
    "RequestId": "56621d12-7555-4af2-8700-11b6a7cd43a5",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Thu, 25 Jul 2024 22:17:29 GMT",
      "content-type":

***Creating Items Schema and Dataset***

Define a schema for item data in Amazon Personalize, specifying fields such as CREATION_TIMESTAMP, ITEM_ID, product_name, category, rating_count, and DESCRIPTION. Next, an items dataset is created using the defined schema and associated with the existing dataset group.

In [7]:
items_schema = {
    "type": "record",
    "name": "Items",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "CREATION_TIMESTAMP",
            "type": "long"
        },
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "PRODUCT_NAME",
            "type": ["null", "string"]
        },
        {
            "name": "CATEGORY",
            "type": ["null", "string"],
            "categorical": True
        },
        {
            "name": "RATING_COUNT",
            "type": ["null", "string"]
        },
        {
            "name": "DESCRIPTION",
            "type": ["null", "string"],
            "textual": True
        }
    ],
    "version": "1.0"
}



create_schema_response = personalize.create_schema(
    name="personalize-product-recommendations-items",
    schema=json.dumps(items_schema)
)

items_schema_arn = create_schema_response['schemaArn']
print(json.dumps(create_schema_response, indent=2))

dataset_type = "ITEMS"
create_dataset_response = personalize.create_dataset(
    name="personalize-product-recommendations-items-dataset",
    datasetType=dataset_type,
    datasetGroupArn=dataset_group_arn,
    schemaArn=items_schema_arn
)

items_dataset_arn = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))



{
  "schemaArn": "arn:aws:personalize:us-west-2:151368154323:schema/personalize-product-recommendations-items1",
  "ResponseMetadata": {
    "RequestId": "0c90b612-97de-4af5-bd8c-f04154154295",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Thu, 25 Jul 2024 22:17:36 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "108",
      "connection": "keep-alive",
      "x-amzn-requestid": "0c90b612-97de-4af5-bd8c-f04154154295",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}
{
  "datasetArn": "arn:aws:personalize:us-west-2:151368154323:dataset/personalize-product-recommendations/ITEMS",
  "ResponseMetadata": {
    "RequestId": "06348dc4-33e9-4b5f-a23c-d96b45908e90",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Thu, 25 Jul 2024 22:17:36 GMT",
      "content-typ

***Configure S3 bucket and IAM role for Personalize***

Allow AWS Personalize to have access to the data stored in the initial S3 bucket. Next,create an IAM role which defines the trust policy when using Personalize as a service.

In [8]:
s3 = boto3.client('s3')

policy = {
    "Version": "2012-10-17",
    "Id": "PersonalizeS3BucketAccessPolicy",
    "Statement": [
        {
            "Sid": "PersonalizeS3BucketAccessPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "personalize.amazonaws.com"
            },
            "Action": [
                "s3:*Object",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::{}".format(bucket_name),
                "arn:aws:s3:::{}/*".format(bucket_name)
            ]
        }
    ]
}


s3.put_bucket_policy(Bucket=bucket_name, Policy=json.dumps(policy))

{'ResponseMetadata': {'RequestId': 'CNYGCXCH59TSB5PF',
  'HostId': '7d3YZPTP09rycCwA+KOW5PG7NGm80xyM7X1wRb+e39jHEYwSZ/tnV+1usQK57KjaPkpEtcpBfpY=',
  'HTTPStatusCode': 204,
  'HTTPHeaders': {'x-amz-id-2': '7d3YZPTP09rycCwA+KOW5PG7NGm80xyM7X1wRb+e39jHEYwSZ/tnV+1usQK57KjaPkpEtcpBfpY=',
   'x-amz-request-id': 'CNYGCXCH59TSB5PF',
   'date': 'Thu, 25 Jul 2024 22:17:46 GMT',
   'server': 'AmazonS3'},
  'RetryAttempts': 0}}

In [11]:
iam = boto3.client("iam")

role_name = "PersonalizeRolePOC"
assume_role_policy_document = {
    "Version": "2012-10-17",
    "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "personalize.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
    ]
}

create_role_response = iam.create_role(
    RoleName = role_name,
    AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)
)

# AmazonPersonalizeFullAccess provides access to any S3 bucket with a name that includes "personalize" or "Personalize" 
# if you would like to use a bucket with a different name, please consider creating and attaching a new policy
# that provides read access to your bucket or attaching the AmazonS3ReadOnlyAccess policy to the role
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonPersonalizeFullAccess"
iam.attach_role_policy(
    RoleName = role_name,
    PolicyArn = policy_arn
)

# Now add S3 support
iam.attach_role_policy(
    PolicyArn='arn:aws:iam::aws:policy/AmazonS3FullAccess',
    RoleName=role_name
)
time.sleep(60) # wait for a minute to allow IAM role policy attachment to propagate

role_arn = create_role_response["Role"]["Arn"]
print(role_arn)

arn:aws:iam::151368154323:role/PersonalizeRolePOC152


***Importing Datasets into Personalize***

Perform dataset import jobs for the interactions dataset, the user dataset, and the item dataset. Each import job specifies the dataset ARN, the location of the data in an S3 bucket, and the IAM role ARN that grants Personalize access to the data.

In [13]:
# interaction dataset import


create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "personalize-product-import",
    datasetArn = interactions_dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket_name, interactions_filename)
    },
    roleArn = role_arn
)

dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_dataset_import_job_response, indent=2))

{
  "datasetImportJobArn": "arn:aws:personalize:us-west-2:151368154323:dataset-import-job/personalize-product-import",
  "ResponseMetadata": {
    "RequestId": "add2093e-946c-451a-8ca2-41fa70b9839e",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Thu, 25 Jul 2024 22:22:05 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "114",
      "connection": "keep-alive",
      "x-amzn-requestid": "add2093e-946c-451a-8ca2-41fa70b9839e",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


In [14]:
#user dataset import

user_filename = 'user_data.csv'

create_user_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName="personalize-user-import",
    datasetArn=users_dataset_arn,
    dataSource={
        "dataLocation": f"s3://{bucket_name}/{user_filename}"
    },
    roleArn=role_arn
)

user_dataset_import_job_arn = create_user_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_user_dataset_import_job_response, indent=2))




{
  "datasetImportJobArn": "arn:aws:personalize:us-west-2:151368154323:dataset-import-job/personalize-user-import",
  "ResponseMetadata": {
    "RequestId": "3226acc8-e68d-4c86-8a97-856044403a9a",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Thu, 25 Jul 2024 22:22:12 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "111",
      "connection": "keep-alive",
      "x-amzn-requestid": "3226acc8-e68d-4c86-8a97-856044403a9a",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


In [15]:
# item dataset import


item_filename = 'product_data.csv'

create_items_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName="personalize-items-import",
    datasetArn=items_dataset_arn,
    dataSource={
        "dataLocation": f"s3://{bucket_name}/{item_filename}"
    },
    roleArn=role_arn
)

items_dataset_import_job_arn = create_items_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_items_dataset_import_job_response, indent=2))



{
  "datasetImportJobArn": "arn:aws:personalize:us-west-2:151368154323:dataset-import-job/personalize-items-import",
  "ResponseMetadata": {
    "RequestId": "36a3b4a2-57f2-4b9a-ac05-871b420bd827",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Thu, 25 Jul 2024 22:22:16 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "112",
      "connection": "keep-alive",
      "x-amzn-requestid": "36a3b4a2-57f2-4b9a-ac05-871b420bd827",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


***Monitoring Dataset Import in Amazon Personalize***

We monitor the progress of the item,user,and interaction dataset import job in Amazon Personalize. Set a maximum wait time of 6 hours for the job to complete. The code then enters a loop that checks the status of the dataset import job every minute using the describe_dataset_import_job API call. If the status is "ACTIVE", indicating that the job has completed successfully, the loop is broken.


In [16]:
# creating interaction datajob

max_time = time.time() + 6*60*60 # 6 hours
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = dataset_import_job_arn
    )
    status = describe_dataset_import_job_response["datasetImportJob"]['status']
    print("DatasetImportJob: {}".format(status))
    
    if status == "ACTIVE":
        print("Dataset import job completed successfully.")
        break
    elif status == "CREATE FAILED":
        failure_reason = describe_dataset_import_job_response["datasetImportJob"]['failureReason']
        print(f"Dataset import job failed: {failure_reason}")
        break
        
    time.sleep(60)

DatasetImportJob: CREATE PENDING
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: ACTIVE
Dataset import job completed successfully.


In [17]:
#create user datajob

max_time = time.time() + 6 * 60 * 60  # 6 hours max wait time
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn=user_dataset_import_job_arn
    )
    status = describe_dataset_import_job_response["datasetImportJob"]['status']
    print(f"DatasetImportJob: {status}")
    
    if status == "ACTIVE":
        print("Dataset import job completed successfully.")
        break
    elif status == "CREATE FAILED":
        failure_reason = describe_dataset_import_job_response["datasetImportJob"]['failureReason']
        print(f"Dataset import job failed: {failure_reason}")
        break
    
    time.sleep(60)

DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: ACTIVE
Dataset import job completed successfully.


In [18]:
# create item datajob


max_time = time.time() + 6 * 60 * 60  # 6 hours max wait time
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn=items_dataset_import_job_arn
    )
    status = describe_dataset_import_job_response["datasetImportJob"]['status']
    print(f"DatasetImportJob: {status}")
    
    if status == "ACTIVE":
        print("Dataset import job completed successfully.")
        break
    elif status == "CREATE FAILED":
        failure_reason = describe_dataset_import_job_response["datasetImportJob"]['failureReason']
        print(f"Dataset import job failed: {failure_reason}")
        break
    
    time.sleep(60)

DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: ACTIVE
Dataset import job completed successfully.


***Creating Solution and Solution Version***

We will be using the "aws-user-personalization" recipe which is designed for user personalization tasks. It is a powerful tool for creating personalized recommendations for users based on their interactions with items in a dataset. The recipe uses the User-Personalized Ranking (UPR) algorithm, which is a variation of the Weighted Matrix Factorization (WMF) algorithm. The UPR algorithm is optimized for generating personalized recommendations by focusing on the relative preferences of users rather than their absolute preferences. After choosing the recipe, we must create a solution which is associated to our previous dataset group. The solution may take up to an **_hour_** to generate.

In [19]:
recipe_arn = "arn:aws:personalize:::recipe/aws-user-personalization"

In [21]:
create_solution_response = personalize.create_solution(
    name = "personalize-soln-user-personalization",
    datasetGroupArn = dataset_group_arn,
    recipeArn = recipe_arn
)

solution_arn = create_solution_response['solutionArn']
print(json.dumps(create_solution_response, indent=2))


{
  "solutionArn": "arn:aws:personalize:us-west-2:151368154323:solution/personalize-soln-user-personalization1",
  "ResponseMetadata": {
    "RequestId": "0a0cdd8d-7c19-42ca-86ab-90990badd7d0",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Thu, 25 Jul 2024 22:29:06 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "108",
      "connection": "keep-alive",
      "x-amzn-requestid": "0a0cdd8d-7c19-42ca-86ab-90990badd7d0",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


In [22]:
create_solution_version_response = personalize.create_solution_version(
    solutionArn = solution_arn
)

solution_version_arn = create_solution_version_response['solutionVersionArn']
print(json.dumps(create_solution_version_response, indent=2))

{
  "solutionVersionArn": "arn:aws:personalize:us-west-2:151368154323:solution/personalize-soln-user-personalization1/af0afc87",
  "ResponseMetadata": {
    "RequestId": "7078a3a3-f5ca-4680-befa-c61870ebdd6d",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Thu, 25 Jul 2024 22:29:12 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "124",
      "connection": "keep-alive",
      "x-amzn-requestid": "7078a3a3-f5ca-4680-befa-c61870ebdd6d",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


In [None]:
%%time
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_solution_version_response = personalize.describe_solution_version(
        solutionVersionArn = solution_version_arn
    )
    status = describe_solution_version_response["solutionVersion"]["status"]
    print("SolutionVersion: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

SolutionVersion: CREATE PENDING
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGRESS
SolutionVersion: CREATE IN_PROGR

***Creating and Monitoring Personalize Campaign***

We create a campaign in order to serve real-time personalized recommendations to users. The campaign ARN returned by the create_campaign API can be used to invoke the Personalize Runtime API to get recommendations for a specific user. We can also configure the service settings to set a minimum provisioned transactions per second (TPS) for the campaign, which determines the minimum level of traffic the campaign can handle. We also configure the item exploration settings, which can be used to balance exploration of new items versus exploitation of known user preferences. The campaign may take up to an **_15 minutes_** to generate.

In [24]:
create_campaign_response = personalize.create_campaign(
    name = "personalize-recs",
    solutionVersionArn = solution_version_arn,
    minProvisionedTPS = 1,
    campaignConfig = {
        "itemExplorationConfig": {
            "explorationWeight": "0.3"
        }
    }
)

campaign_arn = create_campaign_response['campaignArn']
print(json.dumps(create_campaign_response, indent=2))

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_campaign_response = personalize.describe_campaign(
        campaignArn = campaign_arn
    )
    status = describe_campaign_response["campaign"]["status"]
    print("Campaign: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

{
  "campaignArn": "arn:aws:personalize:us-west-2:151368154323:campaign/personalize-recs1",
  "ResponseMetadata": {
    "RequestId": "07a26aac-2b0a-4287-9886-93516321e4a2",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Thu, 25 Jul 2024 23:12:28 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "87",
      "connection": "keep-alive",
      "x-amzn-requestid": "07a26aac-2b0a-4287-9886-93516321e4a2",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}
Campaign: CREATE PENDING
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREAT

***Generating and Saving Personalize Recommendations***

We first load the interactions data and creates a mapping of item IDs to item names. Next, we use the get_recommendations API from the Personalize Runtime to retrieve personalized recommendations for the selected user, based on the campaign created earlier. The recommended item IDs are then mapped to their corresponding item names using the previously created mapping.Lastly, we iterate through all unique user IDs in the interactions data, retrieving personalized recommendations for each user and storing them in a list of dictionaries.


In [25]:
# Load the interactions data to print out user recs

items_map = dict(zip(interactions_df['ITEM_ID'].astype(str), interactions_df['PRODUCT_NAME']))
user_id, item_id = interactions_df[['USER_ID', 'ITEM_ID']].sample().values[0]

print(f"User Id: {user_id.split(',')[0]}")

get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn=campaign_arn,
    userId=str(user_id),
)

# Extract recommended item IDs from the response
recommendation_list = [item['itemId'] for item in get_recommendations_response['itemList']]
recommendation_names = [items_map.get(item_id, 'Unknown') for item_id in recommendation_list]


recommendations_df = pd.DataFrame(recommendation_names, columns=['Recommended_Item'])
recommendations_df.head()

User Id: Mahenddhra


Unnamed: 0,Recommended_Item
0,Zebronics CU3100V Fast charging Type C cable w...
1,akiara - Makes life easy Mini Sewing Machine f...
2,boAt Bassheads 100 in Ear Wired Earphones with...
3,Syska SDI-07 1000 W Stellar with Golden Americ...
4,Sounce Fast Phone Charging Cable & Data Sync U...


In [26]:
user_ids = interactions_df['USER_ID'].unique()

all_recommendations = []

for user_id in user_ids:
    get_recommendations_response = personalize_runtime.get_recommendations(
        campaignArn=campaign_arn,
        userId=str(user_id)
    )
    
    recommendation_list = [item['itemId'] for item in get_recommendations_response['itemList']]
    
    recommendation_names = [items_map.get(item_id, 'Unknown') for item_id in recommendation_list]
    all_recommendations.append({'User ID': user_id, 'Recommended_Items': recommendation_names[:5]})


    
# Convert to DataFrame
recommendations_df = pd.DataFrame(all_recommendations)

# Save to CSV
csv_filename = 'user_recommendations.csv'
recommendations_df.to_csv(csv_filename, index=False)
print(f"Recommendations DataFrame saved to {csv_filename}")

print(recommendations_df['Recommended_Items'])

Recommendations DataFrame saved to user_recommendations.csv
0       [Sounce Fast Phone Charging Cable & Data Sync ...
1       [Sounce Fast Phone Charging Cable & Data Sync ...
2       [Sounce Fast Phone Charging Cable & Data Sync ...
3       [Sounce Fast Phone Charging Cable & Data Sync ...
4       [Sounce Fast Phone Charging Cable & Data Sync ...
                              ...                        
1174    [boAt Bassheads 100 in Ear Wired Earphones wit...
1175    [HP M270 Backlit USB Wired Gaming Mouse with 6...
1176    [HP M270 Backlit USB Wired Gaming Mouse with 6...
1177    [boAt Bassheads 100 in Ear Wired Earphones wit...
1178    [Samsung Galaxy M13 5G (Aqua Green, 6GB, 128GB...
Name: Recommended_Items, Length: 1179, dtype: object


***Storing Personalized Recommendations in Amazon DynamoDB***

We create DynamoDB table named if it doesn't already exist, and then insert the personalized recommendations generated in the previous step into the table. The recommendations are stored with the user ID as the hash key and as a dictionary, allowing for efficient retrieval of recommendations for specific users.

In [27]:
dynamodb = boto3.client('dynamodb')
dynamodb_resource = boto3.resource('dynamodb')
table_name = 'user_recommendations'

existing_tables = dynamodb.list_tables()['TableNames']
if table_name not in existing_tables:
    dynamodb.create_table(
        TableName=table_name,
        KeySchema=[
            {
                'AttributeName': 'User ID',
                'KeyType': 'HASH'
            }
        ],
        AttributeDefinitions=[
            {
                'AttributeName': 'User ID',
                'AttributeType': 'S'
            }
        ],
        BillingMode='PAY_PER_REQUEST'
    )

    print(f"Table {table_name} creation initiated. Waiting for table to become active...")
    waiter = dynamodb.get_waiter('table_exists')
    waiter.wait(TableName=table_name)
    print(f"Table {table_name} created successfully.")
else:
    print(f"Table {table_name} already exists.")

# Insert recommendations into DynamoDB
table = dynamodb_resource.Table(table_name)
with table.batch_writer() as batch:
    for index, row in recommendations_df.iterrows():
        user_id = str(row['User ID'])
        recommended_items = row['Recommended_Items']
        
        try:
            batch.put_item(
                Item={
                    'User ID': user_id,
                    'Recommended_Items': recommended_items  # directly store the list
                }
            )
        except Exception as e:
            print(f"Error inserting recommendation for user {user_id}: {e}")

print("All recommendations inserted into DynamoDB table.")

Table user_recommendations creation initiated. Waiting for table to become active...
Table user_recommendations created successfully.
All recommendations inserted into DynamoDB table.


In [28]:
%store bucket_name
%store recommendations_df
%store dataset_group_arn
%store interaction_schema_arn
%store users_schema_arn
%store items_schema_arn
%store users_dataset_arn
%store items_dataset_arn
%store interactions_dataset_arn
%store role_name
%store role_arn
%store interactions_df
%store user_filename
%store solution_version_arn
%store solution_arn
%store campaign_arn

Stored 'bucket_name' (str)
Stored 'recommendations_df' (DataFrame)
Stored 'dataset_group_arn' (str)
Stored 'interaction_schema_arn' (str)
Stored 'users_schema_arn' (str)
Stored 'items_schema_arn' (str)
Stored 'users_dataset_arn' (str)
Stored 'items_dataset_arn' (str)
Stored 'interactions_dataset_arn' (str)
Stored 'role_name' (str)
Stored 'role_arn' (str)
Stored 'interactions_df' (DataFrame)
Stored 'user_filename' (str)
Stored 'solution_version_arn' (str)
Stored 'solution_arn' (str)
Stored 'campaign_arn' (str)
