# Amazon Personalize - From Data Preparation to Campaign Deployment

This notebook uses <b>`conda_python3`</b> as the default kernel.
<br>
Deploy Personalize Campaign by running cells sequentially from start to finish.

## 0. Setting Environment

<i>(Optional)</i> Run boto3 sdk upgrade if needed.


### boto3 Upgrade (Optional)

In [None]:
# !pip install boto3 --upgrade

## 1. Data Preparation

We use the dataset from the <b>Retail Demo Store</b> below.
- It is used by unpacking the tar archive.

* Retail Demo Store
    * https://github.com/aws-samples/retail-demo-store

In [1]:
import tarfile

tf = tarfile.open("../data/RetailDemoDataSet.tar")
tf.extractall("../data")

In [2]:
import pandas as pd

items = pd.read_csv('../data/items.csv')
users = pd.read_csv('../data/users.csv')
its = pd.read_csv('../data/interactions.csv')

## 2. Data Preprocessing

In [3]:
import boto3
import json
import numpy as np
import pandas as pd
import time
from datetime import datetime

import matplotlib.pyplot as plt

### Edit columns of <b>ITEMS</b> dataset

In [4]:
items.columns

Index(['id', 'url', 'sk', 'name', 'category', 'style', 'description',
       'aliases', 'price', 'image', 'gender_affinity', 'current_stock',
       'featured'],
      dtype='object')

In [5]:
def item_data_selection(df, cols):
    ldf = df[cols]
    ldf = ldf.rename(columns={'id':'ITEM_ID',
                              'name' : 'NAME',
                              'category' :'CATEGORY_L1',
                              'style' : 'STYLE',
                              'description' : 'PRODUCT_DESCRIPTION',
                              'price' : 'PRICE',
                             })
    return ldf


item_cols = ['id', 'name', 'category', 'style', 'description','price']
items_df = item_data_selection(items, item_cols)    

items_df.head(3)

Unnamed: 0,ITEM_ID,NAME,CATEGORY_L1,STYLE,PRODUCT_DESCRIPTION,PRICE
0,e1669081-8ffc-4dec-97a6-e9176d7f6651,Sans Pareil Scarf,apparel,scarf,Sans pareil scarf for women,124.99
1,cfafd627-7d6b-43a5-be05-4c7937be417d,Chef Knife,housewares,kitchen,A must-have for your kitchen,57.99
2,6e6ad102-7510-4a02-b8ce-5a0cd6f431d1,Gainsboro Jacket,apparel,jacket,This gainsboro jacket for women is perfect for...,133.99


### Edit columns of <b>USERS</b> dataset

In [6]:
users.columns

Index(['id', 'username', 'email', 'first_name', 'last_name', 'addresses',
       'age', 'gender', 'persona', 'discount_persona', 'selectable_user'],
      dtype='object')

In [7]:
def user_data_selection(df, cols):
    ldf = df[cols]
    ldf = ldf.rename(columns={'id':'USER_ID',
                              'username' : 'USER_NAME',
                              'age' :'AGE',
                              'gender' : 'GENDER',                              
                             })
    return ldf

user_cols = ['id', 'username', 'age', 'gender']

users_df = user_data_selection(users, user_cols)    
users_df.head(3)

Unnamed: 0,USER_ID,USER_NAME,AGE,GENDER
0,1,user1,31,M
1,2,user2,58,F
2,3,user3,43,M


### Modify data type of <b>ITEMS</b> dataset

In [8]:
users_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5250 entries, 0 to 5249
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   USER_ID    5250 non-null   int64 
 1   USER_NAME  5250 non-null   object
 2   AGE        5250 non-null   int64 
 3   GENDER     5250 non-null   object
dtypes: int64(2), object(2)
memory usage: 164.2+ KB


In [9]:
def change_data_type(df, col, target_type):
    ldf = df.copy()
    ldf[col] = ldf[col].astype(target_type)
    
    return ldf

users_df = change_data_type(users_df, col='USER_ID', target_type='object')
users_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5250 entries, 0 to 5249
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   USER_ID    5250 non-null   object
 1   USER_NAME  5250 non-null   object
 2   AGE        5250 non-null   int64 
 3   GENDER     5250 non-null   object
dtypes: int64(1), object(3)
memory usage: 164.2+ KB


### Edit columns of <b>INTERACTIONS</b> dataset

In [10]:
its.columns

Index(['ITEM_ID', 'USER_ID', 'EVENT_TYPE', 'TIMESTAMP', 'DISCOUNT'], dtype='object')

In [11]:
def interactions_data_selection(df, cols):
    ldf = df[cols]
    ldf = ldf.rename(columns={'id':'USER_ID',
                              'username' : 'USER_NAME',
                              'age' :'AGE',
                              'gender' : 'GENDER',                              
                             })
    return ldf

interactions_cols = ['ITEM_ID', 'USER_ID', 'EVENT_TYPE', 'TIMESTAMP']

full_interactions_df = interactions_data_selection(its, interactions_cols)    
full_interactions_df.head(3)

Unnamed: 0,ITEM_ID,USER_ID,EVENT_TYPE,TIMESTAMP
0,26bb732f-9159-432f-91ef-bad14fedd298,3156,ProductViewed,1591803788
1,26bb732f-9159-432f-91ef-bad14fedd298,3156,ProductViewed,1591803788
2,dc073623-4b95-47d9-93cb-0171c20baa04,332,ProductViewed,1591803812


### Edit EVENT_TYPE column of <b>INTERACTIONS</b> dataset 

Select only <b>ProductViewd</b> and <b>OrderCompleted</b> for EVENT_TYPE and change the names to `View` and `Purchase` respectively.

In [12]:
full_interactions_df.EVENT_TYPE.value_counts()

EVENT_TYPE
ProductViewed      581900
ProductAdded        46552
CartViewed          29095
CheckoutStarted     11638
OrderCompleted       5819
Name: count, dtype: int64

In [13]:
def filter_interactions_data(df, kinds_event_type):
    ldf = df[df['EVENT_TYPE'].isin(kinds_event_type)]
    ldf['EVENT_TYPE'] = ldf['EVENT_TYPE'].replace(['ProductViewed'],'View')    
    ldf['EVENT_TYPE'] = ldf['EVENT_TYPE'].replace(['OrderCompleted'],'Purchase')        
    
    return ldf

select_event_types = ['ProductViewed','OrderCompleted']
interactions_df = filter_interactions_data(full_interactions_df, select_event_types)
interactions_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ldf['EVENT_TYPE'] = ldf['EVENT_TYPE'].replace(['ProductViewed'],'View')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ldf['EVENT_TYPE'] = ldf['EVENT_TYPE'].replace(['OrderCompleted'],'Purchase')


Unnamed: 0,ITEM_ID,USER_ID,EVENT_TYPE,TIMESTAMP
0,26bb732f-9159-432f-91ef-bad14fedd298,3156,View,1591803788
1,26bb732f-9159-432f-91ef-bad14fedd298,3156,View,1591803788
2,dc073623-4b95-47d9-93cb-0171c20baa04,332,View,1591803812
3,dc073623-4b95-47d9-93cb-0171c20baa04,332,View,1591803812
4,31efcfea-47d6-43f3-97f7-2704a5397e22,3981,View,1591803830
...,...,...,...,...
674996,94a0ad41-8b19-4ecb-b0d7-33704e2d4421,4046,View,1598204625
674997,f9c470b0-152b-4776-893a-67ffc4064675,2627,View,1598204657
674998,1def0093-96b2-4cc4-a022-071941f75b92,3538,View,1598204664
674999,9bc87696-e9bd-4241-86b0-234e054a607b,5165,View,1598204678


### Edit columns of <b>INTERACTIONS</b> dataset

In [14]:
interactions_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 587719 entries, 0 to 675003
Data columns (total 4 columns):
 #   Column      Non-Null Count   Dtype 
---  ------      --------------   ----- 
 0   ITEM_ID     587719 non-null  object
 1   USER_ID     587719 non-null  int64 
 2   EVENT_TYPE  587719 non-null  object
 3   TIMESTAMP   587719 non-null  int64 
dtypes: int64(2), object(2)
memory usage: 22.4+ MB


In [15]:
interactions_df = change_data_type(interactions_df, col='USER_ID', target_type='object')
interactions_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 587719 entries, 0 to 675003
Data columns (total 4 columns):
 #   Column      Non-Null Count   Dtype 
---  ------      --------------   ----- 
 0   ITEM_ID     587719 non-null  object
 1   USER_ID     587719 non-null  object
 2   EVENT_TYPE  587719 non-null  object
 3   TIMESTAMP   587719 non-null  int64 
dtypes: int64(1), object(3)
memory usage: 22.4+ MB


## 3. Upload the dataset to S3

In [16]:
import sagemaker

bucket='item-meta-joe-0621' # replace with the name of your S3 bucket
bucket

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


'item-meta-joe-0621'

In [17]:
import os
os.makedirs('dataset', exist_ok=True)

In [18]:
items_filename = "dataset/training_item.csv"
users_filename = "dataset/training_user.csv"
its_filename = "dataset/training_interaction.csv"

items_df.to_csv(items_filename,index=False)
users_df.to_csv(users_filename,index=False)
interactions_df.to_csv(its_filename,index=False)

In [19]:
#upload file for training
response_upload = boto3.Session().resource('s3').Bucket(bucket).Object(its_filename).upload_file(its_filename)
boto3.Session().resource('s3').Bucket(bucket).Object(users_filename).upload_file(users_filename)
boto3.Session().resource('s3').Bucket(bucket).Object(items_filename).upload_file(items_filename)

s3_its_filename = "s3://{}/{}".format(bucket, its_filename)
s3_users_filename = "s3://{}/{}".format(bucket, users_filename)
s3_items_filename = "s3://{}/{}".format(bucket, items_filename)

print("s3_train_interaction_filename: \n", s3_its_filename)
print("s3_train_users_filename: \n", s3_users_filename)
print("s3_train_items_filename: \n", s3_items_filename)


s3_train_interaction_filename: 
 s3://item-meta-joe-0621/dataset/training_interaction.csv
s3_train_users_filename: 
 s3://item-meta-joe-0621/dataset/training_user.csv
s3_train_items_filename: 
 s3://item-meta-joe-0621/dataset/training_item.csv


In [20]:
! aws s3 ls {s3_its_filename} --recursive
! aws s3 ls {s3_users_filename} --recursive
! aws s3 ls {s3_items_filename} --recursive

2024-06-21 05:55:00   33987029 dataset/training_interaction.csv
2024-06-21 05:55:01      97565 dataset/training_user.csv
2024-06-21 05:55:01     300071 dataset/training_item.csv


## 4. Personalize : Create Dataset Group

In [21]:
import boto3
import json
import time
from datetime import datetime

# Configure the SDK to Personalize:
personalize = boto3.client('personalize')

### Creating an IAM Role to access S3 for Personalize 

In [22]:
s3 = boto3.client("s3")

policy = {
    "Version": "2012-10-17",
    "Id": "PersonalizeS3BucketAccessPolicy",
    "Statement": [
        {
            "Sid": "PersonalizeS3BucketAccessPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "personalize.amazonaws.com"
            },
            "Action": [
                "s3:*",
            ],
            "Resource": [
                "arn:aws:s3:::{}".format(bucket),
                "arn:aws:s3:::{}/*".format(bucket)
            ]
        }
    ]
}

s3.put_bucket_policy(Bucket=bucket, Policy=json.dumps(policy))

{'ResponseMetadata': {'RequestId': 'DSZ4Y3EPF3QR2SFJ',
  'HostId': '5daZDhXRBVvedU0RDlF0nlDD1ft2+aCC0ScRKgo5QWschh/qwNhy6ukfyTYCiLfRYVNkeS7dusdHrvc0F2p//A==',
  'HTTPStatusCode': 204,
  'HTTPHeaders': {'x-amz-id-2': '5daZDhXRBVvedU0RDlF0nlDD1ft2+aCC0ScRKgo5QWschh/qwNhy6ukfyTYCiLfRYVNkeS7dusdHrvc0F2p//A==',
   'x-amz-request-id': 'DSZ4Y3EPF3QR2SFJ',
   'date': 'Fri, 21 Jun 2024 05:55:04 GMT',
   'server': 'AmazonS3'},
  'RetryAttempts': 0}}

In [23]:
suffix = str(np.random.uniform())[4:9]

In [24]:
iam = boto3.client("iam")

# Create assume_role_policy to create a role that Personalize will use
role_name = "PersonalizeRoleDemo" + suffix
assume_role_policy_document = {
    "Version": "2012-10-17",
    "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "personalize.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
    ]
}

# Create a role to be used by Personalize
create_role_response = iam.create_role(
    RoleName = role_name,
    AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)
)

# Add AmazonPersonalizeFullAccess permission to the role created above
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonPersonalizeFullAccess"
iam.attach_role_policy(
    RoleName = role_name,
    PolicyArn = policy_arn
)

# Add AmazonS3FullAccess permission to the role created above
iam.attach_role_policy(
    RoleName=role_name,    
    PolicyArn='arn:aws:iam::aws:policy/AmazonS3FullAccess'
)
time.sleep(15) # wait for 15 seconds to allow IAM role policy attachment to propagate

role_arn = create_role_response["Role"]["Arn"]
print(role_arn)

arn:aws:iam::250476343008:role/PersonalizeRoleDemo11201


### Create Dataset Group

In [25]:
create_dataset_group_response = personalize.create_dataset_group(
    name = "RetailDemo-dataset-group" + suffix
)

dataset_group_arn = create_dataset_group_response['datasetGroupArn']
dataset_group_arn

'arn:aws:personalize:ap-northeast-2:250476343008:dataset-group/RetailDemo-dataset-group11201'

#### Waiting for Dataset Group to become <b>Active</b>
Dataset Group creation usually becomes active within 30 seconds.

In [26]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_group_response = personalize.describe_dataset_group(
        datasetGroupArn = dataset_group_arn
    )
    status = describe_dataset_group_response["datasetGroup"]["status"]
    print("DatasetGroup: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(15)

DatasetGroup: CREATE PENDING
DatasetGroup: ACTIVE


### Create Schema

#### for INTERACTIONS

In [27]:
interaction_schema_name="RetailDemo-interaction-schema" + suffix

schema = {
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        { 
            "name": "EVENT_TYPE",
            "type": "string"
        },        
        {
            "name": "TIMESTAMP",
            "type": "long"
        }
    ],
    "version": "1.0"
}


create_schema_response = personalize.create_schema( 
    name = interaction_schema_name,
    schema = json.dumps(schema)
)

interaction_schema_arn = create_schema_response['schemaArn']
print(json.dumps(create_schema_response, indent=2))

{
  "schemaArn": "arn:aws:personalize:ap-northeast-2:250476343008:schema/RetailDemo-interaction-schema11201",
  "ResponseMetadata": {
    "RequestId": "a7db9778-631a-4f13-a008-306e65d671ff",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Fri, 21 Jun 2024 05:55:35 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "105",
      "connection": "keep-alive",
      "x-amzn-requestid": "a7db9778-631a-4f13-a008-306e65d671ff",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


#### for ITEMS

In [28]:
item_schema_name="RetailDemo-item-schema" + suffix

schema = {
    "type": "record",
    "name": "Items",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
    {
        "name": "ITEM_ID",
        "type": "string"
    },
    {
        "name": "NAME",
        "type": "string"
    },
    {
      "name": "CATEGORY_L1",
      "type": [
        "string"
      ],
      "categorical": True
    },
    {
      "name": "STYLE",
      "type": [
        "string"
      ],
      "categorical": True
    },
    {
        "name": "PRODUCT_DESCRIPTION",
        "type": "string"
    },
    {
      "name": "PRICE",
      "type": "float"
    },    
    ],
    "version": "1.0"
}

create_metadata_schema_response = personalize.create_schema(      
    name = item_schema_name,
    schema = json.dumps(schema)
)

item_schema_arn = create_metadata_schema_response['schemaArn']
print(json.dumps(create_metadata_schema_response, indent=2))

{
  "schemaArn": "arn:aws:personalize:ap-northeast-2:250476343008:schema/RetailDemo-item-schema11201",
  "ResponseMetadata": {
    "RequestId": "f4eb12e3-acc2-4398-81ed-9beb3b5f1793",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Fri, 21 Jun 2024 05:55:35 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "98",
      "connection": "keep-alive",
      "x-amzn-requestid": "f4eb12e3-acc2-4398-81ed-9beb3b5f1793",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


#### for USERS

In [29]:
user_schema_name="RetailDemo-user-schema" + suffix

schema = {
    "type": "record",
    "name": "Users",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
    {
        "name": "USER_ID",
        "type": "string"
    },
    {
      "name": "USER_NAME",
      "type": "string"
    },        
    {
      "name": "GENDER",
      "type": [
        "string"
      ],
      "categorical": True
    }        
    ],
    "version": "1.0"
}

create_metadata_schema_response = personalize.create_schema(      
    name = user_schema_name,
    schema = json.dumps(schema)
)

user_schema_arn = create_metadata_schema_response['schemaArn']
print(json.dumps(create_metadata_schema_response, indent=2))

{
  "schemaArn": "arn:aws:personalize:ap-northeast-2:250476343008:schema/RetailDemo-user-schema11201",
  "ResponseMetadata": {
    "RequestId": "215dfd5d-c6c7-4113-a561-cd901268cb12",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Fri, 21 Jun 2024 05:55:36 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "98",
      "connection": "keep-alive",
      "x-amzn-requestid": "215dfd5d-c6c7-4113-a561-cd901268cb12",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


## 5. Personalize : Create Dataset

#### for INTERACTIONS

In [30]:
dataset_type = "INTERACTIONS"
create_dataset_response = personalize.create_dataset(
    name = "RetailDemo-interaction-dataset" + suffix,
    datasetType = dataset_type,
    datasetGroupArn = dataset_group_arn,
    schemaArn = interaction_schema_arn
)

interaction_dataset_arn = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))

{
  "datasetArn": "arn:aws:personalize:ap-northeast-2:250476343008:dataset/RetailDemo-dataset-group11201/INTERACTIONS",
  "ResponseMetadata": {
    "RequestId": "a4972cb1-fcf7-44d3-b3d7-71a651e0e01b",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Fri, 21 Jun 2024 05:55:36 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "115",
      "connection": "keep-alive",
      "x-amzn-requestid": "a4972cb1-fcf7-44d3-b3d7-71a651e0e01b",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


#### for ITEMS

In [31]:
dataset_type = "ITEMS"
create_item_dataset_response = personalize.create_dataset(
    name = "RetailDemo-item-dataset" + suffix,
    datasetType = dataset_type,
    datasetGroupArn = dataset_group_arn,
    schemaArn = item_schema_arn,
  
)

item_dataset_arn = create_item_dataset_response['datasetArn']
print(json.dumps(create_item_dataset_response, indent=2))

{
  "datasetArn": "arn:aws:personalize:ap-northeast-2:250476343008:dataset/RetailDemo-dataset-group11201/ITEMS",
  "ResponseMetadata": {
    "RequestId": "8b5d1a6d-e190-4400-8def-87723cfcce24",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Fri, 21 Jun 2024 05:55:36 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "108",
      "connection": "keep-alive",
      "x-amzn-requestid": "8b5d1a6d-e190-4400-8def-87723cfcce24",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


#### for USERS

In [32]:
dataset_type = "USERS"
create_user_dataset_response = personalize.create_dataset(
    name = "RetailDemo-user-dataset" + suffix,
    datasetType = dataset_type,
    datasetGroupArn = dataset_group_arn,
    schemaArn = user_schema_arn,
  
)

user_dataset_arn = create_user_dataset_response['datasetArn']
print(json.dumps(create_user_dataset_response, indent=2))

{
  "datasetArn": "arn:aws:personalize:ap-northeast-2:250476343008:dataset/RetailDemo-dataset-group11201/USERS",
  "ResponseMetadata": {
    "RequestId": "ac861183-e4c8-45b6-b4c9-2294e3c56690",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Fri, 21 Jun 2024 05:55:36 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "108",
      "connection": "keep-alive",
      "x-amzn-requestid": "ac861183-e4c8-45b6-b4c9-2294e3c56690",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


#### wait for 1 minute(or less) until Dataset creation is complete

In [33]:
time.sleep(60)

## 6. Personalize : Import Dataset 

#### INTERACTIONS Dataset - Create Import Job

In [34]:
create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "RetailDeom-interaction-dataset-import" + suffix,
    datasetArn = interaction_dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket, its_filename)
    },
    roleArn = role_arn
)

interation_dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_dataset_import_job_response, indent=2))

{
  "datasetImportJobArn": "arn:aws:personalize:ap-northeast-2:250476343008:dataset-import-job/RetailDeom-interaction-dataset-import11201",
  "ResponseMetadata": {
    "RequestId": "61ffe2b0-941d-4ba9-a919-1d794d0f0442",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Fri, 21 Jun 2024 05:56:36 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "135",
      "connection": "keep-alive",
      "x-amzn-requestid": "61ffe2b0-941d-4ba9-a919-1d794d0f0442",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


#### ITEMS Dataset - Create Import Job

In [35]:
create_item_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "RetailDemo-item-dataset-import" + suffix,
    datasetArn = item_dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket, items_filename)
    },
    roleArn = role_arn
)

item_dataset_import_job_arn = create_item_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_item_dataset_import_job_response, indent=2))

{
  "datasetImportJobArn": "arn:aws:personalize:ap-northeast-2:250476343008:dataset-import-job/RetailDemo-item-dataset-import11201",
  "ResponseMetadata": {
    "RequestId": "56d6e703-32d4-4ee4-8bd7-0cd489b5168f",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Fri, 21 Jun 2024 05:56:36 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "128",
      "connection": "keep-alive",
      "x-amzn-requestid": "56d6e703-32d4-4ee4-8bd7-0cd489b5168f",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


#### USERS Dataset - Create Import Job

In [36]:
create_user_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "RetailDemo-user-dataset-import" + suffix,
    datasetArn = user_dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket, users_filename)
    },
    roleArn = role_arn
)

user_dataset_import_job_arn = create_user_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_user_dataset_import_job_response, indent=2))

{
  "datasetImportJobArn": "arn:aws:personalize:ap-northeast-2:250476343008:dataset-import-job/RetailDemo-user-dataset-import11201",
  "ResponseMetadata": {
    "RequestId": "2b6cc130-9851-42a0-9229-a4712180cf03",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Fri, 21 Jun 2024 05:56:36 GMT",
      "content-type": "application/x-amz-json-1.1",
      "content-length": "128",
      "connection": "keep-alive",
      "x-amzn-requestid": "2b6cc130-9851-42a0-9229-a4712180cf03",
      "strict-transport-security": "max-age=47304000; includeSubDomains",
      "x-frame-options": "DENY",
      "cache-control": "no-cache",
      "x-content-type-options": "nosniff"
    },
    "RetryAttempts": 0
  }
}


#### All Dataset Import tasks must be completed before proceeding with the next step.
#### Therefore, it waits until all three datasets below become ACTIVE.

#### import job status of INTERACTIONS

In [37]:
%%time

status = None
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = interation_dataset_import_job_arn
    )
    
    dataset_import_job = describe_dataset_import_job_response["datasetImportJob"]
    if "latestDatasetImportJobRun" not in dataset_import_job:
        status = dataset_import_job["status"]
        print("DatasetImportJob: {}".format(status))
    else:
        status = dataset_import_job["latestDatasetImportJobRun"]["status"]
        print("LatestDatasetImportJobRun: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(15)

DatasetImportJob: CREATE PENDING
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: ACTIVE
CPU times: user 81.3 ms, sys: 12.9 ms, total: 94.1 ms
Wall time: 3min 45s


#### import job status of ITEMS

In [38]:
status = None
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = item_dataset_import_job_arn
    )
    
    dataset_import_job = describe_dataset_import_job_response["datasetImportJob"]
    if "latestDatasetImportJobRun" not in dataset_import_job:
        status = dataset_import_job["status"]
        print("DatasetImportJob: {}".format(status))
    else:
        status = dataset_import_job["latestDatasetImportJobRun"]["status"]
        print("LatestDatasetImportJobRun: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(15)

DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: CREATE IN_PROGRESS
DatasetImportJob: ACTIVE


#### import job status of USERS

In [39]:
status = None
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = user_dataset_import_job_arn
    )
    
    dataset_import_job = describe_dataset_import_job_response["datasetImportJob"]
    if "latestDatasetImportJobRun" not in dataset_import_job:
        status = dataset_import_job["status"]
        print("DatasetImportJob: {}".format(status))
    else:
        status = dataset_import_job["latestDatasetImportJobRun"]["status"]
        print("LatestDatasetImportJobRun: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(15)

DatasetImportJob: ACTIVE


## 7. Personalize : Create Solution

### Create Solution with <b>"AWS-USER-PERSONALIZATION"</b> recipe

In [40]:
# Define the solution details
solution_name = "RetailDemo-user-personalization"
recipe_arn = "arn:aws:personalize:::recipe/aws-user-personalization"
perform_hpo = False # set to true if you want to perform hyperparameter optimization

# Create the solution
create_solution_response = personalize.create_solution(
    name=solution_name,
    recipeArn=recipe_arn,
    performHPO=perform_hpo,
    datasetGroupArn = dataset_group_arn,
    solutionConfig = {
        "algorithmHyperParameters": {
            "bptt": "32",
            "hidden_dimension": "149",
            "recency_mask": "true"
        },
        "featureTransformationParameters": {
            "max_user_history_length_percentile": "0.99",
            "min_user_history_length_percentile": "0.00"
        }
    }
)

# Get the solution ARN
solution_arn = create_solution_response['solutionArn']
print(f'Solution ARN: {solution_arn}')

Solution ARN: arn:aws:personalize:ap-northeast-2:250476343008:solution/RetailDemo-user-personalization


### Create Solution Version

In [41]:
# Create the solution version
create_solution_version_response = personalize.create_solution_version(
    solutionArn=solution_arn
)

# Get the solution version ARN
solution_version_arn = create_solution_version_response['solutionVersionArn']
print(f'Solution version ARN: {solution_version_arn}')

Solution version ARN: arn:aws:personalize:ap-northeast-2:250476343008:solution/RetailDemo-user-personalization/5fd17b58


#### Wait until Solution Version is in ACTIVE state
It takes about 20-30 minutes.


In [42]:
%%time

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:

    # status_aws_user_personalization
    describe_solution_response = personalize.describe_solution_version(
        solutionVersionArn = solution_version_arn
    )  
    status_solution = describe_solution_response['solutionVersion']["status"]
    print("status_user-personalization : {}".format(status_solution))
    
        
    if (status_solution == "ACTIVE" or status_solution == "CREATE FAILED") :
        break
    print("-------------------------------------->")
    time.sleep(30)

print("Generating solution version is completed")

status_user-personalization : CREATE PENDING
-------------------------------------->
status_user-personalization : CREATE IN_PROGRESS
-------------------------------------->
status_user-personalization : CREATE IN_PROGRESS
-------------------------------------->
status_user-personalization : CREATE IN_PROGRESS
-------------------------------------->
status_user-personalization : CREATE IN_PROGRESS
-------------------------------------->
status_user-personalization : CREATE IN_PROGRESS
-------------------------------------->
status_user-personalization : CREATE IN_PROGRESS
-------------------------------------->
status_user-personalization : CREATE IN_PROGRESS
-------------------------------------->
status_user-personalization : CREATE IN_PROGRESS
-------------------------------------->
status_user-personalization : CREATE IN_PROGRESS
-------------------------------------->
status_user-personalization : CREATE IN_PROGRESS
-------------------------------------->
status_user-personalizati

## 8. Personalize : Create Campaign

In [43]:
create_campaign_reponse = personalize.create_campaign(
    name = 'RetailDemo-campaign' + suffix,
    solutionVersionArn = solution_version_arn,
    minProvisionedTPS=1
)

campaign_arn = create_campaign_reponse['campaignArn']


#### Wait for Campaign creation to complete
It takes about 7 minutes.

In [44]:
%%time

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:

    # status_aws_user_personalization
    describe_campaign_response = personalize.describe_campaign(
        campaignArn = campaign_arn
    )  
    status_campaign = describe_campaign_response['campaign']["status"]
    print("status_creating_campaign : {}".format(status_campaign))
    
        
    if (status_campaign == "ACTIVE" or status_campaign == "CREATE FAILED") :
        break
    print("-------------------------------------->")
    time.sleep(60)

print("Creating Campaign is completed")

status_creating_campaign : CREATE PENDING
-------------------------------------->
status_creating_campaign : CREATE IN_PROGRESS
-------------------------------------->
status_creating_campaign : CREATE IN_PROGRESS
-------------------------------------->
status_creating_campaign : CREATE IN_PROGRESS
-------------------------------------->
status_creating_campaign : CREATE IN_PROGRESS
-------------------------------------->
status_creating_campaign : CREATE IN_PROGRESS
-------------------------------------->
status_creating_campaign : CREATE IN_PROGRESS
-------------------------------------->
status_creating_campaign : CREATE IN_PROGRESS
-------------------------------------->
status_creating_campaign : CREATE IN_PROGRESS
-------------------------------------->
status_creating_campaign : CREATE IN_PROGRESS
-------------------------------------->
status_creating_campaign : ACTIVE
Creating Campaign is completed
CPU times: user 182 ms, sys: 16.4 ms, total: 199 ms
Wall time: 10min


#### save variable
Save variables needed for clean-up

In [45]:
%store dataset_group_arn
%store interaction_schema_arn
%store item_schema_arn
%store user_schema_arn
%store interaction_dataset_arn
%store item_dataset_arn
%store user_dataset_arn
%store solution_arn
%store campaign_arn


Stored 'dataset_group_arn' (str)
Stored 'interaction_schema_arn' (str)
Stored 'item_schema_arn' (str)
Stored 'user_schema_arn' (str)
Stored 'interaction_dataset_arn' (str)
Stored 'item_dataset_arn' (str)
Stored 'user_dataset_arn' (str)
Stored 'solution_arn' (str)
Stored 'campaign_arn' (str)


# You can make an inference request with the Personalize Campaign ARN below.
In the Lambda Function, Personalize Campaign uses the Personalize Campaign ARN below.

In [46]:
print("Personalize Campaign ARN : ", campaign_arn)

Personalize Campaign ARN :  arn:aws:personalize:ap-northeast-2:250476343008:campaign/RetailDemo-campaign11201
