# Amazon Personalize

## How to Use the Notebook

Code is broken up into cells like the one below. There's a triangular `Run` button at the top of this page you can click to execute each cell and move onto the next, or you can press `Shift` + `Enter` while in the cell to execute it and move onto the next one.

As a cell is executing you'll notice a line to the side showcase an `*` while the cell is running or it will update to a number to indicate the last cell that completed executing after it has finished exectuting all the code within a cell.


Simply follow the instructions below and execute the cells to get started with Amazon Personalize.

## 0. Imports 

Python ships with a broad collection of libraries and we need to import those as well as the ones installed to help us like [boto3](https://aws.amazon.com/sdk-for-python/) (AWS SDK for python) and [Pandas](https://pandas.pydata.org/)/[Numpy](https://numpy.org/) which are core data science tools.

In [None]:
# Imports
import boto3
import sagemaker
import json
import numpy as np
import pandas as pd
import time
import random
# !conda install -y -c conda-forge unzip

Next you will want to validate that your environment can communicate successfully with Amazon Personalize, the lines below do just that.

## 1. Define Boto3, Policy and Role

In [None]:
date = '20221023'

In [None]:
# Configure the SDK to Personalize:
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

In [None]:
bucket = sagemaker.Session().default_bucket()

### 1-1. Attach Policy to S3 Bucket

Amazon Personalize needs to be able to read the content of your S3 bucket that you created earlier. The lines below will do that.

In [None]:
s3 = boto3.client("s3")

policy = {
    "Version": "2012-10-17",
    "Id": "PersonalizeS3BucketAccessPolicy",
    "Statement": [
        {
            "Sid": "PersonalizeS3BucketAccessPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "personalize.amazonaws.com"
            },
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::{}".format(bucket),
                "arn:aws:s3:::{}/*".format(bucket)
            ]
        }
    ]
}

s3.put_bucket_policy(Bucket=bucket, Policy=json.dumps(policy))

### 1-2. Create Personalize Role

Also Amazon Personalize needs the ability to assume Roles in AWS in order to have the permissions to execute certain tasks, the lines below grant that.

In [None]:
iam = boto3.client("iam")

role_name = f"PersonalizeRoleDemo-{date}"
assume_role_policy_document = {
    "Version": "2012-10-17",
    "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "personalize.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
    ]
}

create_role_response = iam.create_role(
    RoleName = role_name,
    AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)
)

# AmazonPersonalizeFullAccess provides access to any S3 bucket with a name that includes "personalize" or "Personalize" 
# if you would like to use a bucket with a different name, please consider creating and attaching a new policy
# that provides read access to your bucket or attaching the AmazonS3ReadOnlyAccess policy to the role
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonPersonalizeFullAccess"
iam.attach_role_policy(
    RoleName = role_name,
    PolicyArn = policy_arn
)

# Now add S3 support
iam.attach_role_policy(
    PolicyArn='arn:aws:iam::aws:policy/AmazonS3FullAccess',
    RoleName=role_name
)
time.sleep(10) # wait for a minute to allow IAM role policy attachment to propagate

role_arn = create_role_response["Role"]["Arn"]
print(role_arn)

## 2. Create Sample Dataset

### 2-1. Interactions Dataset

In [None]:
from datetime import datetime, timedelta
import time

TIMESTAMP = time.mktime((datetime.now()-timedelta(days=100)).timetuple())
TIMESTAMP = int(TIMESTAMP)
TIMESTAMP

In [None]:
user_id = [i+1 for i in range(100)]
item_id = [i+1000 for i in range(10000)]
item_type = ['AAA','BBB','CCC']
event_val = [i+1 for i in range(10)]
event_type = ['purchased','checked']
print(f"USER_ID : {user_id[:10]} \nITEM_ID : {item_id[:10]} \nITEM_TYPE : {item_type} \nEVENT_VALUE : {event_val} \nEVENT_TYPE : {event_type}")

In [None]:
item_mapping={}
for item in item_id:
    item_mapping[item] = random.choice(item_type)

In [None]:
interaction_data = []
for i in range(1000):
    event_type_tmp = random.choice(event_type)
    if event_type_tmp == 'checked':
        event_val_tmp = random.choice(event_val)
    else:
        event_val_tmp = None
    select_item_id = random.choice(item_id)
    select_item_type = item_mapping[select_item_id]
    interaction_data.append([random.choice(user_id),select_item_id, 
                 TIMESTAMP+random.choice(range(10000,4320108)), select_item_type,
                 event_val_tmp, event_type_tmp])

In [None]:
interaction_pd_data=pd.DataFrame(interaction_data, columns=['USER_ID', 'ITEM_ID', 'TIMESTAMP', 'ITEM_TYPE', 'EVENT_VALUE', 'EVENT_TYPE'])
interaction_pd_data.head()

In [None]:
sample_interaction_filename = f"interaction_sample_{date}.csv"
interaction_pd_data.to_csv(sample_interaction_filename, index=False)
boto3.Session().resource('s3').Bucket(bucket).Object(sample_interaction_filename).upload_file(sample_interaction_filename)

### 2-2. Users dataset

In [None]:
user_meta_1 = []

for i in range(10):
    user_meta_1.append(f'USER_GROUP_{i}')

user_meta_2 = [i for i in range(100,105)]

In [None]:
user_data = []
for i in user_id:
    user_data.append([i,random.choice(user_meta_1),random.choice(user_meta_2)])

In [None]:
user_pd_data=pd.DataFrame(user_data, columns=['USER_ID', 'USER_META1', 'USER_META2'])
user_pd_data.head()

In [None]:
sample_user_filename = f"users_sample_{date}.csv"
user_pd_data.to_csv(sample_user_filename, index=False)
boto3.Session().resource('s3').Bucket(bucket).Object(sample_user_filename).upload_file(sample_user_filename)

### 2-3. Items dataset

In [None]:
text = "food is any substance consumed to provide nutritional support for an organism. Food is usually of plant, animal, or fungal origin, and contains essential nutrients, such as carbohydrates, fats, proteins, vitamins, or minerals. The substance is ingested by an organism and assimilated by the organism's cells to provide energy, maintain life, or stimulate growth. Different species of animals have different feeding behaviours that satisfy the needs of their unique metabolisms, often evolved to fill a specific ecological niche within specific geographical contexts. Omnivorous humans are highly adaptable and have adapted to obtain food in many different ecosystems. Historically, humans secured food through two main methods: hunting and gathering and agriculture. As agricultural technologies increased, humans settled into agriculture lifestyles with diets shaped by the agriculture opportunities in their geography. Geographic and cultural differences has led to creation of numerous cuisines and culinary arts, including a wide array of ingredients, herbs, spices, techniques, and dishes. As cultures have mixed through forces like international trade and globalization, ingredients have become more widely available beyond their geographic and cultural origins, creating a cosmopolitan exchange of different food traditions and practices. Today, the majority of the food energy required by the ever-increasing population of the world is supplied by the industrial food industry, which produces food with intensive agriculture and distributes it through complex food processing and food distribution systems. This system of conventional agriculture relies heavily on fossil fuels, which means that the food and agricultural system is one of the major contributors to climate change, accountable for as much as 37% of total greenhouse gas emissions. Addressing the carbon intensity of the food system and food waste are important mitigation measures in the global response to climate change. The food system has significant impacts on a wide range of other social and political issues including: sustainability, biological diversity, economics, population growth, water supply, and access to food. The right to food is a human right derived from the International Covenant on Economic, Social and Cultural Rights, recognizing the right to an adequate standard of living, including adequate food, as well as the fundamental right to be free from hunger. Because of these fundamental rights, food security is often a priority international policy activity; for example Sustainable Development Goal 2 Zero hunger is meant to eliminate hunger by 2030. Food safety and food security are monitored by international agencies like the International Association for Food Protection, World Resources Institute, World Food Programme, Food and Agriculture Organization, and International Food Information Council, and are often subject to national regulation by institutions, like the Food and Drug Administration in the United States. Food is any substance consumed to provide nutritional support and energy to an organism. It can be raw, processed or formulated and is consumed orally by animals for growth, health or pleasure. Food is mainly composed of water, lipids, proteins and carbohydrates. Minerals and organic substances vitamins can also be found in food. Plants, algae and some microorganisms use photosynthesis to make their own food molecules. Water is found in many foods and has been defined as a food by itself. Water and fiber have low energy densities, or calories, while fat is the most energy dense component. Some inorganic (non-food) elements are also essential for plant and animal functioning. Human food can be classified in various ways, either by related content or by how the food is processed. The number and composition of food groups can vary. Most systems include four basic groups that describe their origin and relative nutritional function: Vegetables and Fruit, Cereals and Bread, Dairy, and Meat. Studies that look into diet quality often group food into whole grains, cereals, refined grains/cereals, vegetables, fruits, nuts, legumes, eggs, dairy products, fish, red meat, processed meat, and sugar-sweetened beverages. The Food and Agriculture Organization and World Health Organization use a system with nineteen food classifications: cereals, roots, pulses and nuts, milk, eggs, fish and shellfish, meat, insects, vegetables, fruits, fats and oils, sweets and sugars, spices and condiments, beverages, foods for nutritional uses, food additives, composite dishes and savoury snacks.Plants as a food source are often divided into seeds, fruits, vegetables, legumes, grains and nuts. Where plants fall within these categories can vary with botanically described fruits such as the tomato, squash, pepper and eggplant or seeds like peas commonly considered vegetables. Food is a fruit if the part eaten is derived from the reproductive tissue, so seeds, nuts and grains are technically fruit. From a culinary perspective fruits are generally considered the remains of botanically described fruits after grains, nuts, seeds and fruits used as vegetables are removed. Grains can be defined as seeds that humans eat or harvest, with cereal grains oats, wheat, rice, corn, barley, rye, sorghum and millet belonging to the Poaceae grass family and pulses coming from the Fabaceae legume family. Whole grains are foods that contain all the elements of the original seed bran, germ, and endosperm. Nuts are dry fruits distinguishable by their woody shell."

In [None]:
item_meta_1 = []

for i in range(5):
    item_meta_1.append(f'ITEM_GROUP_{i+1}')

item_meta_2 = []

for i in range(5):
    item_meta_2.append(f'ITEM_GROUP_10000{i+1}')

candidate_desc = text.split(' ')

In [None]:
item_data = []
for i in item_id:
    description = []
    desc_text = ''
    for word in random.sample(candidate_desc, random.choice(range(15,50))):
        desc_text += word + ' '
#     description.append(desc_text)
    select_item_meta_1 = random.choice(item_meta_1)
    select_item_meta_2 = random.choice(item_meta_2)
    
    item_data.append([i,select_item_meta_1,select_item_meta_2,
                      TIMESTAMP+random.choice(range(10000,4320108)), 
                      desc_text])

In [None]:
item_pd_data=pd.DataFrame(item_data, columns=['ITEM_ID','ITEM_META1','ITEM_META2','CREATION_TIMESTAMP','DESCRIPTION'])
item_pd_data.head()

In [None]:
sample_item_filename = f"items_sample_{date}.csv"
item_pd_data.to_csv(sample_item_filename, index=False)
boto3.Session().resource('s3').Bucket(bucket).Object(sample_item_filename).upload_file(sample_item_filename)

## 3. Create Schema

A core component of how Personalize understands your data comes from the Schema that is defined below. This configuration tells the service how to digest the data provided via your CSV file. Note the columns and types align to what was in the file you created above.

### 3-1. Interactions datasets schema

In [None]:
interactions_schema_name = f'interactions-samples-{date}'
try:
    personalize.delete_schema(schemaArn=f'arn:aws:personalize:us-west-2:687314952804:schema/{interactions_schema_name}')
except:
    print("The schema doesn't exist")

In [None]:
interactions_schema = {
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "TIMESTAMP",
            "type": "long"
        },
        {
            "name": "ITEM_TYPE",
            "type": "string",
            "categorical": True
        },
        {
            "name": "EVENT_VALUE",
            "type": [
             "float",
             "null"
          ]
        },
        {
            "name": "EVENT_TYPE",
            "type": "string"
        }
    ],
    "version": "1.0"
}

interactions_create_schema_response = personalize.create_schema(
    name = interactions_schema_name,
    schema = json.dumps(interactions_schema)
)

interactions_schema_arn = interactions_create_schema_response['schemaArn']
print(json.dumps(interactions_create_schema_response, indent=2))

### 3-2. User datasets schema

In [None]:
users_schema_name = f'user-samples-{date}'
try:
    personalize.delete_schema(schemaArn=f'arn:aws:personalize:us-west-2:687314952804:schema/{users_schema_name}')
except:
    print("The schema doesn't exist")


In [None]:
users_schema = {
    "type": "record",
    "name": "Users",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "USER_META1",
            "type": "string",
            "categorical": True
        },
        {
            "name": "USER_META2",
            "type": "int"
        }
    ],
    "version": "1.0"
}

users_create_schema_response = personalize.create_schema(
    name = users_schema_name,
    schema = json.dumps(users_schema)
)

users_schema_arn = users_create_schema_response['schemaArn']
print(json.dumps(users_create_schema_response, indent=2))

### 3-3. Items datasets schema

In [None]:
items_schema_name = f'items-samples-{date}'
try:
    personalize.delete_schema(schemaArn=f'arn:aws:personalize:us-west-2:687314952804:schema/{items_schema_name}')
except:
    print("The schema doesn't exist")


In [None]:
items_schema = {
    "type": "record",
    "name": "Items",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "ITEM_META1",
            "type": [
                "null",
                "string"
              ],
              "categorical": True
        },
        {
            "name": "ITEM_META2",
            "type": [
                "null",
                "string"
              ],
              "categorical": True
        },
        {
          "name": "CREATION_TIMESTAMP",
          "type": "long"
        },
        {
          "name": "DESCRIPTION",
          "type": [
            "null",
            "string"
          ],
          "textual": True
        },
    ],
    "version": "1.0"
}

items_create_schema_response = personalize.create_schema(
    name = items_schema_name,
    schema = json.dumps(items_schema)
)

items_schema_arn = items_create_schema_response['schemaArn']
print(json.dumps(items_create_schema_response, indent=2))

## 4. Create Dataset Group

The largest grouping in Personalize is a Dataset Group, this will isolate your data, event trackers, solutions, and campaigns. Grouping things together that share a common collection of data. Feel free to alter the name below if you'd like.

### 4-1. Dataset group details

In [None]:
create_dataset_group_response = personalize.create_dataset_group(
    name = f"dataset-samples-{date}"
)

dataset_group_arn = create_dataset_group_response['datasetGroupArn']
print(json.dumps(create_dataset_group_response, indent=2))

##### Wait for Dataset Group to Have ACTIVE Status

Before we can use the Dataset Group in any items below it must be active, execute the cell below and wait for it to show active.

In [None]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_group_response = personalize.describe_dataset_group(
        datasetGroupArn = dataset_group_arn
    )
    status = describe_dataset_group_response["datasetGroup"]["status"]
    print("DatasetGroup: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(10)

### 4-2. Create Datasets

After the group, the next thing to create is the actual datasets. Execute the cells below to create it.

#### 4-2-1. Interactions Datasets

In [None]:
interactions_dataset_type = "INTERACTIONS"
try:
    interactions_create_dataset_response = personalize.create_dataset(
        name = f"interactions-samples-dataset-{date}",
        datasetType = interactions_dataset_type,
        datasetGroupArn = dataset_group_arn,
        schemaArn = interactions_schema_arn
    )

    interactions_dataset_arn = interactions_create_dataset_response['datasetArn']
    print(json.dumps(interactions_create_dataset_response, indent=2))
except Exception as e:
    print(e)

#### 4-2-2. Users Datasets

In [None]:
users_dataset_type = "USERS"
try:
    users_create_dataset_response = personalize.create_dataset(
        name = f"users-samples-dataset-{date}",
        datasetType = users_dataset_type,
        datasetGroupArn = dataset_group_arn,
        schemaArn = users_schema_arn
    )

    users_dataset_arn = users_create_dataset_response['datasetArn']
    print(json.dumps(users_create_dataset_response, indent=2))
except Exception as e:
    print(e)

#### 4-2-3. Items Datasets

In [None]:
items_dataset_type = "ITEMS"
try:
    items_create_dataset_response = personalize.create_dataset(
        name = f"items-samples-dataset-{date}",
        datasetType = items_dataset_type,
        datasetGroupArn = dataset_group_arn,
        schemaArn = items_schema_arn
    )

    items_dataset_arn = items_create_dataset_response['datasetArn']
    print(json.dumps(items_create_dataset_response, indent=2))
except Exception as e:
    print(e)

## 4-3. Dataset Import jobs

### 4-3-1. Create dataset import job for Interactions

In [None]:
time.sleep(10)
try:
    interactions_create_dataset_import_job_response = personalize.create_dataset_import_job(
        jobName = f"import-sample-interactions-{date}",
        datasetArn = interactions_dataset_arn,
        dataSource = {
            "dataLocation": "s3://{}/{}".format(bucket, sample_interaction_filename)
        },
        roleArn = role_arn
    )

    interactions_dataset_import_job_arn = interactions_create_dataset_import_job_response['datasetImportJobArn']
    print(json.dumps(interactions_create_dataset_import_job_response, indent=2))
except Exception as e:
    print(e)

### 4-3-2. Create dataset import job for Users

In [None]:
time.sleep(10)
try:
    users_create_dataset_import_job_response = personalize.create_dataset_import_job(
        jobName = f"import-sample-users-{date}",
        datasetArn = users_dataset_arn,
        dataSource = {
            "dataLocation": "s3://{}/{}".format(bucket, sample_user_filename)
        },
        roleArn = role_arn
    )

    users_dataset_import_job_arn = users_create_dataset_import_job_response['datasetImportJobArn']
    print(json.dumps(users_dataset_import_job_arn, indent=2))
except Exception as e:
    print(e)

### 4-3-3. Create dataset import job for Items

In [None]:
time.sleep(10)
try:
    items_create_dataset_import_job_response = personalize.create_dataset_import_job(
        jobName = f"import-sample-items-{date}",
        datasetArn = items_dataset_arn,
        dataSource = {
            "dataLocation": "s3://{}/{}".format(bucket, sample_item_filename)
        },
        roleArn = role_arn
    )

    items_dataset_import_job_arn = items_create_dataset_import_job_response['datasetImportJobArn']
    print(json.dumps(items_dataset_import_job_arn, indent=2))
except Exception as e:
    print(e)

##### Wait for Dataset Import Job to Have ACTIVE Status

It can take a while before the import job completes, please wait until you see that it is active below.

In [None]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = items_dataset_import_job_arn
    )
    status = describe_dataset_import_job_response["datasetImportJob"]['status']
    print("DatasetImportJob: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(10)

## 5. Use custom resources

In Amazon Personalize a trained model is called a Solution, each Solution can have many specific versions that relate to a given volume of data when the model was trained.

To begin we will list all the recipies that are supported, a recipie is an algorithm that has not been trained on your data yet. After listing you'll select one and use that to build your model.

### 5-1. Select Recipe

In [None]:
list_recipes_response = personalize.list_recipes()
list_recipes = [list_recipes_response]
while True:
    if list_recipes_response.get('nextToken'):
        list_recipes_response = personalize.list_recipes(nextToken=list_recipes_response['nextToken'])
        list_recipes.append(list_recipes_response)
    else:
        break

num = 1
for response in list_recipes:
    for i, recipes_name in enumerate(response['recipes']):
        name = recipes_name['name']
        recipeArn = recipes_name['recipeArn']

        if recipes_name.get('domain'):
            domain = recipes_name['domain']
            print(f"{num}.{name}, {recipeArn}, {domain}")
        else:
            print(f"{num}.{name}, {recipeArn}")
        num += 1

### 5-2. Recipe - User Personalization
The [User-Personalization](https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-new-item-USER_PERSONALIZATION.html) (aws-user-personalization) recipe is optimized for all USER_PERSONALIZATION recommendation scenarios. When recommending items, it uses automatic item exploration.

With automatic exploration, Amazon Personalize automatically tests different item recommendations, learns from how users interact with these recommended items, and boosts recommendations for items that drive better engagement and conversion. This improves item discovery and engagement when you have a fast-changing catalog, or when new items, such as news articles or promotions, are more relevant to users when fresh.

You can balance how much to explore (where items with less interactions data or relevance are recommended more frequently) against how much to exploit (where recommendations are based on what we know or relevance). Amazon Personalize automatically adjusts future recommendations based on implicit user feedback.

First, select the recipe by finding the ARN in the list of recipes above.

In [None]:
recipe_arn = "arn:aws:personalize:::recipe/aws-user-personalization" # aws-user-personalization selected for demo purposes

### 5-3. Create Solution

First you will create the solution with the API, then you will create a version. It will take several minutes to train the model and thus create your version of a solution. Once it gets started and you are seeing the in progress notifications it is a good time to take a break, grab a coffee, etc.

In [None]:
try:
    create_solution_response = personalize.create_solution(
        name = f"personalize-sample-user-personalization-{date}",
        datasetGroupArn = dataset_group_arn,
        recipeArn = recipe_arn
    )

    solution_arn = create_solution_response['solutionArn']
    print(json.dumps(create_solution_response, indent=2))
except Exception as e:
    print(e)
    solution_arn = create_solution_response['solutionArn']
    print(f"solution_arn : {solution_arn}")

### 5-4. Create Solution Version

Personalize 재학습에서는 trainingMode='UPDATE'로 변경해서 아래 작업을 다시 진행해 볼 수 있습니다.

In [None]:
trainingMode = "FULL"
# trainingMode = "UPDATE"

In [None]:
try:
    create_solution_version_response = personalize.create_solution_version(
        solutionArn = solution_arn,
        trainingMode=trainingMode 
    )

    solution_version_arn = create_solution_version_response['solutionVersionArn']
    print(json.dumps(create_solution_version_response, indent=2))
except Exception as e:
    print(e)

##### Wait for Solution Version to Have ACTIVE Status

This will take approximately 40-50 minutes.

In [None]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_solution_version_response = personalize.describe_solution_version(
        solutionVersionArn = solution_version_arn
    )
    status = describe_solution_version_response["solutionVersion"]["status"]
    print("SolutionVersion: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

### 5-5. Get Metrics of Solution Version

Now that your solution and version exists, you can obtain the metrics for it to judge its performance. These metrics are not particularly good as it is a demo set of data, but with larger more complex datasets you should see improvements.

In [None]:
try:
    get_solution_metrics_response = personalize.get_solution_metrics(
        solutionVersionArn = solution_version_arn
    )
    print(json.dumps(get_solution_metrics_response, indent=2))
except Exception as e:
    print(e)

We recommend reading [the documentation](https://docs.aws.amazon.com/personalize/latest/dg/working-with-training-metrics.html) to understand the metrics, but we have also copied parts of the documentation below for convenience.

You need to understand the following terms regarding evaluation in Personalize:

- *Relevant recommendation* refers to a recommendation that matches a value in the testing data for the particular user.
- *Rank* refers to the position of a recommended item in the list of recommendations. Position 1 (the top of the list) is presumed to be the most relevant to the user.
- *Query* refers to the internal equivalent of a GetRecommendations call.

The metrics produced by Personalize are:

- coverage: The proportion of unique recommended items from all queries out of the total number of unique items in the training data (includes both the Items and Interactions datasets).
- mean_reciprocal_rank_at_25: The [mean of the reciprocal ranks](https://en.wikipedia.org/wiki/Mean_reciprocal_rank) of the first relevant recommendation out of the top 25 recommendations over all queries. This metric is appropriate if you're interested in the single highest ranked recommendation.
- normalized_discounted_cumulative_gain_at_K: Discounted gain assumes that recommendations lower on a list of recommendations are less relevant than higher recommendations. Therefore, each recommendation is discounted (given a lower weight) by a factor dependent on its position. To produce the [cumulative discounted gain](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) (DCG) at K, each relevant discounted recommendation in the top K recommendations is summed together. The normalized discounted cumulative gain (NDCG) is the DCG divided by the ideal DCG such that NDCG is between 0 - 1. (The ideal DCG is where the top K recommendations are sorted by relevance.) Amazon Personalize uses a weighting factor of 1/log(1 + position), where the top of the list is position 1. This metric rewards relevant items that appear near the top of the list, because the top of a list usually draws more attention.
- precision_at_K: The number of relevant recommendations out of the top K recommendations divided by K. This metric rewards precise recommendation of the relevant items.

## 6. Create and Wait for the Campaign

Now that you have a working solution version you will need to create a campaign to use it with your applications. A campaign is a hosted solution version; an endpoint which you can query for recommendations. Pricing is set by estimating throughput capacity (requests from users for personalization per second). When deploying a campaign, you set a minimum transactions per second (TPS) value (`minProvisionedTPS`). This service, like many within AWS, will automatically scale based on demand, but if latency is critical, you may want to provision ahead for larger demand. For this demo, the minimum throughput threshold is set to 1. For more information, see the [pricing](https://aws.amazon.com/personalize/pricing/) page.

As mentioned above, the user-personalization recipe used for our solution supports automatic exploration of "cold" items. You can control how much exploration is performed when creating your campaign. The `itemExplorationConfig` data type supports `explorationWeight` and `explorationItemAgeCutOff` parameters. Exploration weight determines how frequently recommendations include items with less interactions data or relevance. The closer the value is to 1.0, the more exploration. At zero, no exploration occurs and recommendations are based on current data (relevance). Exploration item age cut-off determines items to be explored based on time frame since latest interaction. Provide the maximum item age, in days since the latest interaction, to define the scope of item exploration. The larger the value, the more items are considered during exploration. For our campaign below, we'll specify an exploration weight of 0.5.

### 6-1. Create Campaign

In [None]:
try:
    if trainingMode == "FULL":
        create_campaign_response = personalize.create_campaign(
            name = f"personalize-sample-camp-{date}",
            solutionVersionArn = solution_version_arn,
            minProvisionedTPS = 1,
            campaignConfig = {
                "itemExplorationConfig": {
                    "explorationWeight": "0.3",   ### Exploration 비율을 조정해 볼 수 있는 값
                    "explorationItemAgeCutOff": "20"
                }
            }
        )
    elif trainingMode == "UPDATE" and campaign_arn is not None:
        create_campaign_response = personalize.update_campaign(
            campaignArn = campaign_arn,
            solutionVersionArn = solution_version_arn,
            minProvisionedTPS = 1,
            campaignConfig = {
                "itemExplorationConfig": {
                    "explorationWeight": "0.3",   ### Exploration 비율을 조정해 볼 수 있는 값
                    "explorationItemAgeCutOff": "20"
                }
            }
        )

    campaign_arn = create_campaign_response['campaignArn']
    print(json.dumps(create_campaign_response, indent=2))
except Exception as e:
    print(e)


##### Wait for Campaign to Have ACTIVE Status

This should take about 10 minutes.

In [None]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_campaign_response = personalize.describe_campaign(
        campaignArn = campaign_arn
    )
    status = describe_campaign_response["campaign"]["status"]
    print("Campaign: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

## 7. Get Sample Recommendations



In [None]:
# Getting a random user:
user_id, item_id, _ ,_ ,_ ,_ = interaction_pd_data.sample().values[0]
print("USER: {}".format(user_id))

### 7-1. Call GetRecommendations

Using the user that you obtained above, the lines below will get recommendations for you and return the list of items that are recommended.


In [None]:
get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = campaign_arn,
    userId = str(user_id),
)
# Update DF rendering
pd.set_option('display.max_rows', 30)

print("Recommendations for user: ", user_id)

item_list_first = get_recommendations_response['itemList']

recommendation_list = []

for item in item_list_first:
    title = item['itemId']
    score = item['score']

    recommendation_list.append([title, score])
    
recommendations_df = pd.DataFrame(recommendation_list, columns = ['OriginalRecs-items','OriginalRecs-score'])
recommendations_df[:10]

### 7-2. Create new filter

In [None]:
sts_client = boto3.client("sts")
account_id = sts_client.get_caller_identity()["Account"]
region_name = boto3.Session().region_name

In [None]:
filter_name = f'sample-filter-not-in-check-and-purchased-{date}'
filter_arn = f"arn:aws:personalize:{region_name}:{account_id}:filter/{filter_name}"
try:
    personalize.delete_filter(
        filterArn=filter_arn
    )
except:
    pass

In [None]:
try:
    res_filter = personalize.create_filter(
        name=filter_name,
        datasetGroupArn=dataset_group_arn,
        filterExpression='EXCLUDE ItemID WHERE Interactions.event_type IN ("checked", "purchased")'
    )
except Exception as e:
    print(e)

##### Wait for creating a filter to Have ACTIVE Status

In [None]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_filter_response = personalize.describe_filter(
        filterArn = filter_arn
    )
    status = describe_filter_response["filter"]["status"]
    print("Filter: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(10)

### 7-3. Call GetRecommendations with Filter (exclude 'checked' and 'purchased')

In [None]:
get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = campaign_arn,
    userId = str(user_id),
    filterArn=res_filter['filterArn']
)
# Update DF rendering
pd.set_option('display.max_rows', 30)

print("Recommendations for user: ", user_id)

item_list = get_recommendations_response['itemList']

recommendation_list = []

for item in item_list:
    title = item['itemId']
    score = item['score']
    recommendation_list.append([title, score])
    

new_rec_DF = pd.DataFrame(recommendation_list, columns = ['Filtered-items','Filtered-score'])
try:
    recommendations_df.drop(['Filtered-items','Filtered-score'], axis=1, inplace=True)
except:
    pass

recommendations_df = recommendations_df.join(new_rec_DF)
recommendations_df[:10]

In [None]:
filter_items_list = []
for inter_data in interaction_data:
    if inter_data[0] == user_id:
        filter_items_list.append(inter_data[1])
print(f"filter_items_list : {filter_items_list}")
for rec_list in recommendations_df['Filtered-items']:
    if rec_list in filter_items_list:
        print(f"rec_list : {rec_list}")

### 7-4. Creating an Event Tracker

Before your recommendation system can respond to real time events you will need an event tracker, the code below will generate one and can be used going forward with this lab. Feel free to name it something more clever.

In [None]:
# Imports
import boto3
import json
import numpy as np
import pandas as pd
import time
import uuid

In [None]:
# Setup and Config
# Recommendations from Event data
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

# Establish a connection to Personalize's Event Streaming
personalize_events = boto3.client(service_name='personalize-events')

In [None]:
try:
    response = personalize.create_event_tracker(
        name=f'SampleTracker-{date}',
        datasetGroupArn=dataset_group_arn
    )
    event_tracker_arn = response['eventTrackerArn']
    TRACKING_ID = response['trackingId']
    print(f"event_tracker_arn : {event_tracker_arn} ,\nTRACKING_ID : {TRACKING_ID}")
except Exception as e:
    response = personalize.describe_event_tracker(
        eventTrackerArn=event_tracker_arn
    )
    event_tracker_arn=response['eventTracker']['eventTrackerArn']
    TRACKING_ID = response['eventTracker']['trackingId']
    print(f"event_tracker_arn : {event_tracker_arn} ,\nTRACKING_ID : {TRACKING_ID}")

#### 7-4-1. Put-Events with Filter

In [None]:
USER_ID = str(user_id)
ITEM_ID = recommendations_df['Filtered-items'][0]
print(f"USER_ID : {USER_ID}, ITEM_ID : {ITEM_ID}")

In [None]:
session_dict = {}

# Configure Session
try:
    session_ID = session_dict[USER_ID]
except:
    session_dict[USER_ID] = str(uuid.uuid1())
    session_ID = session_dict[USER_ID]
    
print(f"session_ID : {session_ID}")
print(f"USER_ID : {USER_ID}")
print(f"ITEM_ID : {ITEM_ID}")

# Configure Properties:
event = {
    "itemId": ITEM_ID,
    "itemtype": "AAA"
}
event_json = json.dumps(event)

# Make Call
personalize_events.put_events(
trackingId = TRACKING_ID,
userId= USER_ID,
sessionId = session_ID,
eventList = [{
    'sentAt': int(time.time()),
    'recommendationId' : get_recommendations_response['recommendationId'],
    'eventType': 'checked',
    'properties': event_json
    }]
)

In [None]:
get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = campaign_arn,
    userId = str(user_id),
    filterArn=res_filter['filterArn'] ### Filters
)
# Update DF rendering
pd.set_option('display.max_rows', 30)

print("Recommendations for user: ", user_id)

item_list = get_recommendations_response['itemList']

recommendation_list = []

for item in item_list:
    title = item['itemId']
    score = item['score']
    recommendation_list.append([title, score])
    

new_rec_DF = pd.DataFrame(recommendation_list, columns = ['Put-event-items','Put-event'])
try:
    recommendations_df.drop(['Put-event-items','Put-event'], axis=1, inplace=True)
except:
    pass

recommendations_df = recommendations_df.join(new_rec_DF)
recommendations_df[:10]

### 7-5. Call GetRecommendations with Filter and Context

In [None]:
get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = campaign_arn,
    userId = str(user_id),
    filterArn=res_filter['filterArn'],
    context={
          'ITEM_TYPE': 'AAA'
      },
)
# Update DF rendering
pd.set_option('display.max_rows', 30)

print("Recommendations for user: ", user_id)

item_list = get_recommendations_response['itemList']

recommendation_list = []

for item in item_list:
    title = item['itemId']
    score = item['score']
    recommendation_list.append([title, score])
    

new_rec_DF = pd.DataFrame(recommendation_list, columns = ['Context-items','Context-score'])

try:
    recommendations_df.drop(['Context-items','Context-score'], axis=1, inplace=True)
except:
    pass

recommendations_df = recommendations_df.join(new_rec_DF)
recommendations_df[:10]

### 7-6. Call GetRecommendations with Impression

In [None]:
USER_ID = str(user_id)
ITEM_ID = recommendations_df['Context-items'][0]
print(f"USER_ID : {USER_ID}, ITEM_ID : {ITEM_ID}")

In [None]:
# Configure Session
try:
    session_ID = session_dict[USER_ID]
except:
    session_dict[USER_ID] = str(uuid.uuid1())
    session_ID = session_dict[USER_ID]
    
    
impression_list = [item[0] for item in recommendation_list[:5]]

print(f"session_ID : {session_ID}")
print(f"ITEM_ID : {ITEM_ID}")
print(f"impression_list : {impression_list}")

# Configure Properties:
event = {
    "itemId": ITEM_ID,
    "itemtype": "BBB"
}
event_json = json.dumps(event)

# Make Call
personalize_events.put_events(
trackingId = TRACKING_ID,
userId= USER_ID,
sessionId = session_ID,
eventList = [{
    'sentAt': int(time.time()),
    'impression': impression_list, ## Explicit impressions
    'eventType': 'purchased',
    'properties': event_json
    }]
)

In [None]:
get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = campaign_arn,
    userId = str(user_id),
    filterArn=res_filter['filterArn'],
    context={
          'ITEM_TYPE': 'AAA'
      },
)
# Update DF rendering
pd.set_option('display.max_rows', 30)

print("Recommendations for user: ", user_id)

item_list = get_recommendations_response['itemList']

recommendation_list = []

for item in item_list:
    title = item['itemId']
    score = item['score']
    recommendation_list.append([title, score])
    

new_rec_DF = pd.DataFrame(recommendation_list, columns = ['Impression-items','Impression-score'])

try:
    recommendations_df.drop(['Impression-items','Impression-score'], axis=1, inplace=True)
except:
    pass

print(f"impression_list : {impression_list}")
recommendations_df = recommendations_df.join(new_rec_DF)
recommendations_df

### 7-7. Call GetRecommendations after doing put_items

In [None]:
TIMESTAMP_10 = time.mktime((datetime.now()-timedelta(days=10)).timetuple())
TIMESTAMP_10 = int(TIMESTAMP_10)

TIMESTAMP_30 = time.mktime((datetime.now()-timedelta(days=30)).timetuple())
TIMESTAMP_30 = int(TIMESTAMP_30)
print(f"TIMESTAMP_10 : {TIMESTAMP_10}, TIMESTAMP_30 : {TIMESTAMP_30}")

In [None]:
new_ITEM_ID_10 = item_data[-1][0] +30000
new_ITEM_ID_30 = item_data[-1][0] +20000
print(f"new_ITEM_ID_10 : {new_ITEM_ID_10} , new_ITEM_ID_30 : {new_ITEM_ID_30}")

In [None]:
for i in range(100):
    new_ITEM_ID_30 = item_data[-1][0] + 20000 + i
    # Configure Properties:
    item_meta = {
        'ITEM_META1': 'A_ITEM_1',
        'ITEM_META2':'B_ITEM_1',
        'CREATION_TIMESTAMP':TIMESTAMP_30
    }
    item_meta_json = json.dumps(item_meta)

    personalize_events.put_items(
        datasetArn=items_dataset_arn,
        items=[
            {
                'itemId': str(new_ITEM_ID_30),
                'properties': item_meta_json
            },
        ]
    )

    new_ITEM_ID_10 = item_data[-1][0] + 30000 + i
    # Configure Properties:
    item_meta = {
        'ITEM_META1': 'A_ITEM_1',
        'ITEM_META2':'B_ITEM_1',
        'CREATION_TIMESTAMP':TIMESTAMP_10
    }
    item_meta_json = json.dumps(item_meta)

    personalize_events.put_items(
        datasetArn=items_dataset_arn,
        items=[
            {
                'itemId': str(new_ITEM_ID_10),
                'properties': item_meta_json
            },
        ]
    )

In [None]:
get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = campaign_arn,
    userId = str(USER_ID),
)
# Update DF rendering
pd.set_option('display.max_rows', 30)

print("Recommendations for user: ", USER_ID)

item_list_first = get_recommendations_response['itemList']

recommendation_list_new_item = []

for item in item_list_first:
    title = item['itemId']
    score = item['score']
    recommendation_list_new_item.append([title, score])
    
new_rec_DF = pd.DataFrame(recommendation_list_new_item, columns = ['new-put-items','new-put-items-score'])

try:
    recommendations_df.drop(['new-put-items','new-put-items-score'], axis=1, inplace=True)
except:
    pass

recommendations_df = recommendations_df.join(new_rec_DF)
recommendations_df

### 7-8. Call GetRecommendations after doing put_users

In [None]:
USER_ID

In [None]:
new_USER_ID = user_data[-1][0] + 100

print(f"USER_ID : {new_USER_ID}")

# Configure Properties:
user_meta = {
    'USER_META1': str(user_data[int(USER_ID)-1][1]),
    'USER_META2': user_data[int(USER_ID)-1][2],
    'CREATION_TIMESTAMP':int(time.time())
}
user_meta_json = json.dumps(user_meta)

personalize_events.put_users(
    datasetArn=users_dataset_arn,
    users=[
        {
            'userId': str(new_USER_ID),
            'properties': user_meta_json
        },
    ]
)

In [None]:
get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = campaign_arn,
    userId = str(new_USER_ID),
)
# Update DF rendering
pd.set_option('display.max_rows', 30)

print("Recommendations for user: ", new_USER_ID)

item_list_first = get_recommendations_response['itemList']

recommendation_list_new_user = []

for item in item_list_first:
    title = item['itemId']
    score = item['score']
    recommendation_list_new_user.append([title, score])
    
new_rec_DF = pd.DataFrame(recommendation_list_new_user, columns = ['new-put-users','new-put-users-score'])

try:
    recommendations_df.drop(['new-put-users','new-put-users-score'], axis=1, inplace=True)
except:
    pass

recommendations_df = recommendations_df.join(new_rec_DF)
recommendations_df

In [None]:
for val in list(recommendations_df['OriginalRecs-items']):
    if val in list(recommendations_df['new-put-users']):
        print(val)

## Review

Using the codes above you have successfully trained a deep learning model to generate recommendations based on prior user behavior. Think about other types of problems where this data is available and what it might look like to build a system like this to offer those recommendations.


- [Amazon Personalize Immersion Day](https://catalog.us-east-1.prod.workshops.aws/workshops/c5a0c80f-1a42-442c-b2c0-956b38d4dc48/en-US) 
- [amazon-personalize-samples](https://github.com/aws-samples/amazon-personalize-samples)
- [Amazon Personalize 기반으로 실시간 추천 사이트 만들기](https://catalog.us-east-1.prod.workshops.aws/workshops/ed82a5d4-6630-41f0-a6a1-9345898fa6ec/ko-KR)