# Creating Solutions <a class="anchor" id="top"></a>

## Outline

1. [Introduction](#intro)
1. [How to train your Solution Versions](#recommenders)
1. [Create Solutions](#solutions)
1. [Evaluate Solutions](#eval)
1. [Using Evaluation Metrics](#use)
1. [Deploy a Campaign](#deploy)
1. [Create Filters](#filters)
1. [Storing Useful Variables](#vars)

To run this notebook, you need to have run [the previous notebook: `01_Data_Layer.ipynb`](01_Data_Layer.ipynb), where you prepared 2 datasets (item-interactions, and items) for use in Amazon Personalize. At the end of that notebook, you saved some variable values, which you now need to load into this notebook.

## Introduction <a class="anchor" id="intro"></a>

In the previous notebook we prepared 2 different datasets that represent our fictional retail store (User interactions, and News Article Metadata) and created Datasets in Amazon Personalize for this data.

In this notebook we will define our use-case, train models and create APIs to get recommendations.

## Define your Use Case <a class="anchor" id="usecase"></a>
[Back to top](#top)

The [minimum data requirements](https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html) can be found in the documentation.

Most of the time this is easily attainable, and if you are low in one category, you can often make up for it by having a larger number in another category.

The user-item-iteraction data is key for getting started with the service. 

### Train Models and create APIs to get recommendations

In this section we will create a custom solutions, solution versions and campaigns for the following use cases:

1. [User Personalization](https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-new-item-USER_PERSONALIZATION.html) this will be used to provide frontpage or genre specific recommendations.

2. [Personalized Ranking](https://docs.aws.amazon.com/personalize/latest/dg/personalized-ranking-recipes.html): will be used to rank a list of items.

All of these will be created within the same dataset group and with the same input data.

The following diagram shows the resources that we will create in this section. The part we are building in this notebook is highlighted in blue with a dashed outline.

![Workflow](Images/02_Training_Layer_Resources.jpg)

Similar to the previous notebook, start by importing the relevant packages, and set up a connection to Amazon Personalize using the SDK.

In [1]:
import time
from time import sleep
import json
from datetime import datetime
import uuid
import random
import boto3
import botocore
from botocore.exceptions import ClientError
import pandas as pd

In [2]:
%store -r

#### Create clients

In [3]:
# Configure the SDK to Personalize:
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

## How to train your Solution Versions <a class="anchor" id="recommenders"></a>
[Back to top](#top)

As mentioned previously, a dataset group, schemas, datasets, solutions, and campaigns have already been created for you. You can open another browser tab/window to view these resources in the Personalize AWS Console.

Below, we will walk you through the steps we used to create these resources. Since they are already created, we will only be retrieving the automated deployment variables, however you can also run this code to train resources if you have not run the automation.

<div class="alert alert-block alert-warning">
<b>Note:</b> Please take into account that creating these resources in your own account will incur a cost. If you are not using the CloudFormation template, it will take upload and training time to do these steps through this notebook (this can be several hours).
</div>

## Ready... Set... Train! :

Now that the data is imported and ready for use, we will also create a custom solution and solution versions for the use case [Personalized-Ranking](https://docs.aws.amazon.com/personalize/latest/dg/personalized-ranking-recipes.html) and [User Personalization](https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-new-item-USER_PERSONALIZATION.html)

## Create Solutions <a class="anchor" id="solutions"></a>
[Back to top](#top)

Whie prsonalize has streamlined solutions for the ecommerce and retail domains some such as article recommendation require a custom implementation. 

In Amazon Personalize, a specific variation of an algorithm is called a recipe. Different recipes are suitable for different situations. A trained model is called a solution, and each solution can have many versions that relate to a given volume of data when the model was trained.

Let's look at all available recipes that are not of a specific domain and can be used to create custom solutions. 

In [4]:
available_recipes = personalize.list_recipes()
display_available_recipes = available_recipes ['recipes']
available_recipes = personalize.list_recipes(nextToken=available_recipes['nextToken'])#paging to get the rest of the recipes 
display_available_recipes = display_available_recipes + available_recipes['recipes']

display ([recipe  for recipe in display_available_recipes if 'domain' not in recipe])

[{'name': 'aws-item-affinity',
  'recipeArn': 'arn:aws:personalize:::recipe/aws-item-affinity',
  'status': 'ACTIVE',
  'creationDateTime': datetime.datetime(2021, 7, 15, 0, 0, tzinfo=tzlocal()),
  'lastUpdatedDateTime': datetime.datetime(2024, 6, 19, 16, 47, 19, 191000, tzinfo=tzlocal())},
 {'name': 'aws-item-attribute-affinity',
  'recipeArn': 'arn:aws:personalize:::recipe/aws-item-attribute-affinity',
  'status': 'ACTIVE',
  'creationDateTime': datetime.datetime(2021, 8, 25, 0, 0, tzinfo=tzlocal()),
  'lastUpdatedDateTime': datetime.datetime(2024, 6, 19, 16, 47, 19, 191000, tzinfo=tzlocal())},
 {'name': 'aws-next-best-action',
  'recipeArn': 'arn:aws:personalize:::recipe/aws-next-best-action',
  'status': 'ACTIVE',
  'creationDateTime': datetime.datetime(2023, 8, 11, 0, 0, tzinfo=tzlocal()),
  'lastUpdatedDateTime': datetime.datetime(2024, 6, 19, 16, 47, 19, 191000, tzinfo=tzlocal())},
 {'name': 'aws-personalized-ranking',
  'recipeArn': 'arn:aws:personalize:::recipe/aws-personalize

## Create Solution

### User Personalization

These use cases require a custom implementation. 

In Amazon Personalize, a specific variation of an algorithm is called a recipe. Different recipes are suitable for different situations. A trained model is called a solution, and each solution can have many versions that relate to a given volume of data when the model was trained.

We are going to create a Solution of the type [User Personalization](https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-new-item-USER_PERSONALIZATION.html). This solution is a combination of a dataset group and a recipe which basically a set of instructions for Amazon Personalize letting it now how to prepare a model to solve a specific type of business use case

In [5]:
workshop_userpersonalization_recipe_arn = "arn:aws:personalize:::recipe/aws-user-personalization"

In [6]:
try:
    user_personalization_create_solution_response = personalize.create_solution(
        name = workshop_userpersonalization_solution_name,
        datasetGroupArn = workshop_dataset_group_arn,
        recipeArn = workshop_userpersonalization_recipe_arn
    )

    workshop_userpersonalization_solution_arn = user_personalization_create_solution_response['solutionArn']
    print(json.dumps(workshop_userpersonalization_solution_arn, indent=2))


    print ('\nCreating the Personalize Ranking Solution with user_personalization_solution_arn = {}'.format(workshop_userpersonalization_solution_arn))

except personalize.exceptions.ResourceAlreadyExistsException as e:
    workshop_userpersonalization_solution_arn =  'arn:aws:personalize:'+region+':'+account_id+':solution/'+workshop_userpersonalization_solution_name
    print('The Personalize Ranking Solution {} already exists.'.format(workshop_userpersonalization_solution_name))
    print ('\nWe will be using the existing Personalize Ranking Solution with workshop_rerank_solution_arn = {}'.format(workshop_userpersonalization_solution_arn))

The Personalize Ranking Solution immersion_day_user_personalization_news already exists.

We will be using the existing Personalize Ranking Solution with workshop_rerank_solution_arn = arn:aws:personalize:us-east-1:908388459961:solution/immersion_day_user_personalization_news


### Create the solution version

Once you have a solution, you need to create a version in order to complete the model training. The training can take a while to complete, upwards of 25 minutes, and an average of 35 minutes for this recipe with our dataset. Normally, we would use a while loop to poll until the task is completed. 

In [7]:
workshop_userpersonalization_solution_version_arn = None

solution_versions_list = personalize.list_solution_versions(
    solutionArn=workshop_userpersonalization_solution_arn,
    maxResults=10
)['solutionVersions']

for solution_vers in solution_versions_list:
    if solution_vers['status'] in ['CREATE PENDING', 'CREATE IN_PROGRESS', 'ACTIVE']:
        workshop_userpersonalization_solution_version_arn = solution_vers['solutionVersionArn']
    if workshop_userpersonalization_solution_version_arn:
        break

if workshop_userpersonalization_solution_version_arn:
    print ('\nWe will be using the existing User Personalization Solution Version with workshop_rerank_solution_version_arn = {}'.format(workshop_userpersonalization_solution_version_arn))
else:
    user_personalization_create_solution_version_response = personalize.create_solution_version(
        solutionArn = workshop_userpersonalization_solution_arn
    )
    workshop_userpersonalization_solution_version_arn = user_personalization_create_solution_version_response['solutionVersionArn']
    print(json.dumps(user_personalization_create_solution_version_response, indent=2))
    
    print ('\nTraining the User Personalization Solution Version with user_personalization_solution_version_arn = {}'.format(workshop_userpersonalization_solution_version_arn))


We will be using the existing User Personalization Solution Version with workshop_rerank_solution_version_arn = arn:aws:personalize:us-east-1:908388459961:solution/immersion_day_user_personalization_news/39a271a4


### View Solution Version Creation Status

To view the status updates in the console:

* In another browser tab you should already have the AWS Console up from opening this notebook instance. 
* Switch to that tab and search at the top for the service `Personalize`, then go to that service page. 
* Click `Dataset groups`.
* Click the name of your dataset group, if you did not change it, it is "".

Or simply run the cell below to keep track of the solution version creation status.

In [8]:
max_time = time.time() + 10*60*60 # 10 hours
while time.time() < max_time:

    # User Personalization Solution Version
    user_personalization_version_response = personalize.describe_solution_version(
        solutionVersionArn = workshop_userpersonalization_solution_version_arn
    )
    status_user_personalization_solution = user_personalization_version_response["solutionVersion"]["status"]

    if status_user_personalization_solution == "ACTIVE":
        print("Build succeeded for {}".format(workshop_userpersonalization_solution_version_arn))
        
    elif status_user_personalization_solution == "CREATE FAILED":
        print("Build failed for {}".format(workshop_userpersonalization_solution_version_arn))
        break

    if not status_user_personalization_solution == "ACTIVE":
        print("User Personalization Solution Version build is still in progress")
    else:
        print("The User Personalization Solution Version is ACTIVE")
        
    if status_user_personalization_solution == "ACTIVE":
        break
        
    print()
    time.sleep(60)

Build succeeded for arn:aws:personalize:us-east-1:908388459961:solution/immersion_day_user_personalization_news/39a271a4
The User Personalization Solution Version is ACTIVE


## Deploy Campaigns <a class="anchor" id="deploy"></a>
[Back to top](#top)

Once a solution version is created, it is possible to get recommendations from them, and to get a feel for their overall behavior.

For real-time recommendations, after you prepare and import data and creating a solution, you are ready to deploy your solution version to generate recommendations. You deploy a solution version by creating an Amazon Personalize campaign. If you are getting batch recommendations, you don't need to create a campaign. For more information see [Getting batch recommendations and user segments](https://docs.aws.amazon.com/personalize/latest/dg/recommendations-batch.html).

We will deploy a campaign for the User Personalization solution version and our Personalized Ranking solution verison

### Create campaigns

A campaign is a hosted solution version; an endpoint which you can query for recommendations. Pricing is set by estimating throughput capacity (requests from users for personalization per second). When deploying a campaign, you set a minimum throughput per second (TPS) value. This service, like many within AWS, will automatically scale based on demand, but if latency is critical, you may want to provision ahead for larger demand. For this POC and demo, all minimum throughput thresholds are set to 1. For more information, see the [pricing page](https://aws.amazon.com/personalize/pricing/).

Once we're satisfied with our solution version, we need to create Campaigns for each solution version. When creating a campaign you specify the minimum transactions per second (`minProvisionedTPS`) that you expect to make against the service for this campaign. Personalize will automatically scale the inference endpoint up and down for the campaign to match demand but will never scale below `minProvisionedTPS`.

Let's create a campaigns for our User Personalization solution version with `minProvisionedTPS` set at 1.

In [9]:
try:
    user_personalization_create_campaign_response = personalize.create_campaign(
        name = workshop_userpersonalization_campaign_name,
        solutionVersionArn = workshop_userpersonalization_solution_version_arn,
        minProvisionedTPS = 1,
        campaignConfig={
                'enableMetadataWithRecommendations': True
        }
    )
    metadata_flag = False
    workshop_userpersonalization_campaign_arn = user_personalization_create_campaign_response['campaignArn']
    print(json.dumps(user_personalization_create_campaign_response, indent=2))

    print ('\nCreating the user personalization campaign with arn = {}'.format(workshop_userpersonalization_campaign_arn))

except personalize.exceptions.ResourceAlreadyExistsException as e:
    workshop_userpersonalization_campaign_arn =  'arn:aws:personalize:'+region+':'+account_id+':campaign/'+workshop_userpersonalization_campaign_name
    print('The user personalization campaign {} already exists.'.format(workshop_userpersonalization_campaign_arn))
    print ('\nWe will be using the existing user personalization campaign with workshop_userpersonalization_campaign_arn = {}'.format(workshop_userpersonalization_campaign_arn))
    print('\nUpdating campaign to return metadata')
    try:
        response = personalize.update_campaign(
            campaignArn = workshop_userpersonalization_campaign_arn,
            campaignConfig={
                'enableMetadataWithRecommendations': True
            }
        )
        metadata_flag = True
    except personalize.exceptions.InvalidInputException as e:
        print('\nCampaign {} already returns metadata as desired'.format(workshop_userpersonalization_campaign_arn))
        metadata_flag = False

The user personalization campaign arn:aws:personalize:us-east-1:908388459961:campaign/immersion_day_user_personalization_news_campaign already exists.

We will be using the existing user personalization campaign with workshop_userpersonalization_campaign_arn = arn:aws:personalize:us-east-1:908388459961:campaign/immersion_day_user_personalization_news_campaign

Updating campaign to return metadata


### View campaign creation status

This is how you view the status updates in the console:

* In another browser tab you should already have the AWS Console open from opening this notebook instance. 
* Switch to that tab and search at the top for the service `Personalize`, then go to that service page. 
* Click `Dataset groups`.
* Click the name of your dataset group.
* Click `Custom Resources`
* Click `Campaigns`.
* You will now see a list of all of the campaigns you created above, including a column with the status of the campaign. Once it is `Active`, your campaign is ready to be queried.

Or simply run the cell below to keep track of the campaign creation status of the campaign we created.

While you are waiting for this to complete you can learn more about campaigns in [the documentation](https://docs.aws.amazon.com/personalize/latest/dg/campaigns.html)

In [10]:
%time

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    if metadata_flag:
        user_personalization_campaign_response = personalize.describe_campaign(
            campaignArn = workshop_userpersonalization_campaign_arn
        )
        status_user_personalization = user_personalization_campaign_response['campaign']['latestCampaignUpdate']['status']

        if status_user_personalization == 'ACTIVE':
            print('Build succeeded for {}'.format(workshop_userpersonalization_campaign_arn))
        elif status_user_personalization == "CREATE FAILED":
            print('Build failed for {}'.format(workshop_userpersonalization_campaign_arn))
            break
        else:
            print('The user personalization campaign update is still in progress')

        if status_user_personalization == "ACTIVE":
            break  

        time.sleep(60)
        
        
    else:
        user_personalization_campaign_response = personalize.describe_campaign(
            campaignArn = workshop_userpersonalization_campaign_arn
        )
        status_user_personalization = user_personalization_campaign_response['campaign']['status']

        if status_user_personalization == 'ACTIVE':
            print('Build succeeded for {}'.format(workshop_userpersonalization_campaign_arn))
        elif status_user_personalization == "CREATE FAILED":
            print('Build failed for {}'.format(workshop_userpersonalization_campaign_arn))
            break
        else:
            print('The user personalization campaign deployment is still in progress')

        if status_user_personalization == "ACTIVE":
            break  

        time.sleep(60)

CPU times: user 2 μs, sys: 0 ns, total: 2 μs
Wall time: 5.48 μs
Build succeeded for arn:aws:personalize:us-east-1:908388459961:campaign/immersion_day_user_personalization_news_campaign


## Evaluate solution versions and recommenders <a class="anchor" id="eval"></a>
[Back to top](#top)

Personalize calculates these metrics based on a subset of the training data. The image below illustrates how Personalize splits the data. Given 10 users, with 10 interactions each (a circle represents an interaction), the interactions are ordered from oldest to newest based on the timestamp. Personalize uses all the interaction data from 90% of the users (blue circles) to train the solution version, and the remaining 10% for evaluation. For each of the users in the remaining 10%, 90% of their interaction data (green circles) is used as input for the call to the trained model. The remaining 10% of their data (orange circle) is compared to the output produced by the model and used to calculate the evaluation metrics.

![personalize metrics](Images/personalize_metrics.png)

We recommend reading [the documentation](https://docs.aws.amazon.com/personalize/latest/dg/working-with-training-metrics.html) to understand the metrics, but we have also copied parts of the documentation below for convenience.

You need to understand the following terms regarding evaluation in Personalize:

* *Relevant recommendation* refers to a recommendation that matches a value in the testing data for the particular user.
* *Rank* refers to the position of a recommended item in the list of recommendations. Position 1 (the top of the list) is presumed to be the most relevant to the user.
* *Query* refers to the internal equivalent of a GetRecommendations call.

The metrics produced by Personalize are:
* **coverage**: The proportion of unique recommended items from all queries out of the total number of unique items in the training data (includes both the Items and Interactions datasets).
* **mean_reciprocal_rank_at_25**: The [mean of the reciprocal ranks](https://en.wikipedia.org/wiki/Mean_reciprocal_rank) of the first relevant recommendation out of the top 25 recommendations over all queries. This metric is appropriate if you're interested in the single highest ranked recommendation.
* **normalized_discounted_cumulative_gain_at_K**: Discounted gain assumes that recommendations lower on a list of recommendations are less relevant than higher recommendations. Therefore, each recommendation is discounted (given a lower weight) by a factor dependent on its position. To produce the [cumulative discounted gain](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) (DCG) at K, each relevant discounted recommendation in the top K recommendations is summed together. The normalized discounted cumulative gain (NDCG) is the DCG divided by the ideal DCG such that NDCG is between 0 - 1. (The ideal DCG is where the top K recommendations are sorted by relevance.) Amazon Personalize uses a weighting factor of 1/log(1 + position), where the top of the list is position 1. This metric rewards relevant items that appear near the top of the list, because the top of a list usually draws more attention.
* **precision_at_K**: The number of relevant recommendations out of the top K recommendations divided by K. This metric rewards precise recommendation of the relevant items.

Let's take a look at the evaluation metrics for each of the solutions produced in this notebook. Please note that your results might differ from the results described in the text of this notebook, due to the quality of the synthetic dataset. 

## User Personalization Metrics
Retrieve the evaluation metrics for the user personalization solution version.

In [11]:
userpersonalization_solution_metrics_response = personalize.get_solution_metrics(
    solutionVersionArn = workshop_userpersonalization_solution_version_arn
)

for metric in userpersonalization_solution_metrics_response["metrics"]:
    print ("{}: {}".format(metric,userpersonalization_solution_metrics_response["metrics"][metric] ))

coverage: 0.2615
mean_reciprocal_rank_at_25: 0.2259
normalized_discounted_cumulative_gain_at_10: 0.2614
normalized_discounted_cumulative_gain_at_25: 0.3117
normalized_discounted_cumulative_gain_at_5: 0.2363
precision_at_10: 0.0452
precision_at_25: 0.0304
precision_at_5: 0.0687


## Using evaluation metrics <a class="anchor" id="usemetrics"></a>
[Back to top](#top)

It is important to use evaluation metrics carefully. There are a number of factors to keep in mind.

* If there is an existing recommendation system in place, this will have influenced the user's interaction history which you use to train your new solutions. This means the evaluation metrics are biased to favor the existing solution. If you work to push the evaluation metrics to match or exceed the existing solution, you may just be pushing the User Personalization to behave like the existing solution and might not end up with something better.


Keeping in mind these factors, the evaluation metrics produced by Personalize are generally useful for two cases:
1. Comparing the performance of solution versions trained on the same recipe, but with different values for the hyperparameters and features (impression data etc)
1. Comparing the performance of solution versions trained on different recipes. Here also keep in mind that the recipes answer different use cases and comparing them to each other might not make sense in your solution.

Properly evaluating a recommendation system is always best done through A/B testing while measuring actual business outcomes. Since recommendations generated by a system usually influence the user behavior which it is based on, it is better to run small experiments and apply A/B testing for longer periods of time. Over time, the bias from the existing model will fade.

## Storing useful variables <a class="anchor" id="vars"></a>
[Back to top](#top)

Before exiting this notebook, run the following cells to save the version ARNs for use in the next notebook.

In [12]:
%store workshop_userpersonalization_solution_arn
%store workshop_userpersonalization_solution_version_arn
%store workshop_userpersonalization_campaign_arn

%store region
%store role_name
%store account_id

Stored 'workshop_userpersonalization_solution_arn' (str)
Stored 'workshop_userpersonalization_solution_version_arn' (str)
Stored 'workshop_userpersonalization_campaign_arn' (str)
Stored 'region' (str)
Stored 'role_name' (str)
Stored 'account_id' (str)


You're all set to move on to [the exploratory notebook `03_Inference_Layer.ipynb`](03_Inference_Layer.ipynb). Open it from the browser and you can start getting recommendations!