# Creating and Evaluating Solutions <a class="anchor" id="top"></a>

In this notebook, you will train several models using Amazon Personalize, specifically: 

1. User Personalization - what items are most relevant to a specific user.
1. Similar Items - given an item, what items are similar to it.
1. Personalized Ranking - given a user and a collection of items, in what order are they most releveant.

## Outline

1. [Introduction](#intro)
1. [Create solutions](#solutions)
1. [Evaluate solutions](#eval)
1. [Using evaluation metrics](#use)
1. [Storing useful variables](#vars)

## Introduction <a class="anchor" id="intro"></a>

To recap, the algorithms in Amazon Personalize (called recipes) look to solve different personalization use cases, explained here:

- **User Personalization** - predicts the items that a user will interact with based on Interactions, Items, and Users datasets. If you are building a recommendation system that provides personalized recommendations for each of your users, you should train your model with a User Personalization recipe
    - User-Personalization - The `aws-user-personalization` recipe is optimized for all personalized recommendation scenarios. It predicts the items that a user will interact with based on Interactions, Items, and Users datasets. When recommending items, it uses automatic item exploration.
    - Popularity-Count - The `aws-popularity-count` recipe recommends the most popular item items based on all of your user behavioral data. The most popular items have the most interactions with unique users. The recipe returns the same popular items for all users.
    - Legacy recipes - The functionality in the legacy `aws-hrnn`, `aws-hrnn-coldstart`, and `aws-hrnn-metadata` recipes has been unified and improved in the `aws-user-personalization` recipe. Therefore, it is recommended to use the `aws-user-personalization` recipe rather than the HRNN recipes.
- **Personalized Ranking** - provides recommendations in ranked order based on predicted interest level.
    - Personalized-Ranking - the `aws-personalized-ranking` recipe generates personalized rankings of items. A personalized ranking is a list of recommended items that are re-ranked for a specific user. This is useful if you have a collection of ordered items, such as search results, promotions, or curated lists, and you want to provide a personalized re-ranking for each of your users.
- **Related Items** - recipes that return items similar to an item that you specify when you get recommendations
    - Similar-Items - the `aws-similar-items` recipe generates recommendations for items that are similar to an item you specify. Similarity is calculated based on both interactions data and item metadata. Use Similar-Items when you have a catalog that includes many new items with item metadata but little to no interactions data.
    - SIMS - the `aws-sims` recipe, or item-to-item similarities (SIMS) recipe, generates items similar to a given item based on the co-occurrence of the item in user history in your Interactions dataset. If sufficient user behavior data for an item isn't available, or if the specified item ID isn't found, the recipe returns popular items as recommendations. 

Regardless of the use case, the algorithms all share a base of learning on user-item-interaction data which is defined by 3 core attributes:

1. **UserID** - The user who interacted
1. **ItemID** - The item the user interacted with
1. **Timestamp** - The time at which the interaction occurred

We also support optional event types and event values defined by:

1. **Event Type** - Categorical label of an event (clicked, purchased, rated, listened, watched, etc).
1. **Event Value** - A numeric value corresponding to the event type that occurred. This value can be used to filter interactions that are included in model training by specifying a minimum threshold. For example, suppose you have an event type of `Rated` and the value is the user rating on a scale of 0 to 5. Since Personalize models on positive interactions, you can use an event value threshold of, say, 3 to only include interactions with an event type of `Rated` that have an event value of 3 or higher.

The event type and event value fields are additional data which can be used to filter the data sent for training the personalization model. In this particular exercise we will not have an event type or event value (More information on how to use the eventValue with eventValueThreshold in the [documentation](https://docs.aws.amazon.com/personalize/latest/dg/recording-events.html)). 

To run this notebook, you need to have run the previous notebooks, `01_Data_Layer.ipynb`, where you created a dataset and imported interaction and item metadata data into Amazon Personalize. At the end of that notebook, you saved some of the variable values, which you now need to load into this notebook.

In [3]:
%store -r

## Create solutions <a class="anchor" id="solutions"></a>
[Back to top](#top)

In this notebook, you will create solutions with the following recipes:

1. User Personalization
1. SIMS
1. Personalized-Ranking

The Popularity-Count recipe is the simplest solution available in Amazon Personalize and it should only be used as a fallback, so it will also not be covered in this notebook.

Similar to the previous notebook, start by importing the relevant packages, and set up a connection to Amazon Personalize using the SDK.

In [4]:
import time
from time import sleep
import json
from datetime import datetime
import uuid
import random

import boto3
import botocore
from botocore.exceptions import ClientError
import pandas as pd


In [5]:
# Configure the SDK to Personalize:
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

In Amazon Personalize, a specific variation of an algorithm is called a recipe. Different recipes are suitable for different situations. A trained model is called a solution, and each solution can have many versions that relate to a given volume of data when the model was trained.

To start, we will list all the recipes that are supported. This will allow you to select one and use that to build your model.

In [4]:
personalize.list_recipes()

{'recipes': [{'name': 'aws-hrnn',
   'recipeArn': 'arn:aws:personalize:::recipe/aws-hrnn',
   'status': 'ACTIVE',
   'creationDateTime': datetime.datetime(2019, 6, 10, 0, 0, tzinfo=tzlocal()),
   'lastUpdatedDateTime': datetime.datetime(2021, 1, 5, 0, 8, 53, 800000, tzinfo=tzlocal())},
  {'name': 'aws-hrnn-coldstart',
   'recipeArn': 'arn:aws:personalize:::recipe/aws-hrnn-coldstart',
   'status': 'ACTIVE',
   'creationDateTime': datetime.datetime(2019, 6, 10, 0, 0, tzinfo=tzlocal()),
   'lastUpdatedDateTime': datetime.datetime(2021, 1, 5, 0, 8, 53, 800000, tzinfo=tzlocal())},
  {'name': 'aws-hrnn-metadata',
   'recipeArn': 'arn:aws:personalize:::recipe/aws-hrnn-metadata',
   'status': 'ACTIVE',
   'creationDateTime': datetime.datetime(2019, 6, 10, 0, 0, tzinfo=tzlocal()),
   'lastUpdatedDateTime': datetime.datetime(2021, 1, 5, 0, 8, 53, 800000, tzinfo=tzlocal())},
  {'name': 'aws-personalized-ranking',
   'recipeArn': 'arn:aws:personalize:::recipe/aws-personalized-ranking',
   'status'

The output is just a JSON representation of all of the algorithms mentioned in the introduction.

Next we will select specific recipes and build models with them.

### User Personalization
The User-Personalization (aws-user-personalization) recipe is optimized for all USER_PERSONALIZATION recommendation scenarios. When recommending items, it uses automatic item exploration.

With automatic exploration, Amazon Personalize automatically tests different item recommendations, learns from how users interact with these recommended items, and boosts recommendations for items that drive better engagement and conversion. This improves item discovery and engagement when you have a fast-changing catalog, or when new items, such as news articles or promotions, are more relevant to users when fresh.

You can balance how much to explore (where items with less interactions data or relevance are recommended more frequently) against how much to exploit previous interactions (where recommendations are based on what we know or relevance). Amazon Personalize automatically adjusts future recommendations based on implicit user feedback.

First, select the recipe by finding the ARN in the list of recipes above.

In [6]:
user_personalization_recipe_arn = "arn:aws:personalize:::recipe/aws-user-personalization"

#### Create the solution

First you create a solution using the recipe. Although you provide the dataset ARN in this step, the model is not yet trained. See this as an identifier instead of a trained model.

In [7]:
user_personalization_create_solution_response = personalize.create_solution(
    name = "personalize-poc-userpersonalization",
    datasetGroupArn = dataset_group_arn,
    recipeArn = user_personalization_recipe_arn
)

user_personalization_solution_arn = user_personalization_create_solution_response['solutionArn']


In [8]:
print(json.dumps(user_personalization_solution_arn, indent=2))

"arn:aws:personalize:us-east-1:832194813872:solution/personalize-poc-userpersonalization"


#### Create the solution version

Once you have a solution, you need to create a version in order to complete the model training. The training can take a while to complete, upwards of 25 minutes, and an average of 90 minutes for this recipe with our dataset. Normally, we would use a while loop to poll until the task is completed. However the task would block other cells from executing, and the goal here is to create many models and deploy them quickly. So we will set up the while loop for all of the solutions further down in the notebook. There, you will also find instructions for viewing the progress in the AWS console.

In [9]:
userpersonalization_create_solution_version_response = personalize.create_solution_version(
    solutionArn = user_personalization_solution_arn
)

In [10]:
userpersonalization_solution_version_arn = userpersonalization_create_solution_version_response['solutionVersionArn']
print(json.dumps(userpersonalization_create_solution_version_response, indent=2))

{
  "solutionArn": "arn:aws:personalize:us-east-1:832194813872:solution/personalize-poc-userpersonalization",
  "ResponseMetadata": {
    "RequestId": "932aeb2f-d9be-4611-8bb8-5227f1696de2",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 01 Feb 2021 20:14:15 GMT",
      "x-amzn-requestid": "932aeb2f-d9be-4611-8bb8-5227f1696de2",
      "content-length": "105",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


### SIMS


SIMS is one of the oldest algorithms used within Amazon for recommendation systems. A core use case for it is when you have one item and you want to recommend items that have been interacted with in similar ways over your entire user base. This means the result is not personalized per user. Sometimes this leads to recommending mostly popular items, so there is a hyperparameter ([popularity_discount_factor](https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-sims.html)) that can be tweaked which will reduce the popular items in your results. 

For our use case, using the Movielens data, let's assume we pick a particular movie. We can then use SIMS to recommend other movies based on the interaction behavior of the entire user base. The results are not personalized per user, but instead, differ depending on the movie we chose as our input.

Just like last time, we start by selecting the recipe.

In [11]:
SIMS_recipe_arn = "arn:aws:personalize:::recipe/aws-sims"

#### Create the solution

As with User Personalization, start by creating the solution first. Although you provide the dataset ARN in this step, the model is not yet trained. See this as an identifier instead of a trained model.

In [12]:
sims_create_solution_response = personalize.create_solution(
    name = "personalize-poc-sims",
    datasetGroupArn = dataset_group_arn,
    recipeArn = SIMS_recipe_arn
)

sims_solution_arn = sims_create_solution_response['solutionArn']
print(json.dumps(sims_create_solution_response, indent=2))

{
  "solutionArn": "arn:aws:personalize:us-east-1:832194813872:solution/personalize-poc-sims",
  "ResponseMetadata": {
    "RequestId": "e329c7d2-5e16-4b4a-9ad9-9859864a1624",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 01 Feb 2021 20:14:26 GMT",
      "x-amzn-requestid": "e329c7d2-5e16-4b4a-9ad9-9859864a1624",
      "content-length": "90",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Create the solution version

Once you have a solution, you need to create a version in order to complete the model training. The training can take a while to complete, upwards of 25 minutes, and an average of 35 minutes for this recipe with our dataset. Normally, we would use a while loop to poll until the task is completed. However the task would block other cells from executing, and the goal here is to create many models and deploy them quickly. So we will set up the while loop for all of the solutions further down in the notebook. There, you will also find instructions for viewing the progress in the AWS console.

In [13]:
sims_create_solution_version_response = personalize.create_solution_version(
    solutionArn = sims_solution_arn
)

In [14]:
sims_solution_version_arn = sims_create_solution_version_response['solutionVersionArn']
print(json.dumps(sims_create_solution_version_response, indent=2))

{
  "solutionVersionArn": "arn:aws:personalize:us-east-1:832194813872:solution/personalize-poc-sims/43c1eddf",
  "ResponseMetadata": {
    "RequestId": "9eaaa908-6b99-49b6-a075-5e8b48258e3f",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 01 Feb 2021 20:14:28 GMT",
      "x-amzn-requestid": "9eaaa908-6b99-49b6-a075-5e8b48258e3f",
      "content-length": "106",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


### Personalized Ranking

Personalized Ranking is an interesting application of HRNN. Instead of just recommending what is most probable for the user in question, this algorithm takes in a list of items as well as a user. The items are then returned back in the order of most probable relevance for the user. The use case here is for filtering on unique categories that you do not have item metadata to create a filter, or when you have a broad collection that you would like better ordered for a particular user.

For our use case, using the MovieLens data, we could imagine that a Video on Demand application may want to create a shelf of comic book movies, or movies by a specific director. We can generate these lists based on metadata we have. We would use personalized ranking to re-order the list of movies for each user. 

Just like last time, we start by selecting the recipe.

In [15]:
rerank_recipe_arn = "arn:aws:personalize:::recipe/aws-personalized-ranking"

#### Create the solution

As with the previous solution, start by creating the solution first. Although you provide the dataset ARN in this step, the model is not yet trained. See this as an identifier instead of a trained model.

In [16]:
rerank_create_solution_response = personalize.create_solution(
    name = "personalize-poc-rerank",
    datasetGroupArn = dataset_group_arn,
    recipeArn = rerank_recipe_arn
)

rerank_solution_arn = rerank_create_solution_response['solutionArn']
print(json.dumps(rerank_create_solution_response, indent=2))

{
  "solutionArn": "arn:aws:personalize:us-east-1:832194813872:solution/personalize-poc-rerank",
  "ResponseMetadata": {
    "RequestId": "98878e7a-1de9-4faf-8944-2aa68ef310e6",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 01 Feb 2021 20:14:34 GMT",
      "x-amzn-requestid": "98878e7a-1de9-4faf-8944-2aa68ef310e6",
      "content-length": "92",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Create the solution version

Once you have a solution, you need to create a version in order to complete the model training. The training can take a while to complete, upwards of 25 minutes, and an average of 35 minutes for this recipe with our dataset. Normally, we would use a while loop to poll until the task is completed. However the task would block other cells from executing, and the goal here is to create many models and deploy them quickly. So we will set up the while loop for all of the solutions further down in the notebook. There, you will also find instructions for viewing the progress in the AWS console.

In [17]:
rerank_create_solution_version_response = personalize.create_solution_version(
    solutionArn = rerank_solution_arn
)

In [18]:
rerank_solution_version_arn = rerank_create_solution_version_response['solutionVersionArn']
print(json.dumps(rerank_create_solution_version_response, indent=2))

{
  "solutionVersionArn": "arn:aws:personalize:us-east-1:832194813872:solution/personalize-poc-rerank/ab00b6fb",
  "ResponseMetadata": {
    "RequestId": "2b4c6e17-8987-4140-98d3-cc921abfbb1f",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 01 Feb 2021 20:14:37 GMT",
      "x-amzn-requestid": "2b4c6e17-8987-4140-98d3-cc921abfbb1f",
      "content-length": "108",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


### View solution creation status

As promised, how to view the status updates in the console:

* In another browser tab you should already have the AWS Console up from opening this notebook instance. 
* Switch to that tab and search at the top for the service `Personalize`, then go to that service page. 
* Click `View dataset groups`.
* Click the name of your dataset group, most likely something with POC in the name.
* Click `Solutions and recipes`.
* You will now see a list of all of the solutions you created above,  including a column with the status of the solution versions. Once it is `Active`, your solution is ready to be reviewed. It is also capable of being deployed.

Or simply run the cell below to keep track of the solution version creation status.

In [21]:
in_progress_solution_versions = [
    userpersonalization_solution_version_arn,
    sims_solution_version_arn,
    rerank_solution_version_arn
]

max_time = time.time() + 10*60*60 # 10 hours
while time.time() < max_time:
    for solution_version_arn in in_progress_solution_versions:
        version_response = personalize.describe_solution_version(
            solutionVersionArn = solution_version_arn
        )
        status = version_response["solutionVersion"]["status"]
        
        if status == "ACTIVE":
            print("Build succeeded for {}".format(solution_version_arn))
            in_progress_solution_versions.remove(solution_version_arn)
        elif status == "CREATE FAILED":
            print("Build failed for {}".format(solution_version_arn))
            in_progress_solution_versions.remove(solution_version_arn)
    
    if len(in_progress_solution_versions) <= 0:
        break
    else:
        print("At least one solution build is still in progress")
        
    time.sleep(60)

At least one solution build is still in progress
At least one solution build is still in progress
At least one solution build is still in progress
At least one solution build is still in progress
At least one solution build is still in progress
At least one solution build is still in progress
At least one solution build is still in progress
At least one solution build is still in progress
At least one solution build is still in progress
At least one solution build is still in progress
Build succeeded for arn:aws:personalize:us-east-1:832194813872:solution/personalize-poc-sims/43c1eddf
At least one solution build is still in progress
At least one solution build is still in progress
At least one solution build is still in progress
At least one solution build is still in progress
Build succeeded for arn:aws:personalize:us-east-1:832194813872:solution/personalize-poc-userpersonalization/4078e9b0
At least one solution build is still in progress
Build succeeded for arn:aws:personalize:us-eas

### Hyperparameter tuning

Personalize offers the option of running hyperparameter tuning when creating a solution. Because of the additional computation required to perform hyperparameter tuning, this feature is turned off by default. Therefore, the solutions we created above, will simply use the default values of the hyperparameters for each recipe. For more information about hyperparameter tuning, see the [documentation](https://docs.aws.amazon.com/personalize/latest/dg/customizing-solution-config-hpo.html).

If you have settled on the correct recipe to use, and are ready to run hyperparameter tuning, the following code shows how you would do so, using SIMS as an example.

```python
sims_create_solution_response = personalize.create_solution(
    name = "personalize-poc-sims-hpo",
    datasetGroupArn = dataset_group_arn,
    recipeArn = SIMS_recipe_arn,
    performHPO=True
)

sims_solution_arn = sims_create_solution_response['solutionArn']
print(json.dumps(sims_create_solution_response, indent=2))
```

If you already know the values you want to use for a specific hyperparameter, you can also set this value when you create the solution. The code below shows how you could set the value for the `popularity_discount_factor` for the SIMS recipe.

```python
sims_create_solution_response = personalize.create_solution(
    name = "personalize-poc-sims-set-hp",
    datasetGroupArn = dataset_group_arn,
    recipeArn = SIMS_recipe_arn,
    solutionConfig = {
        'algorithmHyperParameters': {
            'popularity_discount_factor': '0.7'
        }
    }
)

sims_solution_arn = sims_create_solution_response['solutionArn']
print(json.dumps(sims_create_solution_response, indent=2))
```

## Evaluate solution versions <a class="anchor" id="eval"></a>
[Back to top](#top)

It should not take more than an hour to train all the solutions from this notebook. While training is in progress, we recommend taking the time to read up on the various algorithms (recipes) and their behavior in detail. This is also a good time to consider alternatives to how the data was fed into the system and what kind of results you expect to see.

When the solutions finish creating, the next step is to obtain the evaluation metrics. Personalize calculates these metrics based on a subset of the training data. The image below illustrates how Personalize splits the data. Given 10 users, with 10 interactions each (a circle represents an interaction), the interactions are ordered from oldest to newest based on the timestamp. Personalize uses all of the interaction data from 90% of the users (blue circles) to train the solution version, and the remaining 10% for evaluation. For each of the users in the remaining 10%, 90% of their interaction data (green circles) is used as input for the call to the trained model. The remaining 10% of their data (orange circle) is compared to the output produced by the model and used to calculate the evaluation metrics.

![personalize metrics](../../static/imgs/personalize_metrics.png)

We recommend reading [the documentation](https://docs.aws.amazon.com/personalize/latest/dg/working-with-training-metrics.html) to understand the metrics, but we have also copied parts of the documentation below for convenience.

You need to understand the following terms regarding evaluation in Personalize:

* *Relevant recommendation* refers to a recommendation that matches a value in the testing data for the particular user.
* *Rank* refers to the position of a recommended item in the list of recommendations. Position 1 (the top of the list) is presumed to be the most relevant to the user.
* *Query* refers to the internal equivalent of a GetRecommendations call.

The metrics produced by Personalize are:
* **coverage**: The proportion of unique recommended items from all queries out of the total number of unique items in the training data (includes both the Items and Interactions datasets).
* **mean_reciprocal_rank_at_25**: The [mean of the reciprocal ranks](https://en.wikipedia.org/wiki/Mean_reciprocal_rank) of the first relevant recommendation out of the top 25 recommendations over all queries. This metric is appropriate if you're interested in the single highest ranked recommendation.
* **normalized_discounted_cumulative_gain_at_K**: Discounted gain assumes that recommendations lower on a list of recommendations are less relevant than higher recommendations. Therefore, each recommendation is discounted (given a lower weight) by a factor dependent on its position. To produce the [cumulative discounted gain](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) (DCG) at K, each relevant discounted recommendation in the top K recommendations is summed together. The normalized discounted cumulative gain (NDCG) is the DCG divided by the ideal DCG such that NDCG is between 0 - 1. (The ideal DCG is where the top K recommendations are sorted by relevance.) Amazon Personalize uses a weighting factor of 1/log(1 + position), where the top of the list is position 1. This metric rewards relevant items that appear near the top of the list, because the top of a list usually draws more attention.
* **precision_at_K**: The number of relevant recommendations out of the top K recommendations divided by K. This metric rewards precise recommendation of the relevant items.

Let's take a look at the evaluation metrics for each of the solutions produced in this notebook. Please note that your results might differ from the results described in the text of this notebook, due to the quality of the Movielens dataset. 

### User Personalization metrics

First, retrieve the evaluation metrics for the User Personalization solution version.

In [22]:
user_personalization_solution_metrics_response = personalize.get_solution_metrics(
    solutionVersionArn = userpersonalization_solution_version_arn
)

print(json.dumps(user_personalization_solution_metrics_response, indent=2))

{
  "solutionVersionArn": "arn:aws:personalize:us-east-1:832194813872:solution/personalize-poc-userpersonalization/4078e9b0",
  "metrics": {
    "coverage": 0.0772,
    "mean_reciprocal_rank_at_25": 0.2879,
    "normalized_discounted_cumulative_gain_at_10": 0.3109,
    "normalized_discounted_cumulative_gain_at_25": 0.3537,
    "normalized_discounted_cumulative_gain_at_5": 0.2794,
    "precision_at_10": 0.0571,
    "precision_at_25": 0.0321,
    "precision_at_5": 0.0821
  },
  "ResponseMetadata": {
    "RequestId": "349c6847-5364-4501-8692-a39847878633",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 01 Feb 2021 21:00:54 GMT",
      "x-amzn-requestid": "349c6847-5364-4501-8692-a39847878633",
      "content-length": "419",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


The normalized discounted cumulative gain above tells us that at 5 items, we have less than a (38% for full 22% for small) chance in a recommendation being a part of a user's interaction history (in the hold out phase from training and validation). Around 13% of the recommended items are unique, and we have a precision of only (14% for full, 7.5% for small) in the top 5 recommended items. 

This is clearly not a great model, but keep in mind that we had to use rating data for our interactions because Movielens is an explicit dataset based on ratings. The Timestamps also were from the time that the movie was rated, not watched, so the order is not the same as the order a viewer would watch movies.

### SIMS metrics

Now, retrieve the evaluation metrics for the SIMS solution version.

In [23]:
sims_solution_metrics_response = personalize.get_solution_metrics(
    solutionVersionArn = sims_solution_version_arn
)

print(json.dumps(sims_solution_metrics_response, indent=2))

{
  "solutionVersionArn": "arn:aws:personalize:us-east-1:832194813872:solution/personalize-poc-sims/43c1eddf",
  "metrics": {
    "coverage": 0.1858,
    "mean_reciprocal_rank_at_25": 0.1682,
    "normalized_discounted_cumulative_gain_at_10": 0.2278,
    "normalized_discounted_cumulative_gain_at_25": 0.2939,
    "normalized_discounted_cumulative_gain_at_5": 0.1869,
    "precision_at_10": 0.0618,
    "precision_at_25": 0.0429,
    "precision_at_5": 0.0794
  },
  "ResponseMetadata": {
    "RequestId": "21bea9f7-129f-4bd3-9cab-beba23679976",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 01 Feb 2021 21:00:56 GMT",
      "x-amzn-requestid": "21bea9f7-129f-4bd3-9cab-beba23679976",
      "content-length": "404",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


In this example we are seeing a slightly elevated precision at 5 items, a little over (4.5% for full, 6.4% for small) this time. Effectively this is probably within the margin of error, but given that no effort was made to mask popularity, it may just be returning super popular results that a large volume of users have interacted with in some way. 

### Personalized ranking metrics

Now, retrieve the evaluation metrics for the personalized ranking solution version.

In [24]:
rerank_solution_metrics_response = personalize.get_solution_metrics(
    solutionVersionArn = rerank_solution_version_arn
)

print(json.dumps(rerank_solution_metrics_response, indent=2))

{
  "solutionVersionArn": "arn:aws:personalize:us-east-1:832194813872:solution/personalize-poc-rerank/ab00b6fb",
  "metrics": {
    "coverage": 0.1091,
    "mean_reciprocal_rank_at_25": 0.1428,
    "normalized_discounted_cumulative_gain_at_10": 0.1823,
    "normalized_discounted_cumulative_gain_at_25": 0.2514,
    "normalized_discounted_cumulative_gain_at_5": 0.1599,
    "precision_at_10": 0.0321,
    "precision_at_25": 0.0272,
    "precision_at_5": 0.0491
  },
  "ResponseMetadata": {
    "RequestId": "0f9ded34-6eb6-41d0-84f9-4530bc605e90",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 01 Feb 2021 21:01:00 GMT",
      "x-amzn-requestid": "0f9ded34-6eb6-41d0-84f9-4530bc605e90",
      "content-length": "406",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


Just a quick comment on this one, here we see again a precision of near (2.7% for full, 2.2% for small), as this is based on User Personalization, that is to be expected. However the sample items are not the same items used for validaiton, thus the low scoring.

## Using evaluation metrics <a class="anchor" id="use"></a>
[Back to top](#top)

It is important to use evaluation metrics carefully. There are a number of factors to keep in mind.

* If there is an existing recommendation system in place, this will have influenced the user's interaction history which you use to train your new solutions. This means the evaluation metrics are biased to favor the existing solution. If you work to push the evaluation metrics to match or exceed the existing solution, you may just be pushing the User Personalization to behave like the existing solution and might not end up with something better.
* The HRNN Coldstart recipe is difficult to evaluate using the metrics produced by Amazon Personalize. The aim of the recipe is to recommend items which are new to your business. Therefore, these items will not appear in the existing user transaction data which is used to compute the evaluation metrics. As a result, HRNN Coldstart will never appear to perform better than the other recipes, when compared on the evaluation metrics alone. Note: The User Personalization recipe also includes improved cold start functionality

Keeping in mind these factors, the evaluation metrics produced by Personalize are generally useful for two cases:
1. Comparing the performance of solution versions trained on the same recipe, but with different values for the hyperparameters and features (impression data etc)
1. Comparing the performance of solution versions trained on different recipes (except HRNN Coldstart). Here also keep in mind that the recipes answer different use cases and comparing them to each other might not make sense in your solution.

Properly evaluating a recommendation system is always best done through A/B testing while measuring actual business outcomes. Since recommendations generated by a system usually influence the user behavior which it is based on, it is better to run small experiments and apply A/B testing for longer periods of time. Over time, the bias from the existing model will fade.

# Deploying Campaigns and Filters<a class="anchor" id="top"></a>

In this notebook, you will deploy and interact with campaigns in Amazon Personalize.

1. [Introduction](#intro)
1. [Create campaigns](#create)
1. [Interact with campaigns](#interact)
1. [Batch recommendations](#batch)
1. [Wrap up](#wrapup)

## Introduction <a class="anchor" id="intro"></a>
[Back to top](#top)

At this point, you should have several solutions and at least one solution version for each. Once a solution version is created, it is possible to get recommendations from them, and to get a feel for their overall behavior.

This notebook starts off by deploying each of the solution versions from the previous notebook into individual campaigns. Once they are active, there are resources for querying the recommendations, and helper functions to digest the output into something more human-readable. 

As you with your customer on Amazon Personalize, you can modify the helper functions to fit the structure of their data input files to keep the additional rendering working.

To get started, once again, we need to import libraries, load values from previous notebooks, and load the SDK.

In [3]:
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

# Establish a connection to Personalize's event streaming
personalize_events = boto3.client(service_name='personalize-events')

## Create campaigns <a class="anchor" id="create"></a>
[Back to top](#top)

A campaign is a hosted solution version; an endpoint which you can query for recommendations. Pricing is set by estimating throughput capacity (requests from users for personalization per second). When deploying a campaign, you set a minimum throughput per second (TPS) value. This service, like many within AWS, will automatically scale based on demand, but if latency is critical, you may want to provision ahead for larger demand. For this POC and demo, all minimum throughput thresholds are set to 1. For more information, see the [pricing page](https://aws.amazon.com/personalize/pricing/).

Let's start deploying the campaigns.

### User Personalization

Deploy a campaign for your User Personalization solution version. It can take around 10 minutes to deploy a campaign. Normally, we would use a while loop to poll until the task is completed. However the task would block other cells from executing, and the goal here is to create multiple campaigns. So we will set up the while loop for all of the campaigns further down in the notebook. There, you will also find instructions for viewing the progress in the AWS console.

In [4]:
userpersonalization_create_campaign_response = personalize.create_campaign(
    name = "personalize-poc-userpersonalization",
    solutionVersionArn = userpersonalization_solution_version_arn,
    minProvisionedTPS = 1
)

userpersonalization_campaign_arn = userpersonalization_create_campaign_response['campaignArn']
print(json.dumps(userpersonalization_create_campaign_response, indent=2))

{
  "campaignArn": "arn:aws:personalize:us-east-1:832194813872:campaign/personalize-poc-userpersonalization",
  "ResponseMetadata": {
    "RequestId": "a76d9f06-e1dd-4e55-89ae-4e047b2cac87",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 01 Feb 2021 21:01:18 GMT",
      "x-amzn-requestid": "a76d9f06-e1dd-4e55-89ae-4e047b2cac87",
      "content-length": "105",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


### SIMS

Deploy a campaign for your SIMS solution version. It can take around 10 minutes to deploy a campaign. Normally, we would use a while loop to poll until the task is completed. However the task would block other cells from executing, and the goal here is to create multiple campaigns. So we will set up the while loop for all of the campaigns further down in the notebook. There, you will also find instructions for viewing the progress in the AWS console.

In [5]:
sims_create_campaign_response = personalize.create_campaign(
    name = "personalize-poc-SIMS",
    solutionVersionArn = sims_solution_version_arn,
    minProvisionedTPS = 1
)

sims_campaign_arn = sims_create_campaign_response['campaignArn']
print(json.dumps(sims_create_campaign_response, indent=2))

{
  "campaignArn": "arn:aws:personalize:us-east-1:832194813872:campaign/personalize-poc-SIMS",
  "ResponseMetadata": {
    "RequestId": "e2b3b7d0-76d1-4f53-b54c-30325389cf61",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 01 Feb 2021 21:01:20 GMT",
      "x-amzn-requestid": "e2b3b7d0-76d1-4f53-b54c-30325389cf61",
      "content-length": "90",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


### Personalized Ranking

Deploy a campaign for your personalized ranking solution version. It can take around 10 minutes to deploy a campaign. Normally, we would use a while loop to poll until the task is completed. However the task would block other cells from executing, and the goal here is to create multiple campaigns. So we will set up the while loop for all of the campaigns further down in the notebook. There, you will also find instructions for viewing the progress in the AWS console.

In [6]:
rerank_create_campaign_response = personalize.create_campaign(
    name = "personalize-poc-rerank",
    solutionVersionArn = rerank_solution_version_arn,
    minProvisionedTPS = 1
)

rerank_campaign_arn = rerank_create_campaign_response['campaignArn']
print(json.dumps(rerank_create_campaign_response, indent=2))

{
  "campaignArn": "arn:aws:personalize:us-east-1:832194813872:campaign/personalize-poc-rerank",
  "ResponseMetadata": {
    "RequestId": "efb219b6-ca5a-4a1e-a3ff-11cf2d38271c",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Mon, 01 Feb 2021 21:01:23 GMT",
      "x-amzn-requestid": "efb219b6-ca5a-4a1e-a3ff-11cf2d38271c",
      "content-length": "92",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


### View campaign creation status

As promised, this is how you view the status updates in the console:

* In another browser tab you should already have the AWS Console open from opening this notebook instance. 
* Switch to that tab and search at the top for the service `Personalize`, then go to that service page. 
* Click `View dataset groups`.
* Click the name of your dataset group, most likely something with POC in the name.
* Click `Campaigns`.
* You will now see a list of all of the campaigns you created above, including a column with the status of the campaign. Once it is `Active`, your campaign is ready to be queried.

Or simply run the cell below to keep track of the campaign creation status of the 3 campaigns we created.

In [7]:
in_progress_campaigns = [
    userpersonalization_campaign_arn,
    sims_campaign_arn,
    rerank_campaign_arn
]

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    for campaign_arn in in_progress_campaigns:
        version_response = personalize.describe_campaign(
            campaignArn = campaign_arn
        )
        status = version_response["campaign"]["status"]
        
        if status == "ACTIVE":
            print("Build succeeded for {}".format(campaign_arn))
            in_progress_campaigns.remove(campaign_arn)
        elif status == "CREATE FAILED":
            print("Build failed for {}".format(campaign_arn))
            in_progress_campaigns.remove(campaign_arn)
    
    if len(in_progress_campaigns) <= 0:
        break
    else:
        print("At least one campaign build is still in progress")
        
    time.sleep(60)

At least one campaign build is still in progress
At least one campaign build is still in progress
At least one campaign build is still in progress
At least one campaign build is still in progress
At least one campaign build is still in progress
At least one campaign build is still in progress
At least one campaign build is still in progress
At least one campaign build is still in progress
Build succeeded for arn:aws:personalize:us-east-1:832194813872:campaign/personalize-poc-SIMS
At least one campaign build is still in progress
Build succeeded for arn:aws:personalize:us-east-1:832194813872:campaign/personalize-poc-rerank
At least one campaign build is still in progress
Build succeeded for arn:aws:personalize:us-east-1:832194813872:campaign/personalize-poc-userpersonalization


## Create Static Filters <a class="anchor" id="interact"></a>
[Back to top](#top)

Now that all campaigns are deployed and active, we can create filters. Filters can be created for fields of both Items and Events. Filters can also be created dynamically for "IN" and "=" operations. For range queries, you should use static filters. 
Range queries use the following operations: NOT IN, <, >, <=, and >=.

A few common use cases for static filters in Video On Demand are:

Categorical filters based on Item Metadata (that are range based) - Often your item metadata will have information about the title such as year, user rating, available date. Filtering on these can provide recommendations within that data, such as movies that are available after a specific date, movies rated over 3 stars, movies from the 1990s etc.

User Demographic ranges - you may want to recommend content to specific age demographics, for this you can create a filter that is specific to a age range like over 18, over 18 AND under 30, etc).

Lets look at the item metadata and user interactions, so we can get an idea what type of filters we can create.

In [8]:
# Create a dataframe for the items by reading in the correct source CSV
items_meta_df = pd.read_csv(data_dir + '/item-meta.csv', sep=',', index_col=0)

# Render some sample data
items_meta_df.head(10)

Unnamed: 0_level_0,GENRE,YEAR,CREATION_TIMESTAMP
ITEM_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Adventure|Animation|Children|Comedy|Fantasy,1995,0
2,Adventure|Children|Fantasy,1995,0
3,Comedy|Romance,1995,0
4,Comedy|Drama|Romance,1995,0
5,Comedy,1995,0
6,Action|Crime|Thriller,1995,0
7,Comedy|Romance,1995,0
8,Adventure|Children,1995,0
9,Action,1995,0
10,Action|Adventure|Thriller,1995,0


Since there are a lot of genres to filter on, we will create a dynamic filter using the dynamic variable $GENRE, this will allow us to pass in the variable at runtime rather than create a static filter for each genre.

In [9]:
creategenrefilter_response = personalize.create_filter(name='Genre',
    datasetGroupArn=dataset_group_arn,
    filterExpression='INCLUDE ItemID WHERE Items.GENRE IN ($GENRE)'
    )

In [10]:
genre_filter_arn = creategenrefilter_response['filterArn']

Since we have added the year to our item metadata, lets create a decade filter to recommend only movies released in a given decade. A soft limit of Personalize at this time is 10 total filters, so we will create 7 decade filters for this workshop, leaving room for additional static and dynamic filters.

Create a list for the metadata decade filters and then create the actual filters with the cells below. Note this will take a few minutes to complete.

In [11]:
decades_to_filter = [1950,1960,1970,1980,1990,2000,2010]

In [12]:
# Create a list for the filters:
meta_filter_decade_arns = []

In [13]:
# Iterate through Genres
for decade in decades_to_filter:
    # Start by creating a filter
    current_decade = str(decade)
    next_decade = str(decade + 10)
    try:
        createfilter_response = personalize.create_filter(
            name=current_decade + "s",
            datasetGroupArn=dataset_group_arn,
            filterExpression='INCLUDE ItemID WHERE Items.YEAR >= '+ current_decade +' AND Items.YEAR < '+ next_decade +''
    )
        # Add the ARN to the list
        meta_filter_decade_arns.append(createfilter_response['filterArn'])
        print("Creating: " + createfilter_response['filterArn'])
    
    # If this fails, wait a bit
    except ClientError as error:
        # Here we only care about raising if it isnt the throttling issue
        if error.response['Error']['Code'] != 'LimitExceededException':
            print(error)
        else:    
            time.sleep(120)
            createfilter_response = personalize.create_filter(
                name=current_decade + "s",
                datasetGroupArn=dataset_group_arn,
                filterExpression='INCLUDE ItemID WHERE Items.YEAR >= '+ current_decade +' AND Items.YEAR < '+ next_decade +''
    )
            # Add the ARN to the list
            meta_filter_decade_arns.append(createfilter_response['filterArn'])
            print("Creating: " + createfilter_response['filterArn'])

Creating: arn:aws:personalize:us-east-1:832194813872:filter/1950s
Creating: arn:aws:personalize:us-east-1:832194813872:filter/1960s
Creating: arn:aws:personalize:us-east-1:832194813872:filter/1970s
Creating: arn:aws:personalize:us-east-1:832194813872:filter/1980s
Creating: arn:aws:personalize:us-east-1:832194813872:filter/1990s
Creating: arn:aws:personalize:us-east-1:832194813872:filter/2000s
Creating: arn:aws:personalize:us-east-1:832194813872:filter/2010s


Lets also create 2 event filters for watched and unwatched content

In [14]:
# Create a dataframe for the interactions by reading in the correct source CSV
interactions_df = pd.read_csv(data_dir + '/interactions.csv', sep=',', index_col=0)

# Render some sample data
interactions_df.head(10)

Unnamed: 0_level_0,ITEM_ID,TIMESTAMP,EVENT_TYPE
USER_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
429,222,828124615,watch
429,227,828124615,click
429,595,828124615,watch
429,592,828124615,watch
429,590,828124615,watch
429,434,828124615,watch
429,421,828124615,watch
429,225,828124615,click
429,343,828124615,click
429,222,828124615,click


In [15]:
createwatchedfilter_response = personalize.create_filter(name='watched',
    datasetGroupArn=dataset_group_arn,
    filterExpression='INCLUDE ItemID WHERE Interactions.event_type IN ("watch")'
    )

createunwatchedfilter_response = personalize.create_filter(name='unwatched',
    datasetGroupArn=dataset_group_arn,
    filterExpression='EXCLUDE ItemID WHERE Interactions.event_type IN ("watch")'
    )

Before we move on we want to add those filters to a list as well so they can be used later.

In [16]:
interaction_filter_arns = [createwatchedfilter_response['filterArn'], createunwatchedfilter_response['filterArn']]

## Storing useful variables <a class="anchor" id="vars"></a>
[Back to top](#top)

Before exiting this notebook, run the following cells to save the version ARNs for use in the next notebook.

In [25]:
%store userpersonalization_solution_version_arn
%store sims_solution_version_arn
%store rerank_solution_version_arn
%store user_personalization_solution_arn
%store sims_solution_arn
%store rerank_solution_arn
%store sims_campaign_arn
%store userpersonalization_campaign_arn
%store rerank_campaign_arn
%store meta_filter_decade_arns
%store genre_filter_arn
%store interaction_filter_arns

Stored 'userpersonalization_solution_version_arn' (str)
Stored 'sims_solution_version_arn' (str)
Stored 'rerank_solution_version_arn' (str)
Stored 'user_personalization_solution_arn' (str)
Stored 'sims_solution_arn' (str)
Stored 'rerank_solution_arn' (str)


You're all set to move on to the last exploratory notebook: `03_Inference_Layer.ipynb`. Open it from the browser and you can start interacting with the Campaigns and gettign recommendations!