In [None]:
%store -r
%store

## Create solutions

In [14]:
import time
from time import sleep
import json
import pandas as pd
pd.set_option('display.max_columns', 500)     # Make sure we can see all of the columns
pd.set_option('display.max_rows', 20)         # Keep the output on one page
pd.set_option('display.max_colwidth', 200)  
import boto3

In [15]:
# Configure the SDK to Personalize:
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

In [16]:
recipes = personalize.list_recipes()['recipes']

In [17]:
recipes_df = pd.DataFrame(recipes)
recipes_df[['name','recipeArn']]

Unnamed: 0,name,recipeArn
0,aws-hrnn,arn:aws:personalize:::recipe/aws-hrnn
1,aws-hrnn-coldstart,arn:aws:personalize:::recipe/aws-hrnn-coldstart
2,aws-hrnn-metadata,arn:aws:personalize:::recipe/aws-hrnn-metadata
3,aws-personalized-ranking,arn:aws:personalize:::recipe/aws-personalized-ranking
4,aws-popularity-count,arn:aws:personalize:::recipe/aws-popularity-count
5,aws-sims,arn:aws:personalize:::recipe/aws-sims
6,aws-user-personalization,arn:aws:personalize:::recipe/aws-user-personalization


### User Personalization
La receta de personalización de usuario (aws-user-personalization) está optimizada para todos los escenarios de recomendación de USER_PERSONALIZATION. Al recomendar elementos, utiliza la exploración automática de elementos.

Con la exploración automática, Amazon Personalize prueba automáticamente diferentes recomendaciones de artículos, aprende de cómo los usuarios interactúan con estos artículos recomendados y aumenta las recomendaciones de artículos que generan una mejor participación y conversión. Esto mejora el descubrimiento y la participación de los artículos cuando tiene un catálogo que cambia rápidamente o cuando los artículos nuevos, como artículos de noticias o promociones, son más relevantes para los usuarios cuando están frescos.

Puede equilibrar cuánto explorar (donde los elementos con menos interacciones, datos o relevancia se recomiendan con más frecuencia) con cuánto explotar (donde las recomendaciones se basan en lo que sabemos o en la relevancia). Amazon Personalize ajusta automáticamente las recomendaciones futuras en función de los comentarios implícitos de los usuarios.

Primero, seleccione la receta buscando el ARN en la lista de recetas anterior.

In [18]:
user_personalization_recipe_arn = 'arn:aws:personalize:::recipe/aws-user-personalization'

#### Create the solution

First you create a solution using the recipe. Although you provide the dataset ARN in this step, the model is not yet trained. See this as an identifier instead of a trained model.

In [None]:
user_personalization_create_solution_response = personalize.create_solution(
    name = "personalize-anime-userpersonalization",
    datasetGroupArn = dataset_group_arn,
    recipeArn = user_personalization_recipe_arn
)

user_personalization_solution_arn = user_personalization_create_solution_response['solutionArn']
print(json.dumps(user_personalization_solution_arn, indent=2))

#### Create the solution version

Once you have a solution, you need to create a version in order to complete the model training. The training can take a while to complete, upwards of 25 minutes, and an average of 90 minutes for this recipe with our dataset. Normally, we would use a while loop to poll until the task is completed. However the task would block other cells from executing, and the goal here is to create many models and deploy them quickly. So we will set up the while loop for all of the solutions further down in the notebook. There, you will also find instructions for viewing the progress in the AWS console.

In [None]:
userpersonalization_create_solution_version_response = personalize.create_solution_version(
    solutionArn = user_personalization_solution_arn
)
userpersonalization_solution_version_arn = userpersonalization_create_solution_version_response['solutionVersionArn']
print(json.dumps(user_personalization_create_solution_response, indent=2))

### SIMS


SIMS is one of the oldest algorithms used within Amazon for recommendation systems. A core use case for it is when you have one item and you want to recommend items that have been interacted with in similar ways over your entire user base. This means the result is not personalized per user. Sometimes this leads to recommending mostly popular items, so there is a hyperparameter that can be tweaked which will reduce the popular items in your results. 

For our use case, using the Movielens data, let's assume we pick a particular movie. We can then use SIMS to recommend other movies based on the interaction behavior of the entire user base. The results are not personalized per user, but instead, differ depending on the movie we chose as our input.

Just like last time, we start by selecting the recipe.

#### Create the solution

As with HRNN, start by creating the solution first. Although you provide the dataset ARN in this step, the model is not yet trained. See this as an identifier instead of a trained model.

In [21]:
SIMS_recipe_arn = "arn:aws:personalize:::recipe/aws-sims"

In [None]:
sims_create_solution_response = personalize.create_solution(
    name = "personalize-anime-sims",
    datasetGroupArn = dataset_group_arn,
    recipeArn = SIMS_recipe_arn
)

sims_solution_arn = sims_create_solution_response['solutionArn']
print(json.dumps(sims_create_solution_response, indent=2))

#### Create the solution version

In [None]:
sims_create_solution_version_response = personalize.create_solution_version(
    solutionArn = sims_solution_arn
)

sims_solution_version_arn = sims_create_solution_version_response['solutionVersionArn']
print(json.dumps(sims_create_solution_version_response, indent=2))

### Personalized Ranking

Personalized Ranking is an interesting application of HRNN. Instead of just recommending what is most probable for the user in question, this algorithm takes in a user and a list of items as well. The items are then rendered back in the order of most probable relevance for the user. The use case here is for filtering on unique categories that you do not have item metadata to create a filter, or when you have a broad collection that you would like better ordered for a particular user.


In [24]:
rerank_recipe_arn = "arn:aws:personalize:::recipe/aws-personalized-ranking"

In [None]:
rerank_create_solution_response = personalize.create_solution(
    name = "personalize-anime-rerank",
    datasetGroupArn = dataset_group_arn,
    recipeArn = rerank_recipe_arn
)

rerank_solution_arn = rerank_create_solution_response['solutionArn']
print(json.dumps(rerank_create_solution_response, indent=2))

In [None]:
rerank_create_solution_version_response = personalize.create_solution_version(
    solutionArn = rerank_solution_arn
)
rerank_solution_version_arn = rerank_create_solution_version_response['solutionVersionArn']
print(json.dumps(rerank_create_solution_version_response, indent=2))

### View solution creation status

As promised, how to view the status updates in the console:

* In another browser tab you should already have the AWS Console up from opening this notebook instance. 
* Switch to that tab and search at the top for the service `Personalize`, then go to that service page. 
* Click `View dataset groups`.
* Click the name of your dataset group, most likely something with POC in the name.
* Click `Solutions and recipes`.
* You will now see a list of all of the solutions you created above,  including a column with the status of the solution versions. Once it is `Active`, your solution is ready to be reviewed. It is also capable of being deployed.

Or simply run the cell below to keep track of the solution version creation status.

In [None]:
in_progress_solution_versions = [
    userpersonalization_solution_version_arn,
    sims_solution_version_arn,
    rerank_solution_version_arn
]

max_time = time.time() + 10*60*60 # 10 hours
while time.time() < max_time:
    for solution_version_arn in in_progress_solution_versions:
        version_response = personalize.describe_solution_version(
            solutionVersionArn = solution_version_arn
        )
        status = version_response["solutionVersion"]["status"]
        
        if status == "ACTIVE":
            print("Build succeeded for {}".format(solution_version_arn))
            in_progress_solution_versions.remove(solution_version_arn)
        elif status == "CREATE FAILED":
            print("Build failed for {}".format(solution_version_arn))
            in_progress_solution_versions.remove(solution_version_arn)
    
    if len(in_progress_solution_versions) <= 0:
        break
    else:
        print("At least one solution build is still in progress")
        
    time.sleep(60)

## Evaluate solution versions <a class="anchor" id="eval"></a>
[Back to top](#top)

It should not take more than an hour to train all the solutions from this notebook. While training is in progress, we recommend taking the time to read up on the various algorithms (recipes) and their behavior in detail. This is also a good time to consider alternatives to how the data was fed into the system and what kind of results you expect to see.


We recommend reading [the documentation](https://docs.aws.amazon.com/personalize/latest/dg/working-with-training-metrics.html) to understand the metrics, but we have also copied parts of the documentation below for convenience.

You need to understand the following terms regarding evaluation in Personalize:

* *Relevant recommendation* refers to a recommendation that matches a value in the testing data for the particular user.
* *Rank* refers to the position of a recommended item in the list of recommendations. Position 1 (the top of the list) is presumed to be the most relevant to the user.
* *Query* refers to the internal equivalent of a GetRecommendations call.

The metrics produced by Personalize are:
* **coverage**: The proportion of unique recommended items from all queries out of the total number of unique items in the training data (includes both the Items and Interactions datasets).
* **mean_reciprocal_rank_at_25**: The [mean of the reciprocal ranks](https://en.wikipedia.org/wiki/Mean_reciprocal_rank) of the first relevant recommendation out of the top 25 recommendations over all queries. This metric is appropriate if you're interested in the single highest ranked recommendation.
* **normalized_discounted_cumulative_gain_at_K**: Discounted gain assumes that recommendations lower on a list of recommendations are less relevant than higher recommendations. Therefore, each recommendation is discounted (given a lower weight) by a factor dependent on its position. To produce the [cumulative discounted gain](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) (DCG) at K, each relevant discounted recommendation in the top K recommendations is summed together. The normalized discounted cumulative gain (NDCG) is the DCG divided by the ideal DCG such that NDCG is between 0 - 1. (The ideal DCG is where the top K recommendations are sorted by relevance.) Amazon Personalize uses a weighting factor of 1/log(1 + position), where the top of the list is position 1. This metric rewards relevant items that appear near the top of the list, because the top of a list usually draws more attention.
* **precision_at_K**: The number of relevant recommendations out of the top K recommendations divided by K. This metric rewards precise recommendation of the relevant items.

Let's take a look at the evaluation metrics for each of the solutions produced in this notebook. *Please note, your results might differ from the results described in the text of this notebook, due to the quality of the Movielens dataset.* 

### User Personalization metrics

Primero veamos las métricas de la solución de personalización de usuarios

In [None]:
user_personalization_solution_metrics_response = personalize.get_solution_metrics(
    solutionVersionArn = userpersonalization_solution_version_arn
)['metrics']
sims_solution_metrics_response = personalize.get_solution_metrics(
    solutionVersionArn = sims_solution_version_arn
)['metrics']

rerank_solution_metrics_response = personalize.get_solution_metrics(
    solutionVersionArn = rerank_solution_version_arn
)['metrics']

user_personalization_solution_metrics_response['solution'] = 'user personalization'
sims_solution_metrics_response['solution'] = 'similar items'
rerank_solution_metrics_response['solution'] = 're rank'


In [55]:
import pandas as pd
metrics_df = pd.DataFrame([user_personalization_solution_metrics_response, sims_solution_metrics_response,
                           rerank_solution_metrics_response]).set_index('solution')

metrics_df.rename(columns = {'mean_reciprocal_rank_at_25':'MRR_25', 'normalized_discounted_cumulative_gain_at_10':'NDCG_10', 
                             'normalized_discounted_cumulative_gain_at_25':'NDCG_15',
                              'normalized_discounted_cumulative_gain_at_5':'NDCG_5', 'precision_at_10': 'P_10' , 'precision_at_25': 'P_25' ,
                             'precision_at_5': 'P_5' }, inplace = True) 
metrics_df

Unnamed: 0_level_0,coverage,MRR_25,NDCG_10,NDCG_15,NDCG_5,P_10,P_25,P_5
solution,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
user personalization,0.1782,0.575,0.4652,0.5642,0.4054,0.2049,0.1308,0.277
similar items,0.4498,0.1822,0.19,0.258,0.1494,0.0535,0.0408,0.0644
re rank,0.1899,0.5837,0.4717,0.5708,0.4129,0.2079,0.1321,0.282


In [28]:
%store userpersonalization_solution_version_arn
%store sims_solution_version_arn
%store rerank_solution_version_arn
%store user_personalization_solution_arn
%store sims_solution_arn
%store rerank_solution_arn

Stored 'userpersonalization_solution_version_arn' (str)
Stored 'sims_solution_version_arn' (str)
Stored 'rerank_solution_version_arn' (str)
Stored 'user_personalization_solution_arn' (str)
Stored 'sims_solution_arn' (str)
Stored 'rerank_solution_arn' (str)


In [None]:
%store