# Validating Data <a class="anchor" id="top"></a>

In this notebook, you will upload your data to S3 and begin using it with Amazon Personalize.

1. [Configure an S3 bucket and an IAM  role](#bucket_role)
1. [Create Dataset Group](#group_dataset)
1. [Create the Interactions Schema](#interact_schema)
1. [Create the Items (Movies) Schema](#items_schema)
1. [Create the Users Schema](#users_schema)
1. [Import the Interactions Data](#import_interactions)
1. [Import the Item Metadata](#import_items)
1. [Import the User Metadata](#import_users)
1. [Create Domain Recommenders](#recommenders)
1. [Create Solutions](#solutions)
1. [Evaluate Solutions](#eval)
1. [Using Evaluation Metrics](#use)
1. [Deploy a Campaign](#deploy)
1. [Create Filters](#interact)
1. [Storing useful variables](#wrapup)


## Introduction <a class="anchor" id="intro"></a>
[Back to top](#top)

In Amazon Personalize, you start by creating a dataset group, which is a container for Amazon Personalize components. Your dataset group can be one of the following:

A Domain dataset group, where you create preconfigured resources for different business domains and use cases, such as getting recommendations for similar videos (VIDEO_ON_DEMAND domain) or best selling items (ECOMMERCE domain). You choose your business domain, import your data, and create recommenders. You use recommenders in your application to get recommendations.

Use a [Domain dataset group](https://docs.aws.amazon.com/personalize/latest/dg/domain-dataset-groups.html) if you have a video on demand or e-commerce application and want Amazon Personalize to find the best configurations for your use cases. If you start with a Domain dataset group, you can also add custom resources such as solutions with solution versions trained with recipes for custom use cases.

A [Custom dataset group](https://docs.aws.amazon.com/personalize/latest/dg/custom-dataset-groups.html), where you create configurable resources for custom use cases and batch recommendation workflows. You choose a recipe, train a solution version (model), and deploy the solution version with a campaign. You use a campaign in your application to get recommendations.

Use a Custom dataset group if you don't have a video on demand or e-commerce application or want to configure and manage only custom resources, or want to get recommendations in a batch workflow. If you start with a Custom dataset group, you can't associate it with a domain later. Instead, create a new Domain dataset group.

You can create and manage Domain dataset groups and Custom dataset groups with the AWS console, the AWS Command Line Interface (AWS CLI), or programmatically with the AWS SDKs. For the purposes of this workshop we will be using the AWS SDK.

In [None]:
%store -r

## Prepare the Interactions data <a class="anchor" id="prepare_interactions"></a>
[Back to top](#top)

The next thing to be done is to load the data and confirm the data is in a good state.

Python ships with a broad collection of libraries and we need to import those as well as the ones installed to help us like [boto3](https://aws.amazon.com/sdk-for-python/) (AWS SDK for python) and [Pandas](https://pandas.pydata.org/)/[Numpy](https://numpy.org/)  which are core data science tools.

In [None]:
import time
from time import sleep
import json
from datetime import datetime
import boto3
import pandas as pd
import numpy as np
import uuid
import random
import botocore
from botocore.exceptions import ClientError

## Configure an S3 bucket and an IAM  role <a class="anchor" id="bucket_role"></a>
[Back to top](#top)

So far, we have downloaded, manipulated, and saved the data onto the Amazon EBS instance attached to instance running this Jupyter notebook. 

By default, the Personalize service does not have permission to acccess the data we uploaded into the S3 bucket in our account. In order to grant access to the  Personalize service to read our CSVs, we need to set a Bucket Policy and create an IAM role that the Amazon Personalize service will assume. Let's set all of that up.

Use the metadata stored on the instance underlying this Amazon SageMaker notebook, to determine the region it is operating in. If you are using a Jupyter notebook outside of Amazon SageMaker, simply define the region as a string below. The Amazon S3 bucket needs to be in the same region as the Amazon Personalize resources we have been creating so far.

In [None]:
with open('/opt/ml/metadata/resource-metadata.json') as notebook_info:
    data = json.load(notebook_info)
    resource_arn = data['ResourceArn']
    region = resource_arn.split(':')[3]
print('region:', region)

Amazon S3 bucket names are globally unique. To create a unique bucket name, the code below will append the string `personalizepocvod` to your AWS account number. Then it creates a bucket with this name in the region discovered in the previous cell.

In [None]:
s3 = boto3.client('s3')
account_id = boto3.client('sts').get_caller_identity().get('Account')
bucket_name = account_id + "-" + region + "-" + "personalizepocvod"
print('bucket_name:', bucket_name)
try: 
    if region == "us-east-1":
        s3.create_bucket(Bucket=bucket_name)
    else:
        s3.create_bucket(
            Bucket=bucket_name,
            CreateBucketConfiguration={'LocationConstraint': region}
            )
except s3.exceptions.BucketAlreadyOwnedByYou:
    print("Bucket already exists. Using bucket", bucket_name)

### Set the S3 bucket policy
Amazon Personalize needs to be able to read the contents of your S3 bucket. So add a bucket policy which allows that.

In [None]:
policy = {
    "Version": "2012-10-17",
    "Id": "PersonalizeS3BucketAccessPolicy",
    "Statement": [
        {
            "Sid": "PersonalizeS3BucketAccessPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "personalize.amazonaws.com"
            },
            "Action": [
                "s3:*Object",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::{}".format(bucket_name),
                "arn:aws:s3:::{}/*".format(bucket_name)
            ]
        }
    ]
}

s3.put_bucket_policy(Bucket=bucket_name, Policy=json.dumps(policy))

### Create an IAM role

Amazon Personalize needs the ability to assume roles in AWS in order to have the permissions to execute certain tasks. Let's create an IAM role and attach the required policies to it. The code below attaches very permissive policies; please use more restrictive policies for any production application.

In [None]:
iam = boto3.client("iam")

role_name = account_id+"-PersonalizeS3-Immersion-Day"
assume_role_policy_document = {
    "Version": "2012-10-17",
    "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "personalize.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
    ]
}

try:
    create_role_response = iam.create_role(
        RoleName = role_name,
        AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)
    );
    
except iam.exceptions.EntityAlreadyExistsException as e:
    print('Warning: role already exists:', e)
    create_role_response = iam.get_role(
        RoleName = role_name
    );

role_arn = create_role_response["Role"]["Arn"]
    
print('IAM Role: {}'.format(role_arn))
    
attach_response = iam.attach_role_policy(
    RoleName = role_name,
    PolicyArn = "arn:aws:iam::aws:policy/AmazonS3FullAccess"
);

role_arn = create_role_response["Role"]["Arn"]

# Pause to allow role to be fully consistent
time.sleep(30)
print('Done.')

### Upload data to S3

Now that your Amazon S3 bucket has been created, upload the CSV file of our user-item-interaction data. 

In [None]:
interactions_file_path = data_dir + "/" + interactions_filename
boto3.Session().resource('s3').Bucket(bucket_name).Object(interactions_filename).upload_file(interactions_file_path)

items_file_path = data_dir + "/" + items_filename
boto3.Session().resource('s3').Bucket(bucket_name).Object(items_filename).upload_file(items_file_path)

users_file_path = data_dir + "/" + users_filename
boto3.Session().resource('s3').Bucket(bucket_name).Object(users_filename).upload_file(users_file_path)

## Create the Dataset Group <a class="anchor" id="group_dataset"></a>
[Back to top](#top)

The highest level of isolation and abstraction with Amazon Personalize is a *dataset group*. Information stored within one of these dataset groups has no impact on any other dataset group or models created from one - they are completely isolated. This allows you to run many experiments and is part of how we keep your models private and fully trained only on your data. 

Before importing the data prepared earlier, there needs to be a dataset group and a dataset added to it that handles the interactions.

Dataset groups can house the following types of information:

* User-item-interactions
* Event streams (real-time interactions)
* User metadata
* Item metadata

We need to create the dataset group that will contain our three datasets.

### Create the Dataset Group

The following cell will create a new dataset group with the name `personalize-poc-movielens`.

In [None]:
# Configure the SDK to Personalize:
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')
print("We can communicate with Personalize!")

In [None]:
create_dataset_group_response = personalize.create_dataset_group(
    name = "workshop-personalize-poc-movielens",
    domain='VIDEO_ON_DEMAND'
)

workshop_dataset_group_arn = create_dataset_group_response['datasetGroupArn']
print(json.dumps(create_dataset_group_response, indent=2))

print(f'DatasetGroupArn = {workshop_dataset_group_arn}')

#### Wait for Dataset Group to Have ACTIVE Status 

Before we can use the Dataset Group in any items below it must be active. This can take a minute or two. Execute the cell below and wait for it to show the ACTIVE status. It checks the status of the dataset group every 60 seconds, up to a maximum of 3 hours.

In [None]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_group_response = personalize.describe_dataset_group(
        datasetGroupArn = workshop_dataset_group_arn
    )
    status = describe_dataset_group_response["datasetGroup"]["status"]
    print("DatasetGroup: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

Now that you have a dataset group, you can create a dataset for the interaction data.

## Create the Interactions Schema <a class="anchor" id="interact_schema"></a>
[Back to top](#top)

Now that we've loaded and prepared our three datasets we'll need to configure the Amazon Personalize service to understand our data so that it can be used to train models for generating recommendations.Amazon Personalize requires a schema for each dataset so it can map the columns in our CSVs to fields for model training. Each schema is declared in JSON using the [Apache Avro](https://avro.apache.org/) format. 

First, define a schema to tell Amazon Personalize what type of dataset you are uploading. There are several reserved and mandatory keywords required in the schema, based on the type of dataset. More detailed information can be found in the [documentation](https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html).

Here, you will create a schema for interactions data, which requires the `USER_ID`, `ITEM_ID`, and `TIMESTAMP` fields. These must be defined in the same order in the schema as they appear in the dataset.

The interactions dataset has three required columns: `ITEM_ID`, `USER_ID`, and `TIMESTAMP`. The `TIMESTAMP` represents when the user interated with an item and must be expressed in Unix timestamp format (seconds). For this dataset we also have an `EVENT_TYPE` column.

In [None]:
interactions_schema = {
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "EVENT_TYPE", # "Watch", "Click", etc.
            "type": "string"
        },
        {
            "name": "TIMESTAMP",
            "type": "long"
        }
    ],
    "version": "1.0"
}

try:
    create_schema_response = personalize.create_schema(
        name = "workshop-personalize-poc-movielens-interactions-schema",
        schema = json.dumps(interactions_schema),
        domain='VIDEO_ON_DEMAND'
    )
    print(json.dumps(create_schema_response, indent=2))
    workshop_interactions_schema_arn = create_schema_response['schemaArn']
except personalize.exceptions.ResourceAlreadyExistsException:
    print('You aready created this schema.')
    schemas = personalize.list_schemas(maxResults=100)['schemas']
    for schema_response in schemas:
        if schema_response['name'] == "workshop-personalize-poc-movielens-interactions-schema":
            workshop_interactions_schema_arn = schema_response['schemaArn']
            print(f"Using existing schema: {workshop_interactions_schema_arn}")

### Create the Interactions Dataset

With a schema created, you can create a dataset within the dataset group. Note that this does not load the data yet, but creates a schema of what the data looks like. 

In [None]:
dataset_type = "INTERACTIONS"
create_dataset_response = personalize.create_dataset(
    name = "workshop-personalize-poc-movielens-interactions",
    datasetType = dataset_type,
    datasetGroupArn = workshop_dataset_group_arn,
    schemaArn = workshop_interactions_schema_arn
)

workshop_interactions_dataset_arn = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))

## Create the Items (Movies) Schema<a class="anchor" id="items_schema"></a>
[Back to top](#top)

First, define a schema to tell Amazon Personalize what type of dataset you are uploading. There are several reserved and mandatory keywords required in the schema, based on the type of dataset. More detailed information can be found in the [documentation](https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html).

Here, you will create a schema for item metadata data, and we define the `ITEM_ID`, `GENRES`, `YEAR`, and `CREATION_TIMESTAMP` fields. These must be defined in the same order in the schema as they appear in the dataset.

In [None]:
items_schema = {
    "type": "record",
    "name": "Items",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "GENRES",
            "type": "string",
            "categorical": True
        },{
            "name": "YEAR",
            "type": "int",
        },
        {
            "name": "CREATION_TIMESTAMP",
            "type": "long",
        }
    ],
    "version": "1.0"
}
    
try:
    create_schema_response = personalize.create_schema(
        name = "workshop-personalize-poc-movielens-items-schema",
        schema = json.dumps(items_schema),
        domain='VIDEO_ON_DEMAND'
    )
    workshop_items_schema_arn = create_schema_response['schemaArn']
    print(json.dumps(create_schema_response, indent=2))
except personalize.exceptions.ResourceAlreadyExistsException:
    print('You aready created this schema.')
    schemas = personalize.list_schemas(maxResults=100)['schemas']
    for schema_response in schemas:
        if schema_response['name'] == "workshop-personalize-poc-movielens-items-schema":
            workshop_items_schema_arn = schema_response['schemaArn']
            print(f"Using existing schema: {workshop_items_schema_arn}")


### Create the Items Dataset

With a schema created, you can create a dataset within the dataset group. Note that this does not load the data yet, but creates a schema of what the data looks like. 

In [None]:
dataset_type = "ITEMS"
create_dataset_response = personalize.create_dataset(
    name = "workshop-personalize-poc-movielens-items",
    datasetType = dataset_type,
    datasetGroupArn = workshop_dataset_group_arn,
    schemaArn = workshop_items_schema_arn
)

workshop_items_dataset_arn = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))

## Create the Users Schema<a class="anchor" id="users_schema"></a>
[Back to top](#top)

First, define a schema to tell Amazon Personalize what type of dataset you are uploading. There are several reserved and mandatory keywords required in the schema, based on the type of dataset. More detailed information can be found in the [documentation](https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html).

Here, you will create a schema for user data, which requires the `USER_ID`, and an additonal metadata field, in this case `GENDER`. These must be defined in the same order in the schema as they appear in the dataset.

In [None]:
users_schema = {
    "type": "record",
    "name": "Users",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "GENDER",
            "type": "string",
            "categorical": True
        }
    ],
    "version": "1.0"
}
    
try:
    create_schema_response = personalize.create_schema(
        name = "workshop-personalize-poc-movielens-users-schema",
        schema = json.dumps(users_schema),
        domain='VIDEO_ON_DEMAND'
    )
    print(json.dumps(create_schema_response, indent=2))
    workshop_users_schema_arn = create_schema_response['schemaArn']
except personalize.exceptions.ResourceAlreadyExistsException:
    print('You aready created this schema.')
    schemas = personalize.list_schemas(maxResults=100)['schemas']
    for schema_response in schemas:
        if schema_response['name'] == "workshop-personalize-poc-movielens-users-schema":
            workshop_users_schema_arn = schema_response['schemaArn']
            print(f"Using existing schema: {workshop_users_schema_arn}")

### Create the Users dataset

With a schema created, you can create a dataset within the dataset group. Note that this does not load the data yet, but creates a schema of what the data looks like. 

In [None]:
dataset_type = "USERS"
create_dataset_response = personalize.create_dataset(
    name = "workshop-personalize-poc-movielens-users",
    datasetType = dataset_type,
    datasetGroupArn = workshop_dataset_group_arn,
    schemaArn = workshop_users_schema_arn
)

workshop_users_dataset_arn = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))

Let's wait untill all the datasets have been created.

In [None]:
%%time

max_time = time.time() + 6*60*60 # 6 hours
while time.time() < max_time:
    describe_dataset_response = personalize.describe_dataset(
        datasetArn = workshop_interactions_dataset_arn
    )
    status =  describe_dataset_response["dataset"]['status']
    print("Interactions Dataset: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)
    
while time.time() < max_time:
    describe_dataset_response = personalize.describe_dataset(
        datasetArn = workshop_items_dataset_arn
    )
    status =  describe_dataset_response["dataset"]['status']
    print("Items Dataset: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)
    
while time.time() < max_time:
    describe_dataset_response = personalize.describe_dataset(
        datasetArn = workshop_users_dataset_arn
    )
    status =  describe_dataset_response["dataset"]['status']
    print("Users Dataset: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

## Import the interactions data <a class="anchor" id="import_interactions"></a>
[Back to top](#top)

Earlier you created the dataset group and dataset to house your information, so now you will execute an import job that will load the interactions data from the S3 bucket into the Amazon Personalize dataset. 

In [None]:
create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "workshop-personalize-poc-interactions-import",
    datasetArn = workshop_interactions_dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket_name, interactions_filename)
    },
    roleArn = role_arn
)

workshop_interactions_dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_dataset_import_job_response, indent=2))

## Import the Item Metadata <a class="anchor" id="import_items"></a>
[Back to top](#top)

Earlier you created the dataset group and dataset to house your information, now you will execute an import job that will load the item data from the S3 bucket into the Amazon Personalize dataset. 

In [None]:
create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "workshop-personalize-poc-items-import",
    datasetArn = workshop_items_dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket_name, items_filename)
    },
    roleArn = role_arn
)

workshop_items_dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_dataset_import_job_response, indent=2))

## Import the User Metadata <a class="anchor" id="import_users"></a>
[Back to top](#top)

Earlier you created the dataset group and dataset to house your information, now you will execute an import job that will load the user data from the S3 bucket into the Amazon Personalize dataset. 

In [None]:
create_dataset_import_job_response = personalize.create_dataset_import_job(
    jobName = "workshop-personalize-poc-users-import",
    datasetArn = workshop_users_dataset_arn,
    dataSource = {
        "dataLocation": "s3://{}/{}".format(bucket_name, users_filename)
    },
    roleArn = role_arn
)

workshop_users_dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_dataset_import_job_response, indent=2))

Before we can use the dataset, the import job must be active. Execute the cell below and wait for it to show the ACTIVE status. It checks the status of the import job every minute, up to a maximum of 6 hours.

Importing the data can take some time, depending on the size of the dataset. In this workshop, the data import job should take around 15 minutes. While you're waiting you can learn more about Datasets and Schemas in [the documentation](https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html). We need to wait for the data imports to complete.

In [None]:
%%time

max_time = time.time() + 6*60*60 # 6 hours
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = workshop_interactions_dataset_import_job_arn
    )
    status = describe_dataset_import_job_response["datasetImportJob"]['status']
    print("Interactions DatasetImportJob: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)
    
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = workshop_items_dataset_import_job_arn
    )
    status = describe_dataset_import_job_response["datasetImportJob"]['status']
    print("Items DatasetImportJob: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)
    
while time.time() < max_time:
    describe_dataset_import_job_response = personalize.describe_dataset_import_job(
        datasetImportJobArn = workshop_users_dataset_import_job_arn
    )
    status = describe_dataset_import_job_response["datasetImportJob"]['status']
    print("Users DatasetImportJob: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

### Ready... Set... Train! :

Now that the data is imported and ready for use, we will create Video on Demand Domain Recommenders for the following use cases:

1. [More like X](https://docs.aws.amazon.com/personalize/latest/dg/VIDEO_ON_DEMAND-use-cases.html#more-like-y-use-case): recommendations for movies that are similar to a movie that you specify. With this use case, Amazon Personalize automatically filters movies the user watched based on the userId that you specify and Watch events.

1. [Top picks for you](https://docs.aws.amazon.com/personalize/latest/dg/VIDEO_ON_DEMAND-use-cases.html#top-picks-use-case): personalized content recommendations for a user that you specify. With this use case, Amazon Personalize automatically filters videos the user watched based on the userId that you specify and Watch events.

We will also create a custom solution and solution versions for the following use case:

3. [Personalized-Ranking](https://docs.aws.amazon.com/personalize/latest/dg/working-with-predefined-recipes.html): will be used to rerank a list of movies.

![Workflow](images/image2.png)


## Create Domain Recommenders <a class="anchor" id="recommenders"></a>
[Back to top](#top)


We'll start with pre-configured VIDEO_ON_DEMAND Recommenders that match some of our core use cases. Each domain has different use cases. When you create a recommender you create it for a specific use case, and each use case has different requirements for getting recommendations.

Let us look at the Recommenders supported for this domain:

In [None]:
available_recipes = personalize.list_recipes(domain='VIDEO_ON_DEMAND')
display_available_recipes = available_recipes ['recipes']
available_recipes = personalize.list_recipes(domain='VIDEO_ON_DEMAND',nextToken=available_recipes['nextToken'])#paging to get the rest of the recipes 
display_available_recipes = display_available_recipes + available_recipes['recipes']
display(display_available_recipes)

[More use cases per domain](https://docs.aws.amazon.com/personalize/latest/dg/domain-use-cases.html).

### Create a "More like X" Recommender

We are going to create a recommender of the type "More like X". This type of recommender offers recommendations for videos that are similar to a video a user watched. With this use case, Amazon Personalize automatically filters videos the user watched based on the userId specified in the `get_recommendations` call. 

In [None]:
create_recommender_response = personalize.create_recommender(
  name = 'workshop-more-like-x',
  recipeArn = 'arn:aws:personalize:::recipe/aws-vod-more-like-x',
  datasetGroupArn = workshop_dataset_group_arn
)
workshop_recommender_more_like_x_arn = create_recommender_response["recommenderArn"]
print (json.dumps(create_recommender_response))

Create a "Top picks for you" Recommender

We are going to create a second recommender of the type "Top picks for you". This type of recommender offers personalized streaming content recommendations for a user that you specify. With this use case, Amazon Personalize automatically filters videos the user watched based on the userId that you specify and Watch events.


In [None]:
create_recommender_response = personalize.create_recommender(
  name = 'workshop_top_picks_for_you',
  recipeArn = 'arn:aws:personalize:::recipe/aws-vod-top-picks',
  datasetGroupArn = workshop_dataset_group_arn
)
workshop_recommender_top_picks_arn = create_recommender_response["recommenderArn"]
print (json.dumps(create_recommender_response))

## Create Solutions <a class="anchor" id="solutions"></a>
[Back to top](#top)

Some use cases require a custom implementation. 

In Amazon Personalize, a specific variation of an algorithm is called a recipe. Different recipes are suitable for different situations. A trained model is called a solution, and each solution can have many versions that relate to a given volume of data when the model was trained.

Let's look at all available recipes that are not of a specific domain and can be used to create custom solutions. 

In [None]:
available_recipes = personalize.list_recipes()
display_available_recipes = available_recipes ['recipes']
available_recipes = personalize.list_recipes(nextToken=available_recipes['nextToken'])#paging to get the rest of the recipes 
display_available_recipes = display_available_recipes + available_recipes['recipes']

display ([recipe  for recipe in display_available_recipes if 'domain' not in recipe])


We want to rank a list of items for a specific user. This is useful if you have a collection of ordered items, such as search results, promotions, or curated lists, and you want to provide a personalized re-ranking for each of your users. To implement this use case, we will create a custom solution using the recipe.

The Personalized-Ranking recipe provides recommendations in ranked order based on predicted interest level. This recipe generates personalized rankings of items. A personalized ranking is a list of recommended items that are re-ranked for a specific user. This is useful if you have a collection of ordered items, such as search results, curated lists or anything you cannot easily categorize and you want to provide a personalized re-ranking for each of your users.

These custom solution will use the same datasets that we already implemented so all we need to do is create a solution and solution version for this recipe.


Personalized Ranking

Personalized Ranking is an interesting application of HRNN. Instead of just recommending what is most probable for the user in question, this algorithm takes in a list of items as well as a user. The items are then returned back in the order of most probable relevance for the user. The use case here is for filtering on unique categories that you do not have item metadata to create a filter, or when you have a broad collection that you would like better ordered for a particular user.

For our use case, using the MovieLens data, we could imagine that a Video on Demand application may want to create a shelf of comic book movies, or seasonal movies. We can generate these lists based on metadata we have. We would use personalized ranking to re-order the list of movies for each user.

We start by selecting the recipe.


In [None]:
workshop_rerank_recipe_arn = "arn:aws:personalize:::recipe/aws-personalized-ranking"

### Create the solution

First you create a solution using the recipe. Although you provide the dataset ARN in this step, the model is not yet trained. See this as an identifier instead of a trained model.

In [None]:
rerank_create_solution_response = personalize.create_solution(
    name = "workshop_personalize-poc-rerank",
    datasetGroupArn = workshop_dataset_group_arn,
    recipeArn = workshop_rerank_recipe_arn
)

workshop_rerank_solution_arn = rerank_create_solution_response['solutionArn']
print(json.dumps(rerank_create_solution_response, indent=2))

### Create the solution version

Once you have a solution, you need to create a version in order to complete the model training. The training can take a while to complete, upwards of 25 minutes, and an average of 35 minutes for this recipe with our dataset. Normally, we would use a while loop to poll until the task is completed. 

In [None]:
rerank_create_solution_version_response = personalize.create_solution_version(
    solutionArn = workshop_rerank_solution_arn
)

In [None]:
workshop_rerank_solution_version_arn = rerank_create_solution_version_response['solutionVersionArn']
print(json.dumps(rerank_create_solution_version_response, indent=2))

### View solution and Recommender creation status

To view the status updates in the console:

* In another browser tab you should already have the AWS Console up from opening this notebook instance. 
* Switch to that tab and search at the top for the service `Personalize`, then go to that service page. 
* Click `Dataset groups`.
* Click the name of your dataset group, if you did not change it, it is "personalize-poc-movielens".
* Click `Recommenders`.
* You will see a list of the two recommenders you created above, including a column with the status of the recommender. Once it is `Active`, your recommender is ready.
* Click on `Custom Resources`. This oppens up the list of custom resources that youhave created.
* Click on `Solutions and Recipes` to see your re-ranking solutions. If you click on `personalize-poc-rerank` you can see the status of the solution versions. Once it is `Active`, your solution is ready to be reviewed. It is also capable of being deployed.

Or simply run the cell below to keep track of the recommenders and solution version creation status.

In [None]:
max_time = time.time() + 10*60*60 # 10 hours
while time.time() < max_time:

    # Recommender more_like_x
    version_response = personalize.describe_recommender(
        recommenderArn = workshop_recommender_more_like_x_arn
    )
    status_more_like_x = version_response["recommender"]["status"]

    if status_more_like_x == "ACTIVE":
        print("Build succeeded for {}".format(workshop_recommender_more_like_x_arn))
        
    elif status_more_like_x == "CREATE FAILED":
        print("Build failed for {}".format(workshop_recommender_more_like_x_arn))
        break

    if not status_more_like_x == "ACTIVE":
        print("The recommender more_like_x build is still in progress")
    else:
        print("The recommender more_like_x is ACTIVE")

    # Recommender top_picks_for_you
    version_response = personalize.describe_recommender(
        recommenderArn = workshop_recommender_top_picks_arn
    )
    status_top_picks = version_response["recommender"]["status"]

    if status_top_picks == "ACTIVE":
        print("Build succeeded for {}".format(workshop_recommender_top_picks_arn))
    elif status_top_picks == "CREATE FAILED":
        print("Build failed for {}".format(workshop_recommender_top_picks_arn))
        break

    if not status_top_picks == "ACTIVE":
        print("The recommender top_picks build is still in progress")
    else:
        print("The recommender top_picks is ACTIVE")
        
    # Reranking Solution 
    version_response = personalize.describe_solution_version(
        solutionVersionArn = workshop_rerank_solution_version_arn
    )
    status_rerank_solution = version_response["solutionVersion"]["status"]

    if status_rerank_solution == "ACTIVE":
        print("Build succeeded for {}".format(workshop_rerank_solution_version_arn))
        
    elif status_rerank_solution == "CREATE FAILED":
        print("Build failed for {}".format(workshop_rerank_solution_version_arn))
        break

    if not status_rerank_solution == "ACTIVE":
        print("Rerank Solution Version build is still in progress")
    else:
        print("The Rerank solution is ACTIVE")
        
    if status_more_like_x == "ACTIVE" and status_top_picks == 'ACTIVE' and status_rerank_solution == "ACTIVE":
        break

    print()
    time.sleep(60)

## Deploy a Campaign <a class="anchor" id="deploy"></a>
[Back to top](#top)

Once a solution version is created, it is possible to get recommendations from them, and to get a feel for their overall behavior.

For real-time recommendations, after you prepare and import data and creating a solution, you are ready to deploy your solution version to generate recommendations. You deploy a solution version by creating an Amazon Personalize campaign. If you are getting batch recommendations, you don't need to create a campaign. For more information see [Getting batch recommendations and user segments](https://docs.aws.amazon.com/personalize/latest/dg/recommendations-batch.html).

We will deploy a campaign for the solution version. 

### Create a campaign 

A campaign is a hosted solution version; an endpoint which you can query for recommendations. Pricing is set by estimating throughput capacity (requests from users for personalization per second). When deploying a campaign, you set a minimum throughput per second (TPS) value. This service, like many within AWS, will automatically scale based on demand, but if latency is critical, you may want to provision ahead for larger demand. For this POC and demo, all minimum throughput thresholds are set to 1. For more information, see the [pricing page](https://aws.amazon.com/personalize/pricing/).

Once we're satisfied with our solution version, we need to create Campaigns for each solution version. When creating a campaign you specify the minimum transactions per second (`minProvisionedTPS`) that you expect to make against the service for this campaign. Personalize will automatically scale the inference endpoint up and down for the campaign to match demand but will never scale below `minProvisionedTPS`.

Let's create a campaigns for our solution versions set at `minProvisionedTPS` of 1.

In [None]:
rerank_create_campaign_response = personalize.create_campaign(
    name = "workshop-personalize-poc-rerank",
    solutionVersionArn = workshop_rerank_solution_version_arn,
    minProvisionedTPS = 1
)

workshop_rerank_campaign_arn = rerank_create_campaign_response['campaignArn']
print(json.dumps(rerank_create_campaign_response, indent=2))

In [None]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:

    version_response = personalize.describe_campaign(
        campaignArn = rerank_campaign_arn
    )
    status = version_response['campaign']['status']

    if status == 'ACTIVE':
        print('Build succeeded for {}'.format(rerank_campaign_arn))
    elif status == "CREATE FAILED":
        print('Build failed for {}'.format(rerank_campaign_arn))
        in_progress_campaigns.remove(rerank_campaign_arn)
    
    if status == 'ACTIVE' or status == 'CREATE FAILED':
        break
    else:
        print('The campaign build is still in progress')
        
    time.sleep(60)
workshop_training_complete = True

### View campaign creation status

This is how you view the status updates in the console:

* In another browser tab you should already have the AWS Console open from opening this notebook instance. 
* Switch to that tab and search at the top for the service `Personalize`, then go to that service page. 
* Click `Dataset groups`.
* Click the name of your dataset group.
* Click `Recommenders`
* Click `Custom Resources`
* Click `Campaigns`.
* You will now see a list of all of the campaigns you created above, including a column with the status of the campaign. Once it is `Active`, your campaign is ready to be queried.

Or simply run the cell below to keep track of the campaign creation status of the campaign we created.

While you are waiting for this to complete you can learn more about campaigns in [the documentation](https://docs.aws.amazon.com/personalize/latest/dg/campaigns.html)

In [None]:
%store workshop_dataset_group_arn
%store workshop_interactions_dataset_arn
%store workshop_items_dataset_arn
%store workshop_users_dataset_arn
%store workshop_interactions_schema_arn
%store workshop_items_schema_arn
%store workshop_users_schema_arn
%store workshop_recommender_top_picks_arn
%store workshop_recommender_more_like_x_arn
%store workshop_rerank_campaign_arn
%store workshop_rerank_solution_version_arn
%store workshop_training_complete