# Creating Solution And Campaign

The rest of the notebook will cover generating both a solution and a campaign with Personalize on the Netflix Dataset.



In [1]:
# Imports
import boto3
import json
import numpy as np
import pandas as pd
import time

In [2]:
# Configure the SDK to Personalize:
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

## Create the Solution and Version

In Amazon Personalize a trained model is called a Solution, each Solution can have many specific versions that relate to a given volume of data when the model was trained.
To begin we will list all the recipies that are supported, a recipie is an algorithm that has not been trained on your data yet. After listing you'll select one and use that to build your model.

In [3]:
list_recipes_response = personalize.list_recipes()
list_recipes_response

{'recipes': [{'name': 'aws-hrnn',
   'recipeArn': 'arn:aws:personalize:::recipe/aws-hrnn',
   'status': 'ACTIVE',
   'creationDateTime': datetime.datetime(2019, 6, 10, 0, 0, tzinfo=tzlocal()),
   'lastUpdatedDateTime': datetime.datetime(2019, 6, 20, 0, 39, 17, 65000, tzinfo=tzlocal())},
  {'name': 'aws-hrnn-coldstart',
   'recipeArn': 'arn:aws:personalize:::recipe/aws-hrnn-coldstart',
   'status': 'ACTIVE',
   'creationDateTime': datetime.datetime(2019, 6, 10, 0, 0, tzinfo=tzlocal()),
   'lastUpdatedDateTime': datetime.datetime(2019, 6, 20, 0, 39, 17, 64000, tzinfo=tzlocal())},
  {'name': 'aws-hrnn-metadata',
   'recipeArn': 'arn:aws:personalize:::recipe/aws-hrnn-metadata',
   'status': 'ACTIVE',
   'creationDateTime': datetime.datetime(2019, 6, 10, 0, 0, tzinfo=tzlocal()),
   'lastUpdatedDateTime': datetime.datetime(2019, 6, 20, 0, 39, 17, 64000, tzinfo=tzlocal())},
  {'name': 'aws-personalized-ranking',
   'recipeArn': 'arn:aws:personalize:::recipe/aws-personalized-ranking',
   'stat

In [4]:
recipe_arn = "arn:aws:personalize:::recipe/aws-hrnn" # aws-hrnn selected for demo purposes


### Create and Wait for Solution

First you will create the solution with the API, then you will create a version. It will take several minutes to train the model and thus create your version of a solution. Once it gets started and you are seeing the in progress notifications it is a good time to take a break, grab a coffee, etc.


#### Create Solution


In [6]:
dataset_group_arn = "arn:aws:personalize:us-east-1:059124553121:dataset-group/personalize-nf-demo"

In [8]:
create_solution_response = personalize.create_solution(
    name = "netflix-demo-hrnn",
    datasetGroupArn = dataset_group_arn,
    recipeArn = recipe_arn
)

solution_arn = create_solution_response['solutionArn']
print(json.dumps(create_solution_response, indent=2))


{
  "solutionArn": "arn:aws:personalize:us-east-1:059124553121:solution/netflix-demo-hrnn",
  "ResponseMetadata": {
    "RequestId": "d7b58d6a-696d-4402-a64d-c8d4dff1c8f0",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Sun, 14 Jul 2019 18:53:06 GMT",
      "x-amzn-requestid": "d7b58d6a-696d-4402-a64d-c8d4dff1c8f0",
      "content-length": "87",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Create Solution Version 

In [9]:
create_solution_version_response = personalize.create_solution_version(
    solutionArn = solution_arn
)

solution_version_arn = create_solution_version_response['solutionVersionArn']
print(json.dumps(create_solution_version_response, indent=2))

{
  "solutionVersionArn": "arn:aws:personalize:us-east-1:059124553121:solution/netflix-demo-hrnn/0178c833",
  "ResponseMetadata": {
    "RequestId": "21001ae9-8be7-43bc-a474-66f38cc11df9",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Sun, 14 Jul 2019 18:53:26 GMT",
      "x-amzn-requestid": "21001ae9-8be7-43bc-a474-66f38cc11df9",
      "content-length": "103",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Wait for Solution Version to Have ACTIVE Status

This will take at least 20 minutes.

In [11]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_solution_version_response = personalize.describe_solution_version(
        solutionVersionArn = solution_version_arn
    )
    status = describe_solution_version_response["solutionVersion"]["status"]
    print("SolutionVersion: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

SolutionVersion: ACTIVE


####  Get Metrics of Solution Version

Now that your solution and version exists, you can obtain the metrics for it to judge its performance. These metrics are not particularly good as it is a demo set of data, but with larger more compelx datasets you should see improvements.

In [13]:
get_solution_metrics_response = personalize.get_solution_metrics(
    solutionVersionArn = solution_version_arn
)

print(json.dumps(get_solution_metrics_response, indent=2))


{
  "solutionVersionArn": "arn:aws:personalize:us-east-1:059124553121:solution/netflix-demo-hrnn/0178c833",
  "metrics": {
    "coverage": 0.4509,
    "mean_reciprocal_rank_at_25": 0.1057,
    "normalized_discounted_cumulative_gain_at_10": 0.1471,
    "normalized_discounted_cumulative_gain_at_25": 0.177,
    "normalized_discounted_cumulative_gain_at_5": 0.1251,
    "precision_at_10": 0.0222,
    "precision_at_25": 0.0137,
    "precision_at_5": 0.0314
  },
  "ResponseMetadata": {
    "RequestId": "4726714a-e226-4543-a1e1-bc8bb28d640c",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Sun, 14 Jul 2019 22:25:04 GMT",
      "x-amzn-requestid": "4726714a-e226-4543-a1e1-bc8bb28d640c",
      "content-length": "400",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


### Create and Wait for the Campaign

Now that you have a working solution version you will need to create a campaign to use it with your applications. A campaign is simply a hosted copy of your model. Again there will be a short wait so after executing you can take a quick break while the infrastructure is being provisioned.


#### Create Campaign¶


In [14]:
create_campaign_response = personalize.create_campaign(
    name = "personalize-nf-camp",
    solutionVersionArn = solution_version_arn,
    minProvisionedTPS = 1
)

campaign_arn = create_campaign_response['campaignArn']
print(json.dumps(create_campaign_response, indent=2))


{
  "campaignArn": "arn:aws:personalize:us-east-1:059124553121:campaign/personalize-nf-camp",
  "ResponseMetadata": {
    "RequestId": "66826dd9-c6ca-4519-9de2-454c6403ef5e",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Sun, 14 Jul 2019 22:25:40 GMT",
      "x-amzn-requestid": "66826dd9-c6ca-4519-9de2-454c6403ef5e",
      "content-length": "89",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Wait for Campaign to Have ACTIVE Status

In [15]:
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_campaign_response = personalize.describe_campaign(
        campaignArn = campaign_arn
    )
    status = describe_campaign_response["campaign"]["status"]
    print("Campaign: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)

Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: CREATE IN_PROGRESS
Campaign: ACTIVE


#### Get Recommendations

The recommendations are returned as a list of Movie_IDs, so we will need to parse the original movie titles CSV to a dataframe and look them up.

Normally you would just use pd.read_csv like a sane data scientist but it seems the file does not play nicely with the econding flag.

This will yield the following error:

```
ParserError: Error tokenizing data. C error: Expected 3 fields in line 72, saw 4
```

To fix it the lines below build the dataframe from scratch first, then you can query it.

In [29]:
# Build movie df
movie_df = pd.DataFrame(columns=["ITEM_ID", "TITLE"])

# Load into RAM for querying
with open('netflix-prize-data/movie_titles.csv', encoding = "ISO-8859-1") as fileHandler:
    for line in fileHandler:
        # Clean and parse the line
        line = line.strip('\n').split(',')
        movie_df = movie_df.append({
             "ITEM_ID": line[0],
             "TITLE":  line[2]
              }, ignore_index=True)

# Show top of DF
movie_df.head(5)

Unnamed: 0,ITEM_ID,TITLE
0,1,Dinosaur Planet
1,2,Isle of Man TT 2004 Review
2,3,Character
3,4,Paula Abdul's Get Up & Dance
4,5,The Rise and Fall of ECW


In [32]:
get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = campaign_arn,
    userId = str(155),
)


item_list = get_recommendations_response['itemList']
for item in item_list:
    print(movie_df.iloc[int(item['itemId'])]['TITLE'])
#print("Recommendations: {}".format(json.dumps(item_list, indent=2)))

The Masque of the Red Death / The Premature Burial
Free Willy
The Knights Templar
Happy End
ECW: Heatwave '98
Drive Well
Kati Patang
Suburbia
World Class Trains: The New Polar Express
The Alexander Technique
Seems Like Old Times
Beautiful Creatures
Hard Times
Mulan
The Horse Soldiers
Blast
Owning Mahowny
Godmother
Fearless Hyena 1 / Fearless Hyena 2
Around the World with Orson Welles
National Geographic: Inside Mecca
Warren Miller: Bloopers
An Evening With Edgar Allan Poe
Hocus Pocus
An Evening With Kevin Smith
