# Module 3: Deploying Campaigns and Interacting with Them

`
Rev Date           By       Description
PA1 2020-02-16     akirmak  Modified & extended version of PersonalizePoC created by Chris King (github: chrisking@)
`


At this point there are 3 Amazon Personalize solutions. In this module, we will deploy and get recommendations from them. We will use the Personalize API to get personalized recommendations both in real-time and in batches. 


## Initial Setup

To get started, once again imports, loading previous values, and loading the SDK.

In [1]:
import boto3
from time import sleep
import subprocess
import pandas as pd
import json
import time
import pprint
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
import matplotlib.dates as mdates
from datetime import datetime
import uuid


In [2]:
%store -r

In [3]:
# Setup and Config
# Recommendations from Event data
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

# Establish a connection to Personalize's Event Streaming
personalize_events = boto3.client(service_name='personalize-events')

## Creating Campaigns

A campaign is a hosted solution version, pricing is done by estimating throughput capacity (requests from users for personalization per second). This service like many within AWS will automatically scale based on demand but if latency is critical you may want to provision ahead to the larger demand. Given this is purely a POC and a demo, all capacity limits are set to 1. The code below will just create the campaigns. Again in previous notebooks you may have seen while loops that polled, given that we want to execute multiple deployments at the same time the loops have been removed. Progress will be checked in another tab via the console just as you did for the solution version creation. This time instead of clicking on `Solutions and recipes` click the `Campaigns` link to the right to see their progress.

#### HRNN

In [4]:
hrnn_create_campaign_response = personalize.create_campaign(
    name = "personalize-poc-hrnn",
    solutionVersionArn = hrnn_solution_version_arn,
    minProvisionedTPS = 1
)

hrnn_campaign_arn = hrnn_create_campaign_response['campaignArn']
print(json.dumps(hrnn_create_campaign_response, indent=2))

{
  "campaignArn": "arn:aws:personalize:us-east-1:924376141954:campaign/personalize-poc-hrnn",
  "ResponseMetadata": {
    "RequestId": "18d44801-c7c1-4e89-801b-540cf4a01bb3",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Sun, 16 Feb 2020 10:32:11 GMT",
      "x-amzn-requestid": "18d44801-c7c1-4e89-801b-540cf4a01bb3",
      "content-length": "90",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### SIMS

In [5]:
sims_create_campaign_response = personalize.create_campaign(
    name = "personalize-poc-SIMS",
    solutionVersionArn = sims_solution_version_arn,
    minProvisionedTPS = 1
)

sims_campaign_arn = sims_create_campaign_response['campaignArn']
print(json.dumps(sims_create_campaign_response, indent=2))

{
  "campaignArn": "arn:aws:personalize:us-east-1:924376141954:campaign/personalize-poc-SIMS",
  "ResponseMetadata": {
    "RequestId": "70a9caac-b949-4704-b061-90ceece45c2c",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Sun, 16 Feb 2020 10:32:26 GMT",
      "x-amzn-requestid": "70a9caac-b949-4704-b061-90ceece45c2c",
      "content-length": "90",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


#### Personalized Ranking

In [6]:
rerank_create_campaign_response = personalize.create_campaign(
    name = "personalize-poc-rerank",
    solutionVersionArn = rerank_solution_version_arn,
    minProvisionedTPS = 1
)

rerank_campaign_arn = rerank_create_campaign_response['campaignArn']
print(json.dumps(rerank_create_campaign_response, indent=2))

{
  "campaignArn": "arn:aws:personalize:us-east-1:924376141954:campaign/personalize-poc-rerank",
  "ResponseMetadata": {
    "RequestId": "86cfd073-d28c-4122-9e72-82dd2ae32025",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "content-type": "application/x-amz-json-1.1",
      "date": "Sun, 16 Feb 2020 10:32:27 GMT",
      "x-amzn-requestid": "86cfd073-d28c-4122-9e72-82dd2ae32025",
      "content-length": "92",
      "connection": "keep-alive"
    },
    "RetryAttempts": 0
  }
}


This process should take no more than 15 minutes to complete for all your campaigns.

## Interacting with Campaigns

Now that they are all deployed and active we can start to get recommendations via the API call. Each of these behave in slightly different ways as they serve a different use case.  The order will be switched up a bit to deal with the possible complexities in ascending order(simplest first).

That said you may need a few supporting functions to help make sense of the results from the service. Personalize returns only an `item_id`. This is great for keeping data compact but it means you need to query the real DB or some lookup table to get a human readable result for the notebooks. The first few cells are going to create that for this particular example. 

In [7]:
# Create a dataframe for the items by reading in the correct source CSV.
items_df = pd.read_csv(data_dir + '/artists.dat', delimiter='\t', index_col=0)
# Render some sample data
items_df.head(5)

Unnamed: 0_level_0,name,url,pictureURL
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,MALICE MIZER,http://www.last.fm/music/MALICE+MIZER,http://userserve-ak.last.fm/serve/252/10808.jpg
2,Diary of Dreams,http://www.last.fm/music/Diary+of+Dreams,http://userserve-ak.last.fm/serve/252/3052066.jpg
3,Carpathian Forest,http://www.last.fm/music/Carpathian+Forest,http://userserve-ak.last.fm/serve/252/40222717...
4,Moi dix Mois,http://www.last.fm/music/Moi+dix+Mois,http://userserve-ak.last.fm/serve/252/54697835...
5,Bella Morte,http://www.last.fm/music/Bella+Morte,http://userserve-ak.last.fm/serve/252/14789013...


By defining the ID column as the index column it is trivial to return an artist by just doing this:

That isn't terrible but would get messy to repeat everywhere in our code so the function below will clean that up.

In [9]:
def get_artist_by_id(artist_id, artist_df=items_df):
    """
    This takes in an artist_id from Personalize so it will be a string,
    converts it to an int, and then does a lookup in a default or specified
    dataframe.
    
    A really broad try/except clause was added in case of anything going wrong.
    
    Feel free to add more debugging or filtering here to improve results if
    you hit an error.
    """
    try:
        return artist_df.loc[int(artist_id)]['name']
    except:
        return "Error obtaining artist"

To test that out, a few simple values and to see what happens with errors:

In [11]:
# A known good id
print(get_artist_by_id(artist_id="987"))

Earth, Wind & Fire


Great now we have a way of rendering results, now we'd like to select 5 random artists from our dataframe and determine their SIMS results. 

In [21]:
samples = items_df.sample(5)
samples

Unnamed: 0_level_0,name,url,pictureURL
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2489,Kaada,http://www.last.fm/music/Kaada,http://userserve-ak.last.fm/serve/252/590412.jpg
2784,Hanoi Rocks,http://www.last.fm/music/Hanoi+Rocks,http://userserve-ak.last.fm/serve/252/16193681...
18038,Blue Stone,http://www.last.fm/music/Blue+Stone,http://userserve-ak.last.fm/serve/252/3397224.jpg
1449,Pitbull,http://www.last.fm/music/Pitbull,http://userserve-ak.last.fm/serve/252/34084349...
15016,PRO ARTE ET MUSICA,http://www.last.fm/music/PRO+ARTE+ET+MUSICA,http://userserve-ak.last.fm/serve/252/147813.jpg


### SIMS

SIMS requires just an item and it will return items that are behaved with in similar ways by your users. In this particular case the item is an artist. The cells below will handle getting recommendations from SIMS and rendering the results.

Now go forth and get some recommendations for just the first known item ( Earth Wind and Fire )

Quote wiki: "Earth, Wind & Fire (abbreviated as EW&F or simply EWF) is an American band that has spanned the musical genres of R&B, soul, funk, jazz, disco, pop, rock, dance, Latin, and Afro pop. They have been described as one of the most innovative and commercially successful acts of all time. Rolling Stone called them "innovative, precise yet sensual, calculated yet galvanizing" and declared that the band "changed the sound of black pop""

In [14]:
get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = sims_campaign_arn,
    itemId = str(987),
)

In [15]:
item_list = get_recommendations_response['itemList']

In [16]:
for item in item_list:
    print(get_artist_by_id(artist_id=item['itemId']))

The Byrds
Johnny Cash
Lacrimas Profundere
Neil Young
Jethro Tull
Amorphis
Bob Dylan
George Harrison
Motörhead
Bruce Springsteen
John Lennon
The Who
The Rolling Stones


This is an OK list but it would be really cool to see how the collection of artists render in a nice Dataframe, the code below will do just that.

In [23]:
# Update DF rendering
pd.set_option('display.max_rows', 30)

def get_new_recommendations_df(recommendations_df, artist_ID):
    # Get the artist name
    artist_name = get_artist_by_id(artist_ID)
    # Get the recommendations
    get_recommendations_response = personalize_runtime.get_recommendations(
        campaignArn = sims_campaign_arn,
        itemId = str(artist_ID),
    )
    # Build a new dataframe of recommendations
    item_list = get_recommendations_response['itemList']
    recommendation_list = []
    for item in item_list:
        artist = get_artist_by_id(item['itemId'])
        recommendation_list.append(artist)
    new_rec_DF = pd.DataFrame(recommendation_list, columns = [artist_name])
    # Add this dataframe to the old one
    #recommendations_df = recommendations_df.join(new_rec_DF)
    recommendations_df = pd.concat([recommendations_df, new_rec_DF], axis=1)
    return recommendations_df

sims_recommendations_df = pd.DataFrame()

artists = samples.index.tolist()


for artist in artists:
    sims_recommendations_df = get_new_recommendations_df(sims_recommendations_df, artist)

sims_recommendations_df

Unnamed: 0,Kaada,Hanoi Rocks,Blue Stone,Pitbull,PRO ARTE ET MUSICA
0,Britney Spears,Velvet Revolver,Frank Van Bogaert,Ellie Goulding,Britney Spears
1,Depeche Mode,Mötley Crüe,Paul Schwartz,Jay Sean,Depeche Mode
2,Lady Gaga,Aerosmith,Jason Edward Dudley,M. Pokora,Lady Gaga
3,Madonna,Alice Cooper,Magna Canta,Sade,Madonna
4,Christina Aguilera,,Rue du Soleil,Ne-Yo,Christina Aguilera
5,Muse,,Green Sun,Kelly Clarkson,Muse
6,The Beatles,,Tosca,Jennifer Lopez,The Beatles
7,Rihanna,,Jens Gad,Justin Timberlake,Rihanna
8,Radiohead,,Prometheus,Beyoncé,Radiohead
9,Coldplay,,Iloyd,,Coldplay


You may notice that many of the items look the same, hopefully not all of them do. This is a good time to think about leveraging the popularity discounting hyperparameter in your next revision. That would allow for a bit more nuance in the results. This parameter and its behavior will be unique to every dataset you encounter and the goals of the business. Iterate over that until you find a mix that achieves your objectives.

The remaining campaigns rely on having a sampling of users as well so we will parse for their data and select 3 at random below before moving on.

In [24]:
users_df = pd.read_csv(data_dir + '/user_artists.dat', delimiter='\t', index_col=0)
# Render some sample data
users_df.head(5)

Unnamed: 0_level_0,artistID,weight
userID,Unnamed: 1_level_1,Unnamed: 2_level_1
2,51,13883
2,52,11690
2,53,11351
2,54,10300
2,55,8983


### HRNN

HRNN is one of the more advanced algorithms provided by Amazon Personalize. It supports personalization of the items for a specific user based on their past behavior and can intake real time events in order to alter recommendations for a user without retraining. 

First the cells below will render the recommendations for our 3 random users from above. After that we will explore real-time interactions before moving on to Personalized Ranking.

#### API Call Results

In [28]:
# Update DF rendering
pd.set_option('display.max_rows', 30)

def get_new_recommendations_df_users(recommendations_df, user_id):
    # Get the artist name
    #artist_name = get_artist_by_id(artist_ID)
    # Get the recommendations
    get_recommendations_response = personalize_runtime.get_recommendations(
        campaignArn = hrnn_campaign_arn,
        userId = str(user_id),
    )
    # Build a new dataframe of recommendations
    item_list = get_recommendations_response['itemList']
    recommendation_list = []
    for item in item_list:
        artist = get_artist_by_id(item['itemId'])
        recommendation_list.append(artist)
    #print(recommendation_list)
    new_rec_DF = pd.DataFrame(recommendation_list, columns = [user_id])
    # Add this dataframe to the old one
    #recommendations_df = recommendations_df.join(new_rec_DF)
    recommendations_df = pd.concat([recommendations_df, new_rec_DF], axis=1)
    return recommendations_df

recommendations_df_users = pd.DataFrame()

users = users_df.sample(3).index.tolist()
print(users)

for user in users:
    recommendations_df_users = get_new_recommendations_df_users(recommendations_df_users, user)

recommendations_df_users

[675, 489, 1420]


Unnamed: 0,675,489,1420
0,Skrillex,Led Zeppelin,Skrillex
1,Rebecca Black,Iron Maiden,deadmau5
2,Pleq,Nirvana,Ien Oblique
3,Ien Oblique,Coldplay,Pleq
4,Christina Aguilera,Queen,The Chemical Brothers
5,deadmau5,Arctic Monkeys,T.I.
6,Thalía,Guns N' Roses,Natalie Imbruglia
7,Britney Spears,Metallica,Yelle
8,Depeche Mode,Dave Gahan,James Morrison
9,Yelle,Duran Duran,Composition Of Sound


Here we clearly see that all of their recommendations are different, if you were to need a cache for these results you could start by running the API calls through all your users and storing the results yourself or use a batch export which will be covered after Personalized Ranking. 

The next topic here is real-time events. Personalize has the ability to listen to events from your application in order to update what your users will be shown. This is especially useful in media workloads like video on demand where a customers intent may be to sit down and watch a show with their children or a more serious program later.

Additionally the events that are recorded via this system are also stored until a delete call from you is issued and they are used as historical data alongslide the other interaction data you provided when you train your next models.

#### Real Time Events

Start by creating an event tracker that is attached to the campaign:

In [29]:
response = personalize.create_event_tracker(
    name='ArtistTracker',
    datasetGroupArn=dataset_group_arn
)
print(response['eventTrackerArn'])
print(response['trackingId'])
TRACKING_ID = response['trackingId']
event_tracker_arn = response['eventTrackerArn']


arn:aws:personalize:us-east-1:924376141954:event-tracker/d81ff613
63ed6e75-1172-4c42-9b94-43faed30f652


The lines below provide a code sample that simulates a user interacting with a particular item, you will then get recommendations that differ from those when you started.


In [30]:
session_dict = {}

def send_artist_click(USER_ID, ITEM_ID):
    """
    Simulates a click as an envent
    to send an event to Amazon Personalize's Event Tracker
    """
    # Configure Session
    try:
        session_ID = session_dict[str(USER_ID)]
    except:
        session_dict[str(USER_ID)] = str(uuid.uuid1())
        session_ID = session_dict[str(USER_ID)]
        
    # Configure Properties:
    event = {
    "itemId": str(ITEM_ID),
    }
    event_json = json.dumps(event)
        
    # Make Call
    personalize_events.put_events(
    trackingId = TRACKING_ID,
    userId= str(USER_ID),
    sessionId = session_ID,
    eventList = [{
        'sentAt': int(time.time()),
        'eventType': 'EVENT_TYPE',
        'properties': event_json
        }]
    )

def get_new_recommendations_df_users_real_time(recommendations_df, user_id, item_id):
    # Get the artist name (header of column)
    artist_name = get_artist_by_id(item_id)
    # Interact with the artist
    send_artist_click(USER_ID=user_id, ITEM_ID=item_id)
    # Get the recommendations (note you should have a base recommendation DF created before)
    get_recommendations_response = personalize_runtime.get_recommendations(
        campaignArn = hrnn_campaign_arn,
        userId = str(user_id),
    )
    # Build a new dataframe of recommendations
    item_list = get_recommendations_response['itemList']
    recommendation_list = []
    for item in item_list:
        artist = get_artist_by_id(item['itemId'])
        recommendation_list.append(artist)
    #print(recommendation_list)
    new_rec_DF = pd.DataFrame(recommendation_list, columns = [artist_name])
    # Add this dataframe to the old one
    #recommendations_df = recommendations_df.join(new_rec_DF)
    recommendations_df = pd.concat([recommendations_df, new_rec_DF], axis=1)
    return recommendations_df

Those are just supporting functions, a simple dataframe for just the user's non session based recommend is needed before calling them:

In [33]:
# First pick a user:
user_id = users_df.sample(1).index.tolist()[0]

In [34]:

get_recommendations_response = personalize_runtime.get_recommendations(
        campaignArn = hrnn_campaign_arn,
        userId = str(user_id),
    )
# Build a new dataframe of recommendations
item_list = get_recommendations_response['itemList']
recommendation_list = []
for item in item_list:
    artist = get_artist_by_id(item['itemId'])
    recommendation_list.append(artist)
user_recommendations_df = pd.DataFrame(recommendation_list, columns = [user_id])
user_recommendations_df

Unnamed: 0,1817
0,Skrillex
1,deadmau5
2,The Chemical Brothers
3,IAMX
4,Muse
5,James Morrison
6,Janet Jackson
7,Natasha Bedingfield
8,Thalía
9,Depeche Mode


In [35]:
# Next generate 3 random artists to interact with:
artists = items_df.sample(3).index.tolist()

In [36]:
# Note this will take about 15 seconds to complete due to the sleeps.
for artist in artists:
    user_recommendations_df = get_new_recommendations_df_users_real_time(user_recommendations_df, user_id, artist)
    time.sleep(5)
user_recommendations_df

Unnamed: 0,1817,Christina Perri,Knifehandchop,Sad King
0,Skrillex,Skrillex,Skrillex,Skrillex
1,deadmau5,deadmau5,deadmau5,Depeche Mode
2,The Chemical Brothers,The Chemical Brothers,The Chemical Brothers,deadmau5
3,IAMX,IAMX,IAMX,The Chemical Brothers
4,Muse,Muse,Muse,Cesária Évora
5,James Morrison,James Morrison,James Morrison,IAMX
6,Janet Jackson,Janet Jackson,Janet Jackson,Ennio Morricone
7,Natasha Bedingfield,Natasha Bedingfield,Natasha Bedingfield,Thalía
8,Thalía,Thalía,Thalía,Dragonette
9,Depeche Mode,Depeche Mode,Depeche Mode,Sleeping With Sirens


In the cell above the first column after the index is the user's default recommendations from HRNN, and each column after has a header of the artist that they interacted with via a real time event, and the following recommendations. 

The behavior may not shift very much after the second interaction, this is due to the relatively limited nature of this dataset. If you wanted to better understand this, simulating clicking random artists of random genres would have a more pronounced impact.

Time for the last campaign.

### Personalized Ranking

Again the core use case for this is to take a collection of items and to render them in priority or probable order of interest for a user. To demonstrate this we will need a random user and a random collection of 25 items.

In [50]:
rerank_user = users_df.sample(1).index.tolist()[0]
rerank_items = items_df.sample(25).index.tolist()
print (rerank_user)

762


Now build a nice dataframe that shows the input data. We will sort this list based on the user.

In [51]:
rerank_list = []
for item in rerank_items:
    artist = get_artist_by_id(item)
    rerank_list.append(artist)
rerank_df = pd.DataFrame(rerank_list, columns = [rerank_user])
rerank_df

Unnamed: 0,762
0,Hair Original Broadway Cast
1,The Silvertones
2,Mohsen Chavoshi
3,A Loss For Words
4,Alejandro de Pinedo
5,"Clarence ""Gatemouth"" Brown"
6,Keef Baker
7,"Eminem feat. Drake, Kanye West, Lil Wayne"
8,ghouti
9,Anúna


Now make the personalized-ranking API call:

In [52]:
# Convert user to string:
user_id = str(rerank_user)
rerank_item_list = []
for item in rerank_items:
    rerank_item_list.append(str(item))

In [53]:
get_recommendations_response_rerank = personalize_runtime.get_personalized_ranking(
        campaignArn = rerank_campaign_arn,
        userId = user_id,
        inputList = rerank_item_list
)

In [56]:
# get_recommendations_response_rerank

The only remaining step is to add them to the dataframe.

In [58]:
ranked_list = []
item_list = get_recommendations_response_rerank['personalizedRanking']
for item in item_list:
    artist = get_artist_by_id(item['itemId'])
    ranked_list.append(artist)
ranked_df = pd.DataFrame(ranked_list, columns = ['Re-Ranked'])
rerank_df = pd.concat([rerank_df, ranked_df], axis=1)
rerank_df

Unnamed: 0,762,Re-Ranked,Re-Ranked.1,Re-Ranked.2
0,Hair Original Broadway Cast,Kazik,Kazik,Kazik
1,The Silvertones,ghouti,ghouti,ghouti
2,Mohsen Chavoshi,Impellitteri,Impellitteri,Impellitteri
3,A Loss For Words,A Loss For Words,A Loss For Words,A Loss For Words
4,Alejandro de Pinedo,One Day as a Lion,One Day as a Lion,One Day as a Lion
5,"Clarence ""Gatemouth"" Brown",Gosia Andrzejewicz,Gosia Andrzejewicz,Gosia Andrzejewicz
6,Keef Baker,Example,Example,Example
7,"Eminem feat. Drake, Kanye West, Lil Wayne",境亜寿香,境亜寿香,境亜寿香
8,ghouti,"Clarence ""Gatemouth"" Brown","Clarence ""Gatemouth"" Brown","Clarence ""Gatemouth"" Brown"
9,Anúna,ksandr and I.M.M.U.R.E.,ksandr and I.M.M.U.R.E.,ksandr and I.M.M.U.R.E.


You can see above how each entry was re-ordered based on the model's understanding of the user. This is a popular task when you have a collection of items to surface a user, a list of promotions for example, or if you are filtering on a category and want to show the most likely good items.

## Batch Recommendations

There are many cases where you may want to have a larger dataset of exported recommendations from caching to just digging into the results to learn more. Recently Amazon Personalize launched Batch Recommendations as a way to export a collection of recommendations to S3. For simplicity sake in this example we will walk through how to do this for the HRNN solution.

Full info can be found here: https://docs.aws.amazon.com/personalize/latest/dg/getting-recommendations.html#recommendations-batch

This feature applies to all algorithms, though the output will vary, again see the docs for a full breakdown.

A simple implementation looks like this:

```python
import boto3

personalize_rec = boto3.client(service_name='personalize')

personalize_rec.create_batch_inference_job (
    solutionVersionArn = "Solution version ARN",
    jobName = "Batch job name",
    roleArn = "IAM role ARN",
    jobInput = 
       {"s3DataSource": {"path": S3 input path}},
    jobOutput = 
       {"s3DataDestination": {"path":S3 output path"}}
)
```

The SDK import, the solution version arn, and role arns have all been determined. This just leaves an input, an output, and a job name to be defined.

Starting with the input for HRNN, it looks like:


```JSON
{"userId": "4638"}
{"userId": "663"}
{"userId": "3384"}
```

This should yield something like this as output:

```JSON
{"input":{"userId":"4638"}, "output": {"recommendedItems": ["296", "1", "260", "318"]}}
{"input":{"userId":"663"}, "output": {"recommendedItems": ["1393", "3793", "2701", "3826"]}}
{"input":{"userId":"3384"}, "output": {"recommendedItems": ["8368", "5989", "40815", "48780"]}}
```

This file is sort of JSON, it is JSON if you parse it a line at a time, so more work later to digest the results when they come back.

##### Building the Input File

When you are using the batch feature, you specify the users that you'd like to receive receommendations for when the job has completed, that is done with the schema shown above. The cell below will again select a few random users and will then build the file and save it to disk.

From there you will upload it to S3 to use in the API call later.

In [59]:
# Get the user list
batch_users = users_df.sample(3).index.tolist()

# Write the file to disk
json_input_filename = "json_input.json"
with open(data_dir + "/" + json_input_filename, 'w') as json_input:
    for user_id in batch_users:
        json_input.write('{"userId": "' + str(user_id) + '"}\n')

In [60]:
# Showcase the input file:
!cat $data_dir"/"$json_input_filename

{"userId": "1747"}
{"userId": "1978"}
{"userId": "1521"}


Upload the file to S3 and save the path as a variable for later.

In [61]:
# Upload files to S3
boto3.Session().resource('s3').Bucket(bucket_name).Object(json_input_filename).upload_file(data_dir+"/"+json_input_filename)
s3_input_path = "s3://" + bucket_name + "/" + json_input_filename
print(s3_input_path)

s3://hba-ai-personalizepoc/json_input.json


Define the ouput path for the API call:

In [62]:
# Define the output path
s3_output_path = "s3://" + bucket_name + "/"
print(s3_output_path)

s3://hba-ai-personalizepoc/


Now just make the call to kick off the batch export process.

In [63]:
batchInferenceJobArn = personalize.create_batch_inference_job (
    solutionVersionArn = hrnn_solution_version_arn,
    jobName = "POC-Batch-Inference-Job-HRNN",
    roleArn = role_arn,
    jobInput = 
     {"s3DataSource": {"path": s3_input_path}},
    jobOutput = 
     {"s3DataDestination":{"path": s3_output_path}}
)
batchInferenceJobArn = batchInferenceJobArn['batchInferenceJobArn']

Wait for the job to complete here, this process may take a few minutes to complete, this is due to the creation of infrastructure to perform the task. In bulk it would be quite quick to export, however we are wasting the potential here by only exporting a handful of items, this is just to show the process.

In [None]:
current_time = datetime.now()
print("Import Started on: ", current_time.strftime("%I:%M:%S %p"))

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_inference_job_response = personalize.describe_batch_inference_job(
        batchInferenceJobArn = batchInferenceJobArn
    )
    status = describe_dataset_inference_job_response["batchInferenceJob"]['status']
    print("DatasetInferenceJob: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
        
    time.sleep(60)
    
current_time = datetime.now()
print("Import Completed on: ", current_time.strftime("%I:%M:%S %p"))

Import Started on:  11:27:29 AM
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS


With the data successfully exported, grab the file and parse it:

In [None]:
s3 = boto3.client('s3')
export_name = json_input_filename + ".out"
s3.download_file(bucket_name, export_name, data_dir+"/"+export_name)

# Update DF rendering
pd.set_option('display.max_rows', 30)
with open(data_dir+"/"+export_name) as json_file:
    # Get the first line and parse it
    line = json.loads(json_file.readline())
    # Do the same for the other lines
    while line:
        # extract the user ID 
        col_header = "User: " + line['input']['userId']
        # Create a list for all the artists
        recommendation_list = []
        # Add all the entries
        for item in line['output']['recommendedItems']:
            artist = get_artist_by_id(item)
            recommendation_list.append(artist)
        if 'bulk_recommendations_df' in locals():
            new_rec_DF = pd.DataFrame(recommendation_list, columns = [col_header])
            bulk_recommendations_df = bulk_recommendations_df.join(new_rec_DF)
        else:
            bulk_recommendations_df = pd.DataFrame(recommendation_list, columns=[col_header])
        try:
            line = json.loads(json_file.readline())
        except:
            line = None
bulk_recommendations_df

## Wrap Up

Congratulations. With that you now have a fully working collection of models to tackle various recommendation and personalization scenarios as well as the skills to manipulate customer data to better integrate with the service and a knowledge of how to do all this over APIs and leveraging open source data science tools.

Use the notebooks as a guide to getting started with your customers for POCs and as you find missing components or discover new approaches, cut a pull request and provide any additional helpful components that may be missing from this collection.

Good luck!

## Appendix: Optional Module: Advanced Example using HRNN-Metadata Recipe with MovieLens Data
If you are interested in doing another exercise, this time using the HRNN-Metadata Recipe, please proceed to the next module (which is a modified version of an advanced example from Amazon Personalize samples at github AWS-samples. 