# Interacting with Recommenders, Campaigns and Filters <a class="anchor" id="top"></a>

In this notebook, you will interact with campaigns and filters in Amazon Personalize.

1. [Introduction](#intro)
1. [Interact with Recommenders](#interact-recommenders)
1. [Interact with Campaigns](#interact-campaigns)
1. [Creating Filters](#creating-filters)
1. [Using Filters](#using-filters)
1. [Real-time Events](#real-time)
1. [Batch Recommendations](#batch)
1. [Wrap Up](#wrapup)

## Introduction <a class="anchor" id="intro"></a>
[Back to top](#top)

At this point, you should have 2 Recommenders and one deployed campaign. Once they are active, there are resources for querying the recommendations, and helper functions to digest the output into something more human-readable. 


In this Notebook we will interact with Recommenders and Campaigns and get recommendatiosn. We will interact with filters and send live data to Amazon Personalize to see the effect on recommendations.

![Workflow](images/image3.png)

To run this notebook, you need to have run the previous notebooks, `01_Data_Layer.ipynb`, and `02_Training_Layer.ipynb`, where you created a dataset and imported interaction, item, and user metadata data into Amazon Personalize, created recommenders, solutions and campaigns. At the end of that notebook, you saved some of the variable values, which you now need to load into this notebook.

As you work with Amazon Personalize, you can modify the helper functions to fit the structure of your data input files to keep the additional rendering working.

To get started, once again, we need to import libraries, load values from previous notebooks, and load the SDK.

In [1]:
import time
from time import sleep
import json
from datetime import datetime
import uuid
import random
import boto3
import pandas as pd

In [8]:
%store -r

In [3]:
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

# Establish a connection to Personalize's event streaming
personalize_events = boto3.client(service_name='personalize-events')

First, let's create a supporting function to help make sense of the results returned by a Personalize recommender or campaign. Personalize returns only an `item_id`. This is great for keeping data compact, but it means you need to query a database or lookup table to get a human-readable result for the notebooks. We will create a helper function to return a human-readable result from the Movielens dataset.

Start by loading in the dataset which we can use for our lookup table.

In [4]:
# Create a dataframe for the items by reading in the correct source CSV
items_df = pd.read_csv(dataset_dir + '/movies.csv', sep=',', usecols=[0,1], encoding='latin-1', dtype={'movieId': "object", 'title': "str"},index_col=0)

# Render some sample data
items_df.head(5)

Unnamed: 0_level_0,title
movieId,Unnamed: 1_level_1
1,Toy Story (1995)
2,Jumanji (1995)
3,Grumpier Old Men (1995)
4,Waiting to Exhale (1995)
5,Father of the Bride Part II (1995)


By defining the ID column as the index column it is trivial to return a movie by just querying the ID. Movie #589 should be Terminator 2: Judgment Day.

In [5]:
movieIdExample = 589
title = items_df.loc[movieIdExample]['title']
print(title)

Terminator 2: Judgment Day (1991)


That isn't terrible, but it would get messy to repeat this everywhere in our code, so the function below will clean that up.

In [6]:
def get_movie_by_id(movieId, movie_df=items_df):
    """
    This takes in an artist_id from Personalize so it will be a string,
    converts it to an int, and then does a lookup in a default or specified
    dataframe.
    
    A really broad try/except clause was added in case anything goes wrong.
    
    Feel free to add more debugging or filtering here to improve results if
    you hit an error.
    """
    try:
        return movie_df.loc[int(movieId)]['title']
    except:
        return "Error obtaining title"

Now let's test a few simple values to check our error catching.

In [7]:
# A known good id (The Princess Bride)
print(get_movie_by_id(movieId="1197"))
# A bad type of value
print(get_movie_by_id(movieId="987.9393939"))
# Really bad values
print(get_movie_by_id(movieId="Steve"))

Princess Bride, The (1987)
Error obtaining title
Error obtaining title


Great! Now we have a way of rendering results. 

## Interact with recommenders <a class="anchor" id="interact-recommenders"></a>
[Back to top](#top)

Now that the recommenders have been trained, lets have a look at the recommendations we can get for our users!

### "More like X" Recommender

'More like X' requires an item and a user as input, and it will return items which users interact with in similar ways to their interaction with the input item. In this particular case the item is a movie. 

The cells below will handle getting recommendations from the "More like X" Recommender and rendering the results. Let's see what the recommendations are for the first item we looked at earlier in this notebook (Terminator 2: Judgment Day).

We will be using the `recommenderArn`, the `itemId`, the `userId` as well as the number or results we want, `numResults`.

In [9]:
# First pick a user
testUserId = "1"

In [10]:
get_recommendations_response = personalize_runtime.get_recommendations(
    recommenderArn = recommender_more_like_x_arn,
    itemId = str(589),
    userId = testUserId,
    numResults = 20
)

In [11]:
itemList = get_recommendations_response['itemList']
for item in itemList:
    print(get_movie_by_id(movieId=item['itemId']))

Speed (1994)
Independence Day (a.k.a. ID4) (1996)
Jurassic Park (1993)
Lion King, The (1994)
Firm, The (1993)
Matrix, The (1999)
Mrs. Doubtfire (1993)
Hunt for Red October, The (1990)
Quiz Show (1994)
Babe (1995)
Braveheart (1995)
Die Hard (1988)
Natural Born Killers (1994)
Clueless (1995)
Sound of Music, The (1965)
Get Shorty (1995)
True Lies (1994)
Inception (2010)
Patriot Games (1992)
Mask, The (1994)


Congrats, this is your first list of recommendations! This list is fine, but it would be better to see the recommendations for similar movies render in a nice dataframe. Again, let's create a helper function to achieve this.

In [12]:
# Update DF rendering
pd.set_option('display.max_rows', 30)

def get_new_recommendations_df(recommendations_df, movie_id, user_id):
    # Get the movie name
    movie_name = get_movie_by_id(movie_id)
    # Get the recommendations
    get_recommendations_response = personalize_runtime.get_recommendations(
        recommenderArn = recommender_more_like_x_arn,
        itemId = str(movie_id),
        userId = user_id
    )
    # Build a new dataframe of recommendations
    itemList = get_recommendations_response['itemList']
    recommendation_list = []
    for item in itemList:
        movie = get_movie_by_id(item['itemId'])
        recommendation_list.append(movie)
    new_rec_df = pd.DataFrame(recommendation_list, columns = [movie_name])
    # Add this dataframe to the old one
    recommendations_df = pd.concat([recommendations_df, new_rec_df], axis=1)
    return recommendations_df

Now, let's test the helper function with several different movies. Let's sample some data from our dataset to test our "More like X" Recommender. Grab 5 random movies from our dataframe.

Note: We are going to show similar titles, so you may want to re-run the sample until you recognize some of the movies listed

In [13]:
samples = items_df.sample(5)
samples

Unnamed: 0_level_0,title
movieId,Unnamed: 1_level_1
166492,Office Christmas Party (2016)
117887,Paddington (2014)
89190,Conan the Barbarian (2011)
6898,Sweet Sixteen (2002)
3896,"Way of the Gun, The (2000)"


In [14]:
more_like_x_recommendations_df = pd.DataFrame()
movies = samples.index.tolist()

for movie in movies:
    more_like_x_recommendations_df = get_new_recommendations_df(more_like_x_recommendations_df, movie, testUserId)

more_like_x_recommendations_df

Unnamed: 0,Office Christmas Party (2016),Paddington (2014),Conan the Barbarian (2011),Sweet Sixteen (2002),"Way of the Gun, The (2000)"
0,Hunt for the Wilderpeople (2016),Kingsman: The Secret Service (2015),The Hunger Games: Catching Fire (2013),Interstate 60 (2002),Star Wars: Episode II - Attack of the Clones (...
1,Zootopia (2016),"Hangover Part II, The (2011)","Amazing Spider-Man, The (2012)",Donnie Darko (2001),Minority Report (2002)
2,Inside Out (2015),Coco (2017),Ender's Game (2013),Minority Report (2002),Interstate 60 (2002)
3,The Nut Job 2: Nutty by Nature (2017),Deadpool (2016),Star Trek (2009),Killing Me Softly (2002),Spider-Man (2002)
4,Wonder Woman (2017),Inside Out (2015),Oblivion (2013),Irreversible (IrrÃ©versible) (2002),Clockstoppers (2002)
5,Kingsman: The Secret Service (2015),Zootopia (2016),G.I. Joe: Retaliation (2013),Star Wars: Episode II - Attack of the Clones (...,Planet of the Apes (2001)
6,Coco (2017),Guardians of the Galaxy (2014),After Earth (2013),Insomnia (2002),"Matrix Revolutions, The (2003)"
7,The Boss Baby (2017),Edge of Tomorrow (2014),Journey 2: The Mysterious Island (2012),Joint Security Area (Gongdong gyeongbi guyeok ...,Jurassic Park III (2001)
8,Ice Age: The Great Egg-Scapade (2016),The Nut Job 2: Nutty by Nature (2017),Star Wars: Episode VII - The Force Awakens (2015),Dead or Alive: Final (2002),"Matrix Reloaded, The (2003)"
9,No Game No Life: Zero (2017),The Man from U.N.C.L.E. (2015),"Avengers, The (2012)",Blood Work (2002),Spider-Man 2 (2004)


You may notice that some of the items look the same, hopefully not all of them do (this is more likely with a smaller # of interactions, which will be more common with the movielens small dataset). 

### "Top picks for you" Recommender

"Top picks for you" supports personalization of the items for a specific user based on their past behavior and can intake real time events in order to alter recommendations for a user without retraining. 

Since "Top picks for you" relies on having a sampling of users, let's load the data we need for that and select 3 random users. Since Movielens does not include user data, we will select 3 random numbers from the range of user id's in the dataset.

In [15]:
if not USE_FULL_MOVIELENS:
    users = random.sample(range(1, 600), 3)
else:
    users = random.sample(range(1, 162000), 3)
users

[53, 556, 421]

Now we render the recommendations for our 3 random users from above. After that, we will explore real-time interactions before moving on to Personalized Ranking.

"Top picks for you" requires only a user as input, and it will return items that are relevant for that particular user. In this particular case the item is a movie.

The cells below will handle getting recommendations from the "Top picks for you" Recommender and rendering the results. 

We will be using the `recommenderArn`, the `userId` as well as the number or results we want, `numResults`.

Again, we create a helper function to render the results in a nice dataframe.

#### API call results

In [16]:
# Update DF rendering
pd.set_option('display.max_rows', 30)

def get_new_recommendations_df_users(recommendations_df, user_id):
    # Get the movie name
    #movie_name = get_movie_by_id(artist_ID)
    # Get the recommendations
    get_recommendations_response = personalize_runtime.get_recommendations(
        recommenderArn = recommender_top_picks_arn,
        userId = str(user_id),
        numResults = 20
    )
    # Build a new dataframe of recommendations
    itemList = get_recommendations_response['itemList']
    recommendation_list = []
    for item in itemList:
        movie = get_movie_by_id(item['itemId'])
        recommendation_list.append(movie)
    new_rec_df = pd.DataFrame(recommendation_list, columns = [user_id])
    # Add this dataframe to the old one
    recommendations_df = pd.concat([recommendations_df, new_rec_df], axis=1)
    return recommendations_df

In [17]:
recommendations_df_users = pd.DataFrame()

for user in users:
    recommendations_df_users = get_new_recommendations_df_users(recommendations_df_users, user)

recommendations_df_users

Unnamed: 0,53,556,421
0,Mr. Holland's Opus (1995),Mad Max: Fury Road (2015),"African Queen, The (1951)"
1,Sense and Sensibility (1995),Deadpool (2016),2001: A Space Odyssey (1968)
2,"Birdcage, The (1996)",Zootopia (2016),Saving Private Ryan (1998)
3,Leaving Las Vegas (1995),Up (2009),"Lord of the Rings: The Two Towers, The (2002)"
4,"Piano, The (1993)","Grand Budapest Hotel, The (2014)","Wizard of Oz, The (1939)"
5,"American President, The (1995)",Inside Out (2015),Lawrence of Arabia (1962)
6,First Knight (1995),Titanic (1997),Dr. Strangelove or: How I Learned to Stop Worr...
7,Circle of Friends (1995),Big Hero 6 (2014),"Bridge on the River Kwai, The (1957)"
8,"Bridges of Madison County, The (1995)",Forrest Gump (1994),Groundhog Day (1993)
9,Little Women (1994),Saving Private Ryan (1998),"Lord of the Rings: The Return of the King, The..."


Here we clearly see that the recommendations for each user are different. If you were to need a cache for these results, you could start by running the API calls through all your users and store the results, or you could use a batch export, which will be covered later in this notebook.

## Interact with Campaigns <a class="anchor" id="interact-campaigns"></a>
[Back to top](#top)

Now that the reranking campaign is deployed and active, we can start to get recommendations via an API call. 

### Personalized Ranking

The core use case for personalized ranking is to take a collection of items and to render them in priority or probable order of interest for a user. For a VOD application you want dynamically render a personalized shelf/rail/carousel based on some information (director, location, superhero franchise, movie time period, etc...). This may not be information that you have in your metadata, so an item metadata filter will not work, however you may have this information within you system to generate the item list. 

To demonstrate this, we will use the same user from before and a random collection of items.

In [18]:
rerank_user = user
rerank_items = items_df.sample(25).index.tolist()

Now build a nice dataframe that shows the input data.

In [19]:
rerank_list = []
for item in rerank_items:
    movie = get_movie_by_id(item)
    rerank_list.append(movie)
rerank_df = pd.DataFrame(rerank_list, columns = ['Un-Ranked'])
rerank_df

Unnamed: 0,Un-Ranked
0,Napoleon Dynamite (2004)
1,Meshes of the Afternoon (1943)
2,Religulous (2008)
3,Ghost in the Shell (KÃ´kaku kidÃ´tai) (1995)
4,"Jerk, The (1979)"
5,Arthur (1981)
6,Vampire in Brooklyn (1995)
7,Dreamer: Inspired by a True Story (2005)
8,No Mercy (1986)
9,"Passenger, The (Professione: reporter) (1975)"


Then make the personalized ranking API call.

In [20]:
rerank_item_list = []
for item in rerank_items:
    rerank_item_list.append(str(item))
    
# Get recommended reranking
get_recommendations_response_rerank = personalize_runtime.get_personalized_ranking(
        campaignArn = rerank_campaign_arn,
        userId = str(rerank_user),
        inputList = rerank_item_list
)

Now add the reranked items as a second column to the original dataframe, for a side-by-side comparison.

In [21]:
ranked_list = []
item_list = get_recommendations_response_rerank['personalizedRanking']
for item in item_list:
    movie = get_movie_by_id(item['itemId'])
    ranked_list.append(movie)
ranked_df = pd.DataFrame(ranked_list, columns = ['Re-Ranked'])
rerank_df = pd.concat([rerank_df, ranked_df], axis=1)
rerank_df

Unnamed: 0,Un-Ranked,Re-Ranked
0,Napoleon Dynamite (2004),Shutter Island (2010)
1,Meshes of the Afternoon (1943),"Exorcist, The (1973)"
2,Religulous (2008),Pride and Prejudice (1995)
3,Ghost in the Shell (KÃ´kaku kidÃ´tai) (1995),Napoleon Dynamite (2004)
4,"Jerk, The (1979)",Jerry Maguire (1996)
5,Arthur (1981),"Jerk, The (1979)"
6,Vampire in Brooklyn (1995),Ghost in the Shell (KÃ´kaku kidÃ´tai) (1995)
7,Dreamer: Inspired by a True Story (2005),Control Room (2004)
8,No Mercy (1986),"Babysitter, The (1995)"
9,"Passenger, The (Professione: reporter) (1975)","Passenger, The (Professione: reporter) (1975)"


You can see above how each entry was re-ordered based on the model's understanding of the user. This is a popular task when you have a collection of items to surface a user that cannot be easily categorized in your metadata, for instance "Critics picks" which are curated by a person.

## Creating Filters <a class="anchor" id="using-filters"></a>

## Create Filters <a class="anchor" id="interact"></a>
[Back to top](#top)

Amazon Personalize supports the ability to create [filters](https://docs.aws.amazon.com/personalize/latest/dg/filter.html) that can be used to exclude items from being recommended that meet a filter expression. 

Now that all campaigns are deployed and active amd the recommenders have been trained we can create filters. Filters can be created for fields of both Items and Events. Filters can also be created staticly or dynamically. Static filters have the filter conditions hardcoded into the filter, and dynamic filters can have filter conditions passed in at runtime.

A few common use cases for filters in Video On Demand are:

Categorical filters based on Item Metadata (that are range based) - Often your item metadata will have information about the title such as year, user rating, available date. Filtering on these can provide recommendations within that data, such as movies that are available after a specific date, movies rated over 3 stars, movies from the 1990s etc.

User Demographic ranges - you may want to recommend content to specific age demographics, for this you can create a filter that is specific to a age range like over 18, over 18 AND under 30, etc).

Event Filters - you may want to filter items based on the interactions that have occured, for instance filter movies that have been watched out so the used gets fresh recommendations.


In [22]:
# Create a dataframe for the items by reading in the correct source CSV
items_meta_df = pd.read_csv(data_dir + '/item-meta.csv', sep=',', index_col=0)

# Render some sample data
items_meta_df.head(10)

Unnamed: 0_level_0,GENRES,YEAR,CREATION_TIMESTAMP
ITEM_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Adventure|Animation|Children|Comedy|Fantasy,1995,1640995200
2,Adventure|Children|Fantasy,1995,1640995200
3,Comedy|Romance,1995,1640995200
4,Comedy|Drama|Romance,1995,1640995200
5,Comedy,1995,1640995200
6,Action|Crime|Thriller,1995,1640995200
7,Comedy|Romance,1995,1640995200
8,Adventure|Children,1995,1640995200
9,Action,1995,1640995200
10,Action|Adventure|Thriller,1995,1640995200


Since there are a lot of genres to filter on, we will create a dynamic filter using the dynamic variable $GENRE, this will allow us to pass in the variable at runtime rather than create a static filter for each genre.

In [23]:
create_genre_filter_response = personalize.create_filter(name='Genre',
    datasetGroupArn = dataset_group_arn,
    filterExpression = 'INCLUDE ItemID WHERE Items.GENRES IN ($GENRE)'
    )

In [24]:
genre_filter_arn = create_genre_filter_response['filterArn']

Personalize can also filter based on numerical ranges. This can be helpful if you want to look for items that are within a given time window, above a certain rating etc. For that we will create a filter for decades.

In [25]:
create_genre_filter_response = personalize.create_filter(name='YearRange',
    datasetGroupArn = dataset_group_arn,
    filterExpression = 'INCLUDE ItemID WHERE Items.YEAR >= $YEAR1 AND Items.YEAR < $YEAR2'
    )

In [26]:
year_range_filter_arn = create_genre_filter_response['filterArn']

Lets also create 2 event filters for watched and unwatched content. The Top picks for you and More like X already have a filter on implemented to filter out watched events.

In [27]:
createwatchedfilter_response = personalize.create_filter(name='watched',
    datasetGroupArn = dataset_group_arn,
    filterExpression = 'INCLUDE ItemID WHERE Interactions.event_type IN ("Watch")'
    )

createunwatchedfilter_response = personalize.create_filter(name='unwatched',
    datasetGroupArn = dataset_group_arn,
    filterExpression = 'EXCLUDE ItemID WHERE Interactions.event_type IN ("Watch")'
    )

efore we move on we want to add those filters to a list as well so they can be used later.

In [28]:
interaction_filter_arns = [createwatchedfilter_response['filterArn'], createunwatchedfilter_response['filterArn']]

## Using Filters <a class="anchor" id="using-filters"></a>
[Back to top](#top)

Personalize can utilize either static or dynamic filters. Static filters are where the filter properties are built into the filter itself, which makes invocation simpler, but gives less flexibility. An example of this would be a "Horror" movie filter, which invokes the get_recommendations_response api with the specific filter of GENRE = Horror. In order to create a recommendation for each filter that would require 10+ filters. Personalize also supports dynamic filters, where the values can be passed at runtime, allowing for a single filter of GENRE, where the actual genre is passed at runtime. 

A few common use cases for dynamic filters in Video On Demand are:

Categorical filters based on Item Metadata - Often your item metadata will have information about the title such as Genre, Keyword, Year, Director, Actor etc. Filtering on these can provide recommendations within that data, such as action movies, Steven Spielberg movies, Movies from 1995 etc.

Range based filters based on Item Metadata - Personalize supports range operations in both static and dynamic filters. Filtering based on a range can be used to create recommendations such as "Whats on now" in live tv scenarios, best of decade, Movies rated over 8/10 stars etc

Events - you may want to filter out certain events and provide results based on those events, such as moving a title from a "suggestions to watch" recommendation to a "watch again" recommendations.


First lets create functions to get recommendations and pass in dynamic filter values.


In [29]:
def get_new_recommendations_df_by_dynamic_filter(recommendations_df, user_id, genre_filter_arn, filter_values):
    # Get the recommendations
    get_recommendations_response = personalize_runtime.get_recommendations(
        recommenderArn = recommender_top_picks_arn,
        userId = str(user_id),
        filterArn = genre_filter_arn,
        filterValues = { "GENRE": "\"" + filter_values + "\""}
    )
    # Build a new dataframe of recommendations
    item_list = get_recommendations_response['itemList']
    recommendation_list = []
    for item in item_list:
        movie = get_movie_by_id(item['itemId'])
        recommendation_list.append(movie)
    filter_name = genre_filter_arn.split('/')[1]
    new_rec_DF = pd.DataFrame(recommendation_list, columns = [filter_values])
    # Add this dataframe to the old one
    recommendations_df = pd.concat([recommendations_df, new_rec_DF], axis=1)
    return recommendations_df

In [30]:
def get_new_recommendations_df_by_dynamic_range_filter(recommendations_df, user_id, year_range_filter_arn, filter_value1, filter_value2):
    # Get the recommendations
    get_recommendations_response = personalize_runtime.get_recommendations(
        recommenderArn = recommender_top_picks_arn,
        userId = str(user_id),
        filterArn = range_filter_arn,
        filterValues = {"YEAR1": filter_value1,"YEAR2": filter_value2}
    )
    # Build a new dataframe of recommendations
    item_list = get_recommendations_response['itemList']
    recommendation_list = []
    for item in item_list:
        movie = get_movie_by_id(item['itemId'])
        recommendation_list.append(movie)
    filter_name = range_filter_arn.split('/')[1]
    new_rec_DF = pd.DataFrame(recommendation_list, columns = [filter_value1])
    # Add this dataframe to the old one
    recommendations_df = pd.concat([recommendations_df, new_rec_DF], axis=1)
    return recommendations_df

In [31]:
decades_to_filter = [1930,1940,1950,1960,1970,1980,1990,2000,2010]

In [32]:
# Iterate through Decades
recommendations_df_decade_shelves = pd.DataFrame()
for decade in decades_to_filter:
    recommendations_df_decade_shelves = get_new_recommendations_df_by_dynamic_range_filter(recommendations_df_decade_shelves, user, range_filter_arn , str(decade), str(decade+10))
    
recommendations_df_decade_shelves

Unnamed: 0,1930,1940,1950,1960,1970,1980,1990,2000,2010
0,"Wizard of Oz, The (1939)",Citizen Kane (1941),"African Queen, The (1951)",2001: A Space Odyssey (1968),Monty Python and the Holy Grail (1975),"Princess Bride, The (1987)",Saving Private Ryan (1998),"Lord of the Rings: The Two Towers, The (2002)",The Hunger Games (2012)
1,Gone with the Wind (1939),Twelve O'Clock High (1949),"Bridge on the River Kwai, The (1957)",Lawrence of Arabia (1962),"Deer Hunter, The (1978)",Star Wars: Episode VI - Return of the Jedi (1983),Groundhog Day (1993),"Lord of the Rings: The Return of the King, The...",The Martian (2015)
2,Duck Soup (1933),"Best Years of Our Lives, The (1946)",From Here to Eternity (1953),Dr. Strangelove or: How I Learned to Stop Worr...,Patton (1970),NausicaÃ¤ of the Valley of the Wind (Kaze no t...,Fargo (1996),Battlestar Galactica (2003),Harry Potter and the Deathly Hallows: Part 1 (...
3,Mutiny on the Bounty (1935),Lifeboat (1944),Seven Samurai (Shichinin no samurai) (1954),"Great Escape, The (1963)",M*A*S*H (a.k.a. MASH) (1970),Full Metal Jacket (1987),"Green Mile, The (1999)","Downfall (Untergang, Der) (2004)","Girl with the Dragon Tattoo, The (2011)"
4,Chapayev (1934),Germany Year Zero (Germania anno zero) (Deutsc...,Sabrina (1954),Breakfast at Tiffany's (1961),Alien (1979),Star Wars: Episode V - The Empire Strikes Back...,Austin Powers: The Spy Who Shagged Me (1999),Slumdog Millionaire (2008),Black Swan (2010)
5,Modern Times (1936),"Philadelphia Story, The (1940)",Forbidden Planet (1956),Psycho (1960),Apocalypse Now (1979),"Good Morning, Vietnam (1987)",Being John Malkovich (1999),"Pan's Labyrinth (Laberinto del fauno, El) (2006)",Intouchables (2011)
6,All Quiet on the Western Front (1930),Paisan (PaisÃ ) (1946),Some Like It Hot (1959),"Battle of Algiers, The (La battaglia di Algeri...",Close Encounters of the Third Kind (1977),Stand by Me (1986),"English Patient, The (1996)",Pirates of the Caribbean: The Curse of the Bla...,X-Men: First Class (2011)
7,Captains Courageous (1937),"Great Dictator, The (1940)",The Diary of Anne Frank (1959),"Hustler, The (1961)","Godfather: Part II, The (1974)",Back to the Future (1985),Office Space (1999),Eternal Sunshine of the Spotless Mind (2004),Deadpool (2016)
8,Of Mice and Men (1939),I Know Where I'm Going! (1945),The Wooden Horse (1950),Doctor Zhivago (1965),Tora! Tora! Tora! (1970),Dead Poets Society (1989),"Lock, Stock & Two Smoking Barrels (1998)",Hotel Rwanda (2004),"Avengers, The (2012)"
9,City Lights (1931),"Objective, Burma! (1945)","Man Escaped, A (Un condamnÃ© Ã mort s'est Ã©...",El Cid (1961),Five Easy Pieces (1970),Gandhi (1982),Star Wars: Episode I - The Phantom Menace (1999),"Pianist, The (2002)",Moonrise Kingdom (2012)


In [33]:
# Create a dataframe for the items by reading in the correct source CSV
items_meta_df = pd.read_csv(data_dir + '/item-meta.csv', sep=',', index_col=0)

# Render some sample data
items_meta_df.head(10)

Unnamed: 0_level_0,GENRES,YEAR,CREATION_TIMESTAMP
ITEM_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Adventure|Animation|Children|Comedy|Fantasy,1995,1640995200
2,Adventure|Children|Fantasy,1995,1640995200
3,Comedy|Romance,1995,1640995200
4,Comedy|Drama|Romance,1995,1640995200
5,Comedy,1995,1640995200
6,Action|Crime|Thriller,1995,1640995200
7,Comedy|Romance,1995,1640995200
8,Adventure|Children,1995,1640995200
9,Action,1995,1640995200
10,Action|Adventure|Thriller,1995,1640995200


Now what we want to do is determine the genres to filter on, for that we need a list of all genres. First we will get all the unique values of the column GENRE, then split strings on | if they exist, everyone will then get added to a long list which will be converted to a set for efficiency. That set will then be made into a list so that it can be iterated, and we can then use the get recommendatioins API.

In [34]:
unique_genre_field_values = items_meta_df['GENRES'].unique()

genre_val_list = []

def process_for_bar_char(val, val_list):
    if '|' in val:
        values = val.split('|')
        for item in values:
            val_list.append(item)
    elif '(' in val:
        pass
    else:
        val_list.append(val)
    return val_list
    

for val in unique_genre_field_values:
    genre_val_list = process_for_bar_char(val, genre_val_list)

genres_to_filter = list(set(genre_val_list))

In [68]:
genres_to_filter

['Children',
 'Western',
 'Horror',
 'Thriller',
 'Film-Noir',
 'War',
 'IMAX',
 'Crime',
 'Fantasy',
 'Comedy',
 'Mystery',
 'Sci-Fi',
 'Documentary',
 'Romance',
 'Animation',
 'Adventure',
 'Drama',
 'Musical',
 'Action']

In [35]:
# Iterate through Genres
recommendations_df_genre_shelves = pd.DataFrame()
for genre in genres_to_filter:
    recommendations_df_genre_shelves = get_new_recommendations_df_by_dynamic_filter(recommendations_df_genre_shelves, user, genre_filter_arn , genre)
    
recommendations_df_genre_shelves

Unnamed: 0,Fantasy,Horror,Action,Documentary,Drama,Animation,Musical,War,Western,IMAX,Comedy,Romance,Crime,Film-Noir,Mystery,Adventure,Children,Sci-Fi,Thriller
0,"Lord of the Rings: The Two Towers, The (2002)",Psycho (1960),Saving Private Ryan (1998),Fog of War: Eleven Lessons from the Life of Ro...,2001: A Space Odyssey (1968),NausicaÃ¤ of the Valley of the Wind (Kaze no t...,"Wizard of Oz, The (1939)","African Queen, The (1951)","Horse Soldiers, The (1959)",Apollo 13 (1995),"African Queen, The (1951)","African Queen, The (1951)",Fargo (1996),Sunset Blvd. (a.k.a. Sunset Boulevard) (1950),Citizen Kane (1941),"African Queen, The (1951)","Wizard of Oz, The (1939)",2001: A Space Odyssey (1968),Fargo (1996)
1,"Wizard of Oz, The (1939)",Alien (1979),"Great Escape, The (1963)",Titicut Follies (1967),Saving Private Ryan (1998),Princess Mononoke (Mononoke-hime) (1997),Duck Soup (1933),Saving Private Ryan (1998),Butch Cassidy and the Sundance Kid (1969),Harry Potter and the Deathly Hallows: Part 1 (...,Dr. Strangelove or: How I Learned to Stop Worr...,Groundhog Day (1993),"Green Mile, The (1999)",Dark City (1998),North by Northwest (1959),2001: A Space Odyssey (1968),Toy Story (1995),Battlestar Galactica (2003),"Lock, Stock & Two Smoking Barrels (1998)"
2,Groundhog Day (1993),"Birds, The (1963)","Lord of the Rings: The Return of the King, The...","Blood of the Beasts (Sang des bÃªtes, Le) (1949)",Lawrence of Arabia (1962),Toy Story (1995),South Pacific (1958),Lawrence of Arabia (1962),Once Upon a Time in the West (C'era una volta ...,"Matrix Reloaded, The (2003)",Groundhog Day (1993),Gone with the Wind (1939),Psycho (1960),"Killers, The (1946)",Rashomon (RashÃ´mon) (1950),"Lord of the Rings: The Two Towers, The (2002)",Toy Story 2 (1999),Star Wars: Episode VI - Return of the Jedi (1983),Pulp Fiction (1994)
3,"Lord of the Rings: The Return of the King, The...","Shining, The (1980)",Austin Powers: The Spy Who Shagged Me (1999),Super Size Me (2004),"Bridge on the River Kwai, The (1957)",Toy Story 2 (1999),"Hard Day's Night, A (1964)",Dr. Strangelove or: How I Learned to Stop Worr...,Major Dundee (1965),"Avengers, The (2012)",Fargo (1996),Breakfast at Tiffany's (1961),Office Space (1999),Call Northside 777 (1948),"Saragossa Manuscript, The (Rekopis znaleziony ...","Wizard of Oz, The (1939)",Aladdin (1992),Forbidden Planet (1956),"Matrix, The (1999)"
4,Monty Python and the Holy Grail (1975),King Kong (1933),Seven Samurai (Shichinin no samurai) (1954),Night and Fog (Nuit et brouillard) (1955),Fargo (1996),Spirited Away (Sen to Chihiro no kamikakushi) ...,West Side Story (1961),"Bridge on the River Kwai, The (1957)",High Noon (1952),Harry Potter and the Prisoner of Azkaban (2004),Austin Powers: The Spy Who Shagged Me (1999),From Here to Eternity (1953),"Lock, Stock & Two Smoking Barrels (1998)",This Gun for Hire (1942),Charade (1963),Lawrence of Arabia (1962),Finding Nemo (2003),Star Wars: Episode I - The Phantom Menace (1999),Independence Day (a.k.a. ID4) (1996)
5,Being John Malkovich (1999),Reptilicus (1961),"Princess Bride, The (1987)",Bowling for Columbine (2002),"Lord of the Rings: The Return of the King, The...",How the Grinch Stole Christmas! (1966),Singin' in the Rain (1952),"Great Escape, The (1963)","Treasure of the Sierra Madre, The (1948)",Harry Potter and the Goblet of Fire (2005),Monty Python and the Holy Grail (1975),"English Patient, The (1996)",Pulp Fiction (1994),Strangers on a Train (1951),"Game, The (1997)","Bridge on the River Kwai, The (1957)",Field of Dreams (1989),"Matrix, The (1999)",North by Northwest (1959)
6,"Princess Bride, The (1987)",Aliens (1986),"Dirty Dozen, The (1967)",Touching the Void (2003),"Great Escape, The (1963)",Aladdin (1992),Night and Day (1946),Gone with the Wind (1939),Shenandoah (1965),Batman Begins (2005),Being John Malkovich (1999),"Princess Bride, The (1987)",American History X (1998),Touch of Evil (1958),"City of Lost Children, The (CitÃ© des enfants ...","Lord of the Rings: The Return of the King, The...",101 Dalmatians (One Hundred and One Dalmatians...,"Truman Show, The (1998)","Pan's Labyrinth (Laberinto del fauno, El) (2006)"
7,NausicaÃ¤ of the Valley of the Wind (Kaze no t...,"Last Man on Earth, The (Ultimo uomo della Terr...",Star Wars: Episode VI - Return of the Jedi (1983),Control Room (2004),"Green Mile, The (1999)",101 Dalmatians (One Hundred and One Dalmatians...,Holiday Inn (1942),From Here to Eternity (1953),Stagecoach (1939),Journey 2: The Mysterious Island (2012),Office Space (1999),Doctor Zhivago (1965),Trainspotting (1996),L.A. Confidential (1997),Arsenic and Old Lace (1944),"Great Escape, The (1963)","Chronicles of Narnia: The Lion, the Witch and ...",Star Wars: Episode V - The Empire Strikes Back...,"Manchurian Candidate, The (1962)"
8,"Pan's Labyrinth (Laberinto del fauno, El) (2006)","Sixth Sense, The (1999)",Three Kings (1999),'Hellboy': The Seeds of Creation (2004),Citizen Kane (1941),Yellow Submarine (1968),Show Boat (1951),Battlestar Galactica (2003),"Alamo, The (1960)",Cloud Atlas (2012),"Lock, Stock & Two Smoking Barrels (1998)",Mutiny on the Bounty (1962),Some Like It Hot (1959),"Double Life, A (1947)",To Catch a Thief (1955),Austin Powers: The Spy Who Shagged Me (1999),Jumanji (1995),Independence Day (a.k.a. ID4) (1996),Snatch (2000)
9,Pirates of the Caribbean: The Curse of the Bla...,"Mummy, The (1999)",Star Wars: Episode I - The Phantom Menace (1999),Sympathy for the Devil (1968),Gone with the Wind (1939),Akira (1988),"Court Jester, The (1956)","English Patient, The (1996)",Northwest Passage (1940),Spider-Man 2 (2004),"Princess Bride, The (1987)",Sabrina (1954),Slumdog Millionaire (2008),Double Indemnity (1944),"Maltese Falcon, The (a.k.a. Dangerous Female) ...",Monty Python and the Holy Grail (1975),"Goonies, The (1985)",NausicaÃ¤ of the Valley of the Wind (Kaze no t...,Lord of War (2005)


## Real-time Events<a class="anchor" id="real-time"></a>
[Back to top](#top)

The next topic is real-time events. Personalize has the ability to listen to events from your application in order to update the recommendations shown to the user. This is especially useful in media workloads, like video-on-demand, where a customer's intent may differ based on if they are watching with their children or on their own.

Additionally the events that are recorded via this system are stored until a delete call from you is issued, and they are used as historical data alongside the other interaction data you provided when you train your next models.

Start by creating an event tracker that is attached to the dataset group. This event tracker will add information to the dataset and will influence the recommendations.

In [36]:
response = personalize.create_event_tracker(
    name='MovieTracker',
    datasetGroupArn=dataset_group_arn
)
print(response['eventTrackerArn'])
print(response['trackingId'])
trackingId = response['trackingId']
event_tracker_arn = response['eventTrackerArn']

arn:aws:personalize:us-east-1:051545784337:event-tracker/103c08dd
e59cabd9-b33d-4d25-ab96-cce30eaed850


We will create some code that simulates a user interacting with a particular item. After running this code, you will get recommendations that differ from the results above.

We start by creating some methods for the simulation of real time events.

In [37]:
sessionDict = {}

def send_movie_click(userId, itemId, eventType):
    """
    Simulates a click as an envent
    to send an event to Amazon Personalize's Event Tracker
    """
    # Configure Session
    try:
        sessionId = sessionDict[str(userId)]
    except:
        sessionDict[str(userId)] = str(uuid.uuid1())
        sessionId = sessionDict[str(userId)]
        
    # Configure Properties:
    event = {
    "itemId": str(itemId),
    }
    event_json = json.dumps(event)
        
    # Make Call
    
    personalize_events.put_events(
    trackingId = trackingId,
    userId= str(userId),
    sessionId = sessionId,
    eventList = [{
        'sentAt': int(time.time()),
        'eventType': str(eventType),
        'properties': event_json
        }]
    )

def get_new_recommendations_df_users_real_time(recommendations_df, userId, itemId, eventType):
    # Get the artist name (header of column)
    movieName = get_movie_by_id(itemId)
    
    # Interact with different movies
    print('sending event ' + eventType + ' for ' + get_movie_by_id(itemId))
    send_movie_click(userId=userId, itemId=itemId,eventType=eventType)
    # Get the recommendations (note you should have a base recommendation DF created before)
    get_recommendations_response = personalize_runtime.get_recommendations(
        recommenderArn = recommender_top_picks_arn,
        userId = str(userId),
    )
    # Build a new dataframe of recommendations
    itemList = get_recommendations_response['itemList']
    recommendation_list = []
    for item in itemList:
        artist = get_movie_by_id(item['itemId'])
        recommendation_list.append(artist)
    new_rec_df = pd.DataFrame(recommendation_list, columns = [movieName])
    # Add this dataframe to the old one
    #recommendations_df = recommendations_df.join(new_rec_DF)
    recommendations_df = pd.concat([recommendations_df, new_rec_df], axis=1)
    return recommendations_df

At this point, we haven't generated any real-time events yet; we have only set up the code. To compare the recommendations before and after the real-time events, let's pick one user and generate the original recommendations for them.

In [38]:
# Get recommendations for the user
get_recommendations_response = personalize_runtime.get_recommendations(
        recommenderArn = recommender_top_picks_arn,
        userId = str(rerank_user),
    )

# Build a new dataframe for the recommendations
itemList = get_recommendations_response['itemList']
recommendationList = []
for item in item_list:
    artist = get_movie_by_id(item['itemId'])
    recommendationList.append(artist)
user_recommendations_df = pd.DataFrame(recommendationList, columns = [rerank_user])
user_recommendations_df

Unnamed: 0,421
0,Shutter Island (2010)
1,"Exorcist, The (1973)"
2,Pride and Prejudice (1995)
3,Napoleon Dynamite (2004)
4,Jerry Maguire (1996)
5,"Jerk, The (1979)"
6,Ghost in the Shell (KÃ´kaku kidÃ´tai) (1995)
7,Control Room (2004)
8,"Babysitter, The (1995)"
9,"Passenger, The (Professione: reporter) (1975)"


Ok, so now we have a list of recommendations for this user before we have applied any real-time events. Now let's pick 3 random artists which we will simulate our user interacting with, and then see how this changes the recommendations.

In [39]:
# Next generate 3 random movies
movies = items_df.sample(3).index.tolist()

In [41]:
# Note this will take about 15 seconds to complete due to the sleeps
for movie in movies:
    user_recommendations_df = get_new_recommendations_df_users_real_time(user_recommendations_df, rerank_user, movie,'click')
    time.sleep(5)
    

sending event click for Inglorious Bastards (Quel maledetto treno blindato) (1978)
sending event click for Analyze That (2002)
sending event click for Five-Year Engagement, The (2012)


Now we can look at how the click events changed the recommendations.

In [42]:
user_recommendations_df

Unnamed: 0,421,Inglorious Bastards (Quel maledetto treno blindato) (1978),Analyze That (2002),"Five-Year Engagement, The (2012)"
0,Shutter Island (2010),"African Queen, The (1951)","African Queen, The (1951)","O Brother, Where Art Thou? (2000)"
1,"Exorcist, The (1973)",2001: A Space Odyssey (1968),2001: A Space Odyssey (1968),City of God (Cidade de Deus) (2002)
2,Pride and Prejudice (1995),Saving Private Ryan (1998),Saving Private Ryan (1998),Snatch (2000)
3,Napoleon Dynamite (2004),Lawrence of Arabia (1962),"Wizard of Oz, The (1939)","African Queen, The (1951)"
4,Jerry Maguire (1996),"Bridge on the River Kwai, The (1957)","Lord of the Rings: The Two Towers, The (2002)",Ocean's Eleven (2001)
5,"Jerk, The (1979)","Lord of the Rings: The Two Towers, The (2002)",Dr. Strangelove or: How I Learned to Stop Worr...,Blood Diamond (2006)
6,Ghost in the Shell (KÃ´kaku kidÃ´tai) (1995),Dr. Strangelove or: How I Learned to Stop Worr...,Lawrence of Arabia (1962),Gangs of New York (2002)
7,Control Room (2004),"Wizard of Oz, The (1939)","Bridge on the River Kwai, The (1957)","Lord of the Rings: The Two Towers, The (2002)"
8,"Babysitter, The (1995)",Groundhog Day (1993),Fargo (1996),Blow (2001)
9,"Passenger, The (Professione: reporter) (1975)",Fargo (1996),Groundhog Day (1993),"Great Escape, The (1963)"


In the cell above, the first column after the index is the user's default recommendations from the "Top pics for you" recommender, and each column after that has as a header of the movie that they interacted with via a real time event, and the recommendations after this event occurred. 

The behavior may not shift very much or a lot; this is due to the relatively limited nature of this dataset and effect of a few random clicks. If you wanted to better understand this, try simulating clicking more movies to see the impact.

Now lets look at the event filters, which allow you to filter items based on the interaction data. For this dataset, it could be click or watch based on the data we imported, but could be based on whatever interaction schema you design (click, rate, like, watch, purchase etc.) 

We will create a new helper function to use the personalized ranking campaign, sice the Recommenders already filter out watched content.

In [43]:
def get_new_ranked_recommendations_df_by_static_filter(recommendations_df, user_id, rerank_item_list, filter_arn):
    
    # Get the recommendations
    get_recommendations_response = personalize_runtime.get_personalized_ranking(
        campaignArn = rerank_campaign_arn,
        userId = str(user_id),
        inputList = rerank_item_list,
        filterArn = filter_arn
    )
    # Build a new dataframe of recommendations
    item_list = get_recommendations_response['personalizedRanking']
    recommendation_list = []
    for item in item_list:
        movie = get_movie_by_id(item['itemId'])
        recommendation_list.append(movie)

    filter_name = filter_arn.split('/')[1]
    new_rec_df = pd.DataFrame(recommendation_list, columns = [filter_name])
    # Add this dataframe to the old one
    recommendations_df = pd.concat([recommendations_df, new_rec_df], axis=1)
    return recommendations_df

In [44]:
recommendations_df_events = pd.DataFrame()
for filter_arn in interaction_filter_arns:
    recommendations_df_events = get_new_ranked_recommendations_df_by_static_filter(recommendations_df_events, rerank_user, rerank_item_list, filter_arn)
    
recommendations_df_events

Unnamed: 0,watched,unwatched
0,,Jerry Maguire (1996)
1,,Pride and Prejudice (1995)
2,,Arthur (1981)
3,,"Jerk, The (1979)"
4,,"Quiet Man, The (1952)"
5,,Napoleon Dynamite (2004)
6,,Very Bad Things (1998)
7,,She's So Lovely (1997)
8,,Control Room (2004)
9,,Shutter Island (2010)


Now lets send a watch event in for the top 4 unwatched recommendations, which would simulate watching 4 movies. In a VOD application, you may choose to send in an event after they have watched a significant amount (over 75%) of a piece of content. Sending at 100% complete could miss people that stop short of the credits.

In [45]:
ranked_unwatched_recommendations_response = personalize_runtime.get_personalized_ranking(
    campaignArn = rerank_campaign_arn,
    userId = str(rerank_user),
    inputList = rerank_item_list,
    filterArn = filter_arn)

item_list = ranked_unwatched_recommendations_response['personalizedRanking'][:4]

for item in item_list:
    print('sending event watch for ' + get_movie_by_id(item['itemId']))
    send_movie_click(userId=rerank_user, itemId=item['itemId'], eventType='Watch')
    time.sleep(10)

sending event watch for Jerry Maguire (1996)
sending event watch for Pride and Prejudice (1995)
sending event watch for Arthur (1981)
sending event watch for Jerk, The (1979)


Now we can look at the event filters to see the updated watched and unwatched recommendations 

In [46]:
recommendations_df_events = pd.DataFrame()
for filter_arn in interaction_filter_arns:
    recommendations_df_events = get_new_ranked_recommendations_df_by_static_filter(recommendations_df_events, rerank_user, rerank_item_list, filter_arn)
recommendations_df_events

Unnamed: 0,watched,unwatched
0,"Jerk, The (1979)",Napoleon Dynamite (2004)
1,Jerry Maguire (1996),"Exorcist, The (1973)"
2,Arthur (1981),Religulous (2008)
3,Pride and Prejudice (1995),Gena the Crocodile (1969)
4,,Shutter Island (2010)
5,,"Passenger, The (Professione: reporter) (1975)"
6,,"Quiet Man, The (1952)"
7,,Ghost in the Shell (KÃ´kaku kidÃ´tai) (1995)
8,,"Babysitter, The (1995)"
9,,Control Room (2004)


###### Wrap up <a class="anchor" id="wrapup"></a>
[Back to top](#top)

With that you now have a fully working collection of models to tackle various recommendation and personalization scenarios, as well as the skills to manipulate customer data to better integrate with the service, and a knowledge of how to do all this over APIs and by leveraging open source data science tools.

Use these notebooks as a guide to getting started with your customers for POCs. As you find missing components, or discover new approaches, make a pull request and provide any additional helpful components that may be missing from this collection.

You can choose to head to `06_Operations_Layer.ipynb` to go deeper into ML Ops and what a production solution can look like with an automation pipeline.

You'll want to make sure that you clean up all of the resources deployed during this POC. We have provided a separate notebook which shows you how to identify and delete the resources in `07_Clean_Up.ipynb`.

In [54]:
%store event_tracker_arn


Stored 'event_tracker_arn' (str)
Stored 'batchInferenceJobArn' (str)
