# Using Personalize campaigns  on synthetic cars data
This notebook exercises campaigns that have been built in the other notebooks. The intent is to demonstrate
that the results coming back from requests for recommendations are accurate given the dataset we provided.

There are specific sections for each of the following models:

1. [Personalized Ranking](#Exercise-the-personalized-ranking-campaign)
2. [HRNN](#Exercise-the-hrnn-campaign)
3. [SIMS](#Exercise-the-SIMS-campaign)
4. [HRNN-Metadata]()
5. [Popularity Count](#Exercise-the-popularity-campaign)

In addition, we have a section for experimenting with 
[Personalize Event Tracker](#Use-real-time-events).

## Imports, overall settings, initialization

In [62]:
account_num        = '<your-account-num>'

In [63]:
import json
import boto3
import time
import datetime
import pandas as pd
from sklearn.utils import shuffle

region   = boto3.Session().region_name # or replace with your preferred region
print(region)

dataset_group_name = 'car-dg'

dg_arn = 'arn:aws:personalize:{}:{}:dataset-group/{}'.format(region, 
                                                             account_num, 
                                                             dataset_group_name)

cars_filename         = 'car_items.csv'
users_filename        = 'users.csv'
interactions_filename = 'interactions.csv'
int_exp_filename      = 'interactions_expanded.csv'

ranking_arn           = 'arn:aws:personalize:{}:{}:campaign/car-personalized-ranking'.format(region, account_num)
sims_arn              = 'arn:aws:personalize:{}:{}:campaign/car-sims'.format(region, account_num)
hrnn_arn              = 'arn:aws:personalize:{}:{}:campaign/car-hrnn'.format(region, account_num)
hrnn_metadata_arn     = 'arn:aws:personalize:{}:{}:campaign/car-hrnn-metadata'.format(region, account_num)
pop_arn               = 'arn:aws:personalize:{}:{}:campaign/car-popularity-count'.format(region, account_num)

us-east-1


In [64]:
personalize           = boto3.client('personalize')
personalize_runtime   = boto3.client('personalize-runtime')
personalize_events    = boto3.client('personalize-events')

In [65]:
def show_item_interaction_history(int_df, item_id):
    _tmp_df = int_df[int_df.ITEM_ID == item_id].sort_values('TIMESTAMP')
    print(_tmp_df.shape)
    return _tmp_df[['USER_ID','ITEM_ID','WHEN',
                    'FAV','YEAR','GENDER','SALARY']]

In [66]:
def show_user_interaction_history(int_df, user_id):
    _tmp_df = int_df[int_df.USER_ID == int(user_id)].sort_values('TIMESTAMP')
    print(_tmp_df.shape)
    return _tmp_df[['USER_ID','ITEM_ID','WHEN',
                    'FAV','YEAR','PRICE','MILEAGE']]

In [67]:
def date_to_string(ts):
    return datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S')

In [68]:
def is_campaign_active(c):
    _is_active = False
    
    try:
        _resp = personalize.describe_campaign(campaignArn = c)
        _campaign_status = _resp['campaign']['status']
        if _campaign_status == 'ACTIVE':
            _is_active = True
    except Exception as e:
        pass
        
    return _is_active

In [69]:
int_expanded_df = pd.read_csv(int_exp_filename)

int_expanded_df['WHEN'] = int_expanded_df['TIMESTAMP'].apply(date_to_string)

NUM_CLUSTERS = len(int_expanded_df.FAV_CLUSTER.value_counts())
print('{} clusters'.format(NUM_CLUSTERS))

100 clusters


In [70]:
items_to_rank = int_expanded_df.sample(10)
items_to_rank.head(3)

Unnamed: 0,USER_ID,ITEM_ID,TIMESTAMP,SESSION_ID,MAKE,MODEL,YEAR,MILEAGE,PRICE,AGE,GENDER,LOCATION,SALARY,FAV_CLUSTER,FAV_MODEL,FAV,HOURS_AGO,WHEN
55057,22549,21220,1563207798,48831,Chevrolet,Suburban,2014,78448,33654,44,MALE,10468,26852,44,22,NEWISH-Chevrolet-Suburban,43.85,2019-07-15 16:23:18
675905,17040,29498,1562824530,35336,GMC,Acadia,2015,63545,36952,47,FEMALE,93257,30995,22,11,NEWISH-GMC-Acadia,150.313333,2019-07-11 05:55:30
32449,8361,21618,1562770493,16680,Chrysler,Pacifica,2014,78482,34866,41,FEMALE,94544,49452,46,23,NEWISH-Chrysler-Pacifica,165.323611,2019-07-10 14:54:53


In [71]:
def print_item(item_id):
    tmp = int_expanded_df[int_expanded_df.ITEM_ID == item_id].iloc[0]
    print('Id: {}, Make: {}, Model: {}, Fav: {}, Year: {}'.format(item_id,
         tmp['MAKE'], tmp['MODEL'], tmp['FAV'], tmp['YEAR']))

Skip ahead to try out various campaigns:

1. [Personalized Ranking](#Exercise-the-personalized-ranking-campaign)
2. [HRNN](#Exercise-the-hrnn-campaign)
3. [SIMS](#Exercise-the-SIMS-campaign)
4. [HRNN-Metadata]()
5. [Popularity Count](#Exercise-the-popularity-campaign)

In addition, we have a section for experimenting with 
[Personalize Event Tracker](#Use-real-time-events).

## Exercise the Personalized Ranking campaign
Here we want to see Personalize re-rank a set of search results. For our sample, we will pass
a user that likes oldish cars and would expect oldish cars to appear closer to the top. Likewise, we will
pass a user that likes newish cars and expect the higher ranked cars to be newish.

In [72]:
full_df = pd.DataFrame(columns=['USER_ID','FAV','FAV_CLUSTER'])

for i in range(NUM_CLUSTERS):
    tmp_df  = int_expanded_df[int_expanded_df.FAV_CLUSTER == i][['USER_ID','FAV','FAV_CLUSTER']].sample(1)
    full_df = pd.concat([full_df, tmp_df])
ranking_user_df = shuffle(full_df)
ranking_user_list = ranking_user_df['USER_ID'].values.astype(str).tolist()
ranking_user_df.head(NUM_CLUSTERS)

Unnamed: 0,USER_ID,FAV,FAV_CLUSTER
717562,14548,NEWISH-Chevrolet-Traverse,88
670482,8358,OLDISH-Volkswagen-Tiguan,83
722202,20635,OLDISH-Volkswagen-Jetta,91
748958,8768,NEWISH-Buick-Enclave,4
745926,8486,OLDISH-Nissan-Leaf,93
749953,11493,NEWISH-Dodge-Durango,98
739866,7909,OLDISH-Kia-Sedona,7
720179,14083,OLDISH-Jeep-Grand Cherokee,87
189199,13855,NEWISH-Ford-Explorer,52
272750,10893,NEWISH-Hyundai-Elantra,70


In [73]:
full_df = pd.DataFrame(columns=['ITEM_ID','FAV'])

for i in range(NUM_CLUSTERS):
    tmp_df  = int_expanded_df[int_expanded_df.FAV_CLUSTER == i][['ITEM_ID','FAV']].sample(1)
    full_df = pd.concat([full_df, tmp_df])
ranking_item_df = shuffle(full_df)
ranking_item_list = ranking_item_df['ITEM_ID'].values.astype(str).tolist()
ranking_item_df.head()

Unnamed: 0,ITEM_ID,FAV
656811,24821,OLDISH-BMW-430i
749166,28108,OLDISH-BMW-330i
58590,20861,NEWISH-Chevrolet-Suburban
749383,25620,OLDISH-Toyota-Camry
592259,20193,NEWISH-Volvo-240


In [74]:
def print_ranking_target_df(user_id, input_df, target_cluster):
    print('\nRanking for user: {}'.format(user_id))
    
    _input_list = input_df['ITEM_ID'].values.astype(str).tolist()
    
    personalized_ranking_response = personalize_runtime.get_personalized_ranking(
        campaignArn = ranking_arn, userId = str(user_id), inputList = _input_list)
    
    i = 0
    _rank = len(_input_list)
    for item in personalized_ranking_response['personalizedRanking']:
        item_id = item['itemId']
        tmp = int_expanded_df[int_expanded_df.ITEM_ID == int(item_id)].iloc[0]
        _fav_cluster = tmp['FAV_CLUSTER']
        if (target_cluster == _fav_cluster) & (_rank == len(_input_list)):
            _rank = i
        print('Id: {}, Make: {}, Model: {}, Year: {}, Fav: {}'.format(item_id,
         tmp['MAKE'], tmp['MODEL'], tmp['YEAR'], tmp['FAV']))
        i += 1
    return _rank

In [75]:
full_df = pd.DataFrame(columns=['ITEM_ID','FAV','FAV_CLUSTER'])

random_item_df   = shuffle(int_expanded_df[['ITEM_ID','FAV', 'FAV_CLUSTER']].sample(25))
random_item_list = random_item_df['ITEM_ID'].values.astype(str).tolist()
random_item_df.head()

Unnamed: 0,ITEM_ID,FAV,FAV_CLUSTER
258220,21042,OLDISH-Hyundai-Elantra,71
654781,16728,NEWISH-Ford-F150,38
738965,31119,OLDISH-Volvo-V60,81
646754,22499,OLDISH-Chevrolet-Silverado,35
329913,26998,OLDISH-Honda-Pilot,37


#### Try personalized ranking on a curated set of items with each car cluster covered
Here we take a curated set of items, with one item for each car cluster. Personalize
should be able to re-rank in such a way that the specific item that would best match
the user rises to the 0th position.

In [76]:
if is_campaign_active(ranking_arn):
    rank_total = 0
    for i in range(NUM_CLUSTERS):
        user_fav = ranking_user_df.iloc[i]['FAV']
        user_fav_cluster = ranking_user_df.iloc[i]['FAV_CLUSTER']
        print('\nRanking for user that prefers: {}'.format(user_fav))
        rank = print_ranking_target_df(ranking_user_list[i], ranking_item_df, user_fav_cluster)
        if (random_item_df.shape[0] == rank):
            print('**desired cluster was not found in the item set')
            rank = 0 # reset to not penalize when no item was available
        else:
            print('**rank {}'.format(rank))
        rank_total += rank

    print('\nRank average: {:.2f}'.format(rank_total/len(ranking_item_list)))
else:
    print('Personalized ranking campaign not active: {}'.format(ranking_arn))


Ranking for user that prefers: NEWISH-Chevrolet-Traverse

Ranking for user: 14548
Id: 24271, Make: Chevrolet, Model: Traverse, Year: 2015, Fav: NEWISH-Chevrolet-Traverse
Id: 20159, Make: Honda, Model: Odyssey, Year: 2015, Fav: NEWISH-Honda-Odyssey
Id: 20902, Make: Nissan, Model: Altima, Year: 2016, Fav: NEWISH-Nissan-Altima
Id: 27015, Make: Nissan, Model: Altima, Year: 2013, Fav: OLDISH-Nissan-Altima
Id: 16182, Make: Dodge, Model: Caravan, Year: 2014, Fav: NEWISH-Dodge-Caravan
Id: 28609, Make: Buick, Model: LaCrosse, Year: 2013, Fav: OLDISH-Buick-LaCrosse
Id: 21081, Make: Volkswagen, Model: Tiguan, Year: 2013, Fav: OLDISH-Volkswagen-Tiguan
Id: 24972, Make: Subaru, Model: Outback, Year: 2013, Fav: OLDISH-Subaru-Outback
Id: 22501, Make: Volvo, Model: V60, Year: 2014, Fav: NEWISH-Volvo-V60
Id: 28128, Make: Ford, Model: Explorer, Year: 2014, Fav: NEWISH-Ford-Explorer
Id: 31508, Make: Chevrolet, Model: Silverado, Year: 2016, Fav: NEWISH-Chevrolet-Silverado
Id: 33936, Make: Toyota, Model: P

## Exercise the hrnn campaign
Here we try out the hrnn campaign. We ask Personalize for recommendations for a particular user. Our hope is that
it would detect that this user likes old or new cars and would return a list accordingly. 
Best case is that the recommended list of cars entirely matches the user's preferred car
cluster.

In [77]:
users_to_try = int_expanded_df.sample(5)
users_to_try[['USER_ID','FAV']].head(3)

Unnamed: 0,USER_ID,FAV
559258,1877,OLDISH-Subaru-Outback
539277,14341,OLDISH-Ford-Mustang
657923,16068,OLDISH-BMW-430i


In [78]:
show_user_interaction_history(int_expanded_df, users_to_try.iloc[0]['USER_ID'])

(20, 18)


Unnamed: 0,USER_ID,ITEM_ID,WHEN,FAV,YEAR,PRICE,MILEAGE
559255,1877,26802,2019-07-15 00:27:47,OLDISH-Subaru-Outback,2011,17583,125579
559251,1877,26247,2019-07-15 00:29:47,OLDISH-Subaru-Outback,2012,28235,111179
559260,1877,30193,2019-07-15 00:33:47,OLDISH-Subaru-Outback,2012,27084,105120
559266,1877,30058,2019-07-15 00:39:47,OLDISH-Subaru-Outback,2010,20205,143042
559263,1877,30442,2019-07-15 00:47:47,OLDISH-Subaru-Outback,2012,20331,105673
559250,1877,24479,2019-07-15 00:57:47,OLDISH-Subaru-Outback,2013,29524,91895
559259,1877,20784,2019-07-15 01:09:47,OLDISH-Subaru-Outback,2013,23835,94644
559252,1877,27320,2019-07-15 01:23:47,OLDISH-Subaru-Outback,2013,30890,97988
559264,1877,29458,2019-07-15 01:39:47,OLDISH-Subaru-Outback,2012,20213,114945
559256,1877,29252,2019-07-15 01:57:47,OLDISH-Subaru-Outback,2012,23048,106259


In [79]:
if is_campaign_active(hrnn_arn):
    for i in range(5):
        user_id     = str(users_to_try.iloc[i]['USER_ID'])
        fav         = users_to_try.iloc[i]['FAV']
        fav_cluster = users_to_try.iloc[i]['FAV_CLUSTER']

        print('Getting recommendations for user: {}, who likes: {}'.format(user_id, fav))
        response = personalize_runtime.get_recommendations(campaignArn=hrnn_arn, 
                                                           userId=user_id, 
                                                           numResults=10)
        items = response['itemList']

        match = 0
        actual_num_results = len(items)

        for item in items:
            _curr_item_id  = int(item['itemId'])
            _curr_cluster  = int_expanded_df[int_expanded_df.ITEM_ID == _curr_item_id].iloc[0]['FAV_CLUSTER']
            if fav_cluster == _curr_cluster:
                match += 1
            print_item(_curr_item_id)
        print('Matched {:.0f}% ({}/{})'.format(100 * match/actual_num_results, match, actual_num_results))
        print('')
else:
    print('HRNN campaign not active: {}'.format(hrnn_arn))

Getting recommendations for user: 1877, who likes: OLDISH-Subaru-Outback
Id: 24062, Make: Subaru, Model: Outback, Fav: OLDISH-Subaru-Outback, Year: 2012
Id: 23782, Make: Subaru, Model: Outback, Fav: OLDISH-Subaru-Outback, Year: 2011
Id: 24753, Make: Subaru, Model: Outback, Fav: OLDISH-Subaru-Outback, Year: 2012
Id: 24892, Make: Subaru, Model: Outback, Fav: OLDISH-Subaru-Outback, Year: 2007
Id: 25534, Make: Subaru, Model: Outback, Fav: OLDISH-Subaru-Outback, Year: 2013
Id: 25579, Make: Subaru, Model: Outback, Fav: OLDISH-Subaru-Outback, Year: 2013
Id: 22404, Make: Subaru, Model: Outback, Fav: OLDISH-Subaru-Outback, Year: 2013
Id: 23395, Make: Subaru, Model: Outback, Fav: OLDISH-Subaru-Outback, Year: 2012
Id: 31489, Make: Subaru, Model: Outback, Fav: OLDISH-Subaru-Outback, Year: 2012
Id: 25481, Make: Subaru, Model: Outback, Fav: OLDISH-Subaru-Outback, Year: 2011
Matched 100% (10/10)

Getting recommendations for user: 14341, who likes: OLDISH-Ford-Mustang
Id: 29081, Make: Ford, Model: Mus

## Exercise the SIMS campaign
Here we experiment with the SIMS campaign. We loop through a list of items that have at 
least some interactions historically. 
For each car, we would expect similar cars to be similar in age, make and model.
We leverage car clusters and would like to see Personalize generate a list of similar cars
that entirely come from the same car cluster.

In [80]:
items_to_try = int_expanded_df.sample(5)
items_to_try[['ITEM_ID','FAV']].head(5)

Unnamed: 0,ITEM_ID,FAV
356080,29108,NEWISH-Ford-Mustang
590823,21072,OLDISH-Ford-Mustang
447720,24989,NEWISH-Chevrolet-Camaro
293836,28483,NEWISH-Chrysler-200
102345,29564,OLDISH-Chrysler-Pacifica


In [81]:
if is_campaign_active(sims_arn):
    desired_num_results = 10
    for i in range(items_to_try.shape[0]):
        item_id     = str(items_to_try.iloc[i]['ITEM_ID'])
        fav         = items_to_try.iloc[i]['FAV']
        fav_cluster = items_to_try.iloc[i]['FAV_CLUSTER']

        print('Getting items similar to: {}, which is a: {}'.format(item_id, fav))
        response = personalize_runtime.get_recommendations(campaignArn=sims_arn, 
                                                           itemId=item_id, 
                                                           numResults=desired_num_results)
        items = response['itemList']
        
        match = 0
        actual_num_results = len(items)
        
        for item in items:
            _curr_item_id  = int(item['itemId'])
            _curr_cluster  = int_expanded_df[int_expanded_df.ITEM_ID == _curr_item_id].iloc[0]['FAV_CLUSTER']
            if fav_cluster == _curr_cluster:
                match += 1
            print_item(_curr_item_id)
        print('Matched {:.0f}% ({}/{})'.format(100 * match/actual_num_results, match, actual_num_results))
        print('')
else:
    print('SIMS campaign not active: {}'.format(sims_arn))

Getting items similar to: 29108, which is a: NEWISH-Ford-Mustang
Id: 25852, Make: Ford, Model: Mustang, Fav: NEWISH-Ford-Mustang, Year: 2014
Id: 21475, Make: Ford, Model: Mustang, Fav: NEWISH-Ford-Mustang, Year: 2015
Id: 20513, Make: Ford, Model: Mustang, Fav: NEWISH-Ford-Mustang, Year: 2017
Id: 24048, Make: Ford, Model: Mustang, Fav: NEWISH-Ford-Mustang, Year: 2017
Id: 29908, Make: Ford, Model: Mustang, Fav: NEWISH-Ford-Mustang, Year: 2015
Id: 21853, Make: Ford, Model: Mustang, Fav: NEWISH-Ford-Mustang, Year: 2014
Id: 16900, Make: Ford, Model: Mustang, Fav: NEWISH-Ford-Mustang, Year: 2015
Id: 27087, Make: Ford, Model: Mustang, Fav: NEWISH-Ford-Mustang, Year: 2016
Id: 22862, Make: Ford, Model: Mustang, Fav: NEWISH-Ford-Mustang, Year: 2014
Id: 23408, Make: Ford, Model: Mustang, Fav: NEWISH-Ford-Mustang, Year: 2014
Matched 100% (10/10)

Getting items similar to: 21072, which is a: OLDISH-Ford-Mustang
Id: 25379, Make: Ford, Model: Mustang, Fav: OLDISH-Ford-Mustang, Year: 2011
Id: 28227, M

## Exercise the hrnn-metadata campaign
Here we try out the hrnn-metadata campaign. 
We ask Personalize for recommendations for a particular user. Our hope is that
it would detect that this user likes old or new cars and would return a list accordingly.

In [82]:
users_to_try = int_expanded_df.sample(10)
users_to_try[['USER_ID','FAV']].head(3)

Unnamed: 0,USER_ID,FAV
206482,16330,NEWISH-Ford-F150
18067,17005,NEWISH-Buick-Regal
632842,18701,NEWISH-Dodge-Caravan


In [83]:
show_user_interaction_history(int_expanded_df, users_to_try.iloc[0]['USER_ID'])

(70, 18)


Unnamed: 0,USER_ID,ITEM_ID,WHEN,FAV,YEAR,PRICE,MILEAGE
206526,16330,27202,2019-07-10 20:00:04,NEWISH-Ford-F150,2014,26485,75679
206509,16330,26692,2019-07-10 20:02:04,NEWISH-Ford-F150,2015,31745,67243
206512,16330,26963,2019-07-10 20:06:04,NEWISH-Ford-F150,2016,34469,46700
206492,16330,21428,2019-07-10 20:12:04,NEWISH-Ford-F150,2017,36431,35074
206522,16330,28404,2019-07-10 20:20:04,NEWISH-Ford-F150,2016,34149,52041
206543,16330,13487,2019-07-10 20:30:04,NEWISH-Ford-F150,2014,28671,80201
206487,16330,25537,2019-07-10 20:42:04,NEWISH-Ford-F150,2015,32648,68781
206500,16330,23574,2019-07-10 20:56:04,NEWISH-Ford-F150,2015,34683,63132
206527,16330,27129,2019-07-10 21:12:04,NEWISH-Ford-F150,2014,32072,77843
206530,16330,27536,2019-07-10 21:30:04,NEWISH-Ford-F150,2015,30918,62663


In [84]:
if is_campaign_active(hrnn_metadata_arn):
    for i in range(10):
        user_id     = str(users_to_try.iloc[i]['USER_ID'])
        fav         = users_to_try.iloc[i]['FAV']
        fav_cluster = users_to_try.iloc[i]['FAV_CLUSTER']

        print('Getting recommendations for user: {}, who likes: {}'.format(user_id, fav))
        response = personalize_runtime.get_recommendations(campaignArn=hrnn_metadata_arn, 
                                                           userId=user_id, 
                                                           numResults=10)
        match = 0
        actual_num_results = len(items)

        items = response['itemList']
        for item in items:
            _curr_item_id  = int(item['itemId'])
            _curr_cluster  = int_expanded_df[int_expanded_df.ITEM_ID == _curr_item_id].iloc[0]['FAV_CLUSTER']
            if fav_cluster == _curr_cluster:
                match += 1
            print_item(_curr_item_id)
        print('Matched {:.0f}% ({}/{})'.format(100 * match/actual_num_results, match, actual_num_results))
        print('')
else:
    print('HRNN-metadata campaign not active: {}'.format(hrnn_metadata_arn))

Getting recommendations for user: 16330, who likes: NEWISH-Ford-F150
Id: 28300, Make: Ford, Model: F150, Fav: NEWISH-Ford-F150, Year: 2015
Id: 23667, Make: Ford, Model: F150, Fav: NEWISH-Ford-F150, Year: 2014
Id: 24603, Make: Ford, Model: F150, Fav: NEWISH-Ford-F150, Year: 2015
Id: 26218, Make: Ford, Model: F150, Fav: NEWISH-Ford-F150, Year: 2015
Id: 23504, Make: Ford, Model: F150, Fav: NEWISH-Ford-F150, Year: 2015
Id: 25537, Make: Ford, Model: F150, Fav: NEWISH-Ford-F150, Year: 2015
Id: 22832, Make: Ford, Model: F150, Fav: NEWISH-Ford-F150, Year: 2015
Id: 29796, Make: Ford, Model: F150, Fav: NEWISH-Ford-F150, Year: 2015
Id: 29352, Make: Ford, Model: F150, Fav: NEWISH-Ford-F150, Year: 2015
Id: 27536, Make: Ford, Model: F150, Fav: NEWISH-Ford-F150, Year: 2015
Matched 100% (10/10)

Getting recommendations for user: 17005, who likes: NEWISH-Buick-Regal
Id: 20047, Make: Buick, Model: Regal, Fav: NEWISH-Buick-Regal, Year: 2015
Id: 28006, Make: Buick, Model: Regal, Fav: NEWISH-Buick-Regal, Y

## Exercise the popularity campaign
Personalize provides a baseline recommender which leverages simple popularity of an item. 
Here we will
compare its results with our own definition of "popular". 

Our popularity is driven simply by total count of
interactions for that item. We expect significant overlap between our list and the one from Personalize.

#### First let's get the results from Personalize

In [85]:
personalized_pop = []
pop_items = []
NUM_MOST_POPULAR = 10

if is_campaign_active(pop_arn):
    popularity_response = personalize_runtime.get_recommendations(campaignArn=pop_arn, 
                                                                  userId='0', 
                                                                  numResults=NUM_MOST_POPULAR)
    pop_items = popularity_response['itemList']
    for item in pop_items:
        print_item(int(item['itemId']))    
else:
    print('Popularity campaign not active: {}'.format(pop_arn))

for p in pop_items:
    personalized_pop.append(str(p['itemId']))

Id: 25714, Make: Nissan, Model: Rogue, Fav: NEWISH-Nissan-Rogue, Year: 2015
Id: 24588, Make: Hyundai, Model: Santa Fe, Fav: NEWISH-Hyundai-Santa Fe, Year: 2014
Id: 23545, Make: BMW, Model: 430i, Fav: OLDISH-BMW-430i, Year: 2011
Id: 24366, Make: Nissan, Model: Rogue, Fav: OLDISH-Nissan-Rogue, Year: 2013
Id: 25012, Make: Nissan, Model: Rogue, Fav: NEWISH-Nissan-Rogue, Year: 2014
Id: 23735, Make: Nissan, Model: Rogue, Fav: NEWISH-Nissan-Rogue, Year: 2014
Id: 23649, Make: Chrysler, Model: Pacifica, Fav: OLDISH-Chrysler-Pacifica, Year: 2012
Id: 23791, Make: Ford, Model: Mustang, Fav: NEWISH-Ford-Mustang, Year: 2016
Id: 25812, Make: Chevrolet, Model: Tahoe, Fav: OLDISH-Chevrolet-Tahoe, Year: 2012
Id: 25104, Make: Nissan, Model: Rogue, Fav: NEWISH-Nissan-Rogue, Year: 2016


#### Now let's get the actual popularity counts of the historical interactions

In [86]:
most_popular = pd.DataFrame(int_expanded_df['ITEM_ID'].value_counts().reset_index())
most_popular.drop(['ITEM_ID'], axis=1, inplace=True)
ten_most_popular = most_popular.head(10)

#### Now compare the two lists

In [87]:
if is_campaign_active(pop_arn):
    print('We asked Personalize for {} most popular.'.format(NUM_MOST_POPULAR))
    print('{}'.format(personalized_pop))
    print('We computed it ourselves also')
    print(ten_most_popular['index'])

    overlap_items     = ten_most_popular[ten_most_popular['index'].isin(personalized_pop)]
    overlap_count     = overlap_items.shape[0]
    not_overlap_items = ten_most_popular[~ten_most_popular['index'].isin(personalized_pop)]
    not_overlap_count = not_overlap_items.shape[0]
    
    print('\nOf the actual most popular, {} are selected by Personalize also.'.format(overlap_items))
    print('\nPersonalize did not think this list was truly top 10:'.format(not_overlap_count))
    print(not_overlap_items.head())

We asked Personalize for 10 most popular.
['25714', '24588', '23545', '24366', '25012', '23735', '23649', '23791', '25812', '25104']
We computed it ourselves also
0    25714
1    26377
2    25104
3    24588
4    25589
5    23791
6    24366
7    25012
8    26299
9    23735
Name: index, dtype: int64

Of the actual most popular,    index
0  25714
2  25104
3  24588
5  23791
6  24366
7  25012
9  23735 are selected by Personalize also.

Personalize did not think this list was truly top 10:
   index
1  26377
4  25589
8  26299


## Use real time events
Here we use the event tracker mechanism of personalize to add some events on the fly after deployment of 
a campaign. We then show the impact on the recommendations.

In [88]:
def is_tracker_active(tracker_name):
    _is_active = False
    _event_tracker_arn = ''
    _tracking_id = ''

    resp = personalize.list_event_trackers()
    trackers = resp['eventTrackers']

    for t in trackers:
        if t['name'] == tracker_name:
            _is_active = True
            _event_tracker_arn = t['eventTrackerArn']
            d_resp = personalize.describe_event_tracker(eventTrackerArn = _event_tracker_arn)

            _tracking_id = d_resp['eventTracker']['trackingId']
    
    return _is_active, _event_tracker_arn, _tracking_id

In [89]:
(exists, tracker_arn, tracking_id) = is_tracker_active('CarClickTracker')
if not exists:
    response = personalize.create_event_tracker(
        name='CarClickTracker',
        datasetGroupArn=dg_arn
    )
    print(response['eventTrackerArn'])
    print(response['trackingId'])

    TRACKING_ID = response['trackingId']
else:
    TRACKING_ID = tracking_id

In [90]:
session_dict = {}

In [91]:
import uuid

def send_car_click(user_id, item_id, ts):
    """
    Simulates a click to send an event to Amazon Personalize's Event Tracker
    """
    # Configure Session
    try:
        session_ID = session_dict[user_id]
    except:
        session_dict[user_id] = str(uuid.uuid1())
        session_ID = session_dict[user_id]
        
    # Configure Properties:
    event = {
        'itemId': str(item_id)
    }
    event_json = json.dumps(event)
        
    # Make Call
    personalize_events.put_events(
        trackingId = TRACKING_ID,
        userId     = str(user_id),
        sessionId  = session_ID,
        eventList  = [{
            'sentAt': ts,
            'eventType': 'EVENT_TYPE',
            'properties': event_json
            }]
)

In [92]:
def send_car_clicks(user_id, items):
    # TODO: send all events in a single array instead of one call for each item
    i = 0
    for item in items:
        send_car_click(user_id, item, time.time())
        i += 1

In [93]:
def recommend_cars(user_id, campaign_arn):
    response = personalize_runtime.get_recommendations(campaignArn=campaign_arn, 
                                                       userId=str(user_id), 
                                                       numResults=10)
    items = response['itemList']
    for item in items:
        print_item(int(item['itemId']))
    print('')

In [94]:
sample_user = int_expanded_df.sample(1).iloc[0]['USER_ID']
sample_user_cluster = int_expanded_df[int_expanded_df.USER_ID == sample_user].iloc[0]['FAV_CLUSTER']
sample_user_fav = int_expanded_df[int_expanded_df.USER_ID == sample_user].iloc[0]['FAV']
print('Here is a sample user for exercising real time events:\n')
print('  user: {}, cluster: {}, fav: {}'.format(sample_user, sample_user_cluster, sample_user_fav))

Here is a sample user for exercising real time events:

  user: 11035, cluster: 57, fav: OLDISH-Buick-Regal


In [95]:
new_cluster = sample_user_cluster + 1
if (new_cluster == NUM_CLUSTERS):
    new_cluster = 0
new_fav = int_expanded_df[int_expanded_df.FAV_CLUSTER == new_cluster].iloc[0]['FAV']
print('Now, we pick a new preferred set of cars that we will use when sending events:\n')
print('  new cluster: {}, new fav: {}'.format(new_cluster, new_fav))

Now, we pick a new preferred set of cars that we will use when sending events:

  new cluster: 58, new fav: NEWISH-Hyundai-Santa Fe


In [96]:
print('Before any real time events, Personalize should recommend {} cars...\n'.format(sample_user_fav))

if is_campaign_active(hrnn_arn):
    print('First using {}\n'.format(hrnn_arn))
    recommend_cars(sample_user, hrnn_arn)
else:
    print('HRNN campaign not active: {}'.format(hrnn_arn))

if is_campaign_active(hrnn_metadata_arn):
    print('Next using {}\n'.format(hrnn_metadata_arn))
    recommend_cars(sample_user, hrnn_arn)
else:
    print('HRNN-metadata campaign not active: {}'.format(hrnn_metadata_arn))

Before any real time events, Personalize should recommend OLDISH-Buick-Regal cars...

First using arn:aws:personalize:us-east-1:355151823911:campaign/car-hrnn

Id: 25423, Make: Buick, Model: Regal, Fav: OLDISH-Buick-Regal, Year: 2012
Id: 23990, Make: Buick, Model: Regal, Fav: OLDISH-Buick-Regal, Year: 2009
Id: 25286, Make: Buick, Model: Regal, Fav: OLDISH-Buick-Regal, Year: 2007
Id: 25218, Make: Buick, Model: Regal, Fav: OLDISH-Buick-Regal, Year: 2013
Id: 26524, Make: Buick, Model: Regal, Fav: OLDISH-Buick-Regal, Year: 2013
Id: 24760, Make: Buick, Model: Regal, Fav: OLDISH-Buick-Regal, Year: 2012
Id: 25151, Make: Buick, Model: Regal, Fav: OLDISH-Buick-Regal, Year: 2013
Id: 24195, Make: Buick, Model: Regal, Fav: OLDISH-Buick-Regal, Year: 2012
Id: 26163, Make: Buick, Model: Regal, Fav: OLDISH-Buick-Regal, Year: 2012
Id: 26037, Make: Buick, Model: Regal, Fav: OLDISH-Buick-Regal, Year: 2013

Next using arn:aws:personalize:us-east-1:355151823911:campaign/car-hrnn-metadata

Id: 25423, Make: 

In [97]:
new_car_cluster = int_expanded_df[int_expanded_df.FAV_CLUSTER == new_cluster].sample(100)
new_car_cluster[['FAV','ITEM_ID','YEAR','PRICE']].head(3)

Unnamed: 0,FAV,ITEM_ID,YEAR,PRICE
600670,NEWISH-Hyundai-Santa Fe,25154,2015,33144
493022,NEWISH-Hyundai-Santa Fe,24548,2015,35434
597732,NEWISH-Hyundai-Santa Fe,22639,2015,36570


In [98]:
new_items_clicked = new_car_cluster['ITEM_ID'].values
new_items_clicked

array([25154, 24548, 22639, 24017, 34693, 26981, 28407, 24079, 29705,
       31756, 21275, 30665, 30679, 34563, 16610, 30810, 25881, 30273,
       28192, 21073, 24364, 23093, 14950, 12734, 22127, 25474, 16611,
       27318, 22327, 23281, 31461, 28533, 30257, 17369, 33432, 29030,
       22759, 28192, 23281, 23533, 28606, 23229, 26557, 21620, 33414,
       22631, 18217, 19827, 21727, 23281, 29336, 35398, 19630, 21805,
       29166, 24548, 17706, 25330, 20842, 22327, 23921, 23669, 23494,
       28472, 24016, 16142, 29660, 30940, 21095, 30487, 27974, 32078,
       24017, 25474, 32074, 22466, 26543, 19491, 24017, 19280, 33293,
       36745, 32901, 18677, 20123, 26717, 25330, 24028, 25758, 24710,
       23533, 23281, 15145, 22631, 22172, 17691, 29336, 23333, 20148,
       21477])

In [99]:
send_car_clicks(sample_user, new_items_clicked)

In [100]:
int_expanded_df[int_expanded_df.USER_ID == sample_user]['FAV'].value_counts()

OLDISH-Buick-Regal    10
Name: FAV, dtype: int64

In [101]:
print('Now this same user has started to like {} cars.'.format(new_fav))
print('Lets see if Personalize picks up on this real time change in intent...\n')

if is_campaign_active(hrnn_arn):
    print('First using {}\n'.format(hrnn_arn))
    recommend_cars(sample_user, hrnn_arn)
else:
    print('HRNN campaign not active: {}'.format(hrnn_arn))

if is_campaign_active(hrnn_metadata_arn):
    print('Next using {}\n'.format(hrnn_metadata_arn))
    recommend_cars(sample_user, hrnn_arn)
else:
    print('HRNN-metadata campaign not active: {}'.format(hrnn_metadata_arn))

Now this same user has started to like NEWISH-Hyundai-Santa Fe cars.
Lets see if Personalize picks up on this real time change in intent...

First using arn:aws:personalize:us-east-1:355151823911:campaign/car-hrnn

Id: 25342, Make: Hyundai, Model: Santa Fe, Fav: NEWISH-Hyundai-Santa Fe, Year: 2014
Id: 25696, Make: Hyundai, Model: Santa Fe, Fav: NEWISH-Hyundai-Santa Fe, Year: 2016
Id: 28067, Make: Hyundai, Model: Santa Fe, Fav: NEWISH-Hyundai-Santa Fe, Year: 2015
Id: 24364, Make: Hyundai, Model: Santa Fe, Fav: NEWISH-Hyundai-Santa Fe, Year: 2015
Id: 25758, Make: Hyundai, Model: Santa Fe, Fav: NEWISH-Hyundai-Santa Fe, Year: 2016
Id: 29778, Make: Hyundai, Model: Santa Fe, Fav: NEWISH-Hyundai-Santa Fe, Year: 2015
Id: 22816, Make: Hyundai, Model: Santa Fe, Fav: NEWISH-Hyundai-Santa Fe, Year: 2015
Id: 23494, Make: Hyundai, Model: Santa Fe, Fav: NEWISH-Hyundai-Santa Fe, Year: 2015
Id: 23565, Make: Hyundai, Model: Santa Fe, Fav: NEWISH-Hyundai-Santa Fe, Year: 2014
Id: 27710, Make: Hyundai, Mod