# Using Personalize campaigns  on synthetic cars data
This notebook exercises campaigns that have been built in the other notebooks. The intent is to demonstrate
that the results coming back from requests for recommendations are accurate given the dataset we provided.

There are specific sections for each of the following models:

1. [Personalized Ranking](#Exercise-the-personalized-ranking-campaign)
2. [HRNN](#Exercise-the-hrnn-campaign)
3. [SIMS](#Exercise-the-SIMS-campaign)
4. [HRNN-Metadata]()
5. [Popularity Count](#Exercise-the-popularity-campaign)

In addition, we have a section for experimenting with 
[Personalize Event Tracker](#Use-real-time-events).

## Imports, overall settings, initialization

In [168]:
account_num = '<your-account-num>'

In [169]:
import json
import boto3
import time
import datetime
import pandas as pd
from sklearn.utils import shuffle

region   = boto3.Session().region_name # or replace with your preferred region
print(region)

dataset_group_name = 'car-dg'

dg_arn = 'arn:aws:personalize:{}:{}:dataset-group/{}'.format(region, 
                                                             account_num, 
                                                             dataset_group_name)

cars_filename         = 'car_items.csv'
users_filename        = 'users.csv'
interactions_filename = 'interactions.csv'
int_exp_filename      = 'interactions_expanded.csv'

ranking_arn           = 'arn:aws:personalize:{}:{}:campaign/car-personalized-ranking'.format(region, account_num)
sims_arn              = 'arn:aws:personalize:{}:{}:campaign/car-sims'.format(region, account_num)
hrnn_arn              = 'arn:aws:personalize:{}:{}:campaign/car-hrnn'.format(region, account_num)
hrnn_metadata_arn     = 'arn:aws:personalize:{}:{}:campaign/car-hrnn-metadata'.format(region, account_num)
pop_arn               = 'arn:aws:personalize:{}:{}:campaign/car-popularity-count'.format(region, account_num)

us-east-1


In [170]:
personalize           = boto3.client('personalize')
personalize_runtime   = boto3.client('personalize-runtime')
personalize_events    = boto3.client('personalize-events')

In [171]:
def show_item_interaction_history(int_df, item_id):
    _tmp_df = int_df[int_df.ITEM_ID == item_id].sort_values('TIMESTAMP')
    print(_tmp_df.shape)
    return _tmp_df[['USER_ID','ITEM_ID','WHEN',
                    'FAV','YEAR','GENDER','SALARY']]

In [172]:
def show_user_interaction_history(int_df, user_id):
    _tmp_df = int_df[int_df.USER_ID == int(user_id)].sort_values('TIMESTAMP')
    print(_tmp_df.shape)
    return _tmp_df[['USER_ID','ITEM_ID','WHEN',
                    'FAV','YEAR','PRICE','MILEAGE']]

In [173]:
def date_to_string(ts):
    return datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S')

In [174]:
def is_campaign_active(c):
    _is_active = False
    
    try:
        _resp = personalize.describe_campaign(campaignArn = c)
        _campaign_status = _resp['campaign']['status']
        if _campaign_status == 'ACTIVE':
            _is_active = True
    except Exception as e:
        pass
        
    return _is_active

In [175]:
int_expanded_df = pd.read_csv(int_exp_filename)

int_expanded_df['WHEN'] = int_expanded_df['TIMESTAMP'].apply(date_to_string)

NUM_CLUSTERS = len(int_expanded_df.FAV_CLUSTER.value_counts())
print('{} clusters'.format(NUM_CLUSTERS))

100 clusters


In [176]:
items_to_rank = int_expanded_df.sample(10)
items_to_rank.head(3)

Unnamed: 0,USER_ID,ITEM_ID,TIMESTAMP,SESSION_ID,MAKE,MODEL,YEAR,MILEAGE,PRICE,AGE,GENDER,LOCATION,SALARY,FAV_CLUSTER,FAV_MODEL,FAV,HOURS_AGO,WHEN
262107,28041,19781,1563005830,9739,Hyundai,Elantra,2012,105945,27137,33,FEMALE,90044,22091,71,35,OLDISH-Hyundai-Elantra,99.952222,2019-07-13 08:17:10
18780,23295,26304,1563159215,3470,Buick,Regal,2016,52177,35670,37,FEMALE,75217,31532,56,28,NEWISH-Buick-Regal,57.345278,2019-07-15 02:53:35
726700,15499,31119,1563188328,16836,Volvo,V60,2013,92456,30720,44,FEMALE,95076,45354,81,40,OLDISH-Volvo-V60,49.258333,2019-07-15 10:58:48


In [177]:
def print_item(item_id):
    tmp = int_expanded_df[int_expanded_df.ITEM_ID == item_id].iloc[0]
    print('Id: {}, Make: {}, Model: {}, Fav: {}, Year: {}'.format(item_id,
         tmp['MAKE'], tmp['MODEL'], tmp['FAV'], tmp['YEAR']))

Skip ahead to try out various campaigns:

1. [Personalized Ranking](#Exercise-the-personalized-ranking-campaign)
2. [HRNN](#Exercise-the-hrnn-campaign)
3. [SIMS](#Exercise-the-SIMS-campaign)
4. [HRNN-Metadata]()
5. [Popularity Count](#Exercise-the-popularity-campaign)

In addition, we have a section for experimenting with 
[Personalize Event Tracker](#Use-real-time-events).

## Exercise the Personalized Ranking campaign
Here we want to see Personalize re-rank a set of search results. For our sample, we will pass
a user that likes oldish cars and would expect oldish cars to appear closer to the top. Likewise, we will
pass a user that likes newish cars and expect the higher ranked cars to be newish.

In [178]:
full_df = pd.DataFrame(columns=['USER_ID','FAV','FAV_CLUSTER'])

for i in range(NUM_CLUSTERS):
    tmp_df  = int_expanded_df[int_expanded_df.FAV_CLUSTER == i][['USER_ID','FAV','FAV_CLUSTER']].sample(1)
    full_df = pd.concat([full_df, tmp_df])
ranking_user_df = shuffle(full_df)
ranking_user_list = ranking_user_df['USER_ID'].values.astype(str).tolist()
ranking_user_df.head(NUM_CLUSTERS)

Unnamed: 0,USER_ID,FAV,FAV_CLUSTER
89194,28866,NEWISH-Nissan-Rogue,42
370228,18249,NEWISH-Volkswagen-Golf,28
502065,15654,OLDISH-Volvo-240,27
578268,17130,OLDISH-Chrysler-300,21
267395,15490,NEWISH-Ford-Fusion,72
431092,14014,NEWISH-Jeep-Grand Cherokee,86
747497,17899,NEWISH-Jeep-Cherokee,8
716625,28187,OLDISH-Toyota-Corrola,41
176580,12929,OLDISH-Toyota-Prius,79
573979,11671,OLDISH-GMC-Yukon,25


In [179]:
full_df = pd.DataFrame(columns=['ITEM_ID','FAV'])

for i in range(NUM_CLUSTERS):
    tmp_df  = int_expanded_df[int_expanded_df.FAV_CLUSTER == i][['ITEM_ID','FAV']].sample(1)
    full_df = pd.concat([full_df, tmp_df])
ranking_item_df = shuffle(full_df)
ranking_item_list = ranking_item_df['ITEM_ID'].values.astype(str).tolist()
ranking_item_df.head()

Unnamed: 0,ITEM_ID,FAV
628195,16190,NEWISH-Honda-Civic
723071,23634,NEWISH-Volkswagen-Jetta
734270,24797,OLDISH-Buick-LaCrosse
686353,28972,OLDISH-Ford-Escape
749472,17473,OLDISH-Chevrolet-Traverse


In [180]:
def print_ranking_target_df(user_id, input_df, target_cluster):
    print('\nRanking for user: {}'.format(user_id))
    
    _input_list = input_df['ITEM_ID'].values.astype(str).tolist()
    
    personalized_ranking_response = personalize_runtime.get_personalized_ranking(
        campaignArn = ranking_arn, userId = str(user_id), inputList = _input_list)
    
    i = 0
    _rank = len(_input_list)
    for item in personalized_ranking_response['personalizedRanking']:
        item_id = item['itemId']
        tmp = int_expanded_df[int_expanded_df.ITEM_ID == int(item_id)].iloc[0]
        _fav_cluster = tmp['FAV_CLUSTER']
        if (target_cluster == _fav_cluster) & (_rank == len(_input_list)):
            _rank = i
        print('Id: {}, Make: {}, Model: {}, Year: {}, Fav: {}'.format(item_id,
         tmp['MAKE'], tmp['MODEL'], tmp['YEAR'], tmp['FAV']))
        i += 1
    return _rank

In [181]:
full_df = pd.DataFrame(columns=['ITEM_ID','FAV','FAV_CLUSTER'])

random_item_df   = shuffle(int_expanded_df[['ITEM_ID','FAV', 'FAV_CLUSTER']].sample(25))
random_item_list = random_item_df['ITEM_ID'].values.astype(str).tolist()
random_item_df.head()

Unnamed: 0,ITEM_ID,FAV,FAV_CLUSTER
378109,21597,NEWISH-Chevrolet-Silverado,34
134552,19763,NEWISH-Chevrolet-Suburban,44
485368,24122,OLDISH-Ford-Fusion,73
12230,25523,OLDISH-Nissan-Rogue,43
133173,25899,NEWISH-Chevrolet-Suburban,44


#### Try personalized ranking on a curated set of items with each car cluster covered
Here we take a curated set of items, with one item for each car cluster. Personalize
should be able to re-rank in such a way that the specific item that would best match
the user rises to the 0th position.

In [182]:
if is_campaign_active(ranking_arn):
    rank_total = 0
    for i in range(NUM_CLUSTERS):
        user_fav = ranking_user_df.iloc[i]['FAV']
        user_fav_cluster = ranking_user_df.iloc[i]['FAV_CLUSTER']
        print('\nRanking for user that prefers: {}'.format(user_fav))
        rank = print_ranking_target_df(ranking_user_list[i], ranking_item_df, user_fav_cluster)
        if (random_item_df.shape[0] == rank):
            print('**desired cluster was not found in the item set')
            rank = 0 # reset to not penalize when no item was available
        else:
            print('**rank {}'.format(rank))
        rank_total += rank

    print('\nRank average: {:.2f}'.format(rank_total/len(ranking_item_list)))
else:
    print('Personalized ranking campaign not active: {}'.format(ranking_arn))


Ranking for user that prefers: NEWISH-Nissan-Rogue

Ranking for user: 28866
Id: 20256, Make: Ford, Model: Edge, Year: 2009, Fav: OLDISH-Ford-Edge
Id: 27047, Make: Toyota, Model: Corrola, Year: 2015, Fav: NEWISH-Toyota-Corrola
Id: 18210, Make: Jeep, Model: Grand Cherokee, Year: 2014, Fav: NEWISH-Jeep-Grand Cherokee
Id: 22385, Make: Chrysler, Model: Pacifica, Year: 2014, Fav: NEWISH-Chrysler-Pacifica
Id: 23268, Make: Chevrolet, Model: Suburban, Year: 2012, Fav: OLDISH-Chevrolet-Suburban
Id: 24364, Make: Hyundai, Model: Santa Fe, Year: 2015, Fav: NEWISH-Hyundai-Santa Fe
Id: 29286, Make: Toyota, Model: Corrola, Year: 2010, Fav: OLDISH-Toyota-Corrola
Id: 23768, Make: Ford, Model: Mustang, Year: 2014, Fav: NEWISH-Ford-Mustang
Id: 18592, Make: Toyota, Model: Sienna, Year: 2016, Fav: NEWISH-Toyota-Sienna
Id: 26112, Make: Chrysler, Model: Pacifica, Year: 2009, Fav: OLDISH-Chrysler-Pacifica
Id: 29351, Make: Ford, Model: Explorer, Year: 2012, Fav: OLDISH-Ford-Explorer
Id: 24211, Make: Nissan, Mo

## Exercise the hrnn campaign
Here we try out the hrnn campaign. We ask Personalize for recommendations for a particular user. Our hope is that
it would detect that this user likes old or new cars and would return a list accordingly. 
Best case is that the recommended list of cars entirely matches the user's preferred car
cluster.

In [183]:
users_to_try = int_expanded_df.sample(5)
users_to_try[['USER_ID','FAV']].head(3)

Unnamed: 0,USER_ID,FAV
528574,3070,OLDISH-Chrysler-200
245072,11097,OLDISH-Dodge-Ram
677878,7897,OLDISH-BMW-430i


In [184]:
show_user_interaction_history(int_expanded_df, users_to_try.iloc[0]['USER_ID'])

(20, 18)


Unnamed: 0,USER_ID,ITEM_ID,WHEN,FAV,YEAR,PRICE,MILEAGE
528574,3070,22038,2019-07-10 14:00:55,OLDISH-Chrysler-200,2012,27869,110288
528573,3070,18703,2019-07-10 14:02:55,OLDISH-Chrysler-200,2011,17192,127144
528564,3070,18983,2019-07-10 14:06:55,OLDISH-Chrysler-200,2012,23548,110023
528567,3070,24068,2019-07-10 14:12:55,OLDISH-Chrysler-200,2010,17556,135164
528560,3070,31364,2019-07-10 14:20:55,OLDISH-Chrysler-200,2011,21037,122204
528570,3070,25834,2019-07-10 14:30:55,OLDISH-Chrysler-200,2012,22256,105244
528579,3070,11546,2019-07-10 14:42:55,OLDISH-Chrysler-200,2011,21494,120052
528576,3070,20779,2019-07-10 14:56:55,OLDISH-Chrysler-200,2012,25275,105894
528563,3070,25795,2019-07-10 15:12:55,OLDISH-Chrysler-200,2012,23489,109328
528562,3070,28626,2019-07-10 15:30:55,OLDISH-Chrysler-200,2011,19351,126405


In [185]:
if is_campaign_active(hrnn_arn):
    for i in range(5):
        user_id     = str(users_to_try.iloc[i]['USER_ID'])
        fav         = users_to_try.iloc[i]['FAV']
        fav_cluster = users_to_try.iloc[i]['FAV_CLUSTER']

        print('Getting recommendations for user: {}, who likes: {}'.format(user_id, fav))
        response = personalize_runtime.get_recommendations(campaignArn=hrnn_arn, 
                                                           userId=user_id, 
                                                           numResults=10)
        items = response['itemList']

        match = 0
        actual_num_results = len(items)

        for item in items:
            _curr_item_id  = int(item['itemId'])
            _curr_cluster  = int_expanded_df[int_expanded_df.ITEM_ID == _curr_item_id].iloc[0]['FAV_CLUSTER']
            if fav_cluster == _curr_cluster:
                match += 1
            print_item(_curr_item_id)
        print('Matched {:.0f}% ({}/{})'.format(100 * match/actual_num_results, match, actual_num_results))
        print('')
else:
    print('HRNN campaign not active: {}'.format(hrnn_arn))

Getting recommendations for user: 3070, who likes: OLDISH-Chrysler-200
Id: 29140, Make: Chrysler, Model: 200, Fav: OLDISH-Chrysler-200, Year: 2012
Id: 25002, Make: Chrysler, Model: 200, Fav: OLDISH-Chrysler-200, Year: 2013
Id: 21045, Make: Chrysler, Model: 200, Fav: OLDISH-Chrysler-200, Year: 2012
Id: 23948, Make: Chrysler, Model: 200, Fav: OLDISH-Chrysler-200, Year: 2010
Id: 22926, Make: Chrysler, Model: 200, Fav: OLDISH-Chrysler-200, Year: 2012
Id: 22616, Make: Chrysler, Model: 200, Fav: OLDISH-Chrysler-200, Year: 2013
Id: 29709, Make: Chrysler, Model: 200, Fav: OLDISH-Chrysler-200, Year: 2010
Id: 23457, Make: Chrysler, Model: 200, Fav: OLDISH-Chrysler-200, Year: 2010
Id: 28747, Make: Chrysler, Model: 200, Fav: OLDISH-Chrysler-200, Year: 2011
Id: 24914, Make: Chrysler, Model: 200, Fav: OLDISH-Chrysler-200, Year: 2012
Matched 100% (10/10)

Getting recommendations for user: 11097, who likes: OLDISH-Dodge-Ram
Id: 28216, Make: Dodge, Model: Ram, Fav: OLDISH-Dodge-Ram, Year: 2012
Id: 2831

## Exercise the SIMS campaign
Here we experiment with the SIMS campaign. We loop through a list of items that have at 
least some interactions historically. 
For each car, we would expect similar cars to be similar in age, make and model.
We leverage car clusters and would like to see Personalize generate a list of similar cars
that entirely come from the same car cluster.

In [186]:
items_to_try = int_expanded_df.sample(5)
items_to_try[['ITEM_ID','FAV']].head(5)

Unnamed: 0,ITEM_ID,FAV
680854,24547,OLDISH-GMC-Acadia
694075,29030,NEWISH-Hyundai-Santa Fe
217670,24930,OLDISH-Ford-F150
135555,22067,NEWISH-Chevrolet-Suburban
634994,28493,NEWISH-Ford-Edge


In [187]:
if is_campaign_active(sims_arn):
    desired_num_results = 10
    for i in range(items_to_try.shape[0]):
        item_id     = str(items_to_try.iloc[i]['ITEM_ID'])
        fav         = items_to_try.iloc[i]['FAV']
        fav_cluster = items_to_try.iloc[i]['FAV_CLUSTER']

        print('Getting items similar to: {}, which is a: {}'.format(item_id, fav))
        response = personalize_runtime.get_recommendations(campaignArn=sims_arn, 
                                                           itemId=item_id, 
                                                           numResults=desired_num_results)
        items = response['itemList']
        
        match = 0
        actual_num_results = len(items)
        
        for item in items:
            _curr_item_id  = int(item['itemId'])
            _curr_cluster  = int_expanded_df[int_expanded_df.ITEM_ID == _curr_item_id].iloc[0]['FAV_CLUSTER']
            if fav_cluster == _curr_cluster:
                match += 1
            print_item(_curr_item_id)
        print('Matched {:.0f}% ({}/{})'.format(100 * match/actual_num_results, match, actual_num_results))
        print('')
else:
    print('SIMS campaign not active: {}'.format(sims_arn))

Getting items similar to: 24547, which is a: OLDISH-GMC-Acadia
Id: 28608, Make: GMC, Model: Acadia, Fav: OLDISH-GMC-Acadia, Year: 2013
Id: 24170, Make: GMC, Model: Acadia, Fav: OLDISH-GMC-Acadia, Year: 2013
Id: 26221, Make: GMC, Model: Acadia, Fav: OLDISH-GMC-Acadia, Year: 2013
Id: 26130, Make: GMC, Model: Acadia, Fav: OLDISH-GMC-Acadia, Year: 2012
Id: 26257, Make: GMC, Model: Acadia, Fav: OLDISH-GMC-Acadia, Year: 2012
Id: 24438, Make: GMC, Model: Acadia, Fav: OLDISH-GMC-Acadia, Year: 2012
Id: 19456, Make: GMC, Model: Acadia, Fav: OLDISH-GMC-Acadia, Year: 2013
Id: 24218, Make: GMC, Model: Acadia, Fav: OLDISH-GMC-Acadia, Year: 2010
Id: 24672, Make: GMC, Model: Acadia, Fav: OLDISH-GMC-Acadia, Year: 2013
Id: 29514, Make: GMC, Model: Acadia, Fav: OLDISH-GMC-Acadia, Year: 2012
Matched 100% (10/10)

Getting items similar to: 29030, which is a: NEWISH-Hyundai-Santa Fe
Id: 24364, Make: Hyundai, Model: Santa Fe, Fav: NEWISH-Hyundai-Santa Fe, Year: 2015
Id: 23005, Make: Hyundai, Model: Santa Fe,

## Exercise the hrnn-metadata campaign
Here we try out the hrnn-metadata campaign. 
We ask Personalize for recommendations for a particular user. Our hope is that
it would detect that this user likes old or new cars and would return a list accordingly.

In [188]:
users_to_try = int_expanded_df.sample(10)
users_to_try[['USER_ID','FAV']].head(3)

Unnamed: 0,USER_ID,FAV
241113,21855,OLDISH-Ford-F150
460002,9391,OLDISH-Ford-Explorer
66668,22270,NEWISH-Chevrolet-Tahoe


In [189]:
show_user_interaction_history(int_expanded_df, users_to_try.iloc[0]['USER_ID'])

(10, 18)


Unnamed: 0,USER_ID,ITEM_ID,WHEN,FAV,YEAR,PRICE,MILEAGE
241118,21855,19123,2019-07-15 05:12:25,OLDISH-Ford-F150,2013,25437,94331
241111,21855,33360,2019-07-15 05:14:25,OLDISH-Ford-F150,2012,23486,107885
241114,21855,30046,2019-07-15 05:18:25,OLDISH-Ford-F150,2012,26776,108834
241119,21855,26156,2019-07-15 05:24:25,OLDISH-Ford-F150,2011,19504,128913
241116,21855,25088,2019-07-15 05:32:25,OLDISH-Ford-F150,2012,26156,110343
241110,21855,27339,2019-07-15 05:42:25,OLDISH-Ford-F150,2013,24430,90904
241115,21855,17932,2019-07-15 05:54:25,OLDISH-Ford-F150,2012,23657,109671
241112,21855,31424,2019-07-15 06:08:25,OLDISH-Ford-F150,2008,16511,166766
241113,21855,27125,2019-07-15 06:24:25,OLDISH-Ford-F150,2013,26761,95914
241117,21855,21221,2019-07-15 06:42:25,OLDISH-Ford-F150,2013,25735,98633


In [190]:
if is_campaign_active(hrnn_metadata_arn):
    for i in range(10):
        user_id     = str(users_to_try.iloc[i]['USER_ID'])
        fav         = users_to_try.iloc[i]['FAV']
        fav_cluster = users_to_try.iloc[i]['FAV_CLUSTER']

        print('Getting recommendations for user: {}, who likes: {}'.format(user_id, fav))
        response = personalize_runtime.get_recommendations(campaignArn=hrnn_metadata_arn, 
                                                           userId=user_id, 
                                                           numResults=10)
        match = 0
        actual_num_results = len(items)

        items = response['itemList']
        for item in items:
            _curr_item_id  = int(item['itemId'])
            _curr_cluster  = int_expanded_df[int_expanded_df.ITEM_ID == _curr_item_id].iloc[0]['FAV_CLUSTER']
            if fav_cluster == _curr_cluster:
                match += 1
            print_item(_curr_item_id)
        print('Matched {:.0f}% ({}/{})'.format(100 * match/actual_num_results, match, actual_num_results))
        print('')
else:
    print('HRNN-metadata campaign not active: {}'.format(hrnn_metadata_arn))

Getting recommendations for user: 21855, who likes: OLDISH-Ford-F150
Id: 25327, Make: Ford, Model: F150, Fav: OLDISH-Ford-F150, Year: 2011
Id: 25282, Make: Ford, Model: F150, Fav: OLDISH-Ford-F150, Year: 2013
Id: 22281, Make: Ford, Model: F150, Fav: OLDISH-Ford-F150, Year: 2011
Id: 26107, Make: Ford, Model: F150, Fav: OLDISH-Ford-F150, Year: 2012
Id: 16589, Make: Ford, Model: F150, Fav: OLDISH-Ford-F150, Year: 2011
Id: 21916, Make: Ford, Model: F150, Fav: OLDISH-Ford-F150, Year: 2011
Id: 33809, Make: Ford, Model: F150, Fav: OLDISH-Ford-F150, Year: 2011
Id: 30250, Make: Ford, Model: F150, Fav: OLDISH-Ford-F150, Year: 2013
Id: 23627, Make: Ford, Model: F150, Fav: OLDISH-Ford-F150, Year: 2011
Id: 23341, Make: Ford, Model: F150, Fav: OLDISH-Ford-F150, Year: 2013
Matched 100% (10/10)

Getting recommendations for user: 9391, who likes: OLDISH-Ford-Explorer
Id: 25024, Make: Ford, Model: Explorer, Fav: OLDISH-Ford-Explorer, Year: 2011
Id: 24582, Make: Ford, Model: Explorer, Fav: OLDISH-Ford-Ex

## Exercise the popularity campaign
Personalize provides a baseline recommender which leverages simple popularity of an item. 
Here we will
compare its results with our own definition of "popular". 

Our popularity is driven simply by total count of
interactions for that item. We expect significant overlap between our list and the one from Personalize.

#### First let's get the results from Personalize

In [191]:
personalized_pop = []
pop_items = []
NUM_MOST_POPULAR = 10

if is_campaign_active(pop_arn):
    popularity_response = personalize_runtime.get_recommendations(campaignArn=pop_arn, 
                                                                  userId='0', 
                                                                  numResults=NUM_MOST_POPULAR)
    pop_items = popularity_response['itemList']
    for item in pop_items:
        print_item(int(item['itemId']))    
else:
    print('Popularity campaign not active: {}'.format(pop_arn))

for p in pop_items:
    personalized_pop.append(str(p['itemId']))

Id: 25714, Make: Nissan, Model: Rogue, Fav: NEWISH-Nissan-Rogue, Year: 2015
Id: 24588, Make: Hyundai, Model: Santa Fe, Fav: NEWISH-Hyundai-Santa Fe, Year: 2014
Id: 23545, Make: BMW, Model: 430i, Fav: OLDISH-BMW-430i, Year: 2011
Id: 24366, Make: Nissan, Model: Rogue, Fav: OLDISH-Nissan-Rogue, Year: 2013
Id: 25012, Make: Nissan, Model: Rogue, Fav: NEWISH-Nissan-Rogue, Year: 2014
Id: 23735, Make: Nissan, Model: Rogue, Fav: NEWISH-Nissan-Rogue, Year: 2014
Id: 23649, Make: Chrysler, Model: Pacifica, Fav: OLDISH-Chrysler-Pacifica, Year: 2012
Id: 23791, Make: Ford, Model: Mustang, Fav: NEWISH-Ford-Mustang, Year: 2016
Id: 25812, Make: Chevrolet, Model: Tahoe, Fav: OLDISH-Chevrolet-Tahoe, Year: 2012
Id: 25104, Make: Nissan, Model: Rogue, Fav: NEWISH-Nissan-Rogue, Year: 2016


#### Now let's get the actual popularity counts of the historical interactions

In [192]:
most_popular = pd.DataFrame(int_expanded_df['ITEM_ID'].value_counts().reset_index())
most_popular.drop(['ITEM_ID'], axis=1, inplace=True)
ten_most_popular = most_popular.head(10)

#### Now compare the two lists

In [193]:
if is_campaign_active(pop_arn):
    print('We asked Personalize for {} most popular.'.format(NUM_MOST_POPULAR))
    print('{}'.format(personalized_pop))
    print('We computed it ourselves also')
    print(ten_most_popular['index'])

    overlap_items     = ten_most_popular[ten_most_popular['index'].isin(personalized_pop)]
    overlap_count     = overlap_items.shape[0]
    not_overlap_items = ten_most_popular[~ten_most_popular['index'].isin(personalized_pop)]
    not_overlap_count = not_overlap_items.shape[0]
    
    print('\nOf the actual most popular, {} are selected by Personalize also.'.format(overlap_items))
    print('\nPersonalize did not think this list was truly top 10:'.format(not_overlap_count))
    print(not_overlap_items.head())

We asked Personalize for 10 most popular.
['25714', '24588', '23545', '24366', '25012', '23735', '23649', '23791', '25812', '25104']
We computed it ourselves also
0    25714
1    26377
2    25104
3    24588
4    25589
5    23791
6    24366
7    25012
8    26299
9    23735
Name: index, dtype: int64

Of the actual most popular,    index
0  25714
2  25104
3  24588
5  23791
6  24366
7  25012
9  23735 are selected by Personalize also.

Personalize did not think this list was truly top 10:
   index
1  26377
4  25589
8  26299


## Use real time events
Here we use the event tracker mechanism of personalize to add some events on the fly after deployment of 
a campaign. We then show the impact on the recommendations, demonstrating that Personalize is able to 
react to changing user preferences on the fly.

In [194]:
def is_tracker_active(tracker_name):
    _is_active = False
    _event_tracker_arn = ''
    _tracking_id = ''

    resp = personalize.list_event_trackers()
    trackers = resp['eventTrackers']

    for t in trackers:
        if t['name'] == tracker_name:
            _is_active = True
            _event_tracker_arn = t['eventTrackerArn']
            d_resp = personalize.describe_event_tracker(eventTrackerArn = _event_tracker_arn)

            _tracking_id = d_resp['eventTracker']['trackingId']
    
    return _is_active, _event_tracker_arn, _tracking_id

In [195]:
(exists, tracker_arn, tracking_id) = is_tracker_active('CarClickTracker')
if not exists:
    response = personalize.create_event_tracker(
        name='CarClickTracker',
        datasetGroupArn=dg_arn
    )
    print(response['eventTrackerArn'])
    print(response['trackingId'])

    TRACKING_ID = response['trackingId']
else:
    TRACKING_ID = tracking_id

In [196]:
import uuid

session_dict = {}

In [197]:
def configure_session(user_id):
    # Configure Session
    try:
        _session_ID = session_dict[user_id]
    except:
        session_dict[user_id] = str(uuid.uuid1())
        _session_ID = session_dict[user_id]
    return _session_ID

In [198]:
def send_one_car_click(user_id, item_id, ts):
    """
    Simulates a click to send an event to Amazon Personalize's Event Tracker
    """
    session_ID = configure_session(user_id)
        
    # Configure Properties:
    event = {
        'itemId': str(item_id)
    }
    event_json = json.dumps(event)
        
    # Make Call
    personalize_events.put_events(
        trackingId = TRACKING_ID,
        userId     = str(user_id),
        sessionId  = session_ID,
        eventList  = [{
            'sentAt': ts,
            'eventType': 'EVENT_TYPE',
            'properties': event_json
            }]
    )

In [199]:
def send_car_clicks(user_id, items):
    _send_as_a_batch = True
    
    if _send_as_a_batch:
        # send items as if they each took a second, having them wrap up with the last one being now
        # so, first back up a number of seconds for the first item.
        _base_ts = time.time() - len(items)
        
        _session_ID = configure_session(user_id)
        
        _event_list  = []
        _max_batch_size = 10
        _i = 0
        for _item_id in items:
            _event_properties = {
                'itemId': str(_item_id)
            }
            _event_json = json.dumps(_event_properties)
            
            _event = {'sentAt': _base_ts + _i,
                      'eventType': 'EVENT_TYPE',
                      'properties': _event_json
                     }
            _event_list.append(_event)

            _i += 1
            if (_i > 1) and ((_i % _max_batch_size) == 0):
                # send along the next 10 events
#                print('Sending batch of {} events: {}\n'.format(len(_event_list), _event_list))
                personalize_events.put_events(
                    trackingId = TRACKING_ID,
                    userId     = str(user_id),
                    sessionId  = _session_ID,
                    eventList  = _event_list
                )
                # reset for the next batch
                _event_list = []


        # send last batch
        if (len(_event_list) > 0):
#            print('Sending batch of {} events: {}\n'.format(len(_event_list), _event_list))
            personalize_events.put_events(
                trackingId = TRACKING_ID,
                userId     = str(user_id),
                sessionId  = _session_ID,
                eventList  = _event_list
            )        

    else: # send one at a time
        i = 0
        for item in items:
            send_one_car_click(user_id, item, time.time())
            i += 1

In [200]:
def recommend_cars(user_id, campaign_arn):
    response = personalize_runtime.get_recommendations(campaignArn=campaign_arn, 
                                                       userId=str(user_id), 
                                                       numResults=10)
    items = response['itemList']
    for item in items:
        print_item(int(item['itemId']))
    print('')

In [201]:
sample_user = int_expanded_df.sample(1).iloc[0]['USER_ID']
sample_user_cluster = int_expanded_df[int_expanded_df.USER_ID == sample_user].iloc[0]['FAV_CLUSTER']
sample_user_fav = int_expanded_df[int_expanded_df.USER_ID == sample_user].iloc[0]['FAV']
print('Here is a sample user for exercising real time events:\n')
print('  user: {}, cluster: {}, fav: {}'.format(sample_user, sample_user_cluster, sample_user_fav))

Here is a sample user for exercising real time events:

  user: 13875, cluster: 54, fav: NEWISH-Chevrolet-Tahoe


In [202]:
new_cluster = sample_user_cluster + 1
if (new_cluster == NUM_CLUSTERS):
    new_cluster = 0
new_fav = int_expanded_df[int_expanded_df.FAV_CLUSTER == new_cluster].iloc[0]['FAV']
print('Now, we pick a new preferred set of cars that we will use when sending events:\n')
print('  new cluster: {}, new fav: {}'.format(new_cluster, new_fav))

Now, we pick a new preferred set of cars that we will use when sending events:

  new cluster: 55, new fav: OLDISH-Chevrolet-Tahoe


In [203]:
print('Before any real time events, Personalize should recommend {} cars...\n'.format(sample_user_fav))

if is_campaign_active(hrnn_arn):
    print('First using {}\n'.format(hrnn_arn))
    recommend_cars(sample_user, hrnn_arn)
else:
    print('HRNN campaign not active: {}'.format(hrnn_arn))

if is_campaign_active(hrnn_metadata_arn):
    print('Next using {}\n'.format(hrnn_metadata_arn))
    recommend_cars(sample_user, hrnn_arn)
else:
    print('HRNN-metadata campaign not active: {}'.format(hrnn_metadata_arn))

Before any real time events, Personalize should recommend NEWISH-Chevrolet-Tahoe cars...

First using arn:aws:personalize:us-east-1:355151823911:campaign/car-hrnn

Id: 26104, Make: Chevrolet, Model: Tahoe, Fav: NEWISH-Chevrolet-Tahoe, Year: 2015
Id: 25071, Make: Chevrolet, Model: Tahoe, Fav: NEWISH-Chevrolet-Tahoe, Year: 2018
Id: 24128, Make: Chevrolet, Model: Tahoe, Fav: NEWISH-Chevrolet-Tahoe, Year: 2015
Id: 27365, Make: Chevrolet, Model: Tahoe, Fav: NEWISH-Chevrolet-Tahoe, Year: 2014
Id: 27688, Make: Chevrolet, Model: Tahoe, Fav: NEWISH-Chevrolet-Tahoe, Year: 2015
Id: 25932, Make: Chevrolet, Model: Tahoe, Fav: NEWISH-Chevrolet-Tahoe, Year: 2014
Id: 23245, Make: Chevrolet, Model: Tahoe, Fav: NEWISH-Chevrolet-Tahoe, Year: 2017
Id: 22727, Make: Chevrolet, Model: Tahoe, Fav: NEWISH-Chevrolet-Tahoe, Year: 2018
Id: 24688, Make: Chevrolet, Model: Tahoe, Fav: NEWISH-Chevrolet-Tahoe, Year: 2015
Id: 21920, Make: Chevrolet, Model: Tahoe, Fav: NEWISH-Chevrolet-Tahoe, Year: 2017

Next using arn:

In [204]:
new_car_cluster = int_expanded_df[int_expanded_df.FAV_CLUSTER == new_cluster].sample(100)
new_car_cluster[['FAV','ITEM_ID','YEAR','PRICE']].head(3)

Unnamed: 0,FAV,ITEM_ID,YEAR,PRICE
584385,OLDISH-Chevrolet-Tahoe,33778,2013,25780
320122,OLDISH-Chevrolet-Tahoe,23028,2013,23776
308636,OLDISH-Chevrolet-Tahoe,26517,2011,22385


In [205]:
new_items_clicked = new_car_cluster['ITEM_ID'].values
new_items_clicked

array([33778, 23028, 26517, 23307, 25094, 21760, 27565, 26709, 27571,
       29478, 27887, 27777, 21448, 17218, 33579, 28211, 30818, 13254,
       30099, 16843, 31985, 29599, 23179, 20503, 26820, 28963, 26517,
       25694, 22528, 24527, 24400, 25060, 23454, 21630, 31528, 42523,
       32619, 23307, 27434, 20721, 25694, 20617, 24527, 26429, 19558,
       26429, 23179, 20998, 26709, 29169, 23307, 24150, 14772, 27105,
       31997, 30508, 30122, 26340, 20792, 21423, 32746, 23552, 23860,
       24527, 30277, 18110, 25437, 20948, 25094, 25437, 23330, 20948,
       27434, 25730, 29046, 28581, 25056, 17492, 28542, 26339, 28963,
       22906, 32665, 25338, 27624, 24561, 16707, 23860, 17723, 16353,
       25812, 28336, 27565, 30818, 23179, 24121, 27579, 19374, 30818,
       21913])

In [206]:
send_car_clicks(sample_user, new_items_clicked)

In [210]:
print('Here is the number of historical interactions for the sample user against each car cluster...')
int_expanded_df[int_expanded_df.USER_ID == sample_user]['FAV'].value_counts()

Here is the number of historical interactions for the sample user against each car cluster...


NEWISH-Chevrolet-Tahoe    70
Name: FAV, dtype: int64

In [211]:
print('Now this same user has started to like {} cars.'.format(new_fav))
print('Lets see if Personalize picks up on this real time change in intent...\n')

if is_campaign_active(hrnn_arn):
    print('First using {}\n'.format(hrnn_arn))
    recommend_cars(sample_user, hrnn_arn)
else:
    print('HRNN campaign not active: {}'.format(hrnn_arn))

if is_campaign_active(hrnn_metadata_arn):
    print('Next using {}\n'.format(hrnn_metadata_arn))
    recommend_cars(sample_user, hrnn_arn)
else:
    print('HRNN-metadata campaign not active: {}'.format(hrnn_metadata_arn))

Now this same user has started to like OLDISH-Chevrolet-Tahoe cars.
Lets see if Personalize picks up on this real time change in intent...

First using arn:aws:personalize:us-east-1:355151823911:campaign/car-hrnn

Id: 23196, Make: Chevrolet, Model: Tahoe, Fav: OLDISH-Chevrolet-Tahoe, Year: 2013
Id: 26429, Make: Chevrolet, Model: Tahoe, Fav: OLDISH-Chevrolet-Tahoe, Year: 2013
Id: 21063, Make: Chevrolet, Model: Tahoe, Fav: OLDISH-Chevrolet-Tahoe, Year: 2012
Id: 26517, Make: Chevrolet, Model: Tahoe, Fav: OLDISH-Chevrolet-Tahoe, Year: 2011
Id: 28616, Make: Chevrolet, Model: Tahoe, Fav: OLDISH-Chevrolet-Tahoe, Year: 2012
Id: 23675, Make: Chevrolet, Model: Tahoe, Fav: OLDISH-Chevrolet-Tahoe, Year: 2013
Id: 24302, Make: Chevrolet, Model: Tahoe, Fav: OLDISH-Chevrolet-Tahoe, Year: 2012
Id: 26340, Make: Chevrolet, Model: Tahoe, Fav: OLDISH-Chevrolet-Tahoe, Year: 2012
Id: 23393, Make: Chevrolet, Model: Tahoe, Fav: OLDISH-Chevrolet-Tahoe, Year: 2013
Id: 24312, Make: Chevrolet, Model: Tahoe, Fav: O