## Recommender

The ALS model and text embeddings will be used to make recommendations for a given guest history

### Imports

In [1]:
import pandas as pd
import numpy as np

import pickle
import gzip

from typing import cast
from implicit.cpu.als import AlternatingLeastSquares

### Load Datasets

In [2]:
items = pd.read_csv('../datasets/slimmed/items.csv')
items.head()

Unnamed: 0,title,features,description,videos,details,images,parent_asin,categories,average_rating,rating_number,main_category,store,price
0,Phantasmagoria: A Puzzle of Flesh,['Windows 95'],[],[],"{'Best Sellers Rank': {'Video Games': 137612, ...",[{'thumb': 'https://m.media-amazon.com/images/...,B00069EVOG,"['Video Games', 'PC', 'Games']",4.1,18,Video Games,Sierra,
1,NBA 2K17 - Early Tip Off Edition - PlayStation 4,['The #1 rated NBA video game simulation serie...,['Following the record-breaking launch of NBA ...,[{'title': 'NBA 2K17 - Kobe: Haters vs Players...,"{'Release date': 'September 16, 2016', 'Best S...",[{'thumb': 'https://m.media-amazon.com/images/...,B00Z9TLVK0,"['Video Games', 'PlayStation 4', 'Games']",4.3,223,Video Games,2K,58.0
2,Nintendo Selects: The Legend of Zelda Ocarina ...,['Authentic Nintendo Selects: The Legend of Ze...,[],[],"{'Best Sellers Rank': {'Video Games': 51019, '...",[{'thumb': 'https://m.media-amazon.com/images/...,B07SZJZV88,"['Video Games', 'Legacy Systems', 'Nintendo Sy...",4.9,22,Video Games,Amazon Renewed,37.42
3,"Spongebob Squarepants, Vol. 1",['Bubblestand: SpongeBob shows Patrick and Squ...,['Now you can watch the wild underwater antics...,[],"{'Release date': 'August 15, 2004', 'Best Sell...",[{'thumb': 'https://m.media-amazon.com/images/...,B0001ZNU56,"['Video Games', 'Legacy Systems', 'Nintendo Sy...",4.4,32,Video Games,Majesco,33.98
4,eXtremeRate Soft Touch Top Shell Front Housing...,['Compatibility Models: Ultra fits for Xbox On...,[],[],"{'Best Sellers Rank': {'Video Games': 48130, '...",[{'thumb': 'https://m.media-amazon.com/images/...,B07H93H878,"['Video Games', 'Xbox One', 'Accessories', 'Fa...",4.5,3061,Video Games,eXtremeRate,17.59


### Loading ALS Model 

In [3]:
# Load the compressed file
with gzip.open('../data_structures/als_model.pkl', 'rb') as f:
	als_model = cast(AlternatingLeastSquares, pickle.load(f))

### Loading Text Embeddings

In [4]:
# Load the embeddings
item_text_embeddings = np.load('../data_structures/item_text_embeddings.npy')

### Guest Vector

The ALS model and text embeddings will be used to make recommendations for a given guest history (from least recent to most)

In [5]:
guest_vector = [
	"B07KRWJCQW", "B07ZJ6RY1W", "B07JGVX9D6", 
	"B075YBBQMM", "B0BN942894", "B077GG9D5D", 
	"B00ZQB28XK", "B014R4KYMS", "B07YBXFF5C"
]

History would be most recent to least (exponential decay weights could be attached to items seen long ago)

In [7]:
guest_vector_history = guest_vector[::-1]
guest_vector_history

['B07YBXFF5C',
 'B014R4KYMS',
 'B00ZQB28XK',
 'B077GG9D5D',
 'B0BN942894',
 'B075YBBQMM',
 'B07JGVX9D6',
 'B07ZJ6RY1W',
 'B07KRWJCQW']

History may have repeated items so a unique list is created

In [9]:
guest_history_unique = [item for idx, item in enumerate(guest_vector_history) if idx == guest_vector_history.index(item)]
guest_history_unique

['B07YBXFF5C',
 'B014R4KYMS',
 'B00ZQB28XK',
 'B077GG9D5D',
 'B0BN942894',
 'B075YBBQMM',
 'B07JGVX9D6',
 'B07ZJ6RY1W',
 'B07KRWJCQW']

ALS model has the items with their numerical ids only so the string ids must be mapped

In [8]:
item_map = pd.read_csv('../datasets/mappings/item_map.csv')
item_map.head()

Unnamed: 0,parent_asin
0,B07DK1H3H5
1,B07SRWRH5D
2,B07MFMFW34
3,B0BCHWZX95
4,B00HUWA45W


The ids are mapped but the numbers are in ascending order (doesn't match the guest history necessarily)

In [9]:
guest_history_mapped = item_map[item_map['parent_asin'].isin(guest_history_unique)]
guest_history_mapped

Unnamed: 0,parent_asin
100,B075YBBQMM
187,B07ZJ6RY1W
225,B077GG9D5D
323,B014R4KYMS
573,B07YBXFF5C
601,B0BN942894
634,B07KRWJCQW
1020,B00ZQB28XK
83187,B07JGVX9D6


A longer approach using pandas Categorical ensures that they will be in the right order

In [10]:
guest_history_mapped = item_map[item_map['parent_asin'].isin(guest_history_unique)].copy()

guest_history_mapped['order'] = pd.Categorical(
	guest_history_mapped['parent_asin'],
	categories=guest_history_unique,
	ordered=True
)

guest_history_mapped = guest_history_mapped.sort_values('order').drop(columns='order').index
guest_history_mapped

Index([573, 323, 1020, 225, 601, 100, 83187, 187, 634], dtype='int64')

### Item-Based Collaborative Filtering

Given the (mapped) guest vector, similar items to each item can be found 

In [11]:
def find_similar_items(mapped_guest_vector: np.ndarray, N=10):
	personalized_items = als_model.similar_items(mapped_guest_vector, N=N+1)

	recommended_items, sim_scores = personalized_items
	similar_items = list(zip(recommended_items, sim_scores)) # do not include the item itself

	return similar_items

Given the similar items and their scores, the top N similar items by weighted scoring can be found

In [12]:
from collections import defaultdict

def find_most_similar_items(mapped_guest_history, N=10, decay_rate=0.3):
	similar_items = find_similar_items(mapped_guest_history)
	weighted_scores = defaultdict(float)

	for age, (recommended_items, scores) in enumerate(similar_items):
		# Items further into history should contribute less to the recommended items
		decay = np.exp(-decay_rate * age)

		for idx, item in enumerate(recommended_items):
			if item in mapped_guest_history:
				continue

			weighted_scores[item] += scores[idx] * decay

	# Sort by score and return the top N
	top_n = sorted(weighted_scores.items(), key=lambda x: x[1], reverse=True)[:N]
	return top_n

Given a list of numerical ids for items, the item informations are returned

In [13]:
def get_items(item_ids: list[int]):
	return items.loc[item_ids]

Given the (mapped) guest vector, the ids of the most relevant items are returned

In [14]:
def item_collaborative_filtering(mapped_guest_history: list[int], N=10, decay_rate=0.3):
	found_similar_items, scores = zip(*find_most_similar_items(mapped_guest_history, N=10, decay_rate=decay_rate))
	found_similar_items = list(found_similar_items)

	return get_items(found_similar_items), scores

Guest history consisted of these items

In [15]:
get_items(guest_history_mapped)['title']

573                                    Zoo Vet jc - PC/Mac
323      Limited Edition Replacement Full Cover Shell C...
1020     753 Keyboard Mouse Converter for PS3/PS4/PS5/X...
225      Nintendo Switch Case, Abida PU Leather Protect...
601      Game Card Storage Holder Hard Case for New Nin...
100      PDP Xbox One Mars Starter Pack - Big Buck Hunt...
83187    Nintendo NES Game Cartridge Dust Cover/sleeve ...
187                              CSDC Legacy Of The Stones
634      E-MODS GAMING Switch Charging Dock, Foldable T...
Name: title, dtype: object

The most relevant items are found

In [18]:
item_collaborative_filtering(guest_history_mapped)[0]['title']

25023     Naruto: Clash of Ninja Revolution - Nintendo Wii
2815     PS4 Wired Controller,Wired PS4 Game Controller...
1036      12 Month Xbox Music Pass - Xbox One Digital Code
2551             PowerA MOGA Hero Power - Electronic Games
4573     ENDGAME GEAR XM1 RGB Gaming Mouse, Programmabl...
40833    AULA Dragon Tooth 3 Color Backlit LED Illumina...
7536     RG353V Handheld Game Console 3.5 Inch IPS Scre...
10974    Rocksmith 2014 Edition with Real Tone Cable (PS4)
21159    KIWI design Upgraded Face Cushion Cover Pad Co...
12891    Portable Dock for Nintendo Switch - innoAura R...
Name: title, dtype: object

### Content-Based Collaborative Filtering

In [22]:
def similar_items_by_content(item_id: int, N=10):
	sim_item_id = np.dot(item_text_embeddings, item_text_embeddings[item_id]) \
		/ (np.linalg.norm(item_text_embeddings, axis=1) * np.linalg.norm(item_text_embeddings[item_id]))

	top_idx = np.argsort(-sim_item_id)[1:N+1]
	return [top_idx, sim_item_id[top_idx]]

In [23]:
def find_similar_items_by_content(mapped_guest_vector: np.ndarray, N=10):
	# e^-5 < 0.01 so decay for these items would be very high
	personalized_items = [similar_items_by_content(item_id, N=N) for item_id in mapped_guest_vector[:5]]
	return personalized_items

In [24]:
def find_most_similar_items_by_content(mapped_guest_history, N=10, decay_rate=0.3):
	similar_items = find_similar_items_by_content(mapped_guest_history)
	weighted_scores = defaultdict(float)

	for age, (recommended_items, scores) in enumerate(similar_items):
		# Items further into history should contribute less to the recommended items
		decay = np.exp(-decay_rate * age)

		for idx, item in enumerate(recommended_items):
			if item in mapped_guest_history:
				continue

			weighted_scores[item] += scores[idx] * decay

	# Sort by score and return the top N
	top_n = sorted(weighted_scores.items(), key=lambda x: x[1], reverse=True)[:N]
	return top_n

In [25]:
def content_collaborative_filtering(mapped_guest_history: list[int], N=10, decay_rate=0.3):
	found_similar_items, scores = zip(*find_most_similar_items_by_content(mapped_guest_history, N=N, decay_rate=decay_rate))
	found_similar_items = list(found_similar_items)
	
	return get_items(found_similar_items), scores

In [26]:
get_items(guest_history_mapped)['title']

573                                    Zoo Vet jc - PC/Mac
323      Limited Edition Replacement Full Cover Shell C...
1020     753 Keyboard Mouse Converter for PS3/PS4/PS5/X...
225      Nintendo Switch Case, Abida PU Leather Protect...
601      Game Card Storage Holder Hard Case for New Nin...
100      PDP Xbox One Mars Starter Pack - Big Buck Hunt...
83187    Nintendo NES Game Cartridge Dust Cover/sleeve ...
187                              CSDC Legacy Of The Stones
634      E-MODS GAMING Switch Charging Dock, Foldable T...
Name: title, dtype: object

In [31]:
content_collaborative_filtering(guest_history_mapped, decay_rate=0.1)[0]['title']

77801     Front+Back Shell Housing Case Cover Protector ...
16106     Gamepad Shell DIY Controller Housing Case Cove...
16928     Front+Back Shell Housing Case Cover Protector ...
24523     LICHIFIT Gamepad Shell DIY Controller Housing ...
118055    Front + Back Housing Shell Case Cover Replacem...
56111     Controller Front Shell for PS4 Controller - Ca...
84728                              Vet Emergency 2 - PC/Mac
74897     Replacement Full Housing Shell Cover Case for ...
38036                                         Vet Emergency
60079     New Replacement Top Upper Housing Shell Case C...
Name: title, dtype: object

### Hybrid Approach

In [86]:
als_model.similar_items(itemid=5912, items=guest_history_mapped, N=11)

(array([83187,   601,   323,   573,   225,   100,  1020,   634,   187,
          573,   573]),
 array([0.8376184 , 0.10792866, 0.10594105, 0.10467561, 0.0969936 ,
        0.09594233, 0.08967514, 0.08800711, 0.07285354, 0.        ,
        0.        ], dtype=float32))

In [87]:
def find_rating_similarity_scores(item_id: int, items: list[int], N=10):
	sim_item_id, scores = als_model.similar_items(itemid=item_id, items=items, N=N+1)
	return [sim_item_id[1:], scores[1:]]

In [36]:
def find_content_similarity_scores(item_id: int, items: list[int]):
	sim_item_id = np.dot(item_text_embeddings[items], item_text_embeddings[item_id]) \
		/ (np.linalg.norm(item_text_embeddings[items], axis=1) * np.linalg.norm(item_text_embeddings[item_id]))

	top_idx = np.argsort(-sim_item_id)
	return [items[top_idx], sim_item_id[top_idx]]

In [126]:
def dynamic_rating_weight(history_len, max_history=5):
    return min(0.8, history_len / max_history)  # max out at 0.8 weight

In [165]:
def recommender(mapped_guest_history, N=10, similar_per_item=10, item_decay_rate=0.3, content_decay_rate=0.1, rating_weight=0.5):
	# Get most similar items by rating and content
	similar_items_by_rating, rating_scores = item_collaborative_filtering(mapped_guest_history, N=similar_per_item, decay_rate=item_decay_rate)
	similar_items_by_content, content_scores = content_collaborative_filtering(mapped_guest_history, N=similar_per_item, decay_rate=content_decay_rate)

	# Create two DataFrames to keep track of similarity scores
	df_rating = similar_items_by_rating[['parent_asin', 'title']].rename(columns={'score': 'score_rating'})
	df_content = similar_items_by_content[['parent_asin', 'title']].rename(columns={'score': 'score_content'})

	df_rating['score_rating'] = rating_scores
	df_content['score_content'] = content_scores

	# Merge recommendations (outer to include all)
	merged_df = pd.merge(df_rating, df_content, on=['parent_asin', 'title'], how='outer')
	merged_df.fillna(0, inplace=True)

	# score_rating or score_content may be 0 if the item did not appear in both recommendations
	# so they must be calculated before the top N recommendations are taken
	def fill_rating_score(row):
		if row['score_rating'] == 0 or pd.isna(row['score_rating']):
			numerical_parent_asin = item_map[item_map['parent_asin'] == row['parent_asin']].index[0]
			max_rating_sim_score = np.max(find_content_similarity_scores(numerical_parent_asin, mapped_guest_history)[1])

			return max_rating_sim_score
		
		return row['score_rating']

	def fill_content_score(row):
		if row['score_content'] == 0 or pd.isna(row['score_content']):
			numerical_parent_asin = item_map[item_map['parent_asin'] == row['parent_asin']].index[0]
			max_content_sim_score = np.max(find_content_similarity_scores(numerical_parent_asin, mapped_guest_history)[1])

			return max_content_sim_score
		
		return row['score_content']

	merged_df['score_rating'] = merged_df.apply(fill_rating_score, axis=1)
	merged_df['score_content'] = merged_df.apply(fill_content_score, axis=1)

	# Normalize the scores in the DataFrame to remove bias
	merged_df['score_rating'] = (merged_df['score_rating'] - merged_df['score_rating'].min()) / (merged_df['score_rating'].max() - merged_df['score_rating'].min() + 1e-6)
	merged_df['score_content'] = (merged_df['score_content'] - merged_df['score_content'].min()) / (merged_df['score_content'].max() - merged_df['score_content'].min() + 1e-6)

	# The final score for each item is calculated via weighted average of the scores and the top N is taken afterwards
	RATING_WEIGHT = rating_weight if rating_weight is not None else dynamic_rating_weight(len(mapped_guest_history))
	CONTENT_WEIGHT = 1 - RATING_WEIGHT

	merged_df['final_score'] = (RATING_WEIGHT * merged_df['score_rating'] + CONTENT_WEIGHT * merged_df['score_content'])
	merged_df = merged_df.sort_values(by='final_score', ascending=False)

	top_n_rating = df_rating.sort_values(by='score_rating', ascending=False).iloc[:N]
	top_n_content = df_content.sort_values(by='score_content', ascending=False).iloc[:N]

	def explain_contributor(row):
		r = RATING_WEIGHT * row['score_rating']
		c = CONTENT_WEIGHT * row['score_content']
		return 'both' if abs(r - c) < 0.05 else ('rating' if r > c else 'content')

	top_n_hybrid = merged_df.iloc[:N]
	top_n_hybrid.loc[:, 'main_contributor'] = top_n_hybrid.apply(explain_contributor, axis=1)

	return top_n_rating, top_n_content, top_n_hybrid

In [166]:
top_n_rating, top_n_content, top_n_hybrid = recommender(guest_history_mapped, N=20)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  top_n_hybrid.loc[:, 'main_contributor'] = top_n_hybrid.apply(explain_contributor, axis=1)


In [167]:
top_n_rating

Unnamed: 0,parent_asin,title,score_rating
25023,B000S8JXAM,Naruto: Clash of Ninja Revolution - Nintendo Wii,0.387601
2815,B0925VYS82,"PS4 Wired Controller,Wired PS4 Game Controller...",0.371856
1036,B009VIKJUS,12 Month Xbox Music Pass - Xbox One Digital Code,0.365042
2551,B00GIK3YAO,PowerA MOGA Hero Power - Electronic Games,0.339077
4573,B083TZM57S,"ENDGAME GEAR XM1 RGB Gaming Mouse, Programmabl...",0.338954
40833,B00FAOKV1Y,AULA Dragon Tooth 3 Color Backlit LED Illumina...,0.329759
7536,B0BNJZDJGP,RG353V Handheld Game Console 3.5 Inch IPS Scre...,0.327544
10974,B00KJGJPU6,Rocksmith 2014 Edition with Real Tone Cable (PS4),0.317871
21159,B08NT6NMZ3,KIWI design Upgraded Face Cushion Cover Pad Co...,0.316609
12891,B07QYRF333,Portable Dock for Nintendo Switch - innoAura R...,0.313389


In [168]:
top_n_content

Unnamed: 0,parent_asin,title,score_content
77801,B07FJZL85Q,Front+Back Shell Housing Case Cover Protector ...,0.84515
16106,B088TLK92N,Gamepad Shell DIY Controller Housing Case Cove...,0.823098
16928,B07HVFRVP2,Front+Back Shell Housing Case Cover Protector ...,0.820399
24523,B08DY7MZ1X,LICHIFIT Gamepad Shell DIY Controller Housing ...,0.801288
118055,B0BG92X25C,Front + Back Housing Shell Case Cover Replacem...,0.796818
56111,B00WPKR6AA,Controller Front Shell for PS4 Controller - Ca...,0.792864
84728,B00008I8RE,Vet Emergency 2 - PC/Mac,0.7904
74897,B077QJPJQ6,Replacement Full Housing Shell Cover Case for ...,0.78818
38036,B000EA6VX6,Vet Emergency,0.787053
60079,B081V6T61D,New Replacement Top Upper Housing Shell Case C...,0.785323


In [169]:
top_n_hybrid

Unnamed: 0,parent_asin,title,score_rating,score_content,final_score,main_contributor
14,B088TLK92N,Gamepad Shell DIY Controller Housing Case Cove...,0.999997,0.95985,0.979924,both
18,B0BG92X25C,Front + Back Housing Shell Case Cover Replacem...,0.715899,0.912005,0.813952,content
9,B07FJZL85Q,Front+Back Shell Housing Case Cover Protector ...,0.569819,0.999998,0.784909,content
10,B07HVFRVP2,Front+Back Shell Housing Case Cover Protector ...,0.608591,0.954937,0.781764,content
1,B000EA6VX6,Vet Emergency,0.654552,0.894227,0.77439,content
8,B077QJPJQ6,Replacement Full Housing Shell Cover Case for ...,0.61704,0.896279,0.756659,content
15,B08DY7MZ1X,LICHIFIT Gamepad Shell DIY Controller Housing ...,0.568416,0.920143,0.744279,content
12,B081V6T61D,New Replacement Top Upper Housing Shell Case C...,0.506672,0.891076,0.698874,content
0,B00008I8RE,Vet Emergency 2 - PC/Mac,0.382812,0.90032,0.641566,content
7,B00WPKR6AA,Controller Front Shell for PS4 Controller - Ca...,0.158524,0.904806,0.531665,content


In [170]:
get_items(guest_history_mapped)['title']

573                                    Zoo Vet jc - PC/Mac
323      Limited Edition Replacement Full Cover Shell C...
1020     753 Keyboard Mouse Converter for PS3/PS4/PS5/X...
225      Nintendo Switch Case, Abida PU Leather Protect...
601      Game Card Storage Holder Hard Case for New Nin...
100      PDP Xbox One Mars Starter Pack - Big Buck Hunt...
83187    Nintendo NES Game Cartridge Dust Cover/sleeve ...
187                              CSDC Legacy Of The Stones
634      E-MODS GAMING Switch Charging Dock, Foldable T...
Name: title, dtype: object

In [171]:
top_n_hybrid['title']

14    Gamepad Shell DIY Controller Housing Case Cove...
18    Front + Back Housing Shell Case Cover Replacem...
9     Front+Back Shell Housing Case Cover Protector ...
10    Front+Back Shell Housing Case Cover Protector ...
1                                         Vet Emergency
8     Replacement Full Housing Shell Cover Case for ...
15    LICHIFIT Gamepad Shell DIY Controller Housing ...
12    New Replacement Top Upper Housing Shell Case C...
0                              Vet Emergency 2 - PC/Mac
7     Controller Front Shell for PS4 Controller - Ca...
5             PowerA MOGA Hero Power - Electronic Games
16    KIWI design Upgraded Face Cushion Cover Pad Co...
3      12 Month Xbox Music Pass - Xbox One Digital Code
17    PS4 Wired Controller,Wired PS4 Game Controller...
6     Rocksmith 2014 Edition with Real Tone Cable (PS4)
19    RG353V Handheld Game Console 3.5 Inch IPS Scre...
2      Naruto: Clash of Ninja Revolution - Nintendo Wii
11    Portable Dock for Nintendo Switch - innoAu