## Non-Personalized Recommendations

### Load Datasets

In [1]:
import pandas as pd

reviews = pd.read_csv('../datasets/slimmed/reviews.csv')
items = pd.read_csv('../datasets/slimmed/items.csv')

In [2]:
reviews.head()

Unnamed: 0,timestamp,asin,title,user_id,images,text,parent_asin,rating
0,2020-12-17 06:33:24.795,B07DJWBYKP,It’s pretty sexual. Not my fav,AGCI7FAH4GL5FI65HYLKWTMFZ2CQ,[],I’m playing on ps5 and it’s interesting. It’s...,B07DK1H3H5,4
1,2020-04-16 15:31:54.941,B00ZS80PC2,Good. A bit slow,AGCI7FAH4GL5FI65HYLKWTMFZ2CQ,[],Nostalgic fun. A bit slow. I hope they don’t...,B07SRWRH5D,5
2,2017-03-30 12:37:11.000,B01FEHJYUU,... an order for my kids & they have really en...,AGXVBIUFLFGMVLATYXHJYL4A5Q7Q,[],This was an order for my kids & they have real...,B07MFMFW34,5
3,2019-12-29 16:40:34.017,B07GXJHRVK,Great alt to pro controller,AFTC6ZR5IKNRDG5JCPVNVMU3XV2Q,[],"These work great, They use batteries which is ...",B0BCHWZX95,5
4,2015-03-29 01:18:52.000,B00HUWA45W,solid product,AFTC6ZR5IKNRDG5JCPVNVMU3XV2Q,[],I would recommend to anyone looking to add jus...,B00HUWA45W,5


In [3]:
items.head()

Unnamed: 0,title,features,description,videos,details,images,parent_asin,categories,average_rating,rating_number,main_category,store,price
0,Phantasmagoria: A Puzzle of Flesh,['Windows 95'],[],[],"{'Best Sellers Rank': {'Video Games': 137612, ...",[{'thumb': 'https://m.media-amazon.com/images/...,B00069EVOG,"['Video Games', 'PC', 'Games']",4.1,18,Video Games,Sierra,
1,NBA 2K17 - Early Tip Off Edition - PlayStation 4,['The #1 rated NBA video game simulation serie...,['Following the record-breaking launch of NBA ...,[{'title': 'NBA 2K17 - Kobe: Haters vs Players...,"{'Release date': 'September 16, 2016', 'Best S...",[{'thumb': 'https://m.media-amazon.com/images/...,B00Z9TLVK0,"['Video Games', 'PlayStation 4', 'Games']",4.3,223,Video Games,2K,58.0
2,Nintendo Selects: The Legend of Zelda Ocarina ...,['Authentic Nintendo Selects: The Legend of Ze...,[],[],"{'Best Sellers Rank': {'Video Games': 51019, '...",[{'thumb': 'https://m.media-amazon.com/images/...,B07SZJZV88,"['Video Games', 'Legacy Systems', 'Nintendo Sy...",4.9,22,Video Games,Amazon Renewed,37.42
3,"Spongebob Squarepants, Vol. 1",['Bubblestand: SpongeBob shows Patrick and Squ...,['Now you can watch the wild underwater antics...,[],"{'Release date': 'August 15, 2004', 'Best Sell...",[{'thumb': 'https://m.media-amazon.com/images/...,B0001ZNU56,"['Video Games', 'Legacy Systems', 'Nintendo Sy...",4.4,32,Video Games,Majesco,33.98
4,eXtremeRate Soft Touch Top Shell Front Housing...,['Compatibility Models: Ultra fits for Xbox On...,[],[],"{'Best Sellers Rank': {'Video Games': 48130, '...",[{'thumb': 'https://m.media-amazon.com/images/...,B07H93H878,"['Video Games', 'Xbox One', 'Accessories', 'Fa...",4.5,3061,Video Games,eXtremeRate,17.59


### Ratings/Views Table

A table of item names with their average ratings and number of views is created

In [None]:
ratings_views = items[['parent_asin', 'average_rating', 'rating_number']].set_index('parent_asin')
ratings_views.head()

Unnamed: 0_level_0,average_rating,rating_number
parent_asin,Unnamed: 1_level_1,Unnamed: 2_level_1
B00069EVOG,4.1,18
B00Z9TLVK0,4.3,223
B07SZJZV88,4.9,22
B0001ZNU56,4.4,32
B07H93H878,4.5,3061


### Weighted Scoring

The maximum average rating and number of reviews are collected

In [5]:
max_average_rating = ratings_views['average_rating'].max()
max_number_reviews = ratings_views['rating_number'].max()

max_average_rating, max_number_reviews

(np.float64(5.0), np.int64(278574))

Weights for the average rating and number of reviews are defined

In [6]:
W_avg_rating = 0.5
W_num_reviews = 1 - W_avg_rating

The weighted scores are calculated and displayed in descending order

In [None]:
rating_views_scored = ratings_views.copy()

rating_views_scored['normalized_average_rating'] = rating_views_scored['average_rating'] / max_average_rating 
rating_views_scored['normalized_rating_number'] = rating_views_scored['rating_number'] / max_number_reviews

rating_views_scored['weighted_score'] = W_avg_rating * rating_views_scored['normalized_average_rating'] + W_num_reviews * rating_views_scored['normalized_rating_number']
rating_views_scored = rating_views_scored.sort_values(by='weighted_score', ascending=False)

rating_views_scored[['average_rating', 'rating_number', 'weighted_score']]

Unnamed: 0_level_0,average_rating,rating_number,weighted_score
parent_asin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
B0C3KYVDWT,4.9,278574,0.990000
B0BL65X86R,4.7,261278,0.938956
B07KRWJCQW,4.6,171822,0.768396
B00PB8N364,4.7,129962,0.703263
B07V8YSBFG,4.5,139255,0.699943
...,...,...,...
B01JOR7110,1.0,1,0.100002
B06W59951K,1.0,1,0.100002
B017M9IPDK,1.0,1,0.100002
B007BBZZZG,1.0,1,0.100002


Display the top 10 scored items

In [None]:
items[items['parent_asin'].isin(rating_views_scored.index[:10])]

Unnamed: 0,title,features,description,videos,details,images,parent_asin,categories,average_rating,rating_number,main_category,store,price
1606,$25 PlayStation Store Gift Card [Digital Code],['Must be 18+yrs and have an account for PlayS...,['Redeem against anything on PlayStation Store...,[{'title': 'Know how to redeem before purchase...,"{'Release date': 'January 1, 2008', 'Best Sell...",[{'thumb': 'https://m.media-amazon.com/images/...,B0BL65X86R,"['Video Games', 'Online Game Services', 'PlayS...",4.7,261278,Video Games,PlayStation,25.0
2107,$40 Xbox Gift Card [Digital Code],['XBOX GIFT CARD: Buy full digital game downlo...,['Buy an Xbox Gift Card for yourself or a frie...,[{'title': 'Before You Buy an Xbox Gift Card K...,"{'Release date': 'September 6, 2013', 'Best Se...",[{'thumb': 'https://m.media-amazon.com/images/...,B07KRWJCQW,"['Video Games', 'Online Game Services', 'Xbox ...",4.6,171822,Video Games,Xbox,40.0
4014,"Roblox Digital Gift Code for 1,200 Robux [Rede...",['Get a virtual item when you redeem a Robux d...,[],[{'title': 'Roblox Gift Cards!!! Super easy an...,"{'Release date': 'September 6, 2022', 'Best Se...",[{'thumb': 'https://m.media-amazon.com/images/...,B07V8YSBFG,"['Video Games', 'Digital for the Holidays', 'D...",4.5,139255,Video Games,Roblox,15.0
18299,$45 Nintendo eShop Gift Card [Digital Code],['Give the gift of fun with a Nintendo eShop g...,[],[{'title': 'How To Redeem A Nintendo Gift Card...,"{'Release date': 'January 1, 1970', 'Best Sell...",[{'thumb': 'https://m.media-amazon.com/images/...,B07ZJ6RY1W,"['Video Games', 'Online Game Services', 'Ninte...",4.7,103760,Video Games,Nintendo,45.0
23525,"SanDisk 128GB microSDXC-Card, Licensed for Nin...",['Incredible speeds in a microSD card official...,"['With incredible speed, the officially licens...","[{'title': 'DO NOT BUY', 'url': 'https://www.a...","{'Brand': 'SanDisk', 'Series': 'Nintendo®-Lice...",[{'thumb': 'https://m.media-amazon.com/images/...,B0C3KYVDWT,"['Video Games', 'Nintendo Switch', 'Accessories']",4.9,278574,Computers,SanDisk,14.99
39249,To Kill A Mockingbird,[],['To Kill a Mockingbird'],[],"{'Package Dimensions': '6 x 4 x 2 inches', 'Da...",[{'thumb': 'https://m.media-amazon.com/images/...,B00PB8N364,"['Video Games', 'Legacy Systems', 'Nintendo Sy...",4.7,129962,Digital Music,Harper Lee,
45030,DualShock 4 Wireless Controller for PlayStatio...,"['The feel, shape, and sensitivity of the dual...","[""The DualShock 4 Wireless Controller features...",[{'title': 'PS4 Dualshock 4 Controller PROS an...,"{'Release date': 'October 5, 2016', 'Best Sell...",[{'thumb': 'https://m.media-amazon.com/images/...,B077GG9D5D,"['Video Games', 'PlayStation 4', 'Accessories'...",4.7,124073,Video Games,PlayStation,57.0
61908,Dr. Seuss' The Grinch / Le Grincheux,[],['3'],[],"{'Genre': 'Kids & Family, Animation', 'Format'...",[{'thumb': 'https://m.media-amazon.com/images/...,B07KCS4WNQ,"['Video Games', 'PlayStation 4', 'Games']",4.8,106015,Movies & TV,"Benedict Cumberbatch (Actor), Cameron See...",14.98
63372,Born a Crime: Stories from a South African Chi...,"['#1', 'NEW YORK TIMES', 'BESTSELLER • More th...","['Review', '“A soul-nourishing pleasure . . . ...",[{'title': 'Born a Crime: Stories from a South...,{'Publisher': 'One World; Later Printing editi...,[{'large': 'https://m.media-amazon.com/images/...,0399588175,"['Video Games', 'PC', 'Games']",4.7,102896,Books,Trevor Noah (Author),14.63
63502,amFilm Tempered Glass Screen Protector for Nin...,['Specifically designed for the 6.2-inch Ninte...,[],"[{'title': ""I've Been Using This Since 2017! ""...","{'Product Dimensions': '7.09""L x 3.94""W', 'Ite...",[{'thumb': 'https://m.media-amazon.com/images/...,B01N3ASPNV,"['Video Games', 'Nintendo Switch', 'Accessorie...",4.8,110368,All Electronics,amFilm,8.91


### Average User Rating

In [9]:
# Group by user_id and compute both mean rating and count
grouped = reviews.groupby('user_id')['rating'].agg(
    average_rating='mean',
    num_ratings='count'
).reset_index()

# Rename to match your expected format
average_user_ratings = grouped[['user_id', 'average_rating', 'num_ratings']]
average_user_ratings

Unnamed: 0,user_id,average_rating,num_ratings
0,AE2225K3KY4D3KKSN6I2AHBVR4QQ,2.00,1
1,AE2225U265E7XPS3LRUYSJFT7OLQ,5.00,1
2,AE2227QSJSF6HV3XTTIGQAPA6WHQ,4.50,2
3,AE222GHCPAOQIDZDVHIMMGZ6WZSA,5.00,1
4,AE222HFZDH6BPTYFOUWGGU63YSIQ,5.00,7
...,...,...,...
2282088,AHZZZX5LNNKLAZCTZNB6F445ZIIQ,4.00,1
2282089,AHZZZY2XVWEUJUTYPGGL4WXH6CSA,2.50,4
2282090,AHZZZYCUOTRYW4ZQFIFAEZGBOY4A,4.25,4
2282091,AHZZZYE2256FFHPFB54DUDOQL3IA,5.00,1


In [10]:
average_user_ratings = average_user_ratings.reset_index(drop=True)
average_user_ratings

Unnamed: 0,user_id,average_rating,num_ratings
0,AE2225K3KY4D3KKSN6I2AHBVR4QQ,2.00,1
1,AE2225U265E7XPS3LRUYSJFT7OLQ,5.00,1
2,AE2227QSJSF6HV3XTTIGQAPA6WHQ,4.50,2
3,AE222GHCPAOQIDZDVHIMMGZ6WZSA,5.00,1
4,AE222HFZDH6BPTYFOUWGGU63YSIQ,5.00,7
...,...,...,...
2282088,AHZZZX5LNNKLAZCTZNB6F445ZIIQ,4.00,1
2282089,AHZZZY2XVWEUJUTYPGGL4WXH6CSA,2.50,4
2282090,AHZZZYCUOTRYW4ZQFIFAEZGBOY4A,4.25,4
2282091,AHZZZYE2256FFHPFB54DUDOQL3IA,5.00,1


### Saving Dataset

The weighted scorings are saved to a Postgres SQL table to be used on the backend

In [11]:
rating_views_scored

Unnamed: 0_level_0,average_rating,rating_number,normalized_average_rating,normalized_rating_number,weighted_score
parent_asin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
B0C3KYVDWT,4.9,278574,0.98,1.000000,0.990000
B0BL65X86R,4.7,261278,0.94,0.937912,0.938956
B07KRWJCQW,4.6,171822,0.92,0.616791,0.768396
B00PB8N364,4.7,129962,0.94,0.466526,0.703263
B07V8YSBFG,4.5,139255,0.90,0.499885,0.699943
...,...,...,...,...,...
B01JOR7110,1.0,1,0.20,0.000004,0.100002
B06W59951K,1.0,1,0.20,0.000004,0.100002
B017M9IPDK,1.0,1,0.20,0.000004,0.100002
B007BBZZZG,1.0,1,0.20,0.000004,0.100002


In [12]:
from sqlalchemy import create_engine

# Extract the item id and weighted score of all items
weighted_scores = pd.DataFrame(data={'parent_asin': rating_views_scored.index.values, 'weighted_score': rating_views_scored['weighted_score'].values})

# Create the SQLAlchemy engine and save data
engine = create_engine('postgresql://postgres:root@localhost:5432/AppForge')
weighted_scores.to_sql('weighted_scores', engine, if_exists='replace', index=False)

820

In [13]:
from sqlalchemy import create_engine

# Create the SQLAlchemy engine and save data
engine = create_engine('postgresql://postgres:root@localhost:5432/AppForge')
average_user_ratings.to_sql('average_user_ratings', engine, if_exists='replace', index=False)

93

### Associations/Patterns

Import user-item sparse matrix from `user_item_matrix.ipynb` (all variables and data will be stored in the `uim` dictionary)

In [14]:
import pandas as pd
import nbformat

# Load the notebook
with open('user_item_matrix.ipynb', 'r', encoding='utf-8') as f:
	nb = nbformat.read(f, as_version=4)

# Execute all code cells and store data in the uim dict
uim = {}
for cell in nb.cells:
	if cell.cell_type == 'code':
		exec(cell.source, uim)

Convert the binary user-item sparse matrix to a sparse dataframe

In [15]:
df_sparse_binary = pd.DataFrame.sparse.from_spmatrix(uim['sparse_matrix_csr_binary'], columns=uim['item_map'].keys())
df_sparse_binary.head()

Unnamed: 0,B07DK1H3H5,B07SRWRH5D,B07MFMFW34,B0BCHWZX95,B00HUWA45W,B073SC6V1D,B004RMK57U,B0BYVN9ZK2,B08L6782X9,B017V6YVDC,...,B00CNWBOZS,B000GFSXUI,B00004NHFG,B001DC64UY,B009VURV2U,B00DSO5JN8,B00458GXJA,B00008KTN4,B0030EFVCK,B007U0KJJU
0,1,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,1,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Since there size of the dataframe is huge, it is ABSOLUTELY necessary to use a sparse matrix

In [16]:
print('{:,}'.format(df_sparse_binary.size), 'elements')

278,004,569,260 elements


Get the number of reviews for all items

In [17]:
item_num_reviews = items[['parent_asin', 'rating_number']].set_index('parent_asin')
item_num_reviews = pd.Series(index=item_num_reviews.index, data=item_num_reviews['rating_number']).sort_values(ascending=False)

item_num_reviews

parent_asin
B0C3KYVDWT    278574
B0BL65X86R    261278
B07KRWJCQW    171822
B07V8YSBFG    139255
B00PB8N364    129962
               ...  
B00MOSEV2E         1
B000094VL5         1
B00MPZELRQ         1
B098P656LY         1
B07V5NS93Y         1
Name: rating_number, Length: 121820, dtype: int64

Number of users & items in dataset

In [18]:
num_users, num_items = df_sparse_binary.shape
num_users, num_items

(2282093, 121820)

#### Finding Support of Items

Get the supports of the items

In [19]:
supports = item_num_reviews / num_users
supports

parent_asin
B0C3KYVDWT    1.220695e-01
B0BL65X86R    1.144905e-01
B07KRWJCQW    7.529141e-02
B07V8YSBFG    6.102074e-02
B00PB8N364    5.694860e-02
                  ...     
B00MOSEV2E    4.381942e-07
B000094VL5    4.381942e-07
B00MPZELRQ    4.381942e-07
B098P656LY    4.381942e-07
B07V5NS93Y    4.381942e-07
Name: rating_number, Length: 121820, dtype: float64

Define a minimum support threshold

In [20]:
MINIMUM_SUPPORT_THRESHOLD = 1e-3

Get the items that meet the threshold

In [21]:
item_passes = (supports >= MINIMUM_SUPPORT_THRESHOLD)
satisfying_items = item_passes.loc[item_passes == True].index

satisfying_items.size

2271

#### Create List of Frequent Items in Descending Order

A set of frequent items that meet the minimum support is created, sorted in descending order

In [22]:
frequent_items = supports[satisfying_items]
frequent_items

parent_asin
B0C3KYVDWT    0.122070
B0BL65X86R    0.114491
B07KRWJCQW    0.075291
B07V8YSBFG    0.061021
B00PB8N364    0.056949
                ...   
B019TQJHE0    0.001002
B00GO4PTR0    0.001002
B09T5VN7D1    0.001001
B08MWKRB9Z    0.001001
B01GY35UK6    0.001001
Name: rating_number, Length: 2271, dtype: float64

A subset of the sparse matrix with the frequent items is obtained

In [23]:
subset_df_sparse_binary = df_sparse_binary[frequent_items.index]
subset_df_sparse_binary

Unnamed: 0,B0C3KYVDWT,B0BL65X86R,B07KRWJCQW,B07V8YSBFG,B00PB8N364,B077GG9D5D,B01N3ASPNV,B07KCS4WNQ,B07ZJ6RY1W,0399588175,...,B0C94D6KST,B08MBKGKF4,B085217GYP,B08Z6ZQFRR,B09LTYXWYD,B019TQJHE0,B00GO4PTR0,B09T5VN7D1,B08MWKRB9Z,B01GY35UK6
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2282088,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2282089,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2282090,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2282091,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### FP-Growth Tree

In [24]:
class FPNode:
	def __init__(self, value):
		self.value = value
		self.count = 1
		self.children = {}

class FPGrowthTree:
	def __init__(self):
		self.root = FPNode(None)

	def insert(self, indices):
		current_node = self.root

		for idx in indices:			
			if idx in current_node.children:
				current_node = current_node.children[idx]
				current_node.count += 1
			else:
				current_node.children[idx] = FPNode(idx)
				current_node = current_node.children[idx]

The FP Growth Tree is constructed

In [25]:
item_map = {i: j for j, i in enumerate(subset_df_sparse_binary.columns)}
reverse_item_map = {v:k for k, v in item_map.items()}

item_idx = subset_df_sparse_binary.columns.map(item_map)

In [26]:
coo_matrix = subset_df_sparse_binary.sparse.to_coo()
csr_matrix = coo_matrix.tocsr()

fp_tree = FPGrowthTree()

for i in range(csr_matrix.shape[0]):
	start, end = csr_matrix.indptr[i], csr_matrix.indptr[i + 1]  # Get row slice
	
	if start < end:
		indices = csr_matrix.indices[start:end]  # Extract non-zero indices (items)
		fp_tree.insert(indices)  # Insert into FP-Tree

The FP Growth Tree is then saved

In [27]:
import pickle
import gzip

# Save to a pickle file
with gzip.open('../data_structures/fp_growth_tree.pkl', 'wb', compresslevel=5) as f:
	pickle.dump(fp_tree, f)

Load FP Growth Tree

In [28]:
import pickle
import gzip

# Load the compressed file
with gzip.open('../data_structures/fp_growth_tree.pkl', 'rb') as f:
	fp_tree = pickle.load(f)

Conditional Pattern Bases are then constructed

In [29]:
conditional_pattern_bases = {}

def explore_fp(current_node, current_path):
	if current_node is None:
		return
	
	num_children = len(current_node.children)
	if num_children == 0:
		current_node_value = current_node.value

		if current_node_value not in conditional_pattern_bases:
			conditional_pattern_bases[current_node_value] = []

		current_path.pop() # remove current_node_value from path
		conditional_pattern_bases[current_node_value].append(current_path)
	else:
		for child_idx, child_node in current_node.children.items():
			new_path = list(current_path)
			new_path.append(child_idx)

			explore_fp(child_node, new_path)

explore_fp(fp_tree.root, [])

In [30]:
conditional_pattern_bases

{np.int32(1642): [[np.int32(436), np.int32(1270)],
  [np.int32(436), np.int32(676)],
  [np.int32(436)],
  [np.int32(91)],
  [np.int32(87), np.int32(631), np.int32(948), np.int32(1487)],
  [np.int32(1055)],
  [np.int32(5), np.int32(29)],
  [np.int32(5), np.int32(29), np.int32(1178)],
  [np.int32(5),
   np.int32(37),
   np.int32(252),
   np.int32(282),
   np.int32(521),
   np.int32(1159)],
  [np.int32(5),
   np.int32(57),
   np.int32(282),
   np.int32(687),
   np.int32(881),
   np.int32(1000),
   np.int32(1318)],
  [np.int32(5), np.int32(67)],
  [np.int32(5), np.int32(32), np.int32(519), np.int32(651)],
  [np.int32(5), np.int32(689)],
  [np.int32(5), np.int32(973)],
  [np.int32(500)],
  [np.int32(881), np.int32(1440)],
  [np.int32(881), np.int32(1487)],
  [np.int32(881), np.int32(1369)],
  [np.int32(1487)],
  [np.int32(434), np.int32(436), np.int32(881), np.int32(983)],
  [np.int32(434), np.int32(465)],
  [np.int32(25), np.int32(29), np.int32(595)],
  [np.int32(25), np.int32(184)],
  [np

Conditional Frequent Pattern Bases are then constructed

In [31]:
conditional_frequent_pattern_bases = {}

for item_idx, item_paths in conditional_pattern_bases.items():
	# The intersection of all the item paths is taken
	common_items = set(item_paths[0])
	
	for path in item_paths[1:]:
		common_items = common_items.intersection(path)
	
	len_common_items = len(common_items)
	if len_common_items > 0:
		conditional_frequent_pattern = {}

		for item in common_items:
			conditional_frequent_pattern[item] = len_common_items

		conditional_frequent_pattern_bases[item_idx] = conditional_frequent_pattern

In [32]:
conditional_frequent_pattern_bases

{np.int32(849): {np.int32(436): 1},
 np.int32(1352): {np.int32(36): 1},
 np.int32(471): {np.int32(43): 2, np.int32(198): 2},
 np.int32(15): {np.int32(0): 2, np.int32(6): 2},
 np.int32(1898): {np.int32(1897): 2, np.int32(148): 2},
 np.int32(1800): {np.int32(437): 1},
 np.int32(1660): {np.int32(657): 1},
 np.int32(30): {np.int32(8): 1},
 np.int32(848): {np.int32(8): 1},
 np.int32(1956): {np.int32(5): 7,
  np.int32(1327): 7,
  np.int32(1935): 7,
  np.int32(657): 7,
  np.int32(850): 7,
  np.int32(1947): 7,
  np.int32(255): 7},
 np.int32(539): {np.int32(5): 1},
 np.int32(2245): {np.int32(1425): 2, np.int32(5): 2},
 np.int32(1136): {np.int32(677): 2, np.int32(31): 2},
 np.int32(2095): {np.int32(1030): 2, np.int32(647): 2},
 np.int32(186): {np.int32(25): 1},
 np.int32(883): {np.int32(350): 1},
 np.int32(863): {np.int32(19): 1},
 np.int32(204): {np.int32(19): 1},
 np.int32(497): {np.int32(66): 2, np.int32(363): 2},
 np.int32(1531): {np.int32(66): 2, np.int32(1522): 2},
 np.int32(98): {np.int32

Frequent patterns are then constructed

In [33]:
from itertools import combinations

frequent_patterns = {}

for item_idx, cfp in conditional_frequent_pattern_bases.items():
	keys = cfp.keys()
	frequent_patterns[item_idx] = []

	for i in range(len(keys)):
		for comb in combinations(cfp.keys(), i + 1):
			frequent_patterns[item_idx].append((comb, set(cfp.values()).pop()))

In [34]:
frequent_patterns

{np.int32(849): [((np.int32(436),), 1)],
 np.int32(1352): [((np.int32(36),), 1)],
 np.int32(471): [((np.int32(43),), 2),
  ((np.int32(198),), 2),
  ((np.int32(43), np.int32(198)), 2)],
 np.int32(15): [((np.int32(0),), 2),
  ((np.int32(6),), 2),
  ((np.int32(0), np.int32(6)), 2)],
 np.int32(1898): [((np.int32(1897),), 2),
  ((np.int32(148),), 2),
  ((np.int32(1897), np.int32(148)), 2)],
 np.int32(1800): [((np.int32(437),), 1)],
 np.int32(1660): [((np.int32(657),), 1)],
 np.int32(30): [((np.int32(8),), 1)],
 np.int32(848): [((np.int32(8),), 1)],
 np.int32(1956): [((np.int32(5),), 7),
  ((np.int32(1327),), 7),
  ((np.int32(1935),), 7),
  ((np.int32(657),), 7),
  ((np.int32(850),), 7),
  ((np.int32(1947),), 7),
  ((np.int32(255),), 7),
  ((np.int32(5), np.int32(1327)), 7),
  ((np.int32(5), np.int32(1935)), 7),
  ((np.int32(5), np.int32(657)), 7),
  ((np.int32(5), np.int32(850)), 7),
  ((np.int32(5), np.int32(1947)), 7),
  ((np.int32(5), np.int32(255)), 7),
  ((np.int32(1327), np.int32(1935

The association rules can now be created and we set a minimum confidence

In [35]:
MIN_ASSOCIATION_CONFIDENCE = 1e-4

In [36]:
def count_antecedent_support(rule, consequent):
	# Get the set of users for each antecedent item
	antecedent_users = [set(csr_matrix[:, item].nonzero()[0]) for item in rule]

	# Find users who have interacted with ALL antecedent items (intersection of sets)
	common_users = set.intersection(*antecedent_users)

	# Find users who interacted with the consequent
	consequent_users = set(csr_matrix[:, consequent].nonzero()[0])

	# Find users who have both the antecedent and the consequent
	final_users = common_users.intersection(consequent_users)

	# Compute support
	antecedent_support = len(final_users) / num_users

	return antecedent_support

In [37]:
association_rules = {}

for consequent, antecedents in frequent_patterns.items():
	for antecedent in antecedents:
		rule, _ = antecedent

		antecdent_support = count_antecedent_support(rule, consequent)
		consequent_support = supports[reverse_item_map[consequent]]
		confidence = antecdent_support / consequent_support

		if confidence >= MIN_ASSOCIATION_CONFIDENCE:
			association_rules[(rule, consequent)] = {
				'antecedent_support': antecdent_support,
				'consequent_support': consequent_support,
				'confidence': confidence
			}

In [38]:
association_rules

{((np.int32(436),),
  np.int32(849)): {'antecedent_support': 4.3819423660648363e-07, 'consequent_support': np.float64(0.002349159302447359), 'confidence': np.float64(0.00018653236336504383)},
 ((np.int32(36),),
  np.int32(1352)): {'antecedent_support': 4.3819423660648363e-07, 'consequent_support': np.float64(0.0015687353670512114), 'confidence': np.float64(0.00027932960893854746)},
 ((np.int32(43),),
  np.int32(471)): {'antecedent_support': 4.3819423660648363e-07, 'consequent_support': np.float64(0.0037461225287488283), 'confidence': np.float64(0.00011697274535033338)},
 ((np.int32(198),),
  np.int32(471)): {'antecedent_support': 4.3819423660648363e-07, 'consequent_support': np.float64(0.0037461225287488283), 'confidence': np.float64(0.00011697274535033338)},
 ((np.int32(43), np.int32(198)),
  np.int32(471)): {'antecedent_support': 4.3819423660648363e-07, 'consequent_support': np.float64(0.0037461225287488283), 'confidence': np.float64(0.00011697274535033338)},
 ((np.int32(0),), np.int

Finally the lift is calculated for the association rules

In [39]:
MIN_LIFT_THRESHOLD = 2

In [40]:
import numpy as np

lifts = {}

for rule, rule_data in association_rules.items():
	antecedents = rule[0]

	rule_item_supports = [supports[reverse_item_map[item]] for item in antecedents]
	lift = rule_data['antecedent_support'] / np.array(rule_item_supports).prod()

	if lift >= MIN_LIFT_THRESHOLD:
		lifts[rule] = lift

There are some significant rules

In [41]:
len(lifts)

108

A helper function to get the title of an item from its parent asin

In [42]:
def get_item_name_from_id(parent_asin):
	return items[items['parent_asin'] == parent_asin]['title'].unique()[0]

Display the significant rules

In [43]:
for rule, lift in lifts.items():
	antecedents = list(map(lambda idx: get_item_name_from_id(reverse_item_map[idx]), rule[0]))
	consequent = reverse_item_map[rule[1]]

	print(f'Rule ({antecedents}) => {get_item_name_from_id(consequent)} has a lift of {lift}')

Rule (['DualShock 4 Wireless Controller for PlayStation 4 - Jet Black', 'KONKY -PS4 Controller Charger Dock Station Controller Charging Dock Stand, USB Dual Charger Station Accessory with LED Indicator for Playstation 4 / PS4 Slim Pro and PSVR Controller, Black', 'Ratchet & Clank: Rift Apart - PlayStation 5']) => Ratchet & Clank: Rift Apart - [PlayStation 5] has a lift of 4.396487758237522
Rule (['DualShock 4 Wireless Controller for PlayStation 4 - Jet Black', 'KONKY -PS4 Controller Charger Dock Station Controller Charging Dock Stand, USB Dual Charger Station Accessory with LED Indicator for Playstation 4 / PS4 Slim Pro and PSVR Controller, Black', 'Gran Turismo Sport Hits - PlayStation 4']) => Ratchet & Clank: Rift Apart - [PlayStation 5] has a lift of 2.14693099678639
Rule (['DualShock 4 Wireless Controller for PlayStation 4 - Jet Black', 'KONKY -PS4 Controller Charger Dock Station Controller Charging Dock Stand, USB Dual Charger Station Accessory with LED Indicator for Playstation 4

#### Saving Lifts

The lifts will be saved to a Postgres table

In [70]:
lifts_table_entries = []

for rule, lift in lifts.items():
	antecedents = list(map(lambda idx: reverse_item_map[idx], rule[0]))
	consequent = reverse_item_map[rule[1]]

	lifts_table_entries.append((antecedents, consequent, lift))

lifts_table = pd.DataFrame(data=lifts_table_entries, columns=['antecedents', 'consequent', 'lift'])
lifts_table.head()

Unnamed: 0,antecedents,consequent,lift
0,"[B077GG9D5D, B01KXERLDQ, B095T8C99C]",B09ZNPDF7L,4.396488
1,"[B077GG9D5D, B01KXERLDQ, B0B5SWS9ZW]",B09ZNPDF7L,2.146931
2,"[B077GG9D5D, B01KXERLDQ, B00ZQB28XK]",B09ZNPDF7L,4.42696
3,"[B077GG9D5D, B095T8C99C, B00BGA9X9W]",B09ZNPDF7L,2.455496
4,"[B077GG9D5D, B095T8C99C, B0B5SWS9ZW]",B09ZNPDF7L,2.997493


In [72]:
from sqlalchemy.types import String, Float, ARRAY, TEXT

# Map the DataFrame's dtypes to SQLAlchemy column types
lifts_table_dtype = {
    'antecedents': ARRAY(TEXT),   # Store as TEXT[]
    'consequent': String(),
    'lift': Float(),  # Or Float() depending on dtype
}

In [73]:
from sqlalchemy import create_engine

# Create the SQLAlchemy engine and save data
engine = create_engine('postgresql://postgres:root@localhost:5432/AppForge')
lifts_table.to_sql('frequent_patterns', engine, if_exists='replace', index=False, dtype=lifts_table_dtype)

108

### Co-Visitation Matrix

Build a co-occurrence matrix of items frequently viewed or bought together

First, the number of items each user has reviewed are collected

In [48]:
users_review_count = reviews.groupby('user_id')['parent_asin'].count()
users_review_count

user_id
AE2225K3KY4D3KKSN6I2AHBVR4QQ    1
AE2225U265E7XPS3LRUYSJFT7OLQ    1
AE2227QSJSF6HV3XTTIGQAPA6WHQ    2
AE222GHCPAOQIDZDVHIMMGZ6WZSA    1
AE222HFZDH6BPTYFOUWGGU63YSIQ    7
                               ..
AHZZZX5LNNKLAZCTZNB6F445ZIIQ    1
AHZZZY2XVWEUJUTYPGGL4WXH6CSA    4
AHZZZYCUOTRYW4ZQFIFAEZGBOY4A    4
AHZZZYE2256FFHPFB54DUDOQL3IA    1
AHZZZYSH5MQZKLWDUVLMVQUVFIIQ    1
Name: parent_asin, Length: 2282093, dtype: int64

Only users with a minimum number of reviewed items will be kept

In [49]:
MIN_ITEMS = 2

retained_users = users_review_count[users_review_count >= MIN_ITEMS].index
filtered_users = reviews[reviews['user_id'].isin(retained_users)]

filtered_users

Unnamed: 0,timestamp,asin,title,user_id,images,text,parent_asin,rating
0,2020-12-17 06:33:24.795,B07DJWBYKP,It’s pretty sexual. Not my fav,AGCI7FAH4GL5FI65HYLKWTMFZ2CQ,[],I’m playing on ps5 and it’s interesting. It’s...,B07DK1H3H5,4
1,2020-04-16 15:31:54.941,B00ZS80PC2,Good. A bit slow,AGCI7FAH4GL5FI65HYLKWTMFZ2CQ,[],Nostalgic fun. A bit slow. I hope they don’t...,B07SRWRH5D,5
3,2019-12-29 16:40:34.017,B07GXJHRVK,Great alt to pro controller,AFTC6ZR5IKNRDG5JCPVNVMU3XV2Q,[],"These work great, They use batteries which is ...",B0BCHWZX95,5
4,2015-03-29 01:18:52.000,B00HUWA45W,solid product,AFTC6ZR5IKNRDG5JCPVNVMU3XV2Q,[],I would recommend to anyone looking to add jus...,B00HUWA45W,5
8,2021-05-19 18:24:30.253,B08L6782X9,Price bumps it up from 4 stars,AG6BAEKWLCWH2TW3KKLVK773YF6A,[],*it fits TWO wired Retro-bit 6 button controll...,B08L6782X9,5
...,...,...,...,...,...,...,...,...
3572394,2016-02-11 21:24:30.000,B00U6Y7ISC,Three Stars,AFM7PEZ36OGXMVRC5QPUAFMY372A,[],Not the american version i wanted.,B011ALF4XK,3
3572398,2016-03-27 16:25:18.000,B015NHBBOS,Brand new in box with super fast shipping,AF3ZRTABECSRE3PYB5EX4ESKX6KQ,[],Brand new in box with super fast shipping . ....,B015NHBBOS,5
3572399,2016-03-27 16:21:41.000,B015NGZFWS,Five Stars,AF3ZRTABECSRE3PYB5EX4ESKX6KQ,[],Great product and Great shipping. . . definit...,B015NGZFWS,5
3572406,2016-07-16 19:44:33.000,B015NHBBOS,Five Stars,AEYLVNS5EXYKHZDWJ4ZPLWU6JWCQ,[],Thank,B015NHBBOS,5


The average rating given by each user

In [50]:
user_avg_ratings = filtered_users.groupby('user_id')['rating'].mean()

Only items that exceed the user's average rating are kept

In [51]:
filtered_users = filtered_users.copy()
filtered_users['user_avg_rating'] = filtered_users['user_id'].map(user_avg_ratings)

In [52]:
filtered_users = filtered_users[filtered_users['rating'] >= filtered_users['user_avg_rating']]
filtered_users

Unnamed: 0,timestamp,asin,title,user_id,images,text,parent_asin,rating,user_avg_rating
1,2020-04-16 15:31:54.941,B00ZS80PC2,Good. A bit slow,AGCI7FAH4GL5FI65HYLKWTMFZ2CQ,[],Nostalgic fun. A bit slow. I hope they don’t...,B07SRWRH5D,5,4.50
3,2019-12-29 16:40:34.017,B07GXJHRVK,Great alt to pro controller,AFTC6ZR5IKNRDG5JCPVNVMU3XV2Q,[],"These work great, They use batteries which is ...",B0BCHWZX95,5,5.00
4,2015-03-29 01:18:52.000,B00HUWA45W,solid product,AFTC6ZR5IKNRDG5JCPVNVMU3XV2Q,[],I would recommend to anyone looking to add jus...,B00HUWA45W,5,5.00
8,2021-05-19 18:24:30.253,B08L6782X9,Price bumps it up from 4 stars,AG6BAEKWLCWH2TW3KKLVK773YF6A,[],*it fits TWO wired Retro-bit 6 button controll...,B08L6782X9,5,2.75
10,2018-04-09 22:47:52.900,B001AW156U,good,AG6BAEKWLCWH2TW3KKLVK773YF6A,[],"good, a tad bit too bulky, but good. they stay...",B001AW156U,4,2.75
...,...,...,...,...,...,...,...,...,...
3572393,2017-04-04 23:24:04.000,B01N357OCG,Good case but smelly,AFM7PEZ36OGXMVRC5QPUAFMY372A,[],"Very nice case, quite protective and it looks ...",B01N357OCG,4,3.50
3572398,2016-03-27 16:25:18.000,B015NHBBOS,Brand new in box with super fast shipping,AF3ZRTABECSRE3PYB5EX4ESKX6KQ,[],Brand new in box with super fast shipping . ....,B015NHBBOS,5,5.00
3572399,2016-03-27 16:21:41.000,B015NGZFWS,Five Stars,AF3ZRTABECSRE3PYB5EX4ESKX6KQ,[],Great product and Great shipping. . . definit...,B015NGZFWS,5,5.00
3572406,2016-07-16 19:44:33.000,B015NHBBOS,Five Stars,AEYLVNS5EXYKHZDWJ4ZPLWU6JWCQ,[],Thank,B015NHBBOS,5,5.00


The remaining users have their items placed into pairs

In [53]:
from itertools import combinations

def generate_all_pairs(items_df: pd.DataFrame):
	items = items_df['parent_asin'].unique()

	# Keeping above average items may have left 0 or 1 items so NA is returned
	if len(items) < MIN_ITEMS:
		return pd.NA
	
	return list(combinations(sorted(set(items)), 2))

item_pairs = filtered_users.groupby('user_id')[['parent_asin']].apply(generate_all_pairs)

In [54]:
item_pairs

user_id
AE2227QSJSF6HV3XTTIGQAPA6WHQ                                                 <NA>
AE222HFZDH6BPTYFOUWGGU63YSIQ    [(B01GY35T4S, B07QX99XJJ), (B01GY35T4S, B07SNN...
AE222HYUASC3KWAIUKZV2QISUX3A                           [(B082R1GMV2, B0C5K4M7WJ)]
AE222T2BWBACWZJFX62EIFSY67YA                           [(B077GG9D5D, B0791BVNLL)]
AE222X475JC6ONXMIKZDFGQ7IAUA                           [(B00MNP9PD8, B09TXR8F3C)]
                                                      ...                        
AHZZZF7F5TA2BMROXTQTMUNV53RQ                                                 <NA>
AHZZZIPM6S6HH56B5HGRVZQWIF5Q                           [(B08CY14VGD, B0C679SKTH)]
AHZZZOXVLHIR56MPV7TR35JTXY7Q                           [(B01KQDL4D2, B078N4Y2BW)]
AHZZZY2XVWEUJUTYPGGL4WXH6CSA                                                 <NA>
AHZZZYCUOTRYW4ZQFIFAEZGBOY4A    [(B001EYUX2Q, B00Z9TIGCG), (B001EYUX2Q, B09PNL...
Length: 585330, dtype: object

Extract users with valid item pairs

In [55]:
user_has_valid_item_pairs = item_pairs.notna()
valid_item_pairs_indices = user_has_valid_item_pairs.loc[user_has_valid_item_pairs == True].index

filtered_item_pairs = item_pairs.loc[valid_item_pairs_indices]
filtered_item_pairs

user_id
AE222HFZDH6BPTYFOUWGGU63YSIQ    [(B01GY35T4S, B07QX99XJJ), (B01GY35T4S, B07SNN...
AE222HYUASC3KWAIUKZV2QISUX3A                           [(B082R1GMV2, B0C5K4M7WJ)]
AE222T2BWBACWZJFX62EIFSY67YA                           [(B077GG9D5D, B0791BVNLL)]
AE222X475JC6ONXMIKZDFGQ7IAUA                           [(B00MNP9PD8, B09TXR8F3C)]
AE22323PICKFDFFMANILNBTZYXUA    [(B074JZPS39, B08G4FDHT7), (B074JZPS39, B0B7B7...
                                                      ...                        
AHZZYR74YSTBTJMWGZFMU5B6HWZA    [(B003O6G114, B0072HYRNK), (B003O6G114, B00E36...
AHZZZBOM3X5DXCWGJ27I2JEREBPA    [(B00DBDPOZ4, B00K5I323O), (B00DBDPOZ4, B00KSV...
AHZZZIPM6S6HH56B5HGRVZQWIF5Q                           [(B08CY14VGD, B0C679SKTH)]
AHZZZOXVLHIR56MPV7TR35JTXY7Q                           [(B01KQDL4D2, B078N4Y2BW)]
AHZZZYCUOTRYW4ZQFIFAEZGBOY4A    [(B001EYUX2Q, B00Z9TIGCG), (B001EYUX2Q, B09PNL...
Length: 398926, dtype: object

Count the occurence of each pair in the whole list

In [56]:
from collections import Counter

# Flatten list of lists into one big list of all item pairs
all_pairs = [pair for sublist in filtered_item_pairs for pair in sublist]

# Count frequencies of each pair
pair_counts = Counter(all_pairs)

In [57]:
pair_counts

Counter({('B01N3ASPNV', 'B087NNPYP3'): 417,
         ('B01N3ASPNV', 'B081243BT6'): 375,
         ('B077GG9D5D', 'B07PZ8NZSZ'): 319,
         ('B087NN2K41', 'B087NNPYP3'): 307,
         ('B01N3ASPNV', 'B07624RBWB'): 295,
         ('B087NNPYP3', 'B087SHFL9B'): 284,
         ('B01N3ASPNV', 'B087NN2K41'): 238,
         ('B01LRLJV28', 'B077GG9D5D'): 227,
         ('B087NN2K41', 'B087SHFL9B'): 222,
         ('B07624RBWB', 'B087NNPYP3'): 221,
         ('B01N3ASPNV', 'B087SHFL9B'): 197,
         ('B004RMK5QG', 'B077GG9D5D'): 196,
         ('B01N3ASPNV', 'B0C3KYVDWT'): 183,
         ('B016XBGWAQ', 'B01MG8P418'): 167,
         ('B004RMK57U', 'B004RMK5QG'): 161,
         ('B01N3ASPNV', 'B07CV6LH3V'): 158,
         ('B00NETKFJU', 'B00NEU3Z1Y'): 157,
         ('B013OW09WY', 'B014GWNTCI'): 157,
         ('B00BGA9WK2', 'B00BGA9X9W'): 154,
         ('B07624RBWB', 'B087NN2K41'): 149,
         ('B01N3ASPNV', 'B08D3XL1KF'): 148,
         ('B00NETKFJU', 'B00NEU02JW'): 143,
         ('B01N3ASPNV', 'B072V47

The first and second item of all pairs are extracted

In [58]:
first_items, second_items = zip(*pair_counts.keys())

A table is created for `pair_counts`

In [59]:
pair_counts_df = pd.DataFrame(data=list(zip(first_items, second_items, pair_counts.values())), columns=['item_one', 'item_two', 'num_count'])
pair_counts_df

Unnamed: 0,item_one,item_two,num_count
0,B01GY35T4S,B07QX99XJJ,2
1,B01GY35T4S,B07SNN8GV5,3
2,B01GY35T4S,B07STYRBY5,1
3,B01GY35T4S,B082R1RGZF,1
4,B01GY35T4S,B0BJVLBNCP,1
...,...,...,...
2332809,B00K5I323O,B00KSVXSZU,1
2332810,B08CY14VGD,B0C679SKTH,1
2332811,B01KQDL4D2,B078N4Y2BW,1
2332812,B001EYUX2Q,B00Z9TIGCG,1


There are many pairs that only occur once in the counter so a minimum number needed is set

In [60]:
MIN_COUNT = 5

In [62]:
pair_counts_df = pair_counts_df[pair_counts_df['num_count'] >= 5].sort_values(by='num_count', ascending=False)
pair_counts_df

Unnamed: 0,item_one,item_two,num_count
6135,B01N3ASPNV,B087NNPYP3,417
13909,B01N3ASPNV,B081243BT6,375
7508,B077GG9D5D,B07PZ8NZSZ,319
4564,B087NN2K41,B087NNPYP3,307
8740,B01N3ASPNV,B07624RBWB,295
...,...,...,...
1705541,B01KBL4ISW,B06XG4ZDSL,5
1705558,B002DZKZ5K,B07N1XKY1L,5
1718362,B010KYDNDG,B01LXC1QL0,5
1721947,B01MDUYKDO,B0B96RSG2Y,5


#### Saving Counts

In [63]:
from sqlalchemy import create_engine

# Create the SQLAlchemy engine and save data
engine = create_engine('postgresql://postgres:root@localhost:5432/AppForge')
pair_counts_df.to_sql('pair_counts', engine, if_exists='replace', index=False)

359