## Sparse User-Item Matrix

### Load Datasets

In [1]:
import pandas as pd

reviews = pd.read_csv('../datasets/slimmed/reviews.csv')
items = pd.read_csv('../datasets/slimmed/items.csv')

### Creating Sparse Matrix

A sparse matrix where the user_ids are rows, item_ids are columns, and ratings are the elements is created

In [2]:
from scipy.sparse import coo_matrix

# Map user_id and item_id to index-based values
user_map = {u: i for i, u in enumerate(reviews['user_id'].unique())}
item_map = {i: j for j, i in enumerate(items['parent_asin'].tolist())}

user_idx = reviews['user_id'].map(user_map)
item_idx = reviews['parent_asin'].map(item_map)
ratings = reviews['rating'].astype('float')

# Create a sparse matrix
sparse_matrix = coo_matrix((ratings, (user_idx, item_idx)))

# Convert to CSR format for efficiency in operations
sparse_matrix_csr = sparse_matrix.tocsr()

The sparsity of the matrix is *99.9987335557076*%

In [3]:
count_non_zero = sparse_matrix.count_nonzero()
size = sparse_matrix.shape[0] * sparse_matrix.shape[1]

print(f'{(100 - ((count_non_zero / size) * 100)):2f}')

99.998734


A reverse mapping will have to be created to go from indices to ids

In [4]:
reverse_user_map = {v:k for k, v in user_map.items()}
reverse_item_map = {v:k for k, v in item_map.items()}

A binary version of the sparse matrix where non-zero ratings are set to 1

In [5]:
# Ratings are 0s or 1s
ratings_binary = ratings.astype(bool).astype(int)

# Create a binary sparse matrix
sparse_matrix_binary = coo_matrix((ratings_binary, (user_idx, item_idx)))

# Convert to CSR format for efficiency in operations
sparse_matrix_csr_binary = sparse_matrix_binary.tocsr()

The mappings have to be saved for future working

In [6]:
item_map_series = pd.Series(reverse_item_map)
item_map_series.name = 'parent_asin'

item_map_series

0         B00069EVOG
1         B00Z9TLVK0
2         B07SZJZV88
3         B0001ZNU56
4         B07H93H878
             ...    
121815    B014RXTSDK
121816    B07JDT455V
121817    B09XQJS4CZ
121818    B07DGPTGNV
121819    B00HUWCQBW
Name: parent_asin, Length: 121820, dtype: object

In [7]:
item_map_series.to_csv('../datasets/mappings/item_map.csv', index_label='als_id', header=True)

In [8]:
item_map = pd.read_csv('../datasets/mappings/item_map.csv', index_col='als_id')

In [9]:
item_map

Unnamed: 0_level_0,parent_asin
als_id,Unnamed: 1_level_1
0,B00069EVOG
1,B00Z9TLVK0
2,B07SZJZV88
3,B0001ZNU56
4,B07H93H878
...,...
121815,B014RXTSDK
121816,B07JDT455V
121817,B09XQJS4CZ
121818,B07DGPTGNV


In [10]:
item_map.loc[440]

parent_asin    B07232FS95
Name: 440, dtype: object

In [11]:
items[items['parent_asin'] == 'B07232FS95']

Unnamed: 0,title,features,description,videos,details,images,parent_asin,categories,average_rating,rating_number,main_category,store,price
440,Carrying Case for Nintendo Switch,['Excellent protection: Hard EVA shell keeps N...,[],[],{'Pricing': 'The strikethrough price is the Li...,[{'thumb': 'https://m.media-amazon.com/images/...,B07232FS95,"['Video Games', 'Nintendo Switch', 'Accessorie...",3.2,11,Video Games,hahage,
