![HSV-AI Logo](https://github.com/HSV-AI/hugo-website/blob/master/static/images/logo_v9.png?raw=true)

# Implicit Recommendation from ECommerce Data

Some of the material for this work is based on [A Gentle Introduction to Recommender Systems with Implicit Feedback](https://jessesw.com/Rec-System/) by Jesse Steinweg Woods. This tutorial includes an implementation of the Alternating Least Squares algorithm and some other useful functions (like the area under the curve calculation). Other parts of the tutorial are based on a previous version of the Implicit library and had to be reworked.

The dataset used for this work is from Kaggle [Vipin Kumar Transaction Data](https://www.kaggle.com/vipin20/transaction-data):

## Context

This is a item purchased transactions data. It has 8 columns.
This data makes you familer with transactions data.

## Content

Data description is :-

* UserId -It is a unique ID for all User Id
* TransactionId -It contains unique Transactions ID
* TransactionTime -It contains Transaction Time
* ItemCode -It contains item code that item will be purchased
* ItemDescription -It contains Item description
* NumberOfItemPurchased -It contains total number of items Purchased
* CostPerltem -Cost per item Purchased
* Country -Country where item purchased


# Global Imports

In [8]:
import pandas as pd
import numpy as np
import random
from matplotlib import pyplot as plt
import implicit
import scipy
from sklearn import metrics
from pandas.api.types import CategoricalDtype

In [9]:
%reload_kedro

2022-02-03 23:25:05,319 - kedro.framework.session.store - INFO - `read()` not implemented for `BaseSessionStore`. Assuming empty store.
2022-02-03 23:25:05,352 - root - INFO - ** Kedro project productrec
2022-02-03 23:25:05,353 - root - INFO - Defined global variable `context`, `session` and `catalog`
2022-02-03 23:25:05,359 - root - INFO - Registered line magic `run_viz`


In [10]:
%run Common-Functions.ipynb

In [11]:
catalog.list()

['products',
 'transactions',
 'clean_transaction',
 'train',
 'test',
 'products_altered',
 'transactions_altered',
 'user_factors',
 'item_factors',
 'product_train',
 'score',
 'brazilian_kaggle_order_data',
 'brazilian_kaggle_product_data',
 'brazilian_kaggle_customer_data',
 'brazilian_hyperparameters',
 'ecommerce_kaggle_data',
 'ecommerce_hyperparameters',
 'electronics_kaggle_data',
 'electronics_hyperparameters',
 'instacart_kaggle_order_data',
 'instacart_kaggle_product_data',
 'instacart_hyperparameters',
 'jewelry_kaggle_data',
 'jewelry_hyperparameters',
 'journey_kaggle_transaction_data',
 'journey_kaggle_product_data',
 'journey_hyperparameters',
 'retailrocket_kaggle_event_data',
 'retailrocket_hyperparameters',
 'vipin20_kaggle_data',
 'vipin20_hyperparameters',
 'parameters',
 'params:example_test_data_ratio',
 'params:example_num_train_iter',
 'params:example_learning_rate',
 'params:alpha',
 'params:factors',
 'params:regularization',
 'params:iterations']

In [12]:
transactions = catalog.load("transactions")
print('Loaded',len(transactions),'rows')
transactions.head()

2022-02-03 23:25:14,042 - kedro.io.data_catalog - INFO - Loading data from `transactions` (CSVDataSet)...
Loaded 1016476 rows


Unnamed: 0,order_id,product_id,description,quantity
0,6355745,465549,FAMILY ALBUM WHITE PICTURE FRAME,6
1,6283376,482370,LONDON BUS COFFEE MUG,3
2,6385599,490728,SET 12 COLOUR PENCILS DOLLY GIRL,72
3,6044973,459186,UNION JACK FLAG LUGGAGE TAG,3
4,6143225,1733592,WASHROOM METAL SIGN,3


In [71]:
transaction_list = list(np.sort(transactions.order_id.unique())) # Get our unique customers
item_list = list(transactions.product_id.unique()) # Get our unique products that were purchased
quantity_list = list(transactions.quantity) # All of our purchases

rows = transactions.order_id.astype(CategoricalDtype(categories=transaction_list, ordered=True)).cat.codes 
# Get the associated row indices
cols = transactions.product_id.astype(CategoricalDtype(categories=item_list, ordered=True)).cat.codes 
# Get the associated column indices
purchases_sparse = scipy.sparse.csr_matrix((quantity_list, (rows, cols)), shape=(len(transaction_list), len(item_list)))

In [72]:
matrix_size = purchases_sparse.shape[0]*purchases_sparse.shape[1] # Number of possible interactions in the matrix
num_purchases = len(purchases_sparse.nonzero()[0]) # Number of items interacted with
sparsity = 100*(1 - (num_purchases/matrix_size))
sparsity

99.21026329374317

# Training & Test Datasets

We will use the function below to create a training and test dataset from the tutorial linked at the top. The test dataset masks some percentage of purchases to tested later with a recommendation.

In [73]:
product_train, product_test, products_altered, transactions_altered = make_train(purchases_sparse, pct_test = 0.211)
print('Total number of masked items:',product_test.count_nonzero()-product_train.count_nonzero())


Total number of masked items: 102617


# Implicit Recommendation Model

The code below creates and trains one of the models available from the Implicit package. Currently using hyperparameters suggested by various tutorials with no tuning.

In [74]:
alpha = 3
factors = 256
regularization = 0.003
iterations = 200

model = implicit.als.AlternatingLeastSquares(factors=factors,
                                    regularization=regularization,
                                    iterations=iterations, calculate_training_loss=True)

## BayesianPersonalizedRanking was pretty bad
# model = implicit.bpr.BayesianPersonalizedRanking(factors=31,
#                                     regularization=0.1,
#                                     iterations=50)


# model = implicit.lmf.LogisticMatrixFactorization(factors=32,
#                                     regularization=0.1,
#                                     iterations=50)

model.fit((product_train * alpha).astype('double'))

user_vecs = model.to_cpu().user_factors
item_vecs = model.to_cpu().item_factors

# Deprecated function below
# user_vecs, item_vecs = implicit.alternating_least_squares((product_train*alpha).astype('double'), 
#                                                           factors=32, 
#                                                           regularization = 0.1, 
#                                                           iterations = 50)

  0%|          | 0/200 [00:00<?, ?it/s]

2022-02-04 00:36:51,976 - implicit - INFO - Final training loss 0.0085


In [75]:
print(model.user_factors.shape)
print(model.item_factors.shape)

(18995, 256)
(3242, 256)


In [76]:
catalog.save("user_factors", user_vecs)
catalog.save("item_factors", item_vecs)
catalog.save("product_train", product_train*alpha)

2022-02-04 00:37:50,890 - kedro.io.data_catalog - INFO - Saving data to `user_factors` (PickleDataSet)...
2022-02-04 00:37:50,906 - kedro.io.data_catalog - INFO - Saving data to `item_factors` (PickleDataSet)...
2022-02-04 00:37:50,915 - kedro.io.data_catalog - INFO - Saving data to `product_train` (PickleDataSet)...


# Scoring the Model

Following the tutorial, we will use the area under the Receiver Operating Characteristic curve. 

In [77]:
test, popular = calc_mean_auc(product_train, products_altered, 
              [scipy.sparse.csr_matrix(item_vecs), scipy.sparse.csr_matrix(user_vecs.T)], product_test)


print('Our model scored',test,'versus a score of',popular,'if we always recommended the most popular item.')

IndexError: row index (3242) out of range

# Spot Checking

Now that we have a pretty good idea of the model performance overall, we can spot check a few things like finding similar items and checking item recommendations for an existing invoice.

In [78]:
item_lookup = catalog.load("products")
item_lookup.head()

2022-02-04 00:39:11,520 - kedro.io.data_catalog - INFO - Loading data from `products` (CSVDataSet)...


Unnamed: 0,product_id,description
0,465549,FAMILY ALBUM WHITE PICTURE FRAME
1,482370,LONDON BUS COFFEE MUG
2,490728,SET 12 COLOUR PENCILS DOLLY GIRL
3,459186,UNION JACK FLAG LUGGAGE TAG
4,1733592,WASHROOM METAL SIGN


In [79]:
item_lookup

Unnamed: 0,product_id,description
0,465549,FAMILY ALBUM WHITE PICTURE FRAME
1,482370,LONDON BUS COFFEE MUG
2,490728,SET 12 COLOUR PENCILS DOLLY GIRL
3,459186,UNION JACK FLAG LUGGAGE TAG
4,1733592,WASHROOM METAL SIGN
...,...,...
3973,1784391,LARGE HEART FLOWERS HOOK
3974,1665783,PINK CHERRY LIGHTS
3975,1893738,PURPLE CHUNKY GLASS+BEAD NECKLACE
3976,752493,4 GOLD FLOCK CHRISTMAS BALLS


In [81]:
related = model.similar_items(1284)
print(related)
print(len(item_list))
for index, prob in zip(related[0], related[1]):
    print("Index {}".format(index))
    product = item_list[index]
    print("Product {}".format(product))
    item = item_lookup[item_lookup.product_id == item_list[index]].values
    print("{} {}% {}".format(index, prob, item[0][1]))



(array([1284, 1222,  602, 1372, 1089, 1878, 2049, 1979,  606, 2000],
      dtype=int32), array([1.0000001 , 0.7263095 , 0.6888725 , 0.6261926 , 0.35959667,
       0.19607987, 0.19275524, 0.16532014, 0.16308948, 0.15900104],
      dtype=float32))
3242
Index 1284
Product 489153
1284 1.0000001192092896% SET OF 12 FAIRY CAKE BAKING CASES
Index 1222
Product 489195
1222 0.7263094782829285% SET OF 12 MINI LOAF BAKING CASES
Index 602
Product 489174
602 0.6888725161552429% SET OF 6 SNACK LOAF BAKING CASES
Index 1372
Product 489216
1372 0.6261926293373108% SET OF 6 TEA TIME BAKING CASES
Index 1089
Product 489237
1089 0.3595966696739197% SET 40 HEART SHAPE PETIT FOUR CASES
Index 1878
Product 786702
1878 0.19607986509799957% PET MUG, GOLDFISH
Index 2049
Product 495894
2049 0.19275523722171783% SET 10 CARDS 12 DAYS WRAP  17058
Index 1979
Product 491799
1979 0.16532014310359955% HOME SWEET HOME BOTTLE 
Index 606
Product 474537
606 0.16308948397636414% MUSICAL ZINC HEART DECORATION 
Index 2000
Produc

In [104]:
user_items = (product_train * alpha).astype('double').T.tocsr()
def recommend(order):
    products = transactions[transactions.order_id == transaction_list[order]]['product_id'].values
    indeces = []
    for product in list(products):
        index = item_list.index(product)
        indeces.append(index)

    print('Order Contents:')
    print(transactions[transactions.order_id == transaction_list[order]].loc[:, ['product_id', 'description']])
    print('Recommendations:')
    recommendations = model.recommend(order, user_items[order], filter_items=indeces)
    print(recommendations)
    for index, prob in zip(recommendations[0], recommendations[1]):
        product = item_list[index]
        item = item_lookup[item_lookup.product_id == item_list[index]].values
        print("{} {}% {}".format(index, prob, item[0][1]))


In [105]:
recommend(776)

Order Contents:
        product_id                          description
1743        458052        GARLAND WITH HEARTS AND BELLS
5675        465129           FELTCRAFT 6 FLOWER FRIENDS
20456       445032          WHITE BELL HONEYCOMB PAPER 
21512       481740            FELTCRAFT CHRISTMAS FAIRY
44683       464961       CHRISTMAS CRAFT TREE TOP ANGEL
...            ...                                  ...
894856      470883             REGENCY CAKESTAND 3 TIER
959898      464961       CHRISTMAS CRAFT TREE TOP ANGEL
965005      359247  FLOWER FAIRY,5 SUMMER B'DRAW LINERS
972343      467712                 FELTCRAFT DOLL MARIA
998123      481740            FELTCRAFT CHRISTMAS FAIRY

[74 rows x 2 columns]
Recommendations:
(array([ 306,  297, 1750,  885,  270,  338, 1157, 2607,  870,  761],
      dtype=int32), array([0.7320588 , 0.6685181 , 0.6154464 , 0.61283517, 0.6091131 ,
       0.59640044, 0.56918526, 0.5569538 , 0.5539358 , 0.5538241 ],
      dtype=float32))
306 0.7320588231086731% PA

In [106]:
recommend(2200)

Order Contents:
         product_id                          description
12129        470694         DOORMAT NEIGHBOURHOOD WITCH 
24238        446019                   PINK  POLKADOT CUP
26924        447573          GLASS HEART T-LIGHT HOLDER 
37213        465570   PICTURE FRAME WOOD TRIPLE PORTRAIT
100714       469686                     DOORMAT AIRMAIL 
109737      1787079       JUMBO  BAG BAROQUE BLACK WHITE
133568      1011948              DOORMAT WELCOME PUPPIES
135397       316218         PINK POLKADOT GARDEN PARASOL
138833       477120    SET OF 3 CAKE TINS PANTRY DESIGN 
141468       446040                    BLUE POLKADOT CUP
142390       434973                       JUMBO BAG OWLS
151244       434973                       JUMBO BAG OWLS
163737       450030   SET/3 RED GINGHAM ROSE STORAGE BOX
203681       446019                   PINK  POLKADOT CUP
248822       451185     RETROSPOT HEART HOT WATER BOTTLE
254921      1787079       JUMBO  BAG BAROQUE BLACK WHITE
325261       46

In [None]:
transactions['total'] = transactions['quantity'] * transactions['price']

In [None]:
recommended_price = []
for user in range(0, len(transaction_list)):
    recommendations = model.recommend(user, user_items)
    index = recommendations[0][0]
    price = price_lookup[price_lookup.ItemCode == str(item_list[index])].values
    item = item_lookup[item_lookup.ItemCode == str(item_list[index])].values
    recommended_price.append(price[0][1])
    
total_recommended = np.sum(recommended_price)

print('After recommending',len(transaction_list),'items, there would be an increase of',
      "${:,.2f}".format(total_recommended*test),'in additional purchases.')

In [None]:
totals = transactions.groupby(transactions.TransactionId)['ItemTotal'].sum()
total = totals.sum()

print('Added to the initial total of all',len(transaction_list),'purchases valued at',
      "${:,.2f}".format(total),', the percentage increase in revenue would be', "{:,.4f}%".format(total_recommended / total ))
