![HSV-AI Logo](https://github.com/HSV-AI/hugo-website/blob/master/static/images/logo_v9.png?raw=true)

# Implicit Recommendation from ECommerce Data

Some of the material for this work is based on [A Gentle Introduction to Recommender Systems with Implicit Feedback](https://jessesw.com/Rec-System/) by Jesse Steinweg Woods. This tutorial includes an implementation of the Alternating Least Squares algorithm and some other useful functions (like the area under the curve calculation). Other parts of the tutorial are based on a previous version of the Implicit library and had to be reworked.

The dataset used for this work is from Kaggle [Vipin Kumar Transaction Data](https://www.kaggle.com/vipin20/transaction-data):

## Context

This is a item purchased transactions data. It has 8 columns.
This data makes you familer with transactions data.

## Content

Data description is :-

* UserId -It is a unique ID for all User Id
* TransactionId -It contains unique Transactions ID
* TransactionTime -It contains Transaction Time
* ItemCode -It contains item code that item will be purchased
* ItemDescription -It contains Item description
* NumberOfItemPurchased -It contains total number of items Purchased
* CostPerltem -Cost per item Purchased
* Country -Country where item purchased


# Global Imports

In [1]:
import pandas as pd
import numpy as np
import random
from matplotlib import pyplot as plt
import implicit
import scipy
from sklearn import metrics
from pandas.api.types import CategoricalDtype

In [2]:
%run Common-Functions.ipynb

In [3]:
transactions = catalog.load("vipin20_transactions")
print('Loaded',len(transactions),'rows')
transactions.head()

2021-09-15 23:23:40,593 - kedro.io.data_catalog - INFO - Loading data from `vipin20_transactions` (CSVDataSet)...
Loaded 1016476 rows


Unnamed: 0,order_id,product_id,description,quantity
0,6355745,465549,FAMILY ALBUM WHITE PICTURE FRAME,6
1,6283376,482370,LONDON BUS COFFEE MUG,3
2,6385599,490728,SET 12 COLOUR PENCILS DOLLY GIRL,72
3,6044973,459186,UNION JACK FLAG LUGGAGE TAG,3
4,6143225,1733592,WASHROOM METAL SIGN,3


In [4]:
transaction_list = list(np.sort(transactions.order_id.unique())) # Get our unique customers
item_list = list(transactions.product_id.unique()) # Get our unique products that were purchased
quantity_list = list(transactions.quantity) # All of our purchases

cols = transactions.order_id.astype(CategoricalDtype(categories=transaction_list, ordered=True)).cat.codes 
# Get the associated row indices
rows = transactions.product_id.astype(CategoricalDtype(categories=item_list, ordered=True)).cat.codes 
# Get the associated column indices
purchases_sparse = scipy.sparse.csr_matrix((quantity_list, (rows, cols)), shape=(len(item_list), len(transaction_list)))

In [5]:
matrix_size = purchases_sparse.shape[0]*purchases_sparse.shape[1] # Number of possible interactions in the matrix
num_purchases = len(purchases_sparse.nonzero()[0]) # Number of items interacted with
sparsity = 100*(1 - (num_purchases/matrix_size))
sparsity

99.21026329374317

# Training & Test Datasets

We will use the function below to create a training and test dataset from the tutorial linked at the top. The test dataset masks some percentage of purchases to tested later with a recommendation.

In [6]:
product_train, product_test, products_altered, transactions_altered = make_train(purchases_sparse, pct_test = 0.211)
print('Total number of masked items:',product_test.count_nonzero()-product_train.count_nonzero())


Total number of masked items: 102617


# Implicit Recommendation Model

The code below creates and trains one of the models available from the Implicit package. Currently using hyperparameters suggested by various tutorials with no tuning.

In [7]:
alpha = 3
factors = 64
regularization = 0.003
iterations = 84

model = implicit.als.AlternatingLeastSquares(factors=factors,
                                    regularization=regularization,
                                    iterations=iterations)

## BayesianPersonalizedRanking was pretty bad
# model = implicit.bpr.BayesianPersonalizedRanking(factors=31,
#                                     regularization=0.1,
#                                     iterations=50)


# model = implicit.lmf.LogisticMatrixFactorization(factors=32,
#                                     regularization=0.1,
#                                     iterations=50)

model.fit((product_train * alpha).astype('double'))

user_vecs = model.user_factors
item_vecs = model.item_factors

# Deprecated function below
# user_vecs, item_vecs = implicit.alternating_least_squares((product_train*alpha).astype('double'), 
#                                                           factors=32, 
#                                                           regularization = 0.1, 
#                                                           iterations = 50)



  0%|          | 0/84 [00:00<?, ?it/s]

In [8]:
print(model.user_factors)

[[  3.846817    -3.093744    -0.6750135  ...  -0.42854515   2.4837477
   -2.6642807 ]
 [  2.9065492    1.010623     0.42633152 ...  -0.06613469   2.6422431
   -1.4846625 ]
 [ -2.2314541   -0.6588304   -2.3950474  ...  -4.3340955    0.6007222
   -3.8372843 ]
 ...
 [  8.902557     4.297842   -11.967431   ...  -6.265571    -0.6548658
    6.30774   ]
 [ -4.344634     3.8334672    4.2198963  ...  -1.7165313   -0.27013808
    1.4491554 ]
 [  0.3448894    3.838586    -6.0606213  ...   8.618336     1.1179744
    1.3991079 ]]


In [9]:
catalog.save("vipin20_user_factors", user_vecs)
catalog.save("vipin20_item_factors", item_vecs)
catalog.save("vipin20_product_train", product_train*alpha)

2021-09-15 23:25:44,787 - kedro.io.data_catalog - INFO - Saving data to `vipin20_user_factors` (PickleDataSet)...
2021-09-15 23:25:44,866 - kedro.io.data_catalog - INFO - Saving data to `vipin20_item_factors` (PickleDataSet)...
2021-09-15 23:25:44,886 - kedro.io.data_catalog - INFO - Saving data to `vipin20_product_train` (PickleDataSet)...


# Scoring the Model

Following the tutorial, we will use the area under the Receiver Operating Characteristic curve. 

In [10]:
test, popular = calc_mean_auc(product_train, products_altered, 
              [scipy.sparse.csr_matrix(item_vecs), scipy.sparse.csr_matrix(user_vecs.T)], product_test)


print('Our model scored',test,'versus a score of',popular,'if we always recommended the most popular item.')

Our model scored 0.8779849223896371 versus a score of 0.8038330946271777 if we always recommended the most popular item.


# Spot Checking

Now that we have a pretty good idea of the model performance overall, we can spot check a few things like finding similar items and checking item recommendations for an existing invoice.

In [11]:
item_lookup = catalog.load("vipin20_products")
item_lookup.head()

2021-09-15 23:26:41,558 - kedro.io.data_catalog - INFO - Loading data from `vipin20_products` (CSVDataSet)...


Unnamed: 0,product_id,description
0,465549,FAMILY ALBUM WHITE PICTURE FRAME
1,482370,LONDON BUS COFFEE MUG
2,490728,SET 12 COLOUR PENCILS DOLLY GIRL
3,459186,UNION JACK FLAG LUGGAGE TAG
4,1733592,WASHROOM METAL SIGN


In [12]:
related = model.similar_items(1284)
for rel in related:
    index = rel[0]
    prob = rel[1]
    item = item_lookup[item_lookup.product_id == item_list[index]].values
    print(prob, item[0][1])

1.0000001 SET OF 12 FAIRY CAKE BAKING CASES
0.9606847 SET OF 6 SNACK LOAF BAKING CASES
0.95668566 SET OF 12 MINI LOAF BAKING CASES
0.94624066 SET OF 6 TEA TIME BAKING CASES
0.65419 SET 40 HEART SHAPE PETIT FOUR CASES
0.5878977 SET OF 36 DOILIES PANTRY DESIGN
0.5415114 PANTRY MAGNETIC  SHOPPING LIST
0.52832097 PLACE SETTING WHITE STAR
0.52668613 SET OF 60 PANTRY DESIGN CAKE CASES 
0.51879877 SET OF 60 VINTAGE LEAF CAKE CASES 


In [13]:
user_items = (product_train * alpha).astype('double').T.tocsr()
def recommend(order):
    print('Order Contents:')
    print(transactions[transactions.order_id == transaction_list[order]].loc[:, ['product_id', 'description']])
    print('Recommendations:')
    recommendations = model.recommend(order, user_items)
    for rec in recommendations:
        index = rec[0]
        prob = rec[1]
        stock_code = item_list[index]
        item = item_lookup[item_lookup.product_id == item_list[index]].values
        print(prob, stock_code, item[0][1])

In [14]:
recommend(1)

Order Contents:
        product_id                description
80357       475272  HAND WARMER RED POLKA DOT
586706      475293     HAND WARMER UNION JACK
658425      475293     HAND WARMER UNION JACK
958566      475272  HAND WARMER RED POLKA DOT
Recommendations:
0.8014458 475293 HAND WARMER UNION JACK
0.7894545 480165 HAND WARMER OWL DESIGN
0.77924496 480186 HAND WARMER SCOTTY DOG DESIGN
0.77875805 480207 HAND WARMER BIRD DESIGN
0.6991601 492219 HAND WARMER RED LOVE HEART
0.49658978 479514 HAND WARMER BABUSHKA DESIGN
0.41598326 464394 HOT WATER BOTTLE TEA AND SYMPATHY
0.40100765 464331 SCOTTIE DOG HOT WATER BOTTLE
0.397192 1764609 KNITTED UNION FLAG HOT WATER BOTTLE
0.3851803 490455 HOT WATER BOTTLE KEEP CALM


In [15]:
recommend(2200)

Order Contents:
         product_id                          description
12129        470694         DOORMAT NEIGHBOURHOOD WITCH 
24238        446019                   PINK  POLKADOT CUP
26924        447573          GLASS HEART T-LIGHT HOLDER 
37213        465570   PICTURE FRAME WOOD TRIPLE PORTRAIT
100714       469686                     DOORMAT AIRMAIL 
109737      1787079       JUMBO  BAG BAROQUE BLACK WHITE
133568      1011948              DOORMAT WELCOME PUPPIES
135397       316218         PINK POLKADOT GARDEN PARASOL
138833       477120    SET OF 3 CAKE TINS PANTRY DESIGN 
141468       446040                    BLUE POLKADOT CUP
142390       434973                       JUMBO BAG OWLS
151244       434973                       JUMBO BAG OWLS
163737       450030   SET/3 RED GINGHAM ROSE STORAGE BOX
203681       446019                   PINK  POLKADOT CUP
248822       451185     RETROSPOT HEART HOT WATER BOTTLE
254921      1787079       JUMBO  BAG BAROQUE BLACK WHITE
325261       46

In [36]:
transactions['total'] = transactions['quantity'] * transactions['price']

In [17]:
recommended_price = []
for user in range(0, len(transaction_list)):
    recommendations = model.recommend(user, user_items)
    index = recommendations[0][0]
    price = price_lookup[price_lookup.ItemCode == str(item_list[index])].values
    item = item_lookup[item_lookup.ItemCode == str(item_list[index])].values
    recommended_price.append(price[0][1])
    
total_recommended = np.sum(recommended_price)

print('After recommending',len(transaction_list),'items, there would be an increase of',
      "${:,.2f}".format(total_recommended*test),'in additional purchases.')

After recommending 18995 items, there would be an increase of $74,043.98 in additional purchases.


In [19]:
totals = transactions.groupby(transactions.TransactionId)['ItemTotal'].sum()
total = totals.sum()

print('Added to the initial total of all',len(transaction_list),'purchases valued at',
      "${:,.2f}".format(total),', the percentage increase in revenue would be', "{:,.4f}%".format(total_recommended / total ))


Added to the initial total of all 18995 purchases valued at $59,859,564.72 , the percentage increase in revenue would be 0.0014%
