![HSV-AI Logo](https://github.com/HSV-AI/hugo-website/blob/master/static/images/logo_v9.png?raw=true)

# Implicit Recommendation from ECommerce Data

Some of the material for this work is based on [A Gentle Introduction to Recommender Systems with Implicit Feedback](https://jessesw.com/Rec-System/) by Jesse Steinweg Woods. This tutorial includes an implementation of the Alternating Least Squares algorithm and some other useful functions (like the area under the curve calculation). Other parts of the tutorial are based on a previous version of the Implicit library and had to be reworked.

The dataset used for this work is from Kaggle [Vipin Kumar Transaction Data](https://www.kaggle.com/vipin20/transaction-data):

## Context

This is a item purchased transactions data. It has 8 columns.
This data makes you familer with transactions data.

## Content

Data description is :-

* UserId -It is a unique ID for all User Id
* TransactionId -It contains unique Transactions ID
* TransactionTime -It contains Transaction Time
* ItemCode -It contains item code that item will be purchased
* ItemDescription -It contains Item description
* NumberOfItemPurchased -It contains total number of items Purchased
* CostPerltem -Cost per item Purchased
* Country -Country where item purchased


# Global Imports

In [1]:
import pandas as pd
import numpy as np
import random
from matplotlib import pyplot as plt
import implicit
import scipy
from sklearn import metrics
from pandas.api.types import CategoricalDtype

In [2]:
%run Common-Functions.ipynb

In [3]:
transactions = pd.read_pickle('../data/interim/vipin20/transactions.gz')
print('Loaded',len(transactions),'rows')

Loaded 1016476 rows


In [4]:
transaction_list = list(np.sort(transactions.TransactionId.unique())) # Get our unique customers
item_list = list(transactions.ItemCode.unique()) # Get our unique products that were purchased
quantity_list = list(transactions.NumberOfItemsPurchased) # All of our purchases

cols = transactions.TransactionId.astype(CategoricalDtype(categories=transaction_list, ordered=True)).cat.codes 
# Get the associated row indices
rows = transactions.ItemCode.astype(CategoricalDtype(categories=item_list, ordered=True)).cat.codes 
# Get the associated column indices
purchases_sparse = scipy.sparse.csr_matrix((quantity_list, (rows, cols)), shape=(len(item_list), len(transaction_list)))

In [5]:
matrix_size = purchases_sparse.shape[0]*purchases_sparse.shape[1] # Number of possible interactions in the matrix
num_purchases = len(purchases_sparse.nonzero()[0]) # Number of items interacted with
sparsity = 100*(1 - (num_purchases/matrix_size))
sparsity

99.21026329374317

# Training & Test Datasets

We will use the function below to create a training and test dataset from the tutorial linked at the top. The test dataset masks some percentage of purchases to tested later with a recommendation.

In [6]:
product_train, product_test, products_altered, transactions_altered = make_train(purchases_sparse, pct_test = 0.211)
print('Total number of masked items:',product_test.count_nonzero()-product_train.count_nonzero())


Total number of masked items: 102617


# Implicit Recommendation Model

The code below creates and trains one of the models available from the Implicit package. Currently using hyperparameters suggested by various tutorials with no tuning.

In [7]:
alpha = 3
factors = 64
regularization = 0.003
iterations = 84

model = implicit.als.AlternatingLeastSquares(factors=factors,
                                    regularization=regularization,
                                    iterations=iterations)

## BayesianPersonalizedRanking was pretty bad
# model = implicit.bpr.BayesianPersonalizedRanking(factors=31,
#                                     regularization=0.1,
#                                     iterations=50)


# model = implicit.lmf.LogisticMatrixFactorization(factors=32,
#                                     regularization=0.1,
#                                     iterations=50)

model.fit((product_train * alpha).astype('double'))

user_vecs = model.user_factors
item_vecs = model.item_factors

# Deprecated function below
# user_vecs, item_vecs = implicit.alternating_least_squares((product_train*alpha).astype('double'), 
#                                                           factors=32, 
#                                                           regularization = 0.1, 
#                                                           iterations = 50)



  0%|          | 0/84 [00:00<?, ?it/s]

In [8]:
np.save('../data/interim/vipin20/user_factors', user_vecs)
np.save('../data/interim/vipin20/item_factors', item_vecs)
np.save('../data/interim/vipin20/product_train', product_train*alpha)

# Scoring the Model

Following the tutorial, we will use the area under the Receiver Operating Characteristic curve. 

In [9]:
test, popular = calc_mean_auc(product_train, products_altered, 
              [scipy.sparse.csr_matrix(item_vecs), scipy.sparse.csr_matrix(user_vecs.T)], product_test)


print('Our model scored',test,'versus a score of',popular,'if we always recommended the most popular item.')

Our model scored 0.8790581980962291 versus a score of 0.8038330946271777 if we always recommended the most popular item.


# Spot Checking

Now that we have a pretty good idea of the model performance overall, we can spot check a few things like finding similar items and checking item recommendations for an existing invoice.

In [10]:
item_lookup = transactions[['ItemCode', 'ItemDescription']].drop_duplicates() # Only get unique item/description pairs
item_lookup['ItemCode'] = item_lookup.ItemCode.astype(str) # Encode as strings for future lookup ease

price_lookup = transactions[['ItemCode', 'CostPerItem']].drop_duplicates() # Only get unique item/description pairs
price_lookup['ItemCode'] = price_lookup.ItemCode.astype(str) # Encode as strings for future lookup ease


In [11]:
related = model.similar_items(1284)
for rel in related:
    index = rel[0]
    prob = rel[1]
    item = item_lookup[item_lookup.ItemCode == str(item_list[index])].values
    print(prob, item[0][1])

1.0 SET OF 12 FAIRY CAKE BAKING CASES
0.9680726 SET OF 6 TEA TIME BAKING CASES
0.9662833 SET OF 12 MINI LOAF BAKING CASES
0.9614151 SET OF 6 SNACK LOAF BAKING CASES
0.65862846 SET 40 HEART SHAPE PETIT FOUR CASES
0.604555 SET OF 36 DOILIES PANTRY DESIGN
0.5940598 SET OF 60 VINTAGE LEAF CAKE CASES 
0.568571 VINTAGE CHRISTMAS CAKE FRILL
0.5450308 SET OF 60 PANTRY DESIGN CAKE CASES 
0.5358343 BOX OF 6 CHRISTMAS CAKE DECORATIONS


In [12]:
user_items = (product_train * alpha).astype('double').T.tocsr()
def recommend(order):
    print('Order Contents:')
    print(transactions[transactions.TransactionId == transaction_list[order]].loc[:, ['ItemCode', 'ItemDescription']])
    print('Recommendations:')
    recommendations = model.recommend(order, user_items)
    for rec in recommendations:
        index = rec[0]
        prob = rec[1]
        stock_code = item_list[index]
        item = item_lookup[item_lookup.ItemCode == str(item_list[index])].values
        print(prob, stock_code, item[0][1])

In [13]:
recommend(1)

Order Contents:
         ItemCode            ItemDescription
85667      475272  HAND WARMER RED POLKA DOT
625518     475293     HAND WARMER UNION JACK
701952     475293     HAND WARMER UNION JACK
1022129    475272  HAND WARMER RED POLKA DOT
Recommendations:
0.7987602 475293 HAND WARMER UNION JACK
0.7852349 480186 HAND WARMER SCOTTY DOG DESIGN
0.78475666 480165 HAND WARMER OWL DESIGN
0.78356117 480207 HAND WARMER BIRD DESIGN
0.68987393 492219 HAND WARMER RED LOVE HEART
0.52073014 479514 HAND WARMER BABUSHKA DESIGN
0.43594316 464394 HOT WATER BOTTLE TEA AND SYMPATHY
0.43562314 464331 SCOTTIE DOG HOT WATER BOTTLE
0.41559 464352 CHOCOLATE HOT WATER BOTTLE
0.4010531 479535 HOT WATER BOTTLE I AM SO POORLY


In [15]:
recommend(2200)

Order Contents:
         ItemCode                      ItemDescription
12930      470694         DOORMAT NEIGHBOURHOOD WITCH 
25851      446019                   PINK  POLKADOT CUP
28710      447573          GLASS HEART T-LIGHT HOLDER 
39685      465570   PICTURE FRAME WOOD TRIPLE PORTRAIT
107312     469686                     DOORMAT AIRMAIL 
116889    1787079       JUMBO  BAG BAROQUE BLACK WHITE
142277    1011948              DOORMAT WELCOME PUPPIES
144231     316218         PINK POLKADOT GARDEN PARASOL
147885     477120    SET OF 3 CAKE TINS PANTRY DESIGN 
150689     446040                    BLUE POLKADOT CUP
151681     434973                       JUMBO BAG OWLS
161172     434973                       JUMBO BAG OWLS
174505     450030   SET/3 RED GINGHAM ROSE STORAGE BOX
217133     446019                   PINK  POLKADOT CUP
265312     451185     RETROSPOT HEART HOT WATER BOTTLE
271808    1787079       JUMBO  BAG BAROQUE BLACK WHITE
346807     467082        5 HOOK HANGER MAGIC TOAD

In [16]:
transactions['ItemTotal'] = transactions['NumberOfItemsPurchased'] * transactions['CostPerItem']

In [17]:
recommended_price = []
for user in range(0, len(transaction_list)):
    recommendations = model.recommend(user, user_items)
    index = recommendations[0][0]
    price = price_lookup[price_lookup.ItemCode == str(item_list[index])].values
    item = item_lookup[item_lookup.ItemCode == str(item_list[index])].values
    recommended_price.append(price[0][1])
    
total_recommended = np.sum(recommended_price)

print('After recommending',len(transaction_list),'items, there would be an increase of',
      "${:,.2f}".format(total_recommended*test),'in additional purchases.')

After recommending 18995 items, there would be an increase of $74,043.98 in additional purchases.


In [19]:
totals = transactions.groupby(transactions.TransactionId)['ItemTotal'].sum()
total = totals.sum()

print('Added to the initial total of all',len(transaction_list),'purchases valued at',
      "${:,.2f}".format(total),', the percentage increase in revenue would be', "{:,.4f}%".format(total_recommended / total ))


Added to the initial total of all 18995 purchases valued at $59,859,564.72 , the percentage increase in revenue would be 0.0014%
