![HSV-AI Logo](https://github.com/HSV-AI/hugo-website/blob/master/static/images/logo_v9.png?raw=true)

# Implicit Recommendation from ECommerce Data

Some of the material for this work is based on [A Gentle Introduction to Recommender Systems with Implicit Feedback](https://jessesw.com/Rec-System/) by Jesse Steinweg Woods. This tutorial includes an implementation of the Alternating Least Squares algorithm and some other useful functions (like the area under the curve calculation). Other parts of the tutorial are based on a previous version of the Implicit library and had to be reworked.

The dataset used for this work is from Kaggle [E-Commerce Data, Actual transactions from UK retailer](https://www.kaggle.com/carrie1/ecommerce-data)


# Global Imports

In [1]:
import pandas as pd
import numpy as np
import random
from matplotlib import pyplot as plt
import implicit
import scipy
from sklearn import metrics
from pandas.api.types import CategoricalDtype

In [2]:
%run Common-Functions.ipynb

In [3]:
selected_df = pd.read_pickle('../data/interim/ecommerce/selected_invoices.gz')
item_lookup = pd.read_pickle('../data/interim/ecommerce/item_lookup.gz')

print('Loaded',len(selected_df),'rows')

Loaded 72875 rows


In [4]:
invoices = list(np.sort(selected_df.InvoiceNo.unique())) # Get our unique customers
products = list(selected_df.StockCode.unique()) # Get our unique products that were purchased
quantity = list(selected_df.Quantity) # All of our purchases

cols = selected_df.InvoiceNo.astype(CategoricalDtype(categories=invoices, ordered=True)).cat.codes 
# Get the associated row indices
rows = selected_df.StockCode.astype(CategoricalDtype(categories=products, ordered=True)).cat.codes 
# Get the associated column indices
purchases_sparse = scipy.sparse.csr_matrix((quantity, (rows, cols)), shape=(len(products), len(invoices)))

In [5]:
matrix_size = purchases_sparse.shape[0]*purchases_sparse.shape[1] # Number of possible interactions in the matrix
num_purchases = len(purchases_sparse.nonzero()[0]) # Number of items interacted with
sparsity = 100*(1 - (num_purchases/matrix_size))
sparsity

99.63446551712478

# Training & Test Datasets

We will use the function below to create a training and test dataset from the tutorial linked at the top. The test dataset masks some percentage of purchases to tested later with a recommendation.

In [6]:
product_train, product_test, products_altered, transactions_altered = make_train(purchases_sparse, pct_test = 0.1)

# Implicit Recommendation Model

The code below creates and trains one of the models available from the Implicit package. Currently using hyperparameters suggested by various tutorials with no tuning.

In [7]:
alpha = 29
factors = 64
regularization = 0.117
iterations = 73

model = implicit.als.AlternatingLeastSquares(factors=factors,
                                    regularization=regularization,
                                    iterations=iterations)

## BayesianPersonalizedRanking was pretty bad
# model = implicit.bpr.BayesianPersonalizedRanking(factors=31,
#                                     regularization=0.1,
#                                     iterations=50)


# model = implicit.lmf.LogisticMatrixFactorization(factors=32,
#                                     regularization=0.1,
#                                     iterations=50)

model.fit((product_train * alpha).astype('double'))

user_vecs = model.user_factors
item_vecs = model.item_factors

# Deprecated function below
# user_vecs, item_vecs = implicit.alternating_least_squares((product_train*alpha).astype('double'), 
#                                                           factors=32, 
#                                                           regularization = 0.1, 
#                                                           iterations = 50)



  0%|          | 0/73 [00:00<?, ?it/s]

In [8]:
np.save('../data/interim/ecommerce/user_factors', user_vecs)
np.save('../data/interim/ecommerce/item_factors', item_vecs)
np.save('../data/interim/ecommerce/product_train', product_train*alpha)

# Scoring the Model

Following the tutorial, we will use the area under the Receiver Operating Characteristic curve. 

In [9]:
test, popular = calc_mean_auc(product_train, products_altered, 
              [scipy.sparse.csr_matrix(item_vecs), scipy.sparse.csr_matrix(user_vecs.T)], product_test)


print('Our model scored',test,'versus a score of',popular,'if we always recommended the most popular item.')

Our model scored 0.8002130407904754 versus a score of 0.7365393729154979 if we always recommended the most popular item.


# Spot Checking

Now that we have a pretty good idea of the model performance overall, we can spot check a few things like finding similar items and checking item recommendations for an existing invoice.

The code below is commented out to be able to run automatically from a dvc stage.

In [10]:
related = model.similar_items(1284)
for rel in related:
    index = rel[0]
    prob = rel[1]
    item = item_lookup[item_lookup.StockCode == products[index]].values
    print(prob, item[0][1])


0.9999999 SKULLS PARTY BAG + STICKER SET
0.70554185 PACK OF 6 SKULL PAPER CUPS
0.6804175 DINOSAUR PARTY BAG + STICKER SET
0.6699035 RETROSPOT PARTY BAG + STICKER SET
0.6458982 3D SHEET OF DOG STICKERS
0.6016882 PACK OF 6 SKULL PAPER PLATES
0.55474675 SET OF 9 BLACK SKULL BALLOONS
0.54204106 FLYING PIG WATERING CAN
0.53096527 WOODLAND PARTY BAG + STICKER SET
0.5161282 BLUE PARTY BAGS 


In [11]:
user_items = (product_train * alpha).astype('double').T.tocsr()

def recommend(order):
    print('Order Contents:')
    print(selected_df[selected_df.InvoiceNo == str(invoices[order])].loc[:, ['StockCode', 'Description']])
    print('Recommendations:')
    recommendations = model.recommend(order, user_items)
    for rec in recommendations:
        index = rec[0]
        prob = rec[1]
        stock_code = products[index]
        item = item_lookup[item_lookup.StockCode == stock_code].values
        print(prob, stock_code, item[0][1])    


In [12]:
recommend(1)

Order Contents:
  StockCode                Description
7     22633     HAND WARMER UNION JACK
8     22632  HAND WARMER RED POLKA DOT
Recommendations:
0.75235134 22865 HAND WARMER OWL DESIGN
0.74426055 22867 HAND WARMER BIRD DESIGN
0.7241447 22866 HAND WARMER SCOTTY DOG DESIGN
0.6827718 23439 HAND WARMER RED LOVE HEART
0.41684604 22834 HAND WARMER BABUSHKA DESIGN
0.39577496 23355 HOT WATER BOTTLE KEEP CALM
0.3705899 22114 HOT WATER BOTTLE TEA AND SYMPATHY
0.3326301 84029E RED WOOLLY HOTTIE WHITE HEART.
0.3141568 21481 FAWN BLUE HOT WATER BOTTLE
0.2843492 22111 SCOTTIE DOG HOT WATER BOTTLE


In [13]:
recommend(2340)

Order Contents:
       StockCode                         Description
189024     21122  SET/10 PINK POLKADOT PARTY CANDLES
189025     22953  BIRTHDAY PARTY CORDON BARRIER TAPE
189026     22435      SET OF 9 HEART SHAPED BALLOONS
189027     22436          12 COLOURED PARTY BALLOONS
Recommendations:
0.38086239 23154 SET OF 4 JAM JAR MAGNETS
0.360277 23159 SET OF 5 PANCAKE DAY MAGNETS
0.34165066 22766 PHOTO FRAME CORNICE
0.29829544 21679 SKULLS  STICKERS
0.2912522 21361 LOVE LARGE WOOD LETTERS 
0.28766885 23158 SET OF 5 LUCKY CAT MAGNETS 
0.28373688 21430 SET/3 RED GINGHAM ROSE STORAGE BOX
0.28058022 23121 PACK OF 6 COCKTAIL PARASOL STRAWS
0.27489558 22834 HAND WARMER BABUSHKA DESIGN
0.2726527 21209 MULTICOLOUR HONEYCOMB FAN


In [14]:
selected_df.head()

Unnamed: 0,InvoiceNo,StockCode,Quantity,UnitPrice,Description
0,536365,85123A,6,2.55,WHITE HANGING HEART T-LIGHT HOLDER
1,536365,71053,6,3.39,WHITE METAL LANTERN
2,536365,84406B,8,2.75,CREAM CUPID HEARTS COAT HANGER
3,536365,84029G,6,3.39,KNITTED UNION FLAG HOT WATER BOTTLE
4,536365,84029E,6,3.39,RED WOOLLY HOTTIE WHITE HEART.


In [15]:
price_lookup = selected_df[['StockCode', 'UnitPrice']].drop_duplicates() # Only get unique item/description pairs
price_lookup['StockCode'] = price_lookup.StockCode.astype(str) # Encode as strings for future lookup ease

recommended_price = []
for user in range(0, len(invoices)):
    recommendations = model.recommend(user, user_items)
    index = recommendations[0][0]
    price = price_lookup[price_lookup.StockCode == products[index]].values
    recommended_price.append(price[0][1])
    
total_recommended = np.sum(recommended_price)

print('After recommending',len(invoices),'items, there would be an increase of',
      "${:,.2f}".format(total_recommended),'in additional purchases.')


After recommending 5761 items, there would be an increase of $24,107.95 in additional purchases.


In [16]:
selected_df['StockTotal'] = selected_df['Quantity'] * selected_df['UnitPrice']
totals = selected_df.groupby(selected_df.InvoiceNo)['StockTotal'].sum()
total = totals.sum()

print('Added to the initial total of all',len(invoices),'purchases valued at',
      "${:,.2f}".format(total),', the percentage increase in revenue would be', "{:,.2f}%".format(total_recommended / total))


Added to the initial total of all 5761 purchases valued at $928,261.46 , the percentage increase in revenue would be 0.03%
