![HSV-AI Logo](https://github.com/HSV-AI/hugo-website/blob/master/static/images/logo_v9.png?raw=true)

# Implicit Recommendation from ECommerce Data

Some of the material for this work is based on [A Gentle Introduction to Recommender Systems with Implicit Feedback](https://jessesw.com/Rec-System/) by Jesse Steinweg Woods. This tutorial includes an implementation of the Alternating Least Squares algorithm and some other useful functions (like the area under the curve calculation). Other parts of the tutorial are based on a previous version of the Implicit library and had to be reworked.

The dataset used for this work is from Kaggle [E-Commerce Data, Actual transactions from UK retailer](https://www.kaggle.com/carrie1/ecommerce-data)


# Global Imports

In [1]:
import pandas as pd
import numpy as np
import random
from matplotlib import pyplot as plt
import implicit
import scipy
from sklearn import metrics
from pandas.api.types import CategoricalDtype

In [2]:
%run Common-Functions.ipynb

In [None]:
selected_df = pd.read_pickle('../data/interim/ecommerce/selected_invoices.gz')
item_lookup = pd.read_pickle('../data/interim/ecommerce/item_lookup.gz')

print('Loaded',len(selected_df),'rows')

In [None]:
invoices = list(np.sort(selected_df.InvoiceNo.unique())) # Get our unique customers
products = list(selected_df.StockCode.unique()) # Get our unique products that were purchased
quantity = list(selected_df.Quantity) # All of our purchases

cols = selected_df.InvoiceNo.astype(CategoricalDtype(categories=invoices, ordered=True)).cat.codes 
# Get the associated row indices
rows = selected_df.StockCode.astype(CategoricalDtype(categories=products, ordered=True)).cat.codes 
# Get the associated column indices
purchases_sparse = scipy.sparse.csr_matrix((quantity, (rows, cols)), shape=(len(products), len(invoices)))

In [None]:
matrix_size = purchases_sparse.shape[0]*purchases_sparse.shape[1] # Number of possible interactions in the matrix
num_purchases = len(purchases_sparse.nonzero()[0]) # Number of items interacted with
sparsity = 100*(1 - (num_purchases/matrix_size))
sparsity

# Training & Test Datasets

We will use the function below to create a training and test dataset from the tutorial linked at the top. The test dataset masks some percentage of purchases to tested later with a recommendation.

In [None]:
product_train, product_test, products_altered, transactions_altered = make_train(purchases_sparse, pct_test = 0.1)

# Implicit Recommendation Model

The code below creates and trains one of the models available from the Implicit package. Currently using hyperparameters suggested by various tutorials with no tuning.

In [None]:
alpha = 29
factors = 64
regularization = 0.117
iterations = 73

model = implicit.als.AlternatingLeastSquares(factors=factors,
                                    regularization=regularization,
                                    iterations=iterations)

## BayesianPersonalizedRanking was pretty bad
# model = implicit.bpr.BayesianPersonalizedRanking(factors=31,
#                                     regularization=0.1,
#                                     iterations=50)


# model = implicit.lmf.LogisticMatrixFactorization(factors=32,
#                                     regularization=0.1,
#                                     iterations=50)

model.fit((product_train * alpha).astype('double'))

user_vecs = model.user_factors
item_vecs = model.item_factors

# Deprecated function below
# user_vecs, item_vecs = implicit.alternating_least_squares((product_train*alpha).astype('double'), 
#                                                           factors=32, 
#                                                           regularization = 0.1, 
#                                                           iterations = 50)

In [None]:
np.save('../data/interim/ecommerce/user_factors', user_vecs)
np.save('../data/interim/ecommerce/item_factors', item_vecs)
np.save('../data/interim/ecommerce/product_train', product_train*alpha)

# Scoring the Model

Following the tutorial, we will use the area under the Receiver Operating Characteristic curve. 

In [None]:
test, popular = calc_mean_auc(product_train, products_altered, 
              [scipy.sparse.csr_matrix(item_vecs), scipy.sparse.csr_matrix(user_vecs.T)], product_test)


print('Our model scored',test,'versus a score of',popular,'if we always recommended the most popular item.')

# Spot Checking

Now that we have a pretty good idea of the model performance overall, we can spot check a few things like finding similar items and checking item recommendations for an existing invoice.

The code below is commented out to be able to run automatically from a dvc stage.

In [None]:
related = model.similar_items(1284)
for rel in related:
    index = rel[0]
    prob = rel[1]
    item = item_lookup[item_lookup.StockCode == products[index]].values
    print(prob, item[0][1])


In [None]:
user_items = (product_train * alpha).astype('double').T.tocsr()

def recommend(order):
    print('Order Contents:')
    print(selected_df[selected_df.InvoiceNo == str(invoices[order])].loc[:, ['StockCode', 'Description']])
    print('Recommendations:')
    recommendations = model.recommend(order, user_items)
    for rec in recommendations:
        index = rec[0]
        prob = rec[1]
        stock_code = products[index]
        item = item_lookup[item_lookup.StockCode == stock_code].values
        print(prob, stock_code, item[0][1])    


In [None]:
recommend(1)

In [None]:
recommend(2340)

In [None]:
selected_df.head()

In [None]:
price_lookup = selected_df[['StockCode', 'UnitPrice']].drop_duplicates() # Only get unique item/description pairs
price_lookup['StockCode'] = price_lookup.StockCode.astype(str) # Encode as strings for future lookup ease

recommended_price = []
for user in range(0, len(invoices)):
    recommendations = model.recommend(user, user_items)
    index = recommendations[0][0]
    price = price_lookup[price_lookup.StockCode == products[index]].values
    recommended_price.append(price[0][1])
    
total_recommended = np.sum(recommended_price)

print('After recommending',len(invoices),'items, there would be an increase of',
      "${:,.2f}".format(total_recommended),'in additional purchases.')


In [None]:
selected_df['StockTotal'] = selected_df['Quantity'] * selected_df['UnitPrice']
totals = selected_df.groupby(selected_df.InvoiceNo)['StockTotal'].sum()
total = totals.sum()

print('Added to the initial total of all',len(invoices),'purchases valued at',
      "${:,.2f}".format(total),', the percentage increase in revenue would be', "{:,.2f}%".format(total_recommended / total))
