![HSV-AI Logo](https://github.com/HSV-AI/hugo-website/blob/master/static/images/logo_v9.png?raw=true)

# Implicit Recommendation from ECommerce Data

Some of the material for this work is based on [A Gentle Introduction to Recommender Systems with Implicit Feedback](https://jessesw.com/Rec-System/) by Jesse Steinweg Woods. This tutorial includes an implementation of the Alternating Least Squares algorithm and some other useful functions (like the area under the curve calculation). Other parts of the tutorial are based on a previous version of the Implicit library and had to be reworked.

The dataset used for this work is from Kaggle [Vipin Kumar Transaction Data](https://www.kaggle.com/vipin20/transaction-data):

## Context

This is a item purchased transactions data. It has 8 columns.
This data makes you familer with transactions data.

## Content

Data description is :-

* UserId -It is a unique ID for all User Id
* TransactionId -It contains unique Transactions ID
* TransactionTime -It contains Transaction Time
* ItemCode -It contains item code that item will be purchased
* ItemDescription -It contains Item description
* NumberOfItemPurchased -It contains total number of items Purchased
* CostPerltem -Cost per item Purchased
* Country -Country where item purchased


# Global Imports

In [1]:
import pandas as pd
import numpy as np
import random
from matplotlib import pyplot as plt
import implicit
import scipy
from sklearn import metrics
from pandas.api.types import CategoricalDtype

In [2]:
%run Common-Functions.ipynb

In [3]:
transactions = pd.read_pickle('../data/interim/jewelry/transactions.gz')
print('Loaded',len(transactions),'rows')

Loaded 112905 rows


In [33]:
transaction_list = list(np.sort(transactions.order_id.unique())) # Get our unique customers
item_list = list(transactions.product_id.unique()) # Get our unique products that were purchased
quantity_list = list(transactions.quantity) # All of our purchases

cols = transactions.order_id.astype(CategoricalDtype(categories=transaction_list, ordered=True)).cat.codes 
# Get the associated row indices
rows = transactions.product_id.astype(CategoricalDtype(categories=item_list, ordered=True)).cat.codes 
# Get the associated column indices
purchases_sparse = scipy.sparse.csr_matrix((quantity_list, (rows, cols)), shape=(len(item_list), len(transaction_list)))

total_count = len(transactions)
denominator = len(transaction_list) * len(item_list)
sparsity = 100*(1 - total_count*1.0/denominator)
print("The transactions dataframe is ", "%.4f" % sparsity + "% empty.")

The transactions dataframe is  99.9868% empty.


# Training & Test Datasets

We will use the function below to create a training and test dataset from the tutorial linked at the top. The test dataset masks some percentage of purchases to tested later with a recommendation.

In [6]:
product_train, product_test, products_altered, transactions_altered = make_train(purchases_sparse, pct_test = 0.211)
print('Total number of masked items:',product_test.count_nonzero()-product_train.count_nonzero())


Total number of masked items: 22901


# Implicit Recommendation Model

The code below creates and trains one of the models available from the Implicit package. Currently using hyperparameters suggested by various tutorials with no tuning.

In [34]:
alpha = 15
factors = 64
regularization = 0.003
iterations = 50

# model = implicit.als.AlternatingLeastSquares(factors=factors,
#                                     regularization=regularization,
#                                     iterations=iterations)

## BayesianPersonalizedRanking was pretty bad
model = implicit.bpr.BayesianPersonalizedRanking(factors=factors,
                                     regularization=regularization,
                                     iterations=iterations)


# model = implicit.lmf.LogisticMatrixFactorization(factors=32,
#                                     regularization=0.1,
#                                     iterations=50)

model.fit((product_train * alpha).astype('double'))

user_vecs = model.user_factors
item_vecs = model.item_factors

# Deprecated function below
# user_vecs, item_vecs = implicit.alternating_least_squares((product_train*alpha).astype('double'), 
#                                                           factors=32, 
#                                                           regularization = 0.1, 
#                                                           iterations = 50)

  0%|          | 0/50 [00:00<?, ?it/s]

In [8]:
np.save('../data/interim/jewelry/user_factors', user_vecs)
np.save('../data/interim/jewelry/item_factors', item_vecs)
np.save('../data/interim/jewelry/product_train', product_train*alpha)

# Scoring the Model

Following the tutorial, we will use the area under the Receiver Operating Characteristic curve. 

In [35]:
test, popular = calc_mean_auc(product_train, products_altered, 
              [scipy.sparse.csr_matrix(item_vecs), scipy.sparse.csr_matrix(user_vecs.T)], product_test)


print('Our model scored',test,'versus a score of',popular,'if we always recommended the most popular item.')

Our model scored 0.5078597689872814 versus a score of 0.5980001232330087 if we always recommended the most popular item.


# Spot Checking

Now that we have a pretty good idea of the model performance overall, we can spot check a few things like finding similar items and checking item recommendations for an existing invoice.

In [10]:
transactions.head()

Unnamed: 0,event_time,order_id,product_id,quantity,category_id,category_code,brand,price,user_id,gender,color,metal,gem,datetime,stock_total
0,2018-11-29 16:30:45 UTC,1923415742179443254,1836250225916772582,1,1.806829e+18,jewelry.pendant,0.0,67.78,1.515916e+18,,red,gold,diamond,20181129,67.78
1,2018-11-29 16:52:07 UTC,1923426489303302817,1836015460420681761,1,1.806829e+18,jewelry.pendant,0.0,32.63,1.515916e+18,,red,gold,,20181129,32.63
2,2018-11-29 17:58:37 UTC,1923459963229831173,1806829194936582544,1,1.806829e+18,jewelry.ring,1.0,75.21,1.515916e+18,,red,gold,amethyst,20181129,75.21
3,2018-11-29 20:25:52 UTC,1923534078074684181,1835566854668550661,1,1.806829e+18,jewelry.earring,0.0,131.37,1.515916e+18,f,red,gold,,20181129,131.37
4,2018-11-29 20:30:01 UTC,1923536169069445939,1836568752905257618,1,1.806829e+18,jewelry.bracelet,0.0,102.6,1.515916e+18,f,red,gold,,20181129,102.6


In [11]:
item_lookup = transactions[['product_id', 'category_code']].drop_duplicates() # Only get unique item/description pairs
item_lookup['product_id'] = item_lookup.product_id.astype(str) # Encode as strings for future lookup ease

price_lookup = transactions[['product_id', 'price']].drop_duplicates() # Only get unique item/description pairs
price_lookup['product_id'] = price_lookup.product_id.astype(str) # Encode as strings for future lookup ease


In [12]:
related = model.similar_items(1284)
for rel in related:
    index = rel[0]
    prob = rel[1]
    item = item_lookup[item_lookup.product_id == str(item_list[index])].values
    print(prob, item[0][1])

0.99999994 jewelry.pendant
0.8663837 jewelry.pendant
0.84383583 jewelry.pendant
0.82434195 jewelry.pendant
0.8063963 jewelry.pendant
0.7890969 jewelry.pendant
0.7809124 nan
0.7705171 jewelry.pendant
0.7628399 jewelry.pendant
0.7584341 jewelry.ring


In [13]:
user_items = (product_train * alpha).astype('double').T.tocsr()
def recommend(order):
    print('Order Contents:')
    print(transactions[transactions.order_id == transaction_list[order]].loc[:, ['product_id', 'category_code']])
    print('Recommendations:')
    recommendations = model.recommend(order, user_items)
    for rec in recommendations:
        index = rec[0]
        prob = rec[1]
        stock_code = item_list[index]
        item = item_lookup[item_lookup.product_id == str(item_list[index])].values
        print(prob, stock_code, item[0][1])

In [14]:
recommend(1)

Order Contents:
            product_id    category_code
1  1836015460420681761  jewelry.pendant
Recommendations:
3.9715745e-11 1956663845585944582 jewelry.ring
3.6347956e-11 1956663847666319760 jewelry.ring
2.0686551e-11 1956663845862769039 jewelry.ring
2.0593572e-11 1956663836207481430 jewelry.ring
1.3452031e-11 1956663836207481431 jewelry.ring
1.3161472e-11 1352907200745439279 jewelry.ring
1.2399197e-11 1956663840309510725 jewelry.ring
1.0012997e-11 1956663836400419658 jewelry.earring
8.514214e-12 1956663845845991794 jewelry.earring
8.215916e-12 1956663831199482752 jewelry.earring


In [15]:
recommend(2200)

Order Contents:
               product_id   category_code
2734  1956663848320631776  jewelry.brooch
2735  1956663846374474515  jewelry.brooch
2736  1956663847397883910  jewelry.brooch
Recommendations:
0.53057414 1893645719638639197 jewelry.brooch
0.4254006 1956663830721331445 jewelry.earring
0.40232134 1956663848329020389 jewelry.earring
0.37513334 1884732184250548463 jewelry.brooch
0.34064856 1956663836031320509 jewelry.bracelet
0.3392904 1956663836249424522 jewelry.earring
0.3215841 1515966222752328632 jewelry.pendant
0.31941366 1956663835989377425 jewelry.brooch
0.31266576 1913814474649764091 jewelry.earring
0.3068612 1956663845745328352 jewelry.ring


In [16]:
transactions['ItemTotal'] = transactions['quantity'] * transactions['price']

In [17]:
recommended_price = []
for user in range(0, len(transaction_list)):
    recommendations = model.recommend(user, user_items)
    index = recommendations[0][0]
    price = price_lookup[price_lookup.product_id == str(item_list[index])].values
    item = item_lookup[item_lookup.product_id == str(item_list[index])].values
    recommended_price.append(price[0][1])
    
total_recommended = np.sum(recommended_price)


In [18]:
accept_rate = 0.3
print('After recommending',len(transaction_list),'items with a',accept_rate,'acceptance rate, there would be an increase of',
      "${:,.2f}".format(total_recommended*accept_rate),'in additional purchases.')

After recommending 85348 items with a 0.3 acceptance rate, there would be an increase of $5,827,424.46 in additional purchases.


In [19]:
totals = transactions.groupby(transactions.order_id)['ItemTotal'].sum()
total = totals.sum()

print('Added to the initial total of all',len(transaction_list),'purchases valued at',
      "${:,.2f}".format(total),', the percentage increase in revenue would be', "{:,.4f}%".format(total_recommended*accept_rate / total * 100 ))


Added to the initial total of all 85348 purchases valued at $32,997,331.17 , the percentage increase in revenue would be 17.6603%
