![HSV-AI Logo](https://github.com/HSV-AI/hugo-website/blob/master/static/images/logo_v9.png?raw=true)

# Implicit Recommendation from ECommerce Data

Some of the material for this work is based on [A Gentle Introduction to Recommender Systems with Implicit Feedback](https://jessesw.com/Rec-System/) by Jesse Steinweg Woods. This tutorial includes an implementation of the Alternating Least Squares algorithm and some other useful functions (like the area under the curve calculation). Other parts of the tutorial are based on a previous version of the Implicit library and had to be reworked.

The dataset used for this work is from Kaggle [E-Commerce Data, Actual transactions from UK retailer](https://www.kaggle.com/carrie1/ecommerce-data)


# Global Imports

In [1]:
import pandas as pd
import numpy as np
import random
from matplotlib import pyplot as plt
import implicit
import scipy
from sklearn import metrics
from pandas.api.types import CategoricalDtype

import wandb

In [2]:
%run Common-Functions.ipynb

## Hyperparameter Tuning with Weights & Biases


In [None]:
sweep_config = {
    "method": "bayes",  # grid, random
    "metric": {"name": "prediction_auc", "goal": "maximize"},
    "parameters": {
        "percent_test": {"min":0.1, "max":0.3},
        "alpha": {"min":1, "max":30 },
        "factors" : {
            "values" : [64, 128]
        },
        "regularization": {"min":0.01, "max":.2},
        "iterations": {"min":20, "max":100}
    },
}

sweep_id = wandb.sweep(sweep_config, project="ecommerce")

def sweep():

    # Initialize a new wandb run
    with wandb.init() as run:

        selected_df = pd.read_pickle('../data/interim/ecommerce/selected_invoices.gz')   

        # The Sweep parameters are passed in with the wandb.config parameter
        invoices = list(np.sort(selected_df.InvoiceNo.unique())) # Get our unique customers
        products = list(selected_df.StockCode.unique()) # Get our unique products that were purchased
        quantity = list(selected_df.Quantity) # All of our purchases

        cols = selected_df.InvoiceNo.astype(CategoricalDtype(categories=invoices, ordered=True)).cat.codes 
        # Get the associated row indices
        rows = selected_df.StockCode.astype(CategoricalDtype(categories=products, ordered=True)).cat.codes 
        # Get the associated column indices
        purchases_sparse = scipy.sparse.csr_matrix((quantity, (rows, cols)), shape=(len(products), len(invoices)))
        product_train, product_test, products_altered, transactions_altered = make_train(purchases_sparse, pct_test = wandb.config['percent_test'])

        model = implicit.als.AlternatingLeastSquares(factors=wandb.config['factors'],
                                        regularization=wandb.config['regularization'],
                                        iterations=wandb.config['iterations'])

        alpha = wandb.config['alpha']                                                                  
        model.fit((product_train * alpha).astype('double'))

        user_vecs = model.user_factors
        item_vecs = model.item_factors

        test, popular = calc_mean_auc(product_train, products_altered, 
                      [scipy.sparse.csr_matrix(item_vecs), scipy.sparse.csr_matrix(user_vecs.T)], product_test)

        wandb.log({
            'prediction_auc': test
        })


wandb.agent(sweep_id, sweep, count=100)


Create sweep with ID: q0maelzd
Sweep URL: https://wandb.ai/jperiodlangley/ecommerce/sweeps/q0maelzd


[34m[1mwandb[0m: Agent Starting Run: cte0bi68 with config:
[34m[1mwandb[0m: 	alpha: 29
[34m[1mwandb[0m: 	factors: 128
[34m[1mwandb[0m: 	iterations: 50
[34m[1mwandb[0m: 	percent_test: 0.17899711738223037
[34m[1mwandb[0m: 	regularization: 0.1436015367974845
[34m[1mwandb[0m: Currently logged in as: [33mjperiodlangley[0m (use `wandb login --relogin` to force relogin)




  0%|          | 0/50 [00:00<?, ?it/s]

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.76485
_runtime,52.0
_timestamp,1629573344.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Agent Starting Run: 4vlcyv02 with config:
[34m[1mwandb[0m: 	alpha: 10
[34m[1mwandb[0m: 	factors: 128
[34m[1mwandb[0m: 	iterations: 60
[34m[1mwandb[0m: 	percent_test: 0.23302660348215537
[34m[1mwandb[0m: 	regularization: 0.16724601072407524


  0%|          | 0/60 [00:00<?, ?it/s]

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.75555
_runtime,77.0
_timestamp,1629573425.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Agent Starting Run: nybn0o3g with config:
[34m[1mwandb[0m: 	alpha: 25
[34m[1mwandb[0m: 	factors: 128
[34m[1mwandb[0m: 	iterations: 84
[34m[1mwandb[0m: 	percent_test: 0.15427211557385098
[34m[1mwandb[0m: 	regularization: 0.06535730988590596


  0%|          | 0/84 [00:00<?, ?it/s]

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.77472
_runtime,80.0
_timestamp,1629573508.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Agent Starting Run: f7vy6u7n with config:
[34m[1mwandb[0m: 	alpha: 12
[34m[1mwandb[0m: 	factors: 64
[34m[1mwandb[0m: 	iterations: 84
[34m[1mwandb[0m: 	percent_test: 0.19551055842321036
[34m[1mwandb[0m: 	regularization: 0.039556539473122036


  0%|          | 0/84 [00:00<?, ?it/s]

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.77393
_runtime,58.0
_timestamp,1629573569.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Agent Starting Run: 1ivhe2n8 with config:
[34m[1mwandb[0m: 	alpha: 22
[34m[1mwandb[0m: 	factors: 128
[34m[1mwandb[0m: 	iterations: 84
[34m[1mwandb[0m: 	percent_test: 0.17081870135814453
[34m[1mwandb[0m: 	regularization: 0.053777593686969605


  0%|          | 0/84 [00:00<?, ?it/s]