![HSV-AI Logo](https://github.com/HSV-AI/hugo-website/blob/master/static/images/logo_v9.png?raw=true)

# Implicit Recommendation from ECommerce Data

Some of the material for this work is based on [A Gentle Introduction to Recommender Systems with Implicit Feedback](https://jessesw.com/Rec-System/) by Jesse Steinweg Woods. This tutorial includes an implementation of the Alternating Least Squares algorithm and some other useful functions (like the area under the curve calculation). Other parts of the tutorial are based on a previous version of the Implicit library and had to be reworked.

The dataset used for this work is from Kaggle [Vipin Kumar Transaction Data](https://www.kaggle.com/vipin20/transaction-data):

## Context

This is a item purchased transactions data. It has 8 columns.
This data makes you familer with transactions data.

## Content

Data description is :-

* UserId -It is a unique ID for all User Id
* TransactionId -It contains unique Transactions ID
* TransactionTime -It contains Transaction Time
* ItemCode -It contains item code that item will be purchased
* ItemDescription -It contains Item description
* NumberOfItemPurchased -It contains total number of items Purchased
* CostPerltem -Cost per item Purchased
* Country -Country where item purchased


# Global Imports

In [1]:
import pandas as pd
import numpy as np
import random
from matplotlib import pyplot as plt
import implicit
import scipy
from sklearn import metrics
from pandas.api.types import CategoricalDtype
import wandb

In [2]:
%run Common-Functions.ipynb

## Hyperparameter Tuning with Weights & Biases


In [3]:
sweep_config = {
    "name": "vipin20-sweep",
    "method": "bayes",  # grid, random
    "metric": {"name": "prediction_auc", "goal": "maximize"},
    "parameters": {
        "percent_test": {"min":0.1, "max":0.3},
        "alpha": {"min":1, "max":50 },
        "regularization": {"min":0.001, "max":.3},
        "iterations": {"min":20, "max":100}
    },
}

sweep_id = wandb.sweep(sweep_config, project="vipin20")

def sweep():

    # Initialize a new wandb run
    with wandb.init() as run:

        transactions = pd.read_pickle('../data/interim/vipin20/transactions.gz')

        transaction_list = list(np.sort(transactions.TransactionId.unique())) # Get our unique customers
        item_list = list(transactions.ItemCode.unique()) # Get our unique products that were purchased
        quantity_list = list(transactions.NumberOfItemsPurchased) # All of our purchases

        cols = transactions.TransactionId.astype(CategoricalDtype(categories=transaction_list, ordered=True)).cat.codes 
        # Get the associated row indices
        rows = transactions.ItemCode.astype(CategoricalDtype(categories=item_list, ordered=True)).cat.codes 
        # Get the associated column indices
        purchases_sparse = scipy.sparse.csr_matrix((quantity_list, (rows, cols)), shape=(len(item_list), len(transaction_list)))

        product_train, product_test, products_altered, transactions_altered = make_train(purchases_sparse, pct_test = wandb.config['percent_test'])

        model = implicit.als.AlternatingLeastSquares(factors=64,
                                        regularization=wandb.config['regularization'],
                                        iterations=wandb.config['iterations'])

        alpha = wandb.config['alpha']                                                                  
        model.fit((product_train * alpha).astype('double'))

        user_vecs = model.user_factors
        item_vecs = model.item_factors

        test, popular = calc_mean_auc(product_train, products_altered, 
                      [scipy.sparse.csr_matrix(item_vecs), scipy.sparse.csr_matrix(user_vecs.T)], product_test)
        print('Logging Test Value:',test)
        wandb.log({
            'prediction_auc': test
        })


wandb.agent(sweep_id, function=sweep, count=100)

Create sweep with ID: b1xepl0k
Sweep URL: https://wandb.ai/jperiodlangley/vipin20/sweeps/b1xepl0k


[34m[1mwandb[0m: Agent Starting Run: n2g3m49y with config:
[34m[1mwandb[0m: 	alpha: 6
[34m[1mwandb[0m: 	iterations: 27
[34m[1mwandb[0m: 	percent_test: 0.29996418164015204
[34m[1mwandb[0m: 	regularization: 0.18367796470780554
[34m[1mwandb[0m: Currently logged in as: [33mjperiodlangley[0m (use `wandb login --relogin` to force relogin)




  0%|          | 0/27 [00:00<?, ?it/s]

Logging Test Value: 0.8540146483821729


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.85401
_runtime,90.0
_timestamp,1629485764.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Agent Starting Run: pcsnth5o with config:
[34m[1mwandb[0m: 	alpha: 17
[34m[1mwandb[0m: 	iterations: 99
[34m[1mwandb[0m: 	percent_test: 0.11529949010102865
[34m[1mwandb[0m: 	regularization: 0.27788805504385244


  0%|          | 0/99 [00:00<?, ?it/s]

Logging Test Value: 0.8851870094601174


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.88519
_runtime,214.0
_timestamp,1629485982.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Agent Starting Run: l37nsale with config:
[34m[1mwandb[0m: 	alpha: 20
[34m[1mwandb[0m: 	iterations: 98
[34m[1mwandb[0m: 	percent_test: 0.120313701011623
[34m[1mwandb[0m: 	regularization: 0.13607005725622778


  0%|          | 0/98 [00:00<?, ?it/s]

Logging Test Value: 0.8811534214407395


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.88115
_runtime,195.0
_timestamp,1629486182.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Agent Starting Run: tw8k6uiw with config:
[34m[1mwandb[0m: 	alpha: 42
[34m[1mwandb[0m: 	iterations: 90
[34m[1mwandb[0m: 	percent_test: 0.11593275299898827
[34m[1mwandb[0m: 	regularization: 0.29162796789634277


  0%|          | 0/90 [00:00<?, ?it/s]

Logging Test Value: 0.8799646401472598


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.87996
_runtime,184.0
_timestamp,1629486370.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: 8eq43nb9 with config:
[34m[1mwandb[0m: 	alpha: 6
[34m[1mwandb[0m: 	iterations: 86
[34m[1mwandb[0m: 	percent_test: 0.10581114993109318
[34m[1mwandb[0m: 	regularization: 0.2688695534011146


  0%|          | 0/86 [00:00<?, ?it/s]

Logging Test Value: 0.8853776440297852


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.88538
_runtime,175.0
_timestamp,1629486561.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Agent Starting Run: 5ejb6pc2 with config:
[34m[1mwandb[0m: 	alpha: 4
[34m[1mwandb[0m: 	iterations: 99
[34m[1mwandb[0m: 	percent_test: 0.10308777446711591
[34m[1mwandb[0m: 	regularization: 0.1525805133133311


  0%|          | 0/99 [00:00<?, ?it/s]

Logging Test Value: 0.8900086875503848


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.89001
_runtime,199.0
_timestamp,1629486763.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Agent Starting Run: 3l6otovy with config:
[34m[1mwandb[0m: 	alpha: 1
[34m[1mwandb[0m: 	iterations: 59
[34m[1mwandb[0m: 	percent_test: 0.10084316646728433
[34m[1mwandb[0m: 	regularization: 0.0070055656991231405


  0%|          | 0/59 [00:00<?, ?it/s]

Logging Test Value: 0.8812901316463518


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.88129
_runtime,134.0
_timestamp,1629486902.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Agent Starting Run: q0g6tn3w with config:
[34m[1mwandb[0m: 	alpha: 1
[34m[1mwandb[0m: 	iterations: 96
[34m[1mwandb[0m: 	percent_test: 0.1409726867749291
[34m[1mwandb[0m: 	regularization: 0.05866636886619058


  0%|          | 0/96 [00:00<?, ?it/s]

Logging Test Value: 0.87552111978591


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.87552
_runtime,187.0
_timestamp,1629487094.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Agent Starting Run: 13sk68z0 with config:
[34m[1mwandb[0m: 	alpha: 45
[34m[1mwandb[0m: 	iterations: 33
[34m[1mwandb[0m: 	percent_test: 0.10713693400773652
[34m[1mwandb[0m: 	regularization: 0.017589891574095906


  0%|          | 0/33 [00:00<?, ?it/s]

Logging Test Value: 0.8672429353391732


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.86724
_runtime,93.0
_timestamp,1629487191.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Agent Starting Run: 97c15p3i with config:
[34m[1mwandb[0m: 	alpha: 2
[34m[1mwandb[0m: 	iterations: 98
[34m[1mwandb[0m: 	percent_test: 0.10197416665368829
[34m[1mwandb[0m: 	regularization: 0.22724718509807346


  0%|          | 0/98 [00:00<?, ?it/s]

Logging Test Value: 0.8877003200325867


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.8877
_runtime,192.0
_timestamp,1629487387.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Agent Starting Run: 6prvoff9 with config:
[34m[1mwandb[0m: 	alpha: 7
[34m[1mwandb[0m: 	iterations: 20
[34m[1mwandb[0m: 	percent_test: 0.10009295108698649
[34m[1mwandb[0m: 	regularization: 0.28049909395311995


  0%|          | 0/20 [00:00<?, ?it/s]

Logging Test Value: 0.8815161578930027


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.88152
_runtime,77.0
_timestamp,1629487469.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Agent Starting Run: p89zmiay with config:
[34m[1mwandb[0m: 	alpha: 4
[34m[1mwandb[0m: 	iterations: 69
[34m[1mwandb[0m: 	percent_test: 0.10537613614469413
[34m[1mwandb[0m: 	regularization: 0.17052866612478138


  0%|          | 0/69 [00:00<?, ?it/s]

Logging Test Value: 0.8868775482434587


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.88688
_runtime,145.0
_timestamp,1629487619.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Agent Starting Run: 8bby3a35 with config:
[34m[1mwandb[0m: 	alpha: 13
[34m[1mwandb[0m: 	iterations: 100
[34m[1mwandb[0m: 	percent_test: 0.10370879354089185
[34m[1mwandb[0m: 	regularization: 0.17121299591674657


  0%|          | 0/100 [00:00<?, ?it/s]

Logging Test Value: 0.8861683535403817


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.88617
_runtime,175.0
_timestamp,1629487799.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Agent Starting Run: jyhl3ffi with config:
[34m[1mwandb[0m: 	alpha: 48
[34m[1mwandb[0m: 	iterations: 93
[34m[1mwandb[0m: 	percent_test: 0.27535940948817683
[34m[1mwandb[0m: 	regularization: 0.2550768413927618


  0%|          | 0/93 [00:00<?, ?it/s]

Logging Test Value: 0.8546270640552974


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.85463
_runtime,164.0
_timestamp,1629487967.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Agent Starting Run: sxfyw5p7 with config:
[34m[1mwandb[0m: 	alpha: 46
[34m[1mwandb[0m: 	iterations: 32
[34m[1mwandb[0m: 	percent_test: 0.1006949680088612
[34m[1mwandb[0m: 	regularization: 0.28425681898089716


  0%|          | 0/32 [00:00<?, ?it/s]

Logging Test Value: 0.874580778316381


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.87458
_runtime,92.0
_timestamp,1629488063.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Agent Starting Run: uehrud0m with config:
[34m[1mwandb[0m: 	alpha: 2
[34m[1mwandb[0m: 	iterations: 88
[34m[1mwandb[0m: 	percent_test: 0.10293408430865875
[34m[1mwandb[0m: 	regularization: 0.19151952301971548


  0%|          | 0/88 [00:00<?, ?it/s]

Logging Test Value: 0.8881049513571511


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.8881
_runtime,158.0
_timestamp,1629488226.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Agent Starting Run: q42f8x5w with config:
[34m[1mwandb[0m: 	alpha: 3
[34m[1mwandb[0m: 	iterations: 36
[34m[1mwandb[0m: 	percent_test: 0.10583276718805162
[34m[1mwandb[0m: 	regularization: 0.18620152053849426


  0%|          | 0/36 [00:00<?, ?it/s]

Logging Test Value: 0.8853267339388581


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.88533
_runtime,98.0
_timestamp,1629488328.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Agent Starting Run: gkxnb1pc with config:
[34m[1mwandb[0m: 	alpha: 6
[34m[1mwandb[0m: 	iterations: 63
[34m[1mwandb[0m: 	percent_test: 0.10030978742423674
[34m[1mwandb[0m: 	regularization: 0.2883315586642588


  0%|          | 0/63 [00:00<?, ?it/s]

Logging Test Value: 0.8887782510120875


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.88878
_runtime,143.0
_timestamp,1629488476.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Sweep Agent: Exiting.
