![HSV-AI Logo](https://github.com/HSV-AI/hugo-website/blob/master/static/images/logo_v9.png?raw=true)

# Implicit Recommendation from ECommerce Data

Some of the material for this work is based on [A Gentle Introduction to Recommender Systems with Implicit Feedback](https://jessesw.com/Rec-System/) by Jesse Steinweg Woods. This tutorial includes an implementation of the Alternating Least Squares algorithm and some other useful functions (like the area under the curve calculation). Other parts of the tutorial are based on a previous version of the Implicit library and had to be reworked.

The dataset used for this work is from Kaggle [Vipin Kumar Transaction Data](https://www.kaggle.com/vipin20/transaction-data):

## Context

This is a item purchased transactions data. It has 8 columns.
This data makes you familer with transactions data.

## Content

Data description is :-

* UserId -It is a unique ID for all User Id
* TransactionId -It contains unique Transactions ID
* TransactionTime -It contains Transaction Time
* ItemCode -It contains item code that item will be purchased
* ItemDescription -It contains Item description
* NumberOfItemPurchased -It contains total number of items Purchased
* CostPerltem -Cost per item Purchased
* Country -Country where item purchased


# Global Imports

In [1]:
import pandas as pd
import numpy as np
import random
from matplotlib import pyplot as plt
import implicit
import scipy
from sklearn import metrics
from pandas.api.types import CategoricalDtype
import wandb

In [2]:
%run Common-Functions.ipynb

In [3]:
## Hyperparameter Tuning with Weights & Biases


In [4]:
sweep_config = {
    "name": "jewelry-sweep",
    "method": "bayes",  # grid, random
    "metric": {"name": "prediction_auc", "goal": "maximize"},
    "parameters": {
        "percent_test": {"min":0.1, "max":0.3},
        "alpha": {"min":1.0, "max":50.0 },
        "regularization": {"min":0.001, "max":.3},
        "iterations": {"min":20, "max":100}
    },
}

sweep_id = wandb.sweep(sweep_config, project="jewelry")

def sweep():

    # Initialize a new wandb run
    with wandb.init() as run:

        transactions = pd.read_pickle('../data/interim/jewelry/transactions.gz')

        transaction_list = list(np.sort(transactions.order_id.unique())) # Get our unique customers
        item_list = list(transactions.product_id.unique()) # Get our unique products that were purchased
        quantity_list = list(transactions.quantity) # All of our purchases

        cols = transactions.order_id.astype(CategoricalDtype(categories=transaction_list, ordered=True)).cat.codes 
        # Get the associated row indices
        rows = transactions.product_id.astype(CategoricalDtype(categories=item_list, ordered=True)).cat.codes 
        # Get the associated column indices
        purchases_sparse = scipy.sparse.csr_matrix((quantity_list, (rows, cols)), shape=(len(item_list), len(transaction_list)))

        product_train, product_test, products_altered, transactions_altered = make_train(purchases_sparse, pct_test = wandb.config['percent_test'])

        model = implicit.als.AlternatingLeastSquares(factors=64,
                                        regularization=wandb.config['regularization'],
                                        iterations=wandb.config['iterations'])

        alpha = wandb.config['alpha']                                                                  
        model.fit((product_train * alpha).astype('double'))

        user_vecs = model.user_factors
        item_vecs = model.item_factors

        test, popular = calc_mean_auc(product_train, products_altered, 
                      [scipy.sparse.csr_matrix(item_vecs), scipy.sparse.csr_matrix(user_vecs.T)], product_test)
        print('Logging Test Value:',test)
        wandb.log({
            'prediction_auc': test
        })


wandb.agent(sweep_id, function=sweep, count=100)

Create sweep with ID: fruslu9x
Sweep URL: https://wandb.ai/jperiodlangley/jewelry/sweeps/fruslu9x
[34m[1mwandb[0m: Agent Starting Run: lulytiso with config:
[34m[1mwandb[0m: 	alpha: 40.3188987308305
[34m[1mwandb[0m: 	iterations: 44
[34m[1mwandb[0m: 	percent_test: 0.23350583591293908
[34m[1mwandb[0m: 	regularization: 0.1252363678675451
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: wandb version 0.12.1 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade




  0%|          | 0/44 [00:00<?, ?it/s]

Logging Test Value: 0.5500579862660521


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.55006
_runtime,305.0
_timestamp,1630375516.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Agent Starting Run: bpti9gwc with config:
[34m[1mwandb[0m: 	alpha: 9.758198035269368
[34m[1mwandb[0m: 	iterations: 58
[34m[1mwandb[0m: 	percent_test: 0.23218566162369142
[34m[1mwandb[0m: 	regularization: 0.158060211381857
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: wandb version 0.12.1 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


  0%|          | 0/58 [00:00<?, ?it/s]

Logging Test Value: 0.5408465753565608


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
prediction_auc,0.54085
_runtime,335.0
_timestamp,1630375855.0
_step,0.0


0,1
prediction_auc,▁
_runtime,▁
_timestamp,▁
_step,▁


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Sweep Agent: Exiting.
