## Popularity Model

Note: this notebook must be run in a Linux environment. In our case, using **WSL**.

In [106]:
import turicreate as tc
import numpy as np
import pandas as pd

from sklearn import metrics

In [57]:
interactions = tc.SFrame.read_csv('data_clean_recommendation.csv', verbose= False);

In [104]:
def compute_auc_at_k(precision_at_k: float, recall_at_k:float) -> float:
    '''
        Computes the AUC@k using the provided precision_at_k and recall_at_k
    '''
    sorted_idx = recall_at_k.argsort()
    recall_sorted = recall_at_k[sorted_idx]
    sorted_idx = precision_at_k.argsort()
    precision_sorted = precision_at_k[sorted_idx]

    auc_at_k = np.round(metrics.auc(precision_sorted, recall_sorted), decimals=2)
    
    return auc_at_k

### Create train and test subsets

In [7]:
interactions_main = interactions[['cac', 'product_code', 'volume_primary_units']]
training_data, test_data = tc.recommender.util.random_split_by_user(interactions_main, 'cac', 'product_code')

# Modeling

### Popularity Model

In [8]:
# https://medium.com/@acalamea/introduction-to-product-recommender-with-apples-turi-create-7f9f02fd0063
popularity_model = tc.recommender.popularity_recommender.create(training_data,
                                        user_id='cac',
                                        item_id='product_code')

Compute MAP@k using 1, 5, 10 and 15 for k

In [12]:
popularity_model.evaluate(test_data, cutoffs = [1, 5, 10, 15])




Precision and recall summary statistics by cutoff
+--------+----------------------+------------------------+
| cutoff |    mean_precision    |      mean_recall       |
+--------+----------------------+------------------------+
|   1    | 0.01548387096774194  | 5.2714209114485375e-05 |
|   5    | 0.010838709677419357 |  0.000925989022967075  |
|   10   | 0.009032258064516128 | 0.0027976787293925802  |
|   15   | 0.009806451612903224 |  0.004014982131571899  |
+--------+----------------------+------------------------+
[4 rows x 3 columns]



{'precision_recall_by_user': Columns:
 	cac	str
 	cutoff	int
 	precision	float
 	recall	float
 	count	int
 
 Rows: 3100
 
 Data:
 +--------+--------+-----------+--------+-------+
 |  cac   | cutoff | precision | recall | count |
 +--------+--------+-----------+--------+-------+
 | cac_5  |   1    |    0.0    |  0.0   |  516  |
 | cac_5  |   5    |    0.0    |  0.0   |  516  |
 | cac_5  |   10   |    0.0    |  0.0   |  516  |
 | cac_5  |   15   |    0.0    |  0.0   |  516  |
 | cac_22 |   1    |    0.0    |  0.0   |  195  |
 | cac_22 |   5    |    0.0    |  0.0   |  195  |
 | cac_22 |   10   |    0.0    |  0.0   |  195  |
 | cac_22 |   15   |    0.0    |  0.0   |  195  |
 | cac_24 |   1    |    0.0    |  0.0   |  375  |
 | cac_24 |   5    |    0.0    |  0.0   |  375  |
 +--------+--------+-----------+--------+-------+
 [3100 rows x 5 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.,
 'precision_rec

## Popularity Model - Profitability

In [107]:
precision_recall = popularity_model.evaluate(test_data, cutoffs = [1, 5, 10, 15], verbose= False)
precision_recall = precision_recall['precision_recall_by_user'].to_dataframe()
interactions_df = interactions.to_dataframe()

shuffle_count: int = 50
test_split, train_split = np.split(interactions_df['invoiced_sales'], [int(0.2 * len(interactions_df))])

results = {}

for k in precision_recall['cutoff'].unique().tolist():
    avg_profit_at_k: float = 0.0
        
    precision_recall_at_k = precision_recall[precision_recall['cutoff'] == k]
    map_at_k = precision_recall_at_k['precision'].mean()
    auc_at_k = compute_auc_at_k(precision_recall_at_k['precision'].values, precision_recall_at_k['recall'].values)

    for i in range(shuffle_count):   
        test_split = test_split.sample(frac=1)
        avg_profit_at_k = avg_profit_at_k + (test_split * map_at_k).sum()

    avg_profit_at_k = avg_profit_at_k / shuffle_count
    
    results[k] = [auc_at_k, map_at_k, avg_profit_at_k]
    
    print('AUC@', k, ': ', auc_at_k)
    print('MAP@', k, ": ", map_at_k)
    print('Profitability @', k, ": ", avg_profit_at_k)



AUC@ 1 :  0.0
MAP@ 1 :  0.015483870967741935
Profitability @ 1 :  7168383.208517057
AUC@ 5 :  0.01
MAP@ 5 :  0.010838709677419355
Profitability @ 5 :  5017868.245961936
AUC@ 10 :  0.0
MAP@ 10 :  0.00903225806451613
Profitability @ 10 :  4181556.871634952
AUC@ 15 :  0.06
MAP@ 15 :  0.009806451612903225
Profitability @ 15 :  4539976.032060802


## Conclusion

The model's conclusion has been written in the main notebook, at "Recommendation Engine - Modeling.ipynb"

Recall to uninstall tensorflow, turicreate, jupyter, pip3