In [None]:
!pip install hnswlib

## Recommender model training

We do a small experiment to demonstrate how Approximate Nearest Neighbor (ANN) search could be performed in Cornac. First, we need to train a model that supports ANN search. Here we choose MF for simple illustration purpose. Other models that support ANN search should work in a similar fashion.

In [2]:
import cornac
from cornac.data import Reader
from cornac.datasets import netflix
from cornac.eval_methods import RatioSplit

data = netflix.load_feedback(variant="small", reader=Reader(bin_threshold=1.0))

ratio_split = RatioSplit(
    data=data,
    test_size=0.1,
    rating_threshold=1.0,
    exclude_unknowns=True,
    verbose=True,
)

mf = cornac.models.MF(
    k=50, 
    max_iter=25, 
    learning_rate=0.01, 
    lambda_reg=0.02, 
    use_bias=False,
    verbose=True,
    seed=123,
)

auc = cornac.metrics.AUC()
rec_20 = cornac.metrics.Recall(k=20)

cornac.Experiment(
    eval_method=ratio_split,
    models=[mf],
    metrics=[auc, rec_20],
    user_based=True,
).run()

rating_threshold = 1.0
exclude_unknowns = True
---
Training data:
Number of users = 9986
Number of items = 4927
Number of ratings = 547022
Max rating = 1.0
Min rating = 1.0
Global mean = 1.0
---
Test data:
Number of users = 9986
Number of items = 4927
Number of ratings = 60753
Number of unknown users = 0
Number of unknown items = 0
---
Total users = 9986
Total items = 4927

[MF] Training started!


  0%|          | 0/25 [00:00<?, ?it/s]

Optimization finished!

[MF] Evaluation started!


Ranking:   0%|          | 0/8214 [00:00<?, ?it/s]


TEST:
...
   |    AUC | Recall@20 | Train (s) | Test (s)
-- + ------ + --------- + --------- + --------
MF | 0.8542 |    0.0591 |    1.0683 |  11.7926



## Building index for ANN recommender

After MF model is trained, we need to wrap it with an ANN recommender. We employ Cornac built-in HNSWLibANN which implements [HNSW algorithm](https://arxiv.org/abs/1603.09320) for building index and doing approximate K-nearest neighbor search. More on how to tune the hyper-parameters at https://github.com/nmslib/hnswlib.

In [3]:
from cornac.models import HNSWLibANN

ann = HNSWLibANN(
    recom=mf,
    M=16,
    ef_construction=100,
    ef=50,
    num_threads=-1,
)
ann.build_index()

## Time/accuracy tradeoff

Here we measure the tradeoff between efficiency and accuracy. Let say we are doing top-20 recommendations for 10,000 users.

In [4]:
import numpy as np

K = 20
N = 10000
test_users = np.random.choice(mf.user_ids, size=N)
mf_recs = []
ann_recs = []

In [5]:
%%time
for uid in test_users:
    mf_recs.append(mf.recommend(uid, k=K))

CPU times: user 3min 12s, sys: 51.4 ms, total: 3min 12s
Wall time: 6.05 s


In [6]:
%%time
for uid in test_users:
    ann_recs.append(ann.recommend(uid, k=K))

CPU times: user 358 ms, sys: 0 ns, total: 358 ms
Wall time: 354 ms


While it took MF 6.05 seconds to complete the task, it's only 354 ms for ANN. The speed up is about 20 times. Note that our dataset contains less than 5000 items. We will see an even bigger improvement with more items and with higher dimensions of the factors.

In [7]:
recalls = []
for mf_rec, ann_rec in zip(mf_recs, ann_recs):
    recalls.append(len(set(mf_rec) & set(ann_rec[0])) / len(mf_rec))
print(np.mean(recalls) * 100.0)

97.95049999999999


In terms of recall, we only see a small drop of 2% meaning recommendations are very similar between the two. While it's almost a free lunch for this case, the numbers might differ for other cases. It's always good to make sure that ANN maintains consistent recommendations with the base model. 