Support negative scores #114

ravimody · 2018-05-24T19:32:49Z

This library should support negative scores. They can be quite useful, and work fit elegantly with the original ALS algorithm. Spark's ALS implemented this change years ago: https://gite.lirmm.fr/yagoubi/spark/commit/9e63f80e75bb6d9bbe6df268908c3219de6852d9

benfred · 2018-05-25T16:48:08Z

You're the second person to ask for this : #90 =)

I've run some experiments on doing this before (at previous a job working on news recommendations) - and found that it didn't make an appreciable difference in the results for me. I had much better luck using the implicit positive signals only to rank items, and then using a different model with the positive/negative values to filter the ranked list. However, the link you supplied has a different point of view and says that they found it helpful 🤷‍♂️

Since it is an easy change to make that multiple people are asking for, I'll add this sometime soon.

lucidyan · 2018-05-25T18:38:15Z

@benfred
I think a lot of people used before this implementation of Spark, where it is supported out of the box and maybe they already implemented some kind of business logic with negative feedback. FYI, in our case, it showed some benefition in production. So this would be a reasonable step.

I admit, I became one of such people and instead of convergence I received NaN, negative loss and incomprehensible mistakes, until I understood what was the matter. Perhaps handling this type of error for the positive confidence case would be a good start

ravimody · 2018-05-25T18:58:48Z

To add a data point, at iheartradio we had quite a bit of thumbs down data. I found adding the negative information was useful, especially for users with complex taste in music. For example, a user who likes punk but thumbed down emo would understandbly expect less emo in their recommendations/playlists.

benfred · 2018-05-26T00:01:06Z

I have a rough draft of this here - https://github.com/benfred/implicit/compare/negative_als_prefs

I ran a simple experiment to test this out using the movielens datasets. The idea is to pass in low ratings (1 or 2 stars) as negative items here - and experiment with included these as negative values versus just removing them entirely. I experimented with different confidence values (multiplying by a constant alpha).

Anyways - this code improved things in 3 cases and hurt perfomance in 5 cases. I will run some more tests on larger datasets, I only included the smaller movielens ones to test out quickly.

dataset	alpha	p@10 with negatives	p@10 without negatives	ratio
movielens-100k	20	0.3036	0.3238	0.9377
movielens-1m	20	0.2671	0.2813	0.9493
movielens-100k	10	0.3287	0.3402	0.9662
movielens-1m	10	0.2996	0.3075	0.9743
movielens-100k	5	0.3239	0.3190	1.0153
movielens-1m	5	0.3311	0.3330	0.9943
movielens-100k	2	0.2435	0.2432	1.0013
movielens-1m	2	0.3209	0.3205	1.0010
movielens-100k	1	0.1909	0.1909	1.0000
movielens-1m	1	0.2556	0.2556	1.0000

The code for this experiment is this:

import numpy as np

from implicit.evaluation import precision_at_k, train_test_split
from implicit.als import AlternatingLeastSquares
from implicit.datasets.movielens import get_movielens

import logging
logging.basicConfig(level=logging.DEBUG)


# Train model including negative prefs and report p@10
# Note: setting alpha=1 should have identical results (since missing entries are assumed
# negative with a confidence of 1)
SEED = 32
FACTORS = 128
REGULARIZATION = 50
USE_CG = True

stats = []
for ALPHA in [20, 10, 5, 2, 1]:
    for dataset in ["100k", "1m"]:
        movies, ratings = get_movielens(dataset)

        # remove ratings between 2 and 4 (ambigiuous)
        # set ratings >= 4 as being positive
        # set ratings <= 2 as being negative
        ratings.data[(ratings.data > 2) & (ratings.data < 4)] = 0
        ratings.eliminate_zeros()
        ratings.data[ratings.data <= 2] = -ALPHA
        ratings.data[ratings.data >= 4] = ALPHA

        # generate a random test/train split (removing negatives from test set)
        np.random.seed(32)
        train, test = train_test_split(ratings)
        test.data[test.data < 0] = 0
        test.eliminate_zeros()

        # train a first model with the negatives in the training set, and get p@10
        np.random.seed(SEED)
        model = AlternatingLeastSquares(factors=FACTORS, regularization=REGULARIZATION,
                                        use_cg=USE_CG,
                                        calculate_training_loss=False)
        model.fit(train)
        p_neg = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)

        # train another model without the negatives and get p@10
        positive_train = train.copy()
        positive_train.data[train.data < 0] = 0
        positive_train.eliminate_zeros()

        np.random.seed(SEED)
        model = AlternatingLeastSquares(factors=FACTORS, regularization=REGULARIZATION,
                                        use_cg=USE_CG,
                                        calculate_training_loss=False)
        model.fit(positive_train)
        p_pos = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)

        # write out some stats
        print("ALPHA %s" % ALPHA)
        print("Have %i positive entries, %i negative entries" %
              (len(ratings.data[ratings.data > 0]), len(ratings.data[ratings.data < 0])))
        print("p@10 (including negatives): %.4f" % p_neg)
        print("p@10 (excluding negatives): %.4f" % p_pos)
        print("Ratio %.4f" % (p_neg/p_pos))

        stats.append(("movielens-" + dataset.ljust(4), str(ALPHA),
                      "%.4f" % p_neg, "%.4f" % p_pos, "%.4f" % (p_neg/p_pos)))

# print out markdown table body
for row in stats:
    print("| " + (" | ".join(row)) + " |")

It's possible I messed up somewhere, but I got similar results using both the CG and cholesky optimizer - and the change for supporting this in the cholesky optimizer is pretty simple.

Edit: A better test might be to test out if including the negative items prevents the withheld negative items from being recommended. Also just testing on movielens probably isn't sufficient. Will try those out later

lucidyan · 2018-05-28T19:01:32Z

I decided to conduct an experiment on my data. Here's what I got

Parameters:

Regularization: 0.1
Iterations: 10
Factors: 40

Data:

Implicit feedback
Negative/Positive interactions ratio: 0.004266
Splitted by date ~ 90 / 10
Train data
<3940135x25367 sparse matrix of type '<class 'numpy.float64'>'
Test data
<3949967x25433 sparse matrix of type '<class 'numpy.float64'>'

Shows small improvements on my implicit data

Implicit ALS (positive feedback only)
['recall_at', 'precision_at', 'f1_at', 'success_at']
[[0.0306  0.0897  0.04054 0.0897 ]
 [0.05156 0.07805 0.05359 0.1561 ]
 [0.06889 0.0713  0.06032 0.2139 ]
 [0.08381 0.06545 0.06312 0.2618 ]
 [0.09788 0.06148 0.06508 0.3074 ]
 [0.10962 0.05793 0.06561 0.3476 ]
 [0.12102 0.05513 0.06585 0.3859 ]
 [0.13202 0.05257 0.06573 0.4206 ]
 [0.14143 0.0505  0.06534 0.4545 ]]

Implicit ALS  (With negative feedback)
['recall_at', 'precision_at', 'f1_at', 'success_at']
[[0.03048 0.0902  0.04038 0.0902 ]
 [0.05267 0.0792  0.05477 0.1584 ]
 [0.07034 0.07193 0.06119 0.2158 ]
 [0.08487 0.06632 0.06405 0.2653 ]
 [0.09879 0.0619  0.06567 0.3095 ]
 [0.11111 0.0584  0.06631 0.3504 ]
 [0.12109 0.0552  0.06602 0.3864 ]
 [0.13186 0.05305 0.0661  0.4244 ]
 [0.14207 0.05084 0.06573 0.4576 ]]

Spark ALS (positive feedback only)
['recall_at', 'precision_at', 'f1_at', 'success_at']
[[0.00904 0.0285  0.01202 0.0285 ]
 [0.01881 0.02855 0.01948 0.0571 ]
 [0.02917 0.02933 0.02495 0.088  ]
 [0.04095 0.03125 0.03045 0.125  ]
 [0.05183 0.03172 0.03395 0.1586 ]
 [0.06395 0.03245 0.03733 0.1947 ]
 [0.07618 0.03337 0.04047 0.2336 ]
 [0.08914 0.03422 0.04343 0.2738 ]
 [0.1021  0.03457 0.04563 0.3111 ]]

Spark ALS (With negative feedback)
['recall_at', 'precision_at', 'f1_at', 'success_at']
[[0.00886 0.0272  0.01205 0.0272 ]
 [0.01921 0.02985 0.02024 0.0597 ]
 [0.02903 0.0306  0.02562 0.0918 ]
 [0.03983 0.0312  0.03004 0.1248 ]
 [0.05142 0.03208 0.03419 0.1604 ]
 [0.06304 0.03297 0.03764 0.1978 ]
 [0.0752  0.03357 0.04061 0.235  ]
 [0.08738 0.03406 0.04312 0.2725 ]
 [0.10022 0.0345  0.04544 0.3105 ]]

But the speed left much to be desired:
01:52/05:36 (Negative Feedback)

However, I think this was a draft decision

lucidyan · 2018-05-28T19:04:45Z

@benfred
I also think that MovieLens wasn't the best dataset. The ratio of negatives (<= 2) to positives (>=4) there is 40(1M) and 27 (100K)!

benfred · 2018-05-28T21:47:52Z

@lucidyan: thanks for running the experiment on your data! If I’m reading that right - implicit seems to generate more accurate recommendations than spark ? =).

I think movielens has a pretty ordinary ratings distribution. I might be calculating this wrong - but for the 1m dataset it seems like ~16% are negative (<=2 stars) and ~57% are positive (>= 4 stars):

In [1]: from implicit.datasets.movielens import get_movielens

In [2]: _, ratings = get_movielens("100k")

In [3]: len(ratings.data), len(ratings.data[ratings.data <= 2]), len(ratings.data[ratings.data >= 4])
Out[3]: (100000, 17480, 55375)

In [4]: _, ratings = get_movielens("1m")

In [5]: len(ratings.data), len(ratings.data[ratings.data <= 2]), len(ratings.data[ratings.data >= 4])
Out[5]: (1000209, 163731, 575281)

A long time ago I wrote a blog post looking at recommender dataset distributions: https://www.benfrederickson.com/rating-set-distributions/ I looked at datasets from netflix/yelp/amazon/reddit and found they all broke down roughly like that. (Also datasets from companies I’ve worked at have been similarly skewed).

I still want to try out different datasets though, movielens seems to be truncated to only contain users with at least 20 ratings.

Allow negative confidence values to be passed in, signifying that the user disliked the item. This lets use set higher confidence values for the known negatives. Experiments here indicate setting known negatives to a higher confidence level slightly reduces p@10, but signifincantly reduces the likelihood of withheld negative items to be recommended: #114

benfred · 2018-05-29T23:16:55Z

I ran this also on the ml-10m and ml-20m datasets:

dataset	alpha	p@10 with negatives	p@10 without negatives	ratio
movielens-100k	20	0.3034	0.3233	0.9386
movielens-1m	20	0.2672	0.2812	0.9500
movielens-10m	20	0.2734	0.2783	0.9824
movielens-20m	20	0.2707	0.2742	0.9873
movielens-100k	10	0.3287	0.3402	0.9662
movielens-1m	10	0.2997	0.3074	0.9747
movielens-10m	10	0.2954	0.2965	0.9962
movielens-20m	10	0.2900	0.2904	0.9985
movielens-100k	5	0.3239	0.3190	1.0153
movielens-1m	5	0.3312	0.3331	0.9942
movielens-10m	5	0.2958	0.2954	1.0014
movielens-20m	5	0.2902	0.2896	1.0021
movielens-100k	2	0.2435	0.2432	1.0013
movielens-1m	2	0.3209	0.3205	1.0010
movielens-10m	2	0.2794	0.2794	1.0000
movielens-20m	2	0.2661	0.2659	1.0008
movielens-100k	1	0.1909	0.1909	1.0000
movielens-1m	1	0.2556	0.2556	1.0000
movielens-10m	1	0.3081	0.3081	1.0000
movielens-20m	1	0.2600	0.2600	1.0000

Including the negative ratings seems to reduce p@10 in most cases, but the impact is pretty small for the larger datasets.

However, the interesting thing is that including the negatives with higher confidence also seems to reduce how often the withheld negatives in the test set get recommended. Changing the above code to only test for withheld negatives (by filtering out positive test data by goingtest.data[test.data > 0] = 0) - and the same code then computes something like the false discovery rate@k:

dataset	alpha	fdr@10 with negatives	fdr@10 without negatives	ratio
movielens-100k	20	0.0329	0.0391	0.8426
movielens-1m	20	0.0175	0.0251	0.6977
movielens-100k	10	0.0344	0.0406	0.8482
movielens-1m	10	0.0180	0.0238	0.7571
movielens-100k	5	0.0406	0.0485	0.8358
movielens-1m	5	0.0180	0.0220	0.8186
movielens-100k	2	0.0402	0.0438	0.9174
movielens-1m	2	0.0193	0.0203	0.9520
movielens-100k	1	0.0348	0.0348	1.0000
movielens-1m	1	0.0178	0.0178	1.0000

So for the movielens datasets, P@10 is only slightly reduce by including known negatives at a higher confidence level - but withheld negatives are much less likely to be recommended

I also ran this on the reddit dataset I just added here, and got similar results. Setting alpha=20

Reddit, alpha=20
Have 19446392 positive entries, 3633062 negative entries
p@10 (including negatives): 0.0617
p@10 (excluding negatives): 0.0618
fdr@10 (including negatives): 0.0041
fdr@10 (excluding negatives): 0.0075
Ratio 0.5383

So p@10 for the reddit dataset here is basically unchanged, while the negatives are almost half as likely to be recommended.

Anyways, I'm probably going to merge the PR. It doesn't seem to hurt p@K much - and can significantly reduce the number of bad results being recommended.

edit: link to pr #119

Allow negative confidence values to be passed in, signifying that the user disliked the item. This lets use set higher confidence values for the known negatives. Experiments here indicate setting known negatives to a higher confidence level slightly reduces p@10, but signifincantly reduces the likelihood of withheld negative items to be recommended: #114

benfred · 2018-06-04T16:46:10Z

Done here #119 , support will be in next version

benfred added the enhancement label May 25, 2018

benfred mentioned this issue May 29, 2018

Add option for negative preferences in ALS #119

Merged

ita9naiwa mentioned this issue Jun 2, 2018

Support for negative feedback in BPR model #122

Closed

benfred closed this as completed Jun 4, 2018

franktma mentioned this issue Jul 10, 2018

Idea for making the preference matrix as a separate input #130

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support negative scores #114

Support negative scores #114

ravimody commented May 24, 2018

benfred commented May 25, 2018

lucidyan commented May 25, 2018

ravimody commented May 25, 2018

benfred commented May 26, 2018 •

edited

lucidyan commented May 28, 2018

lucidyan commented May 28, 2018

benfred commented May 28, 2018

benfred commented May 29, 2018 •

edited

benfred commented Jun 4, 2018

Support negative scores #114

Support negative scores #114

Comments

ravimody commented May 24, 2018

benfred commented May 25, 2018

lucidyan commented May 25, 2018

ravimody commented May 25, 2018

benfred commented May 26, 2018 • edited

lucidyan commented May 28, 2018

lucidyan commented May 28, 2018

benfred commented May 28, 2018

benfred commented May 29, 2018 • edited

benfred commented Jun 4, 2018

benfred commented May 26, 2018 •

edited

benfred commented May 29, 2018 •

edited