-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support negative scores #114
Comments
You're the second person to ask for this : #90 =) I've run some experiments on doing this before (at previous a job working on news recommendations) - and found that it didn't make an appreciable difference in the results for me. I had much better luck using the implicit positive signals only to rank items, and then using a different model with the positive/negative values to filter the ranked list. However, the link you supplied has a different point of view and says that they found it helpful 🤷♂️ Since it is an easy change to make that multiple people are asking for, I'll add this sometime soon. |
@benfred I admit, I became one of such people and instead of convergence I received NaN, negative loss and incomprehensible mistakes, until I understood what was the matter. Perhaps handling this type of error for the positive confidence case would be a good start |
To add a data point, at iheartradio we had quite a bit of thumbs down data. I found adding the negative information was useful, especially for users with complex taste in music. For example, a user who likes punk but thumbed down emo would understandbly expect less emo in their recommendations/playlists. |
I have a rough draft of this here - https://github.com/benfred/implicit/compare/negative_als_prefs I ran a simple experiment to test this out using the movielens datasets. The idea is to pass in low ratings (1 or 2 stars) as negative items here - and experiment with included these as negative values versus just removing them entirely. I experimented with different confidence values (multiplying by a constant alpha). Anyways - this code improved things in 3 cases and hurt perfomance in 5 cases. I will run some more tests on larger datasets, I only included the smaller movielens ones to test out quickly.
The code for this experiment is this: import numpy as np
from implicit.evaluation import precision_at_k, train_test_split
from implicit.als import AlternatingLeastSquares
from implicit.datasets.movielens import get_movielens
import logging
logging.basicConfig(level=logging.DEBUG)
# Train model including negative prefs and report p@10
# Note: setting alpha=1 should have identical results (since missing entries are assumed
# negative with a confidence of 1)
SEED = 32
FACTORS = 128
REGULARIZATION = 50
USE_CG = True
stats = []
for ALPHA in [20, 10, 5, 2, 1]:
for dataset in ["100k", "1m"]:
movies, ratings = get_movielens(dataset)
# remove ratings between 2 and 4 (ambigiuous)
# set ratings >= 4 as being positive
# set ratings <= 2 as being negative
ratings.data[(ratings.data > 2) & (ratings.data < 4)] = 0
ratings.eliminate_zeros()
ratings.data[ratings.data <= 2] = -ALPHA
ratings.data[ratings.data >= 4] = ALPHA
# generate a random test/train split (removing negatives from test set)
np.random.seed(32)
train, test = train_test_split(ratings)
test.data[test.data < 0] = 0
test.eliminate_zeros()
# train a first model with the negatives in the training set, and get p@10
np.random.seed(SEED)
model = AlternatingLeastSquares(factors=FACTORS, regularization=REGULARIZATION,
use_cg=USE_CG,
calculate_training_loss=False)
model.fit(train)
p_neg = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)
# train another model without the negatives and get p@10
positive_train = train.copy()
positive_train.data[train.data < 0] = 0
positive_train.eliminate_zeros()
np.random.seed(SEED)
model = AlternatingLeastSquares(factors=FACTORS, regularization=REGULARIZATION,
use_cg=USE_CG,
calculate_training_loss=False)
model.fit(positive_train)
p_pos = precision_at_k(model, train.T.tocsr(), test.T.tocsr(), K=10, num_threads=4)
# write out some stats
print("ALPHA %s" % ALPHA)
print("Have %i positive entries, %i negative entries" %
(len(ratings.data[ratings.data > 0]), len(ratings.data[ratings.data < 0])))
print("p@10 (including negatives): %.4f" % p_neg)
print("p@10 (excluding negatives): %.4f" % p_pos)
print("Ratio %.4f" % (p_neg/p_pos))
stats.append(("movielens-" + dataset.ljust(4), str(ALPHA),
"%.4f" % p_neg, "%.4f" % p_pos, "%.4f" % (p_neg/p_pos)))
# print out markdown table body
for row in stats:
print("| " + (" | ".join(row)) + " |") It's possible I messed up somewhere, but I got similar results using both the CG and cholesky optimizer - and the change for supporting this in the cholesky optimizer is pretty simple. Edit: A better test might be to test out if including the negative items prevents the withheld negative items from being recommended. Also just testing on movielens probably isn't sufficient. Will try those out later |
I decided to conduct an experiment on my data. Here's what I got Parameters:
Data:
Shows small improvements on my implicit data
But the speed left much to be desired: However, I think this was a draft decision |
@benfred |
@lucidyan: thanks for running the experiment on your data! If I’m reading that right - implicit seems to generate more accurate recommendations than spark ? =). I think movielens has a pretty ordinary ratings distribution. I might be calculating this wrong - but for the 1m dataset it seems like ~16% are negative (<=2 stars) and ~57% are positive (>= 4 stars): In [1]: from implicit.datasets.movielens import get_movielens
In [2]: _, ratings = get_movielens("100k")
In [3]: len(ratings.data), len(ratings.data[ratings.data <= 2]), len(ratings.data[ratings.data >= 4])
Out[3]: (100000, 17480, 55375)
In [4]: _, ratings = get_movielens("1m")
In [5]: len(ratings.data), len(ratings.data[ratings.data <= 2]), len(ratings.data[ratings.data >= 4])
Out[5]: (1000209, 163731, 575281) A long time ago I wrote a blog post looking at recommender dataset distributions: https://www.benfrederickson.com/rating-set-distributions/ I looked at datasets from netflix/yelp/amazon/reddit and found they all broke down roughly like that. (Also datasets from companies I’ve worked at have been similarly skewed). I still want to try out different datasets though, movielens seems to be truncated to only contain users with at least 20 ratings. |
Allow negative confidence values to be passed in, signifying that the user disliked the item. This lets use set higher confidence values for the known negatives. Experiments here indicate setting known negatives to a higher confidence level slightly reduces p@10, but signifincantly reduces the likelihood of withheld negative items to be recommended: #114
I ran this also on the ml-10m and ml-20m datasets:
Including the negative ratings seems to reduce p@10 in most cases, but the impact is pretty small for the larger datasets. However, the interesting thing is that including the negatives with higher confidence also seems to reduce how often the withheld negatives in the test set get recommended. Changing the above code to only test for withheld negatives (by filtering out positive test data by going
So for the movielens datasets, P@10 is only slightly reduce by including known negatives at a higher confidence level - but withheld negatives are much less likely to be recommended I also ran this on the reddit dataset I just added here, and got similar results. Setting alpha=20
So p@10 for the reddit dataset here is basically unchanged, while the negatives are almost half as likely to be recommended. Anyways, I'm probably going to merge the PR. It doesn't seem to hurt p@K much - and can significantly reduce the number of bad results being recommended. edit: link to pr #119 |
Allow negative confidence values to be passed in, signifying that the user disliked the item. This lets use set higher confidence values for the known negatives. Experiments here indicate setting known negatives to a higher confidence level slightly reduces p@10, but signifincantly reduces the likelihood of withheld negative items to be recommended: #114
Done here #119 , support will be in next version |
This library should support negative scores. They can be quite useful, and work fit elegantly with the original ALS algorithm. Spark's ALS implemented this change years ago: https://gite.lirmm.fr/yagoubi/spark/commit/9e63f80e75bb6d9bbe6df268908c3219de6852d9
The text was updated successfully, but these errors were encountered: