Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AlgoBase.predict() returns the same value for all uid and iid values #140

Closed
skurzhanskyi opened this issue Feb 2, 2018 · 10 comments
Closed

Comments

@skurzhanskyi
Copy link

skurzhanskyi commented Feb 2, 2018

Description

I tried to test Matrix Factorization-based algorithms on my own dataset and admitted that AlgoBase.predict() returns the same value for all uid and iid values. Then I tested on default dataset and the result was the same. Maybe, there's a mistake in my code, but I can't get it at the moment.

Steps/Code to Reproduce

from surprise import SVD
from surprise import Dataset


data = Dataset.load_builtin('ml-100k')
algo = SVD()
trainset = data.build_full_trainset()
algo.fit(trainset)

users_ids = trainset.all_users()
items_ids = trainset.all_items()

ratings = []
for i in range(min(100, len(users_ids))):
    for j in range(min(100, len(items_ids))):
        ratings.append(algo.predict(users_ids[i], items_ids[j]).est)
print len(set(ratings))

Expected Results

10000 or smth less (not 1)

Actual Results

1

Versions

Darwin-16.7.0-x86_64-i386-64bit
Python 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 12:01:12)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
surprise 1.0.5

@NicolasHug
Copy link
Owner

That's because predict() expects raw ids but trainset.all_users/items() returns inner ids. Please see this note.

@skurzhanskyi
Copy link
Author

Thank you for your swift answer!
I really didn't note that. But, actually, firstly I tried to use data from my file, but I get something like "this item not in unknown". Unfortunately, I am not on the work computer at the moment. Will it be OK if I describe my problem in detail on Monday?

@NicolasHug
Copy link
Owner

Yes, if you tried but didn't find a solution on your own feel free to ask (of course please do some research before:) )

@skurzhanskyi
Copy link
Author

Of course. The thing is that I spent half of the day trying to solve this problem. I wouldn't write here if I spent less.

@skurzhanskyi
Copy link
Author

skurzhanskyi commented Feb 5, 2018

You, actually, were right. The problem was in my dataset. There were non-ASCII characters in items, so unicode and string variables differed. Using to_raw_iid you get string, but the field in DataFrame is unicode. So, if you check their equality, you'll get True, but the result of predict depends on the type.

Thank for your help anyway. The task may be closed. But I think converting to string in predict would be great.

@elaine-peiru
Copy link

elaine-peiru commented May 5, 2018

@skurzhanskyi I have the same problem when I using my own dataset, can you specify how you solve this problem? I also go all the same estimate rating all the time. I understand the raw id and inner id in the note, but I have no idea where I should modify in the origin code: I retrieved the inner id of user and item and converted them then put it to predict(), but I still got the same est=3.44 all the time.
Would be great if you can share your experience. Thank you :)
screen shot 2018-05-05 at 16 48 37

@skurzhanskyi
Copy link
Author

@elaine-peiru, this code look strange. You first get inner_id and than get back to row_id. Maybe the problem is with str().

@elaine-peiru
Copy link

@skurzhanskyi thanks! the question was solved by remove the str().

@alpalalpal
Copy link

Solved this issue by convert ids from string to integer or float

@olegyablokov
Copy link

I had the same issue, but the problem turned out to be the fact that the matrix I used to fit the model with had NaNs. I fixed this and it worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants