Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling the fit method on the load_from_df dataset produces AttributeError: DatasetAutoFolds instance has no attribute 'global_mean' #161

Closed
sachinpuranik99 opened this issue Apr 1, 2018 · 6 comments

Comments

@sachinpuranik99
Copy link

sachinpuranik99 commented Apr 1, 2018

Description

Calling the fit method on the load_from_df dataset without split produces AttributeError: DatasetAutoFolds instance has no attribute 'global_mean'

Steps/Code to Reproduce

import pandas as pd

from surprise import Dataset
from surprise import Reader
from surprise.model_selection import cross_validate
from surprise import SVD
from surprise.model_selection import cross_validate, KFold

#Creation of the dataframe. Column names are irrelevant.
ratings_dict = {'itemID': [1, 1, 1, 2, 2],
'userID': [9, 32, 2, 45, 'user_foo'],
'rating': [3, 2, 4, 3, 1]}
df1 = pd.DataFrame(ratings_dict)

reader = Reader()
data = Dataset.load_from_df(df1[['userID', 'itemID', 'rating']], reader)

algo = SVD()
algo.fit(data)

Expected Results

I was expecting it to fit the model on this train data and I was planning to run predict method on the test data to get the actual predictions

Actual Results

AttributeError Traceback (most recent call last)
in ()
17
18 algo = SVD()
---> 19 algo.fit(data)
20
21 #kf = KFold(n_splits=3)

C:\Users\Sachin\Anaconda2\lib\site-packages\surprise\prediction_algorithms\matrix_factorization.pyx in surprise.prediction_algorithms.matrix_factorization.SVD.fit()
153
154 AlgoBase.fit(self, trainset)
--> 155 self.sgd(trainset)
156
157 return self

C:\Users\Sachin\Anaconda2\lib\site-packages\surprise\prediction_algorithms\matrix_factorization.pyx in surprise.prediction_algorithms.matrix_factorization.SVD.sgd()
202 cdef int u, i, f
203 cdef double r, err, dot, puf, qif
--> 204 cdef double global_mean = self.trainset.global_mean
205
206 cdef double lr_bu = self.lr_bu

AttributeError: DatasetAutoFolds instance has no attribute 'global_mean'

Versions

Windows-10-10.0.16299
('Python', '2.7.14 |Anaconda custom (64-bit)| (default, Oct 15 2017, 03:34:40) [MSC v.1500 64 bit (AMD64)]')
('surprise', '1.0.5')

Additional info.

However if I run the fit and predict using KFold split, it works properly. My intention is I have a separate test and train data and I want to fit the model on the train data without any KFolding and run the predict on the test data.

@NicolasHug
Copy link
Owner

You need to use build_full_trainset.
Please refer to the doc.

@kirtigroover
Copy link

Hi , were you able to find a solution for your query?
I am in similar situation, where I have test and train data already separated, I just want to train data on my trainset and then test it on test data.
I see that there are functions to build trainset but no function to built testset(rather than split).

It would be great if you can share your how you overcame this ?

@xinyuewang1
Copy link

Hi , were you able to find a solution for your query?
I am in similar situation, where I have test and train data already separated, I just want to train data on my trainset and then test it on test data.
I see that there are functions to build trainset but no function to built testset(rather than split).

It would be great if you can share your how you overcame this ?

Hi kirtigroover,
I came across the similar situation and so far I think the working solution would be:
writing a for loop to feed each line into model to get prediction.

I'm pretty sure you know how to do that with ref to

uid = str(196)  # raw user id (as in the ratings file). They are **strings**!
iid = str(302)  # raw item id (as in the ratings file). They are **strings**!

# get a prediction for specific users and items.
pred = algo.predict(uid, iid, r_ui=4, verbose=True)

To retrieve est result, use .est.

Hope this help. :)

@kirtigroover
Copy link

Thanks for your reply!! :-)

I have been also trying to use file reader, which allows usage of user defied test and train data. Hopefully it will work.

@07priyayadav
Copy link

Hi , were you able to find a solution for your query?
I am in similar situation, where I have test and train data already separated, I just want to train data on my trainset and then test it on test data.
I see that there are functions to build trainset but no function to built testset(rather than split).
It would be great if you can share your how you overcame this ?

Hi kirtigroover,
I came across the similar situation and so far I think the working solution would be:
writing a for loop to feed each line into model to get prediction.

I'm pretty sure you know how to do that with ref to

uid = str(196)  # raw user id (as in the ratings file). They are **strings**!
iid = str(302)  # raw item id (as in the ratings file). They are **strings**!

# get a prediction for specific users and items.
pred = algo.predict(uid, iid, r_ui=4, verbose=True)

To retrieve est result, use .est.

Hope this help. :)

Hi , were you able to find a solution for your query?
I am in similar situation, where I have test and train data already separated, I just want to train data on my trainset and then test it on test data. My test dataset don't have the rating column.

It would be great if you can share your how you overcame this ?

@kirtigroover
Copy link

Hi , were you able to find a solution for your query?
I am in similar situation, where I have test and train data already separated, I just want to train data on my trainset and then test it on test data. My test dataset don't have the rating column.

It would be great if you can share your how you overcame this ?

yes, I was able to find a solution, go through documentation and look for "load_from_folds(). " this is the function you will be using. this function can only read from a file not directly from dataframe.

so just write your dataframe into file also you would need to make sure your files have 4 columns(userid, itemid, rating, timestamp). If you dont have any data for any of these columns just keep them blank.

Now as you mentioned , your testset doesnt have ratings, so you would need to add ratings as well as timestamp column.

You can also use xinyuewang1 approach of training the dataset on your traindata using build_full_trainset(), and then traversing through your test dataset using a for loop and getting prediction one by one, instead of getting prediction for entire testset at one go.

I hope this make sense to you. Hope this would help!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants