Calling the fit method on the load_from_df dataset produces AttributeError: DatasetAutoFolds instance has no attribute 'global_mean' #161

sachinpuranik99 · 2018-04-01T17:13:03Z

Description

Calling the fit method on the load_from_df dataset without split produces AttributeError: DatasetAutoFolds instance has no attribute 'global_mean'

Steps/Code to Reproduce

import pandas as pd

from surprise import Dataset
from surprise import Reader
from surprise.model_selection import cross_validate
from surprise import SVD
from surprise.model_selection import cross_validate, KFold

#Creation of the dataframe. Column names are irrelevant.
ratings_dict = {'itemID': [1, 1, 1, 2, 2],
'userID': [9, 32, 2, 45, 'user_foo'],
'rating': [3, 2, 4, 3, 1]}
df1 = pd.DataFrame(ratings_dict)

reader = Reader()
data = Dataset.load_from_df(df1[['userID', 'itemID', 'rating']], reader)

algo = SVD()
algo.fit(data)

Expected Results

I was expecting it to fit the model on this train data and I was planning to run predict method on the test data to get the actual predictions

Actual Results

AttributeError Traceback (most recent call last)
in ()
17
18 algo = SVD()
---> 19 algo.fit(data)
20
21 #kf = KFold(n_splits=3)

C:\Users\Sachin\Anaconda2\lib\site-packages\surprise\prediction_algorithms\matrix_factorization.pyx in surprise.prediction_algorithms.matrix_factorization.SVD.fit()
153
154 AlgoBase.fit(self, trainset)
--> 155 self.sgd(trainset)
156
157 return self

C:\Users\Sachin\Anaconda2\lib\site-packages\surprise\prediction_algorithms\matrix_factorization.pyx in surprise.prediction_algorithms.matrix_factorization.SVD.sgd()
202 cdef int u, i, f
203 cdef double r, err, dot, puf, qif
--> 204 cdef double global_mean = self.trainset.global_mean
205
206 cdef double lr_bu = self.lr_bu

AttributeError: DatasetAutoFolds instance has no attribute 'global_mean'

Versions

Windows-10-10.0.16299
('Python', '2.7.14 |Anaconda custom (64-bit)| (default, Oct 15 2017, 03:34:40) [MSC v.1500 64 bit (AMD64)]')
('surprise', '1.0.5')

Additional info.

However if I run the fit and predict using KFold split, it works properly. My intention is I have a separate test and train data and I want to fit the model on the train data without any KFolding and run the predict on the test data.

NicolasHug · 2018-04-01T18:57:46Z

You need to use build_full_trainset.
Please refer to the doc.

kirtigroover · 2019-10-22T14:41:02Z

Hi , were you able to find a solution for your query?
I am in similar situation, where I have test and train data already separated, I just want to train data on my trainset and then test it on test data.
I see that there are functions to build trainset but no function to built testset(rather than split).

It would be great if you can share your how you overcame this ?

xinyuewang1 · 2019-10-30T10:51:59Z

Hi , were you able to find a solution for your query?
I am in similar situation, where I have test and train data already separated, I just want to train data on my trainset and then test it on test data.
I see that there are functions to build trainset but no function to built testset(rather than split).

It would be great if you can share your how you overcame this ?

Hi kirtigroover,
I came across the similar situation and so far I think the working solution would be:
writing a for loop to feed each line into model to get prediction.

I'm pretty sure you know how to do that with ref to

uid = str(196)  # raw user id (as in the ratings file). They are **strings**!
iid = str(302)  # raw item id (as in the ratings file). They are **strings**!

# get a prediction for specific users and items.
pred = algo.predict(uid, iid, r_ui=4, verbose=True)

To retrieve est result, use .est.

Hope this help. :)

kirtigroover · 2019-10-30T12:02:48Z

Thanks for your reply!! :-)

I have been also trying to use file reader, which allows usage of user defied test and train data. Hopefully it will work.

07priyayadav · 2019-11-06T22:24:34Z

Hi , were you able to find a solution for your query?
I am in similar situation, where I have test and train data already separated, I just want to train data on my trainset and then test it on test data.
I see that there are functions to build trainset but no function to built testset(rather than split).
It would be great if you can share your how you overcame this ?

Hi kirtigroover,
I came across the similar situation and so far I think the working solution would be:
writing a for loop to feed each line into model to get prediction.

I'm pretty sure you know how to do that with ref to
uid = str(196)  # raw user id (as in the ratings file). They are **strings**!
iid = str(302)  # raw item id (as in the ratings file). They are **strings**!

# get a prediction for specific users and items.
pred = algo.predict(uid, iid, r_ui=4, verbose=True)
To retrieve est result, use .est.

Hope this help. :)

Hi , were you able to find a solution for your query?
I am in similar situation, where I have test and train data already separated, I just want to train data on my trainset and then test it on test data. My test dataset don't have the rating column.

It would be great if you can share your how you overcame this ?

kirtigroover · 2019-11-07T09:38:20Z

Hi , were you able to find a solution for your query?
I am in similar situation, where I have test and train data already separated, I just want to train data on my trainset and then test it on test data. My test dataset don't have the rating column.

It would be great if you can share your how you overcame this ?

yes, I was able to find a solution, go through documentation and look for "load_from_folds(). " this is the function you will be using. this function can only read from a file not directly from dataframe.

so just write your dataframe into file also you would need to make sure your files have 4 columns(userid, itemid, rating, timestamp). If you dont have any data for any of these columns just keep them blank.

Now as you mentioned , your testset doesnt have ratings, so you would need to add ratings as well as timestamp column.

You can also use xinyuewang1 approach of training the dataset on your traindata using build_full_trainset(), and then traversing through your test dataset using a for loop and getting prediction one by one, instead of getting prediction for entire testset at one go.

I hope this make sense to you. Hope this would help!!!

NicolasHug closed this as completed Apr 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calling the fit method on the load_from_df dataset produces AttributeError: DatasetAutoFolds instance has no attribute 'global_mean' #161

Calling the fit method on the load_from_df dataset produces AttributeError: DatasetAutoFolds instance has no attribute 'global_mean' #161

sachinpuranik99 commented Apr 1, 2018 •

edited

NicolasHug commented Apr 1, 2018

kirtigroover commented Oct 22, 2019

xinyuewang1 commented Oct 30, 2019

kirtigroover commented Oct 30, 2019

07priyayadav commented Nov 6, 2019

kirtigroover commented Nov 7, 2019

Calling the fit method on the load_from_df dataset produces AttributeError: DatasetAutoFolds instance has no attribute 'global_mean' #161

Calling the fit method on the load_from_df dataset produces AttributeError: DatasetAutoFolds instance has no attribute 'global_mean' #161

Comments

sachinpuranik99 commented Apr 1, 2018 • edited

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Additional info.

NicolasHug commented Apr 1, 2018

kirtigroover commented Oct 22, 2019

xinyuewang1 commented Oct 30, 2019

kirtigroover commented Oct 30, 2019

07priyayadav commented Nov 6, 2019

kirtigroover commented Nov 7, 2019

sachinpuranik99 commented Apr 1, 2018 •

edited