Skip to content

Commit

Permalink
Moved Reader and Trainset classes in their own module
Browse files Browse the repository at this point in the history
  • Loading branch information
NicolasHug committed Dec 25, 2017
1 parent aa92831 commit 86cf445
Show file tree
Hide file tree
Showing 17 changed files with 484 additions and 439 deletions.
14 changes: 7 additions & 7 deletions doc/source/FAQ.rst
Original file line number Diff line number Diff line change
Expand Up @@ -130,11 +130,11 @@ On trainset creation, each raw id is mapped to a unique
integer called inner id, which is a lot more suitable for `Surprise
<https://nicolashug.github.io/Surprise/>`_ to manipulate. Conversions between
raw and inner ids can be done using the :meth:`to_inner_uid()
<surprise.dataset.Trainset.to_inner_uid>`, :meth:`to_inner_iid()
<surprise.dataset.Trainset.to_inner_iid>`, :meth:`to_raw_uid()
<surprise.dataset.Trainset.to_raw_uid>`, and :meth:`to_raw_iid()
<surprise.dataset.Trainset.to_raw_iid>` methods of the :class:`trainset
<surprise.dataset.Trainset>`.
<surprise.Trainset.to_inner_uid>`, :meth:`to_inner_iid()
<surprise.Trainset.to_inner_iid>`, :meth:`to_raw_uid()
<surprise.Trainset.to_raw_uid>`, and :meth:`to_raw_iid()
<surprise.Trainset.to_raw_iid>` methods of the :class:`trainset
<surprise.Trainset>`.


Can I use my own dataset with Surprise, and can it be a pandas dataframe
Expand All @@ -155,8 +155,8 @@ How to get accuracy measures on the training set
------------------------------------------------

You can use the :meth:`build_testset()
<surprise.dataset.Trainset.build_testset()>` method of the :class:`Trainset
<surprise.dataset.Trainset>` object to build a testset that can be then used
<surprise.Trainset.build_testset()>` method of the :class:`Trainset
<surprise.Trainset>` object to build a testset that can be then used
with the :meth:`test()
<surprise.prediction_algorithms.algo_base.AlgoBase.test>` method:

Expand Down
2 changes: 1 addition & 1 deletion doc/source/building_custom_algo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Once the base class :meth:`train()
<surprise.prediction_algorithms.algo_base.AlgoBase.train>` method has returned,
all the info you need about the current training set (rating values, etc...) is
stored in the ``self.trainset`` attribute. This is a :class:`Trainset
<surprise.dataset.Trainset>` object that has many attributes and methods of
<surprise.Trainset>` object that has many attributes and methods of
interest for prediction.

To illustrate its usage, let's make an algorithm that predicts an average
Expand Down
1 change: 0 additions & 1 deletion doc/source/dataset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,3 @@ dataset module
.. automodule:: surprise.dataset
:members:
:exclude-members: BuiltinDataset, read_ratings, DatasetUserFolds,
parse_line
8 changes: 4 additions & 4 deletions doc/source/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ dataset:
- or if your dataset is already split into predefined folds, you can specify a
list of files for training and testing.

Either way, you will need to define a :class:`Reader <surprise.dataset.Reader>`
Either way, you will need to define a :class:`Reader <surprise.reader.Reader>`
object for `Surprise <https://nicolashug.github.io/Surprise/>`_ to be able to
parse the file(s) or the dataframe. We'll see now how to handle both cases.

Expand All @@ -65,7 +65,7 @@ Load an entire dataset from a file or a dataframe
:lines: 17-26

For more details about readers and how to use them, see the :class:`Reader
class <surprise.dataset.Reader>` documentation.
class <surprise.reader.Reader>` documentation.

.. note::
As you already know from the previous section, the Movielens-100k dataset
Expand All @@ -76,7 +76,7 @@ Load an entire dataset from a file or a dataframe

- To load a dataset from a pandas dataframe, you will need the
:meth:`load_from_df() <surprise.dataset.Dataset.load_from_df>` method. You
will also need a :class:`Reader<surprise.dataset.Reader>` object, but only
will also need a :class:`Reader<surprise.reader.Reader>` object, but only
the ``rating_scale`` parameter must be specified. The dataframe must have
three columns, corresponding to the user (raw) ids, the item (raw) ids, and
the ratings in this order. Each row thus corresponds to a given rating. This
Expand Down Expand Up @@ -241,7 +241,7 @@ performing cross-validation (i.e. there is no test set).
The latter is pretty straightforward: all you need is to load a dataset, and
the :meth:`build_full_trainset()
<surprise.dataset.DatasetAutoFolds.build_full_trainset>` method to build the
:class:`trainset <surprise.dataset.Trainset>` and train you algorithm:
:class:`trainset <surprise.trainset.Trainset>` and train you algorithm:

.. literalinclude:: ../../examples/query_for_predictions.py
:caption: From file ``examples/query_for_predictions.py``
Expand Down
2 changes: 2 additions & 0 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,5 +44,7 @@ to contribute and send pull requests (see `GitHub page
similarities
accuracy
dataset
trainset
reader
evaluate
dump
9 changes: 9 additions & 0 deletions doc/source/reader.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.. _reader:

Reader class
============

.. autoclass:: surprise.reader.Reader
:members:
:exclude-members: parse_line

7 changes: 7 additions & 0 deletions doc/source/trainset.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
.. _trainset:

Trainset class
==============

.. autoclass:: surprise.Trainset
:members:
6 changes: 3 additions & 3 deletions surprise/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@
from .prediction_algorithms import Prediction

from .dataset import Dataset
from .dataset import Reader
from .dataset import Trainset
from .dataset import get_dataset_dir
from .reader import Reader
from .trainset import Trainset
from .builtin_datasets import get_dataset_dir
from .evaluate import evaluate
from .evaluate import print_perf
from .evaluate import GridSearch
Expand Down
8 changes: 4 additions & 4 deletions surprise/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
from surprise.prediction_algorithms import CoClustering
import surprise.dataset as dataset
from surprise.dataset import Dataset
from surprise.dataset import Reader # noqa
from surprise.builtin_datasets import get_dataset_dir
from surprise.evaluate import evaluate
from surprise import __version__

Expand Down Expand Up @@ -137,11 +137,11 @@ def error(self, message):
default=None,
help='Where to dump the files. Ignored if ' +
'with-dump is not set. Default is ' +
os.path.join(dataset.get_dataset_dir(), 'dumps/')
os.path.join(get_dataset_dir(), 'dumps/')
)

parser.add_argument('--clean', dest='clean', action='store_true',
help='Remove the ' + dataset.get_dataset_dir() +
help='Remove the ' + get_dataset_dir() +
' directory and exit.'
)

Expand All @@ -151,7 +151,7 @@ def error(self, message):
args = parser.parse_args()

if args.clean:
folder = dataset.get_dataset_dir()
folder = get_dataset_dir()
shutil.rmtree(folder)
print('Removed', folder)
exit()
Expand Down
66 changes: 66 additions & 0 deletions surprise/builtin_datasets.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
from six.moves.urllib.request import urlretrieve
import zipfile
from collections import namedtuple
import os
from os.path import join


def get_dataset_dir():
'''Return folder where downloaded datasets and other data are stored.
Default folder is ~/.surprise_data/, but it can also be set by the
environment variable ``SURPRISE_DATA_FOLDER``.
'''

folder = os.environ.get('SURPRISE_DATA_FOLDER', os.path.expanduser('~') +
'/.surprise_data/')
if not os.path.exists(folder):
os.makedirs(folder)

return folder


# a builtin dataset has
# - an url (where to download it)
# - a path (where it is located on the filesystem)
# - the parameters of the corresponding reader
BuiltinDataset = namedtuple('BuiltinDataset', ['url', 'path', 'reader_params'])

BUILTIN_DATASETS = {
'ml-100k':
BuiltinDataset(
url='http://files.grouplens.org/datasets/movielens/ml-100k.zip',
path=join(get_dataset_dir(), 'ml-100k/ml-100k/u.data'),
reader_params=dict(line_format='user item rating timestamp',
rating_scale=(1, 5),
sep='\t')
),
'ml-1m':
BuiltinDataset(
url='http://files.grouplens.org/datasets/movielens/ml-1m.zip',
path=join(get_dataset_dir(), 'ml-1m/ml-1m/ratings.dat'),
reader_params=dict(line_format='user item rating timestamp',
rating_scale=(1, 5),
sep='::')
),
'jester':
BuiltinDataset(
url='http://eigentaste.berkeley.edu/dataset/jester_dataset_2.zip',
path=join(get_dataset_dir(), 'jester/jester_ratings.dat'),
reader_params=dict(line_format='user item rating',
rating_scale=(-10, 10))
)
}


def download_builtin_dataset(name, dataset):

print('Trying to download dataset from ' + dataset.url + '...')
tmp_file_path = join(get_dataset_dir(), 'tmp.zip')
urlretrieve(dataset.url, tmp_file_path)

with zipfile.ZipFile(tmp_file_path, 'r') as tmp_zip:
tmp_zip.extractall(join(get_dataset_dir(), name))

os.remove(tmp_file_path)
print('Done! Dataset', name, 'has been saved to',
join(get_dataset_dir(), name))

0 comments on commit 86cf445

Please sign in to comment.