Skip to content

Latest commit

 

History

History
143 lines (106 loc) · 5.25 KB

FAQ.rst

File metadata and controls

143 lines (106 loc) · 5.25 KB

FAQ

You will find here the Frequently Asked Questions, as well as some other use-case examples that are not part of the User Guide.

How to get the top-N recommendations for each user

Here is an example where we retrieve retrieve the top-10 items with highest rating prediction for each user in the MovieLens-100k dataset. We first train an SVD algorithm on the whole dataset, and then predict all the ratings for the pairs (user, item) that are not in the training set. We then retrieve the top-10 prediction for each user.

../../examples/top_n_recommendations.py

How to get the k nearest neighbors of a user (or item)

You can use the get_neighbors() <surprise.prediction_algorithms.algo_base.AlgoBase.get_neighbors> methods of the algorithm object. This is only relevant for algorithms that use a similarity measure, such as the k-NN algorithms <pred_package_knn_inpired>.

Here is an example where we retrieve the 10 nearest neighbors of the movie Toy Story from the MovieLens-100k dataset. The output is:

The 10 nearest neighbors of Toy Story are: Beauty and the Beast (1991) Raiders of the Lost Ark (1981) That Thing You Do! (1996) Lion King, The (1994) Craft, The (1996) Liar Liar (1997) Aladdin (1992) Cool Hand Luke (1967) Winnie the Pooh and the Blustery Day (1968) Indiana Jones and the Last Crusade (1989)

There's a lot of boilerplate because of the conversions between movie names and their raw/inner ids (see this note <raw_inner_note>), but it all boils down to the use of get_neighbors() <surprise.prediction_algorithms.algo_base.AlgoBase.get_neighbors>:

../../examples/k_nearest_neighbors.py

Naturally, the same can be done for users with minor modifications.

How to serialize an algorithm

Prediction algorithms can be serialized and loaded back using the dump() <surprise.dump.dump> and load() <surprise.dump.load> functions. Here is a small example where the SVD algorithm is trained on a dataset and serialized. It is then reloaded and can be used again for making predictions:

../../examples/serialize_algorithm.py

Algorithms can be serialized along with their predictions, so that can be further analyzed or compared with other algorithms, using pandas dataframes. Some examples are given in the two following notebooks:

How to build my own prediction algorithm

There's a whole guide here<building_custom_algo>.

What are raw and inner ids

See this note <raw_inner_note>.

Can I use my own dataset with Surprise

Yes, you can. See the user guide <load_custom>.

How to tune an algorithm parameters

You can tune the parameters of an algorithm with the GridSearch <surprise.evaluate.GridSearch> class as described here <tuning_algorithm_parameters>. After the tuning, you may want to have an unbiased estimate of your algorithm performances <unbiased_estimate_after_tuning>.

How to get accuracy measures on the training set

You can use the build_testset() <surprise.dataset.Trainset.build_testset()> method of the Trainset <surprise.dataset.Trainset> object to build a testset that can be then used with the test() <surprise.prediction_algorithms.algo_base.AlgoBase.test> method:

../../examples/evaluate_on_trainset.py

Check out the example file for more usage examples.

How to save some data for unbiased accuracy estimation

If your goal is to tune the parameters of an algorithm, you may want to spare a bit of data to have an unbiased estimation of its performances. For instance you may want to split your data into two sets A and B. A is used for parameter tuning using grid search, and B is used for unbiased estimation. This can be done as follows:

../../examples/split_data_for_unbiased_estimation.py