New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nearest Neighbours Recommendations #14

Merged
merged 2 commits into from Feb 12, 2017

Conversation

Projects
None yet
2 participants
@benfred
Owner

benfred commented Dec 27, 2016

This adds a fast and memory efficient of Item-Item KNN Recommendation models.

Calculating the Similarity matrix is based on the algorithm described in the
paper 'Sparse Matrix Multiplication Package (SMMP)'
(www.i2m.univ-amu.fr/~bradji/multp_sparse.pdf), but modified so that only the
top K rows are selected using a heap. This means that we can calculate
the similarity matrix even when the full similarity matrix wouldn't fit in
available memory. This calculation is also parallelized unlike the sparse matrix
multiply in scipy.

Also switch to using C++ instead of C for Cython, run flake8 on the Cython code,
add an isort check and cpplint check, and fix some issues with the ALS unittest
intermittently failing.

@benfred

This comment has been minimized.

Owner

benfred commented Dec 27, 2016

still todo:

  • parallelize calculation
  • add scorer class
  • example usage
  • add save/load to scorer

benfred added some commits Dec 27, 2016

Nearest Neighbours Recommendations
This adds a fast and memory efficient of Item-Item KNN Recommendation models.

Calculating the Similarity matrix is based on the algorithm described in the
paper 'Sparse Matrix Multiplication Package (SMMP)'
(www.i2m.univ-amu.fr/~bradji/multp_sparse.pdf), but modified so that only the
top K rows are selected using a heap. This means that we can calculate
the similarity matrix even when the full similarity matrix wouldn't fit in
available memory. This calculation is also parallelized unlike the sparse matrix
multiply in scipy.

Also switch to using C++ instead of C for Cython, run flake8 on the Cython code,
add an isort check and cpplint check, and fix some issues with the ALS unittest
intermittently failing.

@benfred benfred force-pushed the nearest_neighbours branch from 2230bc6 to bc548a0 Feb 6, 2017

@benfred benfred changed the title from first draft nearest neighbours code to Nearest Neighbours Recommendations Feb 6, 2017

@benfred benfred merged commit f5a3cdc into master Feb 12, 2017

4 checks passed

continuous-integration/appveyor/branch AppVeyor build succeeded
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@benfred benfred deleted the nearest_neighbours branch Feb 12, 2017

@chapleau

This comment has been minimized.

chapleau commented Feb 14, 2017

Thanks for providing this very neat package.
I was just wondering if, from a performance point of view, going to C++ from C for Cython makes a significant improvement ? Are the APIs/functions backward compatible ?
Thanks!

@benfred

This comment has been minimized.

Owner

benfred commented Feb 14, 2017

Performance should be identical between C++ and C.

The API's and functions are also compatible from Python - I changed to C++ mainly to use the heap functions provided with the STL: https://github.com/benfred/implicit/blob/master/implicit/nearest_neighbours.h#L21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment