All benchmarks reported here were performed on an Intel i7-7820x CPU. GPU Benchmarks were done on a NVIDIA GTX 1080 Ti.
The benchmark_spark.py script compares the AlternatingLeastSquares model found here to the implementation found in Spark MLlib.
To run this comparison, you should first compile Spark with native BLAS support.
This benchmark compares the Conjugate Gradient solver found in implicit on both the CPU and GPU, to the Cholesky solver used in Spark.
The times per iteration are average times over 5 iterations.
last.fm 360k dataset
For the lastm.fm dataset at 256 factors, implicit on the CPU is 30x faster than Spark and the GPU version of implicit is 93x faster than Spark:
MovieLens 20M dataset
For the ml20m dataset at 256 factors, implicit on the CPU was 8x faster than Spark while the GPU version was 68x faster than Spark:
Note that this dataset was filtered down for all versions to reviews that were positive (4+ stars), to simulate a truly implicit dataset.
Implicit on the CPU seems to suffer a bit here relative to the other options. It seems like there might be a single threaded bottleneck at some point thats worth examining later.