ReplacingSparseVectors
Sparse vectors are confusing.
LensKit would be easier to learn with more normal data structures.
In DetailedResults, there is a proposal to change how we report recommendation and scoring results to remove the external need for sparse vectors. This page discusses removing the internal need.
Rather than having our weird SparseVector
, we can do something else:
- Use a normal mapping, such as Fastutil
Long2DoubleMap
or HPPC'sLongDoubleMap
interface in LensKit's internals. - Provide a package of utility functions for statistical and algebraic operations, such as Euclidean lengths and dot products.
- Provide an implementation of the long-double map using our sorted array strategy, to provide very good memory use and performance characteristics.
- Special-case our statistical and algebraic operations so that they are optimized for use with our sorted-array implementation.
Then the code will be implemented in terms of familiar abstractions (maps), but can still be very fast by using custom maps and map builders and optimized statistical operations.
We can stay with Fastutil
Michael is also interested in considering HPPC. HPPC forgoes Java collections compatibility and some other run-time safety checks in favor of increased performance, and simplified, focused interfaces.
We would want to re-introduce JCF compatibility through wrapper classes, but that is not hard.
HPPC is much smaller than Fastutil, and also has some useful operations (e.g. add
methods on its LongDoubleMap
) that make them a very appealing replacement for sparse vectors.