Skip to content
This repository has been archived by the owner on Mar 20, 2024. It is now read-only.

ReplacingSparseVectors

Michael Ekstrand edited this page Feb 27, 2015 · 1 revision

Sparse vectors are confusing.

LensKit would be easier to learn with more normal data structures.

In DetailedResults, there is a proposal to change how we report recommendation and scoring results to remove the external need for sparse vectors. This page discusses removing the internal need.

Basic Idea

Rather than having our weird SparseVector, we can do something else:

  • Use a normal mapping, such as Fastutil Long2DoubleMap or HPPC's LongDoubleMap interface in LensKit's internals.
  • Provide a package of utility functions for statistical and algebraic operations, such as Euclidean lengths and dot products.
  • Provide an implementation of the long-double map using our sorted array strategy, to provide very good memory use and performance characteristics.
  • Special-case our statistical and algebraic operations so that they are optimized for use with our sorted-array implementation.

Then the code will be implemented in terms of familiar abstractions (maps), but can still be very fast by using custom maps and map builders and optimized statistical operations.

Implementation

We can stay with Fastutil

Michael is also interested in considering HPPC. HPPC forgoes Java collections compatibility and some other run-time safety checks in favor of increased performance, and simplified, focused interfaces.

We would want to re-introduce JCF compatibility through wrapper classes, but that is not hard.

HPPC is much smaller than Fastutil, and also has some useful operations (e.g. add methods on its LongDoubleMap) that make them a very appealing replacement for sparse vectors.