CreatingAlgorithms
This is a draft of a documentation page about how to create algorithms for LensKit. It is probably full of errors and lies.
So you want to create an algorithm for LensKit. Great!
But where do you start?
LensKit algorithms can be complex beasts. But it is very possible — and highly recommended — to build a simple algorithm implementation and let it grow as needed.
When we are building a LensKit algorithm, what we are often trying to do is define a new way to compute scores for items. There are other interesting types of algorithms to create, such as new ways of producing recommendation lists that don't just rank items by score, but even a lot of those reduce to a method for adjusting item scores. And once we've built a new scoring method, it will be a lot easier to build other kinds of algorithms.
LensKit computes personalized scores for items with item scorers. These scorers are defined by classes implementing the ItemScorer interface.
So what we will do is create a new implementation of ItemScorer
, which can
then be used to score items; these scores can then be turned into rating
predictions by the rating predictor or recommendation lists by the top-N item
recommender, and all we have to do is create one class.
To get started, we'll create a simple scorer that scores items using the user's mean rating. Yes, this kind of scorer already exists in LensKit, but we'll write a new one to see how it works.
The ItemScorer
interface has three score methods to make it easier for
applications to compute scores in different contexts. But they should all
implement the same logic; AbstractItemScorer
makes this easy by letting us
implement just one method:
package org.grouplens.demo;
import org.grouplens.lenskit.basic.AbstractItemScorer;
import org.grouplens.lenskit.vectors.*;
public class CustomItemScorer extends AbstractItemScorer {
public void score(long user, MutableSparseVector scores) {
// TODO Compute the scores for items and put them in scores
}
}
The other two score
methods will automatically be implemented by calling our
score
method. Now, this method is a little strange: it takes just two
parameters and returns nothing. The first parameter is a user ID: the user for
whom we want to compute scores. Since we are building a recommender, not just
a popular movies list, we probably want there to be some personalization, and
the user ID tells the scorer what use we are trying to personalize for.
The scores
parameter is a vector. It is used for both input
and output in this method; this is somewhat unconventional Java, but it lets us
integrate multiple item scorers efficiently (and it's perfectly normal Fortran,
for what it's worth). Sparse vectors have a key domain: a set of IDs for
which they can contain values. The caller of this method will have created
the vector so that its key domain is the items for which it wants scores. The
job of our score
method, therefore, is to compute scores for all the items
and put them in the vector. The caller will extract them from the vector and
do something useful with them.
We want to fill the output vector with something. To do that, we'll loop over its entries and set them:
public void score(long user, MutableSparseVector scores) {
double mean = 0; // TODO compute the user's mean
for (VectorEntry e: scores.fast(VectorEntry.State.EITHER)) {
scores.set(e, mean);
}
}
This loop iterates over all the entries — key-value pairs — in the vector.
The fast
method is a performance optimization: it means that each vector
entry is only valid within the loop body (we can't save vector entries
somewhere). The parameter, EITHER
, tells the sparse vector that we want to
iterate over all the keys, not just the ones that have values. Usually, the
score vector will be empty, but have a key domain full of items to score. The
sparse vector documentation says more about this. For now,
just write the loop like the example, and don't put the vector entries in a
list or something like that.
The scores.set
method will set the value for the entry to something. In this
case, the user's mean rating, which we don't yet know how to compute.
In order to compute the user's mean rating, we're going to need their rating history. However, all we have is an item ID! How do we get their other ratings?
We can get user data from the UserEventDAO (data access object). But first we have to get one of those.
LensKit algorithms are organized around components. Our item scorer is one such component: a fairly simple component, and one that doesn't work yet, but it is a component.
Algorithm components can depend on other components. This is what enables LensKit's flexible configuration capabilities. As it turns out, the user event DAO is also a component. And getting components is easy: just ask for them!
Components ask for other components in their constructors. So let's add a constructor, and a field to store the component we get:
// Field to store the user event DAO component
UserEventDAO userEventDAO;
@Inject
CustomItemScorer(UserEventDAO dao) {
userEventDAO = dao;
}
That's it! There are several pieces here:
-
The
@Inject
annotation basically just tells Lenskit to use this constructor when it is creating an item scorer from our class. Even though this is the only constructor defined, we still have to provide@Inject
since it isn't a default constructor. It makes things nice and explicit. -
The
dao
parameter establishes our dependency on the user event DAO. LensKit will look at the constructor, see that it needs a user event DAO, create one (or look up the one it already created), and pass it in to the constructor when it creates our item scorer. -
The field just lets us remember the component so that we can use it when computing scores.
Now that we have the user's data, we can finish our score
method:
public void score(long user, MutableSparseVector scores) {
// get the user's ratings
UserHistory<Rating> userRatings =
userEventDAO.getEventsForUser(user, Rating.class);
// convert it to a vector
SparseVector userVector = Ratings.userRatingVector(userRatings);
// and finally take the mean
double mean = userVector.mean();
for (VectorEntry e: scores.fast(VectorEntry.State.EITHER)) {
scores.set(e, mean);
}
}
to be written