Skip to content
This repository has been archived by the owner on Mar 20, 2024. It is now read-only.

CreatingAlgorithms

Michael Ekstrand edited this page Aug 15, 2014 · 7 revisions

This is a draft of a documentation page about how to create algorithms for LensKit. It is probably full of errors and lies.

So you want to create an algorithm for LensKit. Great!

But where do you start?

LensKit algorithms can be complex beasts. But it is very possible — and highly recommended — to build a simple algorithm implementation and let it grow as needed.

Goal: Create an ItemScorer

When we are building a LensKit algorithm, what we are often trying to do is define a new way to compute scores for items. There are other interesting types of algorithms to create, such as new ways of producing recommendation lists that don't just rank items by score, but even a lot of those reduce to a method for adjusting item scores. And once we've built a new scoring method, it will be a lot easier to build other kinds of algorithms.

LensKit computes personalized scores for items with item scorers. These scorers are defined by classes implementing the ItemScorer interface.

So what we will do is create a new implementation of ItemScorer, which can then be used to score items; these scores can then be turned into rating predictions by the rating predictor or recommendation lists by the top-N item recommender, and all we have to do is create one class.

Creating the Class

To get started, we'll create a simple scorer that scores items using the user's mean rating. Yes, this kind of scorer already exists in LensKit, but we'll write a new one to see how it works.

The ItemScorer interface has three score methods to make it easier for applications to compute scores in different contexts. But they should all implement the same logic; AbstractItemScorer makes this easy by letting us implement just one method:

package org.grouplens.demo;

import org.grouplens.lenskit.basic.AbstractItemScorer;
import org.grouplens.lenskit.vectors.*;

public class CustomItemScorer extends AbstractItemScorer {
    public void score(long user, MutableSparseVector scores) {
        // TODO Compute the scores for items and put them in scores
    }
}

The other two score methods will automatically be implemented by calling our score method. Now, this method is a little strange: it takes just two parameters and returns nothing. The first parameter is a user ID: the user for whom we want to compute scores. Since we are building a recommender, not just a popular movies list, we probably want there to be some personalization, and the user ID tells the scorer what use we are trying to personalize for.

The scores parameter is a vector. It is used for both input and output in this method; this is somewhat unconventional Java, but it lets us integrate multiple item scorers efficiently (and it's perfectly normal Fortran, for what it's worth). Sparse vectors have a key domain: a set of IDs for which they can contain values. The caller of this method will have created the vector so that its key domain is the items for which it wants scores. The job of our score method, therefore, is to compute scores for all the items and put them in the vector. The caller will extract them from the vector and do something useful with them.

Filling the Output

We want to fill the output vector with something. To do that, we'll loop over its entries and set them:

public void score(long user, MutableSparseVector scores) {
    double mean = 0; // TODO compute the user's mean
    for (VectorEntry e: scores.fast(VectorEntry.State.EITHER)) {
        scores.set(e, mean);
    }
}

This loop iterates over all the entries — key-value pairs — in the vector. The fast method is a performance optimization: it means that each vector entry is only valid within the loop body (we can't save vector entries somewhere). The parameter, EITHER, tells the sparse vector that we want to iterate over all the keys, not just the ones that have values. Usually, the score vector will be empty, but have a key domain full of items to score. The sparse vector documentation says more about this. For now, just write the loop like the example, and don't put the vector entries in a list or something like that.

The scores.set method will set the value for the entry to something. In this case, the user's mean rating, which we don't yet know how to compute.

Accessing Data: Computing the Mean

In order to compute the user's mean rating, we're going to need their rating history. However, all we have is an item ID! How do we get their other ratings?

We can get user data from the UserEventDAO (data access object). But first we have to get one of those.

LensKit algorithms are organized around components. Our item scorer is one such component: a fairly simple component, and one that doesn't work yet, but it is a component.

Algorithm components can depend on other components. This is what enables LensKit's flexible configuration capabilities. As it turns out, the user event DAO is also a component. And getting components is easy: just ask for them!

Components ask for other components in their constructors. So let's add a constructor, and a field to store the component we get:

// Field to store the user event DAO component
UserEventDAO userEventDAO;

@Inject
CustomItemScorer(UserEventDAO dao) {
    userEventDAO = dao;
}

That's it! There are several pieces here:

  • The @Inject annotation basically just tells Lenskit to use this constructor when it is creating an item scorer from our class. Even though this is the only constructor defined, we still have to provide @Inject since it isn't a default constructor. It makes things nice and explicit.

  • The dao parameter establishes our dependency on the user event DAO. LensKit will look at the constructor, see that it needs a user event DAO, create one (or look up the one it already created), and pass it in to the constructor when it creates our item scorer.

  • The field just lets us remember the component so that we can use it when computing scores.

Now that we have the user's data, we can finish our score method:

public void score(long user, MutableSparseVector scores) {
    // get the user's ratings
    UserHistory<Rating> userRatings =
        userEventDAO.getEventsForUser(user, Rating.class);
    // convert it to a vector
    SparseVector userVector = Ratings.userRatingVector(userRatings);
    // and finally take the mean
    double mean = userVector.mean();

    for (VectorEntry e: scores.fast(VectorEntry.State.EITHER)) {
        scores.set(e, mean);
    }
}

Adding a Model

to be written