Skip to content

comatose-turtle/co2_filter

Repository files navigation

co2_filter

Co2Filter is a combination Collaborative and Content-based filtering gem, with optional access to key points in the logic chain for adding your own calculations or storing them for later.

Collaborative filtering recommends based on the ratings of other users and their similarity to you.

Content-based filtering recommends based on the attributes of items and your opinions of them.

Both strategies will return a predicted rating for each item still unrated. They can each be used alone with this gem, but they can also be combined via either an averaging of the two predicted user ratings or the content-boosted collaborative filtering technique.

Content-boosted collaborative filtering first applies content-based filtering to fill out a sparse user rating set, then applies collaborative filtering on the resulting dense data.

Rating ranges are mostly irrelevant to the algorithm, as rating averages are the key point of reference. Feel free to use whatever works best for your app. (But be warned that this means there is a chance of predicting slightly outside of the actual range.)

Installation

Add this line to your application’s Gemfile:

gem 'co2_filter'

And then execute:

$ bundle

Usage

The most basic usage follows this pattern:

recommended = Co2Filter.filter(current_user: current_user, other_users: other_users, items: items)

And the results can be used as follows:

most_recommended_item_id = recommended.ids_by_rating.first
top_20_recommended_items = recommended.ids_by_rating.take(20)
predicted_user_rating = recommended[most_recommended_item_id]

The return type is a simple wrapper for a results hash. You can extract the inner hash if necessary with to_hash.

This gem is ORM-agnostic and expects you to select your relevant data on your own. The data you pass in should look like:

current_user = {
    # item_id => rating
    'item1' => 5,
    'item2' => 1,
    'item3' => 3
    # ...
}

other_users = {
    # user_id => { item_id => rating }
    'user1' => {
        'item1' => 2,
        'item2' => 5,
        'item4' => 2
    },
    'user2' => {
        'item1' => 5,
        'item2' => 1,
        'item4' => 5,
        'item5' => 1
    }
    # ...
}

# A set of all items from the dataset
items = {
    # item_id => { attribute_id => strength }
}

Ids are arbitrary to the algorithm and can be strings as easily as numbers. Ratings and strengths should be numbers of some type, and the range should be consistent across rating types (i.e. item ratings, attribute strengths), but there is no range restriction enforced by the algorithm.

Attribute strength refers to a situation where attributes are applied in varying degrees rather than a simple “off” or “on” state. If this does not apply to your app, I suggest setting all strengths to 1.

Using Individual Filters

Collaborative Filter

To implement only the collaborative filter, just use:

Co2Filter::Collaborative.filter(current_user: current_user, other_users: other_users)

The collaborative filter accepts an argument that determines by what process users are determined to be similar, called measure:

Co2Filter::Collaborative.filter(current_user: current_user, other_users: other_users, measure: :euclidean)

:euclidean means that users will be considered similar based on the Euclidean distance of their rating sets. This is a straight-forward comparison that makes sense to most people, but may overlook some subtleties.

:cosine means that users’ similarity is determined by a mean-based cosine coefficient. This essentially means that the curve formed by users’ rating sets is compared by shape to others’. This technique is more reliable in some cases, but suffers considerably more in sparse data sets.

:hybrid is the default measure, which represents an average of the above two measures. Inherently, this makes it slower, but averaging may prove more reliable overall. It unfortunately may also smooth over the uniquely accurate aspects of each technique, lowering opportunities for a surprisingly good recommendation (or surprisingly bad).

You may also feed precalculated similarity coefficients into the filter using the similarity_coefficients argument:

similarity_coefficients = {
    'user1' => 0.56,
    'user2' => 0.8,
    'user3' => -0.4
    # ...
}
Co2Filter::Collaborative.filter(current_user: current_user, other_users: other_users, similarity_coefficients: similarity_coefficients)

Note that, in this case, you ought to provide a hash of coefficients that is already trimmed by some kind of nearest-neighbor calculation. A set of coefficients that is not trimmed will result in an excess of calculations for what is probably many irrelevantly small coefficients.

Content-Based Filter

To implement only the content-based filter, use:

Co2Filter::ContentBased.filter(user: current_user, items: items)

The content-based filtering process consists of two steps:

  1. Constructing a user profile

  2. Using the user profile to determine recommendations

If you are interested in doing this process piecemeal (for instance, to save the user profile to the database for later use), you can do so:

user_profile = Co2Filter::ContentBased.ratings_to_profile(user_ratings: current_user, items: items)
Co2Filter::ContentBased.filter(user: user_profile, items: items)

Note that a Co2Filter::ContentBased::UserProfile object like the one returned must be submitted as the user to trigger this shortcut.

Separately run content-based filtering can be combined into the base (hybrid) filter by submitting the results (as a Co2Filter::Results object) to the filter as follows:

recommended = Co2Filter.filter(current_user: current_user, other_users: other_users, content_based_results: content_based_results)

Content-Boosted Collaborative Filtering

Content-boosted collaborative filtering can be used as follows:

Co2Filter.content_boosted_collaborative_filter(current_user: current_user, other_users: other_users, items: items)

This is the most processor-intensive algorithm, but it too can be split up into multiple pieces if you wish:

boosted_users = Co2Filter::ContentBased.boost_ratings(users: other_users, items: items)
Co2Filter::Collaborative.filter(current_user: current_user, other_users: boosted_users)

Note that the second step is simply the basic collaborative filter. If you wish to break up the boost_ratings method even further, then you are actually talking about using the Co2Filter::ContentBased.filter on each of the users. (See the definition for boost_ratings.)

Contributing

Bug reports and pull requests are welcome on GitHub at github.com/comatose-turtle/co2_filter. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.

About

A combination of collaborative and content-based filtering for a better recommendation engine

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages