Skip to content
High performance factorization machines for Ruby
Ruby Python
Branch: master
Clone or download

Latest commit

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
lib
test
vendor
.gitignore
.travis.yml
CHANGELOG.md
Gemfile
LICENSE.txt
NOTICE.txt
README.md
Rakefile
appveyor.yml
xlearn.gemspec

README.md

xLearn

xLearn - the high performance machine learning library - for Ruby

Supports:

  • Linear models
  • Factorization machines
  • Field-aware factorization machines

Build Status

Installation

Add this line to your application’s Gemfile:

gem 'xlearn'

Getting Started

Prep your data

x = [[1, 2], [3, 4], [5, 6], [7, 8]]
y = [1, 2, 3, 4]

Train a model

model = XLearn::Linear.new(task: "reg")
model.fit(x, y)

Use XLearn::FM for factorization machines and XLearn::FFM for field-aware factorization machines

Make predictions

model.predict(x)

Save the model to a file

model.save_model("model.bin")

Load the model from a file

model.load_model("model.bin")

Save a text version of the model

model.save_txt("model.txt")

Pass a validation set

model.fit(x_train, y_train, eval_set: [x_val, y_val])

Train online

model.partial_fit(x_train, y_train)

Get the bias term, linear term, and latent factors

model.bias_term
model.linear_term
model.latent_factors # fm and ffm only

Parameters

Pass parameters - default values below

XLearn::FM.new(
  task: "binary",      # binary (classification), reg (regression)
  metric: nil,         # acc, prec, recall, f1, auc, mae, mape, rmse, rmsd
  lr: 0.2,             # learning rate
  lambda: 0.00002,     # lambda for l2 regularization
  k: 4,                # latent factors for fm and ffm
  alpha: 0.3,          # hyper parameter for ftrl
  beta: 1.0,           # hyper parameter for ftrl
  lambda_1: 0.00001,   # hyper parameter for ftrl
  lambda_2: 0.00002,   # hyper parameter for ftrl
  epoch: 10,           # number of epochs
  fold: 3,             # number of folds
  opt: "adagrad",      # sgd, adagrad, ftrl
  block_size: 500,     # block size for on-disk training in MB
  early_stop: true,    # use early stopping
  stop_window: 2,      # size of stop window for early stopping
  sign: false,         # convert predition output to 0 and 1
  sigmoid: false,      # convert predition output using sigmoid
  seed: 1              # random seed to shuffle data set
)

Cross-Validation

Cross-validation

model.cv(x, y)

Specify the number of folds

model.cv(x, y, folds: 5)

Data

Data can be an array of arrays

[[1, 2, 3], [4, 5, 6]]

Or a Daru data frame

Daru::DataFrame.from_csv("houses.csv")

Or a Numo NArray

Numo::DFloat.new(3, 2).seq

Performance

For large datasets, read data directly from files

model.fit("train.txt", eval_set: "validate.txt")
model.predict("test.txt")
model.cv("train.txt")

For linear models and factorization machines, use CSV:

label,value_1,value_2,...,value_n

Or the libsvm format (better for sparse data):

label index_1:value_1 index_2:value_2 ... index_n:value_n

You can also use commas instead of spaces for separators

For field-aware factorization machines, use the libffm format:

label field_1:index_1:value_1 field_2:index_2:value_2 ...

You can also use commas instead of spaces for separators

You can also write predictions directly to a file

model.predict("test.txt", out_path: "predictions.txt")

Credits

This library is modeled after xLearn’s Scikit-learn API.

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development and testing:

git clone https://github.com/ankane/xlearn.git
cd xlearn
bundle install
bundle exec rake vendor:all
bundle exec rake test
You can’t perform that action at this time.