Ruby client for Carrot2 - the open-source document clustering server
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
lib
test
.gitignore
CHANGELOG.md
Gemfile
LICENSE.txt
README.md
Rakefile
carrot2.gemspec

README.md

Carrot2

Ruby client for Carrot2 - the open-source document clustering server

Installation

First, download and run the Carrot2 server. It’s the one on this page that begins with carrot2-dcs.

With Homebrew, use:

brew install carrot2
brew services start carrot2

Then add this line to your application’s Gemfile:

gem 'carrot2'

How to Use

To cluster documents, use:

documents = [
  "Sign up for an exclusive coupon.",
  "Exclusive members get a free coupon.",
  "Coupons are going fast.",
  "This is completely unrelated to the other documents."
]

carrot2 = Carrot2.new
carrot2.cluster(documents)

This returns:

{
  "processing-time-total"=>1,
  "clusters"=> [
    {
      "id"=>0,
      "size"=>3,
      "phrases"=>["Coupon"],
      "score"=>0.06462323710740674,
      "documents"=>[0, 1, 2],
      "attributes"=>{"score"=>0.06462323710740674}
    },
    {
      "id"=>1,
      "size"=>2,
      "phrases"=>["Exclusive"],
      "score"=>0.05873148311034013,
      "documents"=>[0, 1],
      "attributes"=>{"score"=>0.05873148311034013}
    },
    {
      "id"=>2,
      "size"=>1,
      "phrases"=>["Other Topics"],
      "score"=>0.0,
      "documents"=>[3],
      "attributes"=>{"other-topics"=>true, "score"=>0.0}
    }
  ],
  "processing-time-algorithm"=>1,
  "query"=>nil
}

Documents are numbered in the order provided, starting with 0.

Specify a language with:

carrot2.cluster(documents, language: "FRENCH")

All of these languages are supported

For other requests, use:

carrot2.request(
  "dcs.c2stream" => xml_str
)

Configuration

To specify the Carrot2 server, set ENV["CARROT2_URL"] or use:

Carrot2.new(url: "http://localhost:8080")

Set timeouts [master]

Carrot2.new(open_timeout: 3, read_timeout: 5)

Heroku

Carrot2 can be easily deployed to Heroku thanks to support for WAR deployment.

You can find the .war file in the war directory in the dcs download. Then run:

heroku plugins:install heroku-cli-deploy
heroku create <app_name>
heroku war:deploy carrot2-dcs.war --app <app_name>

And set ENV["CARROT2_URL"] in your application.

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help: