aurelian / ruby-stemmer

Expose libstemmer_c to Ruby

This URL has Read+Write access

name age message
file .document Wed Oct 28 11:12:10 -0700 2009 Merged new_layout [aurelian]
file .gitignore Mon Nov 02 07:07:42 -0800 2009 Don't need the gemspec anymore [aurelian]
file MIT-LICENSE Tue Feb 10 07:26:07 -0800 2009 minor cosmethic changes [aurelian]
file README.rdoc Fri Nov 06 02:32:35 -0800 2009 small typo. [aurelian]
file Rakefile Mon Nov 02 07:06:09 -0800 2009 Don't pack stemwords into gem [aurelian]
file VERSION Fri Nov 06 02:44:34 -0800 2009 bumped version [aurelian]
directory ext/ Fri Nov 06 02:32:35 -0800 2009 small typo. [aurelian]
directory lib/ Mon Nov 02 11:49:45 -0800 2009 cleaning up C and Ruby codes [tenderlove]
directory libstemmer_c/ Mon Nov 02 06:38:33 -0800 2009 Windows Support: added libstemmer_c Makefile to... [aurelian]
directory test/ Mon Nov 02 11:49:45 -0800 2009 cleaning up C and Ruby codes [tenderlove]
README.rdoc

Ruby-Stemmer

Ruby-Stemmer exposes SnowBall API to Ruby.

This package includes libstemmer_c library released under BSD licence and available for free at: snowball.tartarus.org/dist/libstemmer_c.tgz.

For more details about libstemmer_c please visit the SnowBall website.

Usage

  require 'rubygems'
  require 'lingua/stemmer'

  stemmer= Lingua::Stemmer.new(:language => "ro")
  stemmer.stem("netăgăduit") #=> netăgădu

Alternative

  require 'rubygems'
  require 'lingua/stemmer'

  Lingua.stemmer( %w(incontestabil neîndoielnic), :language => "ro" ) #=> ["incontest", "neîndoieln"]
  Lingua.stemmer("installation") #=> "instal"
  Lingua.stemmer("installation", :language => "fr", :encoding => "ISO_8859_1") do | word |
    puts "~> #{word}" #=> "instal"
  end # => #<Lingua::Stemmer:0x102501e48>

Rails

  # in config/environment.rb:
  config.gem 'ruby-stemmer', :version => '>=0.6.2', :lib => 'lingua/stemmer'

More details

Install

Standard install with:

 gem install ruby-stemmer

Please not that Windows is not supported at this time.

Development version

  $ git clone git://github.com/aurelian/ruby-stemmer.git
  $ cd ruby-stemmer
  $ rake -T #<== see what we've got
  $ rake compile #<== builds the extension do'h
  $ rake test

NOT A BUG

The stemming process is an algorithm to allow one to find the stem of an word (not the root of it). For further reference on stem vs. root, please check wikipedia articles on the topic:

TODO

Note on Patches/Pull Requests

  • Fork the project from github
  • Make your feature addition or bug fix
  • Add tests for it. This is important so I don’t break it in a future version unintentionally.
  • Commit, do not mess with rakefile, version, or history.

    if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull

  • Send me a pull request. Bonus points for topic branches.

Alternative Stemmers for Ruby

Copyright

Copyright © 2008,2009 Aurelian Oancea. See MIT-LICENSE for details.

Contributors

  • Aurelian Oancea
  • Yury Korolev - various bug fixes
  • Aaron Patterson - rake compiler (windows support), code cleanup

Real life usage

  • planet33.ru is using Ruby-Stemmer together with Classifier to automatically rate places based on users comments.