Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
fuzzy string matching library for ruby
Ruby Java Perl
Pull request Compare This branch is even with kiyoka:divide_into_pure_and_native.

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
benchmark
lib
original
test
.gemtest
.gitignore
LICENSE.txt
README.md
Rakefile
VERSION.yml
fuzzy-string-match.gemspec
fuzzy-string-match_pure.gemspec

README.md

What is fuzzy-string-match

  • fuzzy-string-match is a fuzzy string matching library for ruby.
  • It is fast. ( written in C with RubyInline )
  • It supports only Jaro-Winkler distance algorithm.
  • This program was ported by hand from lucene-3.0.2. (lucene is Java product)
  • If you want to add another string distance algorithm, please port by yourself and contact me kiyoka@sumibi.org.

The reason why i developed fuzzy-string-match

  • I tried amatch-0.2.5, but it contains some issues.
    1. Some memory leaks.
    2. I felt difficult to maintain it.
  • So, I decide to create another gem by porting lucene-3.0.x.

Installing

  1. gem install fuzzy-string-match

Features

  • Calculate Jaro-Winkler distance of two strings.
    • Pure ruby version can handle both ASCII and UTF8 strings. (and slow)
    • Native version can only ASCII strings. (and fast)

Sample code

  • Native version

require 'fuzzystringmatch' jarow = FuzzyStringMatch::JaroWinkler.new.create( :native ) p jarow.getDistance( "jones", "johnson" )

  • Pure ruby version

require 'fuzzystringmatch' jarow = FuzzyStringMatch::JaroWinkler.new.create( :pure ) p jarow.getDistance( "ああ", "あい" )

Sample on irb

irb(main):001:0> require 'fuzzystringmatch' require 'fuzzystringmatch' => true

irb(main):002:0> jarow = FuzzyStringMatch::JaroWinkler.new.create( :native )
jarow = FuzzyStringMatch::JaroWinkler.new.create( :native )
=> #<FuzzyStringMatch::JaroWinklerNative:0x000001011b0010>

irb(main):003:0> jarow.getDistance( "al",        "al"        )
jarow.getDistance( "al",        "al"        )
=> 1.0

irb(main):004:0> jarow.getDistance( "dixon",     "dicksonx"  )
jarow.getDistance( "dixon",     "dicksonx"  )
=> 0.8133333333333332

Benchmarks

$ rake bench ruby ./benchmark/vs_amatch.rb --- --- Each match functions will be called 1Mega times. --- --- [Amatch] user system total real 1.160000 0.050000 1.210000 ( 1.218259) [this Module (pure)] user system total real 39.940000 0.160000 40.100000 ( 40.542448) [this Module (native)] user system total real 0.480000 0.000000 0.480000 ( 0.484187)

Requires

  • RubyInline
  • Ruby 1.9.1 or higher

Author

  • Copyright (C) Kiyoka Nishiyama kiyoka@sumibi.org
  • I ported from java source code of lucene-3.0.2.

See also

License

  • Apache 2.0 LICENSE
Something went wrong with that request. Please try again.