A distance based hash (one where similar input gives similar output, the opposite of a cryptographic hash), suitable for text applications.
Ruby C
Switch branches/tags
Nothing to show
Pull request Compare This branch is 2 commits ahead, 6 commits behind jwilkins:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
examples
ext
README
nilsimsa.gemspec
nilsimsa.rb

README

nilsimsa
--------
Nilsimsa is a distance based hash, which is the opposite of more familiar
hashes like MD5.  Instead of small changes making a large difference in
the resulting hash (to avoid collisions), distance based hashes cause
similar values to have similar output.  This is good for detecting near
similar documents without having to store the original text.

Standard usage is as follows:

  require 'nilsimsa'

  n1 = Nilsimsa::new
  text1 = "The quick brown fox"
  n1.update(text1)
  puts "Text '#{text1}': #{n1.hexdigest}"