Skip to content

Levenshtein edit distance in linear memory (also turns out to be faster than C++)

License

Notifications You must be signed in to change notification settings

0xd34df00d/edit-distance-linear

Repository files navigation

edit-distance-linear

Build Status Hackage

The pure Haskell implementation of the Levenshtein edit distance, with linear space complexity.

Comparison

There are already several other existing implementations, but the goals and design decisions vary. In particular, this package is intended to be used to:

  • compare long strings (think tens of thousands of characters), driving the implementation to live in the ST monad and aim at linear space complexity to lower GC pressure;
  • not care about Unicode, thus accepting ByteStrings and comparing them byte-by-byte rather than character-by-character (or glyph-by-glyph, or whatever is the right notion of an edit for Unicode).

Among the alternatives:

  • text-metrics — uses a similar algorithm, but cares about Unicode, making it 4-5 times slower.
  • edit-distance — uses a very different algorithm (which we might implement here one day with huge potential benefits), which tends to consume more memory (I'm not up for estimating its space asymptotics, though).

About

Levenshtein edit distance in linear memory (also turns out to be faster than C++)

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published