Optimize Levenshtein.distance #8324

wooster0 · 2019-10-14T08:01:16Z

This avoids two expensive array allocations (.chars) because the strings can already be accessed directly.
This improves speed and vastly improves memory allocation.

Benchmark.ips do |bm|
  bm.report "old" do
    Levenshtein.distance("algorithm", "altruistic")
    Levenshtein.distance("hello", "hallo")
    Levenshtein.distance("こんにちは", "こんちは")
    Levenshtein.distance("hey", "hey")
    Levenshtein.distance("hippo", "zzzzzzzz")
    Levenshtein.distance("a" * 100000, "hello")
    Levenshtein.distance("hello", "a" * 100000)
  end

  bm.report "new" do
    Levenshtein.new_distance("algorithm", "altruistic")
    Levenshtein.new_distance("hello", "hallo")
    Levenshtein.new_distance("こんにちは", "こんちは")
    Levenshtein.new_distance("hey", "hey")
    Levenshtein.new_distance("hippo", "zzzzzzzz")
    Levenshtein.new_distance("a" * 100000, "hello")
    Levenshtein.new_distance("hello", "a" * 100000)
  end
end

old 295.78  (  3.38ms) (± 8.86%)  0.95MB/op   2.26× slower
new 669.21  (  1.49ms) (± 8.87%)   195kB/op        fastest

src/levenshtein.cr

asterite · 2019-10-14T10:12:32Z

This is wrong. to_unsafe will work at the byte level. chars work at the codepoint level. That specs pass is a coincidence.

straight-shoota · 2019-10-14T11:44:35Z

Maybe this could be optimized using Char::Reader. But slice is not going to work. This just means specs need to be improved as well. There's only a single example with non-ASCII characters, and that seems to pass for some reason.

asterite · 2019-10-14T11:55:23Z

However, it's true that if both strings are ascii_only?, a different, more efficient path could be written, and this using to_unsafe or to_slice.

wooster0 · 2019-10-14T17:21:46Z

Okay I think I figured it out now.

src/levenshtein.cr

asterite · 2021-06-04T23:42:48Z

Can you try using Slice instead of Pointer and seeing if there's any performance drop?

oprypin · 2021-06-04T23:45:22Z

@asterite The pointer stuff is pre-existing code, by the way -- just repeated twice in the new branches. That actually was going to be my 1 comment -- that line costs = Pointer(Int32).malloc(t_size + 1) { |i| i } could be written only once at the top, instead of twice.

wooster0 · 2021-06-05T06:12:10Z

require "benchmark"

Benchmark.ips do |x|
  x.report("1") { distance1("漢字", "かんじ"); distance1("hello", "HELLO") }
  x.report("2") { distance2("漢字", "かんじ"); distance2("hello", "HELLO") }
end

def distance1(string1 : String, string2 : String) : Int32
  return 0 if string1 == string2

  s_size = string1.size
  t_size = string2.size

  return t_size if s_size == 0
  return s_size if t_size == 0

  # This is to allocate less memory
  if t_size > s_size
    string1, string2 = string2, string1
    t_size, s_size = s_size, t_size
  end

  if string1.single_byte_optimizable? && string2.single_byte_optimizable?
    s = string1.to_unsafe
    t = string2.to_unsafe

    costs = Slice(Int32).new(t_size + 1) { |i| i }

    last_cost = 0
    s_size.times do |i|
      last_cost = i + 1

      t_size.times do |j|
        sub_cost = s[i] == t[j] ? 0 : 1
        cost = Math.min(Math.min(last_cost + 1, costs[j + 1] + 1), costs[j] + sub_cost)
        costs[j] = last_cost
        last_cost = cost
      end
      costs[t_size] = last_cost
    end

    last_cost
  else
    reader = Char::Reader.new(string1)

    # Use an array instead of a reader to decode the second string only once
    chars = string2.chars

    costs = Slice(Int32).new(t_size + 1) { |i| i }

    last_cost = 0
    reader.each_with_index do |char1, i|
      last_cost = i + 1

      chars.each_with_index do |char2, j|
        sub_cost = char1 == char2 ? 0 : 1
        cost = Math.min(Math.min(last_cost + 1, costs[j + 1] + 1), costs[j] + sub_cost)
        costs[j] = last_cost
        last_cost = cost
      end
      costs[t_size] = last_cost
    end

    last_cost
  end
end

def distance2(string1 : String, string2 : String) : Int32
  return 0 if string1 == string2

  s_size = string1.size
  t_size = string2.size

  return t_size if s_size == 0
  return s_size if t_size == 0

  # This is to allocate less memory
  if t_size > s_size
    string1, string2 = string2, string1
    t_size, s_size = s_size, t_size
  end

  if string1.single_byte_optimizable? && string2.single_byte_optimizable?
    s = string1.to_unsafe
    t = string2.to_unsafe

    costs = Pointer(Int32).malloc(t_size + 1) { |i| i }

    last_cost = 0
    s_size.times do |i|
      last_cost = i + 1

      t_size.times do |j|
        sub_cost = s[i] == t[j] ? 0 : 1
        cost = Math.min(Math.min(last_cost + 1, costs[j + 1] + 1), costs[j] + sub_cost)
        costs[j] = last_cost
        last_cost = cost
      end
      costs[t_size] = last_cost
    end

    last_cost
  else
    reader = Char::Reader.new(string1)

    # Use an array instead of a reader to decode the second string only once
    chars = string2.chars

    costs = Pointer(Int32).malloc(t_size + 1) { |i| i }

    last_cost = 0
    reader.each_with_index do |char1, i|
      last_cost = i + 1

      chars.each_with_index do |char2, j|
        sub_cost = char1 == char2 ? 0 : 1
        cost = Math.min(Math.min(last_cost + 1, costs[j + 1] + 1), costs[j] + sub_cost)
        costs[j] = last_cost
        last_cost = cost
      end
      costs[t_size] = last_cost
    end

    last_cost
  end
end

1   5.82M (171.81ns) (± 9.03%)  96.0B/op        fastest
2   5.62M (178.05ns) (± 7.41%)  96.0B/op   1.04× slower

I've changed it.
to_unsafe can not be removed though.

caspiano · 2021-06-05T07:19:45Z

I tried with to_slice and got the following, where old is to_unsafe and new is to_slice

old 459.71  (  2.18ms) (± 9.33%)  195kB/op        fastest
new 435.59  (  2.30ms) (± 9.91%)  195kB/op   1.06× slower

wooster0 · 2021-06-05T07:51:12Z

It does not seem to be noise but rather a consistent slowdown though. 1 is always the slice version with to_slice & Slice here.

$ crystal run --release ../a.cr
1   5.24M (191.00ns) (± 8.74%)  96.0B/op   1.03× slower
2   5.37M (186.34ns) (± 9.72%)  96.0B/op        fastest
$ crystal run --release ../a.cr
1   5.27M (189.82ns) (± 7.37%)  96.0B/op   1.02× slower
2   5.38M (185.84ns) (± 9.13%)  96.0B/op        fastest
$ crystal run --release ../a.cr
1   5.19M (192.85ns) (± 7.41%)  96.0B/op   1.07× slower
2   5.55M (180.04ns) (± 8.09%)  96.0B/op        fastest
$ crystal run --release ../a.cr
2   5.53M (180.86ns) (± 7.69%)  96.0B/op        fastest
1   5.29M (188.96ns) (± 7.04%)  96.0B/op   1.04× slower
$ crystal run --release ../a.cr
2   5.34M (187.09ns) (±12.38%)  96.0B/op        fastest
1   4.73M (211.50ns) (±12.11%)  96.0B/op   1.13× slower
$ crystal run --release ../a.cr
2   5.55M (180.18ns) (± 7.90%)  96.0B/op        fastest
1   5.23M (191.14ns) (± 8.05%)  96.0B/op   1.06× slower

Is that worth it?

asterite · 2021-06-05T09:40:27Z

Thanks for the Slice stuff. I didn't realize it was using pointers before. I just wanted to make sure we don't end up with undefined behavior in case the code has a bug.

Optimize Levenshtein.distance

83a3fc9

yxhuvud reviewed Oct 14, 2019

View reviewed changes

src/levenshtein.cr Outdated Show resolved Hide resolved

Add a non-ASCII path

7d3d9d1

More specs

a569a94

wooster0 commented Oct 14, 2019

View reviewed changes

src/levenshtein.cr Outdated Show resolved Hide resolved

asterite reviewed Oct 14, 2019

View reviewed changes

src/levenshtein.cr Outdated Show resolved Hide resolved

wooster0 added 3 commits October 15, 2019 13:49

Use both .chars and Char::Reader

e7cc245

Improve some variable names

cb7ad9e

Format the file and add "second"

f768697

wooster0 requested a review from asterite October 19, 2019 14:46

Remove unnecessary index definition

c2dcbf9

jhass reviewed May 19, 2020

View reviewed changes

src/levenshtein.cr Outdated Show resolved Hide resolved

straight-shoota reviewed May 6, 2021

View reviewed changes

src/levenshtein.cr Outdated Show resolved Hide resolved

straight-shoota reviewed May 6, 2021

View reviewed changes

src/levenshtein.cr Outdated Show resolved Hide resolved

straight-shoota added performance topic:stdlib:text labels May 6, 2021

wooster0 added 2 commits May 6, 2021 18:52

Swap string1 and string2 + use single_byte_optimizable

e531ae6

Fix specs and change the algorithm back to the original

a74fabb

straight-shoota reviewed May 6, 2021

View reviewed changes

src/levenshtein.cr Show resolved Hide resolved

Format + readd comment

52157a7

straight-shoota approved these changes May 6, 2021

View reviewed changes

oprypin approved these changes Jun 4, 2021

View reviewed changes

Change costs to be a Slice

9a0ca6c

malloc -> new

3560708

oprypin approved these changes Jun 5, 2021

View reviewed changes

asterite added this to the 1.1.0 milestone Jun 5, 2021

asterite merged commit 35a36b1 into crystal-lang:master Jun 5, 2021

caspiano mentioned this pull request Oct 27, 2021

Implement Myers algorithm for Levenshtein distance calculation #11370

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Levenshtein.distance #8324

Optimize Levenshtein.distance #8324

wooster0 commented Oct 14, 2019

asterite commented Oct 14, 2019

straight-shoota commented Oct 14, 2019

asterite commented Oct 14, 2019

wooster0 commented Oct 14, 2019

asterite commented Jun 4, 2021

oprypin commented Jun 4, 2021

wooster0 commented Jun 5, 2021

caspiano commented Jun 5, 2021

wooster0 commented Jun 5, 2021 •

edited

Loading

asterite commented Jun 5, 2021

Optimize Levenshtein.distance #8324

Optimize Levenshtein.distance #8324

Conversation

wooster0 commented Oct 14, 2019

asterite commented Oct 14, 2019

straight-shoota commented Oct 14, 2019

asterite commented Oct 14, 2019

wooster0 commented Oct 14, 2019

asterite commented Jun 4, 2021

oprypin commented Jun 4, 2021

wooster0 commented Jun 5, 2021

caspiano commented Jun 5, 2021

wooster0 commented Jun 5, 2021 • edited Loading

asterite commented Jun 5, 2021

wooster0 commented Jun 5, 2021 •

edited

Loading