Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Levenshtein.distance #8324

Merged
merged 12 commits into from
Jun 5, 2021
Merged

Conversation

wooster0
Copy link
Contributor

This avoids two expensive array allocations (.chars) because the strings can already be accessed directly.
This improves speed and vastly improves memory allocation.

Benchmark.ips do |bm|
  bm.report "old" do
    Levenshtein.distance("algorithm", "altruistic")
    Levenshtein.distance("hello", "hallo")
    Levenshtein.distance("こんにちは", "こんちは")
    Levenshtein.distance("hey", "hey")
    Levenshtein.distance("hippo", "zzzzzzzz")
    Levenshtein.distance("a" * 100000, "hello")
    Levenshtein.distance("hello", "a" * 100000)
  end

  bm.report "new" do
    Levenshtein.new_distance("algorithm", "altruistic")
    Levenshtein.new_distance("hello", "hallo")
    Levenshtein.new_distance("こんにちは", "こんちは")
    Levenshtein.new_distance("hey", "hey")
    Levenshtein.new_distance("hippo", "zzzzzzzz")
    Levenshtein.new_distance("a" * 100000, "hello")
    Levenshtein.new_distance("hello", "a" * 100000)
  end
end
old 295.78  (  3.38ms) (± 8.86%)  0.95MB/op   2.26× slower
new 669.21  (  1.49ms) (± 8.87%)   195kB/op        fastest

src/levenshtein.cr Outdated Show resolved Hide resolved
@asterite
Copy link
Member

This is wrong. to_unsafe will work at the byte level. chars work at the codepoint level. That specs pass is a coincidence.

@straight-shoota
Copy link
Member

Maybe this could be optimized using Char::Reader. But slice is not going to work. This just means specs need to be improved as well. There's only a single example with non-ASCII characters, and that seems to pass for some reason.

@asterite
Copy link
Member

However, it's true that if both strings are ascii_only?, a different, more efficient path could be written, and this using to_unsafe or to_slice.

@wooster0
Copy link
Contributor Author

Okay I think I figured it out now.

src/levenshtein.cr Outdated Show resolved Hide resolved
src/levenshtein.cr Outdated Show resolved Hide resolved
src/levenshtein.cr Outdated Show resolved Hide resolved
src/levenshtein.cr Outdated Show resolved Hide resolved
src/levenshtein.cr Outdated Show resolved Hide resolved
@asterite
Copy link
Member

asterite commented Jun 4, 2021

Can you try using Slice instead of Pointer and seeing if there's any performance drop?

@oprypin
Copy link
Member

oprypin commented Jun 4, 2021

@asterite The pointer stuff is pre-existing code, by the way -- just repeated twice in the new branches. That actually was going to be my 1 comment -- that line costs = Pointer(Int32).malloc(t_size + 1) { |i| i } could be written only once at the top, instead of twice.

@wooster0
Copy link
Contributor Author

wooster0 commented Jun 5, 2021

require "benchmark"

Benchmark.ips do |x|
  x.report("1") { distance1("漢字", "かんじ"); distance1("hello", "HELLO") }
  x.report("2") { distance2("漢字", "かんじ"); distance2("hello", "HELLO") }
end

def distance1(string1 : String, string2 : String) : Int32
  return 0 if string1 == string2

  s_size = string1.size
  t_size = string2.size

  return t_size if s_size == 0
  return s_size if t_size == 0

  # This is to allocate less memory
  if t_size > s_size
    string1, string2 = string2, string1
    t_size, s_size = s_size, t_size
  end

  if string1.single_byte_optimizable? && string2.single_byte_optimizable?
    s = string1.to_unsafe
    t = string2.to_unsafe

    costs = Slice(Int32).new(t_size + 1) { |i| i }

    last_cost = 0
    s_size.times do |i|
      last_cost = i + 1

      t_size.times do |j|
        sub_cost = s[i] == t[j] ? 0 : 1
        cost = Math.min(Math.min(last_cost + 1, costs[j + 1] + 1), costs[j] + sub_cost)
        costs[j] = last_cost
        last_cost = cost
      end
      costs[t_size] = last_cost
    end

    last_cost
  else
    reader = Char::Reader.new(string1)

    # Use an array instead of a reader to decode the second string only once
    chars = string2.chars

    costs = Slice(Int32).new(t_size + 1) { |i| i }

    last_cost = 0
    reader.each_with_index do |char1, i|
      last_cost = i + 1

      chars.each_with_index do |char2, j|
        sub_cost = char1 == char2 ? 0 : 1
        cost = Math.min(Math.min(last_cost + 1, costs[j + 1] + 1), costs[j] + sub_cost)
        costs[j] = last_cost
        last_cost = cost
      end
      costs[t_size] = last_cost
    end

    last_cost
  end
end

def distance2(string1 : String, string2 : String) : Int32
  return 0 if string1 == string2

  s_size = string1.size
  t_size = string2.size

  return t_size if s_size == 0
  return s_size if t_size == 0

  # This is to allocate less memory
  if t_size > s_size
    string1, string2 = string2, string1
    t_size, s_size = s_size, t_size
  end

  if string1.single_byte_optimizable? && string2.single_byte_optimizable?
    s = string1.to_unsafe
    t = string2.to_unsafe

    costs = Pointer(Int32).malloc(t_size + 1) { |i| i }

    last_cost = 0
    s_size.times do |i|
      last_cost = i + 1

      t_size.times do |j|
        sub_cost = s[i] == t[j] ? 0 : 1
        cost = Math.min(Math.min(last_cost + 1, costs[j + 1] + 1), costs[j] + sub_cost)
        costs[j] = last_cost
        last_cost = cost
      end
      costs[t_size] = last_cost
    end

    last_cost
  else
    reader = Char::Reader.new(string1)

    # Use an array instead of a reader to decode the second string only once
    chars = string2.chars

    costs = Pointer(Int32).malloc(t_size + 1) { |i| i }

    last_cost = 0
    reader.each_with_index do |char1, i|
      last_cost = i + 1

      chars.each_with_index do |char2, j|
        sub_cost = char1 == char2 ? 0 : 1
        cost = Math.min(Math.min(last_cost + 1, costs[j + 1] + 1), costs[j] + sub_cost)
        costs[j] = last_cost
        last_cost = cost
      end
      costs[t_size] = last_cost
    end

    last_cost
  end
end
1   5.82M (171.81ns) (± 9.03%)  96.0B/op        fastest
2   5.62M (178.05ns) (± 7.41%)  96.0B/op   1.04× slower

I've changed it.
to_unsafe can not be removed though.

@caspiano
Copy link
Contributor

caspiano commented Jun 5, 2021

I tried with to_slice and got the following, where old is to_unsafe and new is to_slice

old 459.71  (  2.18ms) (± 9.33%)  195kB/op        fastest
new 435.59  (  2.30ms) (± 9.91%)  195kB/op   1.06× slower

@wooster0
Copy link
Contributor Author

wooster0 commented Jun 5, 2021

It does not seem to be noise but rather a consistent slowdown though. 1 is always the slice version with to_slice & Slice here.

$ crystal run --release ../a.cr
1   5.24M (191.00ns) (± 8.74%)  96.0B/op   1.03× slower
2   5.37M (186.34ns) (± 9.72%)  96.0B/op        fastest
$ crystal run --release ../a.cr
1   5.27M (189.82ns) (± 7.37%)  96.0B/op   1.02× slower
2   5.38M (185.84ns) (± 9.13%)  96.0B/op        fastest
$ crystal run --release ../a.cr
1   5.19M (192.85ns) (± 7.41%)  96.0B/op   1.07× slower
2   5.55M (180.04ns) (± 8.09%)  96.0B/op        fastest
$ crystal run --release ../a.cr
2   5.53M (180.86ns) (± 7.69%)  96.0B/op        fastest
1   5.29M (188.96ns) (± 7.04%)  96.0B/op   1.04× slower
$ crystal run --release ../a.cr
2   5.34M (187.09ns) (±12.38%)  96.0B/op        fastest
1   4.73M (211.50ns) (±12.11%)  96.0B/op   1.13× slower
$ crystal run --release ../a.cr
2   5.55M (180.18ns) (± 7.90%)  96.0B/op        fastest
1   5.23M (191.14ns) (± 8.05%)  96.0B/op   1.06× slower

Is that worth it?

@asterite asterite added this to the 1.1.0 milestone Jun 5, 2021
@asterite asterite merged commit 35a36b1 into crystal-lang:master Jun 5, 2021
@asterite
Copy link
Member

asterite commented Jun 5, 2021

Thanks for the Slice stuff. I didn't realize it was using pointers before. I just wanted to make sure we don't end up with undefined behavior in case the code has a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants