-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Levenshtein.distance #8324
Conversation
This is wrong. to_unsafe will work at the byte level. chars work at the codepoint level. That specs pass is a coincidence. |
Maybe this could be optimized using |
However, it's true that if both strings are |
Okay I think I figured it out now. |
Can you try using Slice instead of Pointer and seeing if there's any performance drop? |
@asterite The pointer stuff is pre-existing code, by the way -- just repeated twice in the new branches. That actually was going to be my 1 comment -- that line |
require "benchmark"
Benchmark.ips do |x|
x.report("1") { distance1("漢字", "かんじ"); distance1("hello", "HELLO") }
x.report("2") { distance2("漢字", "かんじ"); distance2("hello", "HELLO") }
end
def distance1(string1 : String, string2 : String) : Int32
return 0 if string1 == string2
s_size = string1.size
t_size = string2.size
return t_size if s_size == 0
return s_size if t_size == 0
# This is to allocate less memory
if t_size > s_size
string1, string2 = string2, string1
t_size, s_size = s_size, t_size
end
if string1.single_byte_optimizable? && string2.single_byte_optimizable?
s = string1.to_unsafe
t = string2.to_unsafe
costs = Slice(Int32).new(t_size + 1) { |i| i }
last_cost = 0
s_size.times do |i|
last_cost = i + 1
t_size.times do |j|
sub_cost = s[i] == t[j] ? 0 : 1
cost = Math.min(Math.min(last_cost + 1, costs[j + 1] + 1), costs[j] + sub_cost)
costs[j] = last_cost
last_cost = cost
end
costs[t_size] = last_cost
end
last_cost
else
reader = Char::Reader.new(string1)
# Use an array instead of a reader to decode the second string only once
chars = string2.chars
costs = Slice(Int32).new(t_size + 1) { |i| i }
last_cost = 0
reader.each_with_index do |char1, i|
last_cost = i + 1
chars.each_with_index do |char2, j|
sub_cost = char1 == char2 ? 0 : 1
cost = Math.min(Math.min(last_cost + 1, costs[j + 1] + 1), costs[j] + sub_cost)
costs[j] = last_cost
last_cost = cost
end
costs[t_size] = last_cost
end
last_cost
end
end
def distance2(string1 : String, string2 : String) : Int32
return 0 if string1 == string2
s_size = string1.size
t_size = string2.size
return t_size if s_size == 0
return s_size if t_size == 0
# This is to allocate less memory
if t_size > s_size
string1, string2 = string2, string1
t_size, s_size = s_size, t_size
end
if string1.single_byte_optimizable? && string2.single_byte_optimizable?
s = string1.to_unsafe
t = string2.to_unsafe
costs = Pointer(Int32).malloc(t_size + 1) { |i| i }
last_cost = 0
s_size.times do |i|
last_cost = i + 1
t_size.times do |j|
sub_cost = s[i] == t[j] ? 0 : 1
cost = Math.min(Math.min(last_cost + 1, costs[j + 1] + 1), costs[j] + sub_cost)
costs[j] = last_cost
last_cost = cost
end
costs[t_size] = last_cost
end
last_cost
else
reader = Char::Reader.new(string1)
# Use an array instead of a reader to decode the second string only once
chars = string2.chars
costs = Pointer(Int32).malloc(t_size + 1) { |i| i }
last_cost = 0
reader.each_with_index do |char1, i|
last_cost = i + 1
chars.each_with_index do |char2, j|
sub_cost = char1 == char2 ? 0 : 1
cost = Math.min(Math.min(last_cost + 1, costs[j + 1] + 1), costs[j] + sub_cost)
costs[j] = last_cost
last_cost = cost
end
costs[t_size] = last_cost
end
last_cost
end
end
I've changed it. |
I tried with
|
It does not seem to be noise but rather a consistent slowdown though. $ crystal run --release ../a.cr
1 5.24M (191.00ns) (± 8.74%) 96.0B/op 1.03× slower
2 5.37M (186.34ns) (± 9.72%) 96.0B/op fastest
$ crystal run --release ../a.cr
1 5.27M (189.82ns) (± 7.37%) 96.0B/op 1.02× slower
2 5.38M (185.84ns) (± 9.13%) 96.0B/op fastest
$ crystal run --release ../a.cr
1 5.19M (192.85ns) (± 7.41%) 96.0B/op 1.07× slower
2 5.55M (180.04ns) (± 8.09%) 96.0B/op fastest
$ crystal run --release ../a.cr
2 5.53M (180.86ns) (± 7.69%) 96.0B/op fastest
1 5.29M (188.96ns) (± 7.04%) 96.0B/op 1.04× slower
$ crystal run --release ../a.cr
2 5.34M (187.09ns) (±12.38%) 96.0B/op fastest
1 4.73M (211.50ns) (±12.11%) 96.0B/op 1.13× slower
$ crystal run --release ../a.cr
2 5.55M (180.18ns) (± 7.90%) 96.0B/op fastest
1 5.23M (191.14ns) (± 8.05%) 96.0B/op 1.06× slower Is that worth it? |
Thanks for the Slice stuff. I didn't realize it was using pointers before. I just wanted to make sure we don't end up with undefined behavior in case the code has a bug. |
This avoids two expensive array allocations (
.chars
) because the strings can already be accessed directly.This improves speed and vastly improves memory allocation.