-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Rabin-Karp algorithm for String#rindex #3876
Implement Rabin-Karp algorithm for String#rindex #3876
Conversation
Is it faster for short and long strings? |
Old implementation is to scan string from head to end, and short string benchmark has no match but long string benchmark has match in middle of string, so long string benchmark is better score. Generally, the longer string length is, the better score. |
Rabin-Karp is also faster for shorter strings. The old algorithm was very naive, for every char in the string check if there's a match from there on, so Note that I knew the old algorithm was slow, it just wasn't a priority to improve it. There are many parts in the std that could use an optimization review. @makenowjust By the way, there's also |
@asterite I have known it. I'll work tomorrow. Good night... |
Awesome! Crystal is already quite fast; but I won't complain if it gets even faster! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the spec that was mysteriously removed, if you can squash the format tool commit to the parent one we will be fine to merge this as is.
Thanks for the performance addition and sending clear benchmarks with them 👍
@@ -653,7 +653,8 @@ describe "String" do | |||
|
|||
describe "with offset" do | |||
assert { "foo baro baz".rindex("o b", 6).should eq(2) } | |||
assert { "foo baro baz".rindex("fg").should be_nil } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this spec was removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is this in "with offset"
section, but it has no offset option and it exists already in another line, so I thought it is just copy-and-paste mistaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap, you are right.
I posted a possible improvement to the algorithm in the previous (for index) change #3873. I think it should be reviewed before merging this copy of it. |
52dbfe9
to
e460317
Compare
@MaxLap Your idea is awesome, but I think we cannot apply it to |
e460317
to
85a6044
Compare
@bcardiff Rebased |
Ah, i see what you mean. A check could still be made to do another step unless the current byte is a valid starting one, but there may be less to gain here. You could still try it if you want.
|
e61369e
to
991c6bd
Compare
Keep up the great work people 🎉 😍 |
991c6bd
to
dac6c94
Compare
@MaxLap Thanks. I applied some optimizations. |
Thanks @makenowjust and @MaxLap ! ⚡️ ⚡️ |
little changed behavior after this:
not sure what is really correct. |
Really? for me, on 0.20.5, I still get true O_o |
ops i miss versions, fixed |
True would be the same result as It may be interesting to add this edge case to the specs. |
Reverse version of this.
Benchmark is here.
My result is:
(I recommend you run this benchmark on your environment ;)