New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regex-dna with &[u8] matching #31

Closed
llogiq opened this Issue Apr 22, 2016 · 4 comments

Comments

Projects
None yet
2 participants
@llogiq
Contributor

llogiq commented Apr 22, 2016

I have a gist where I use regex::bytes instead of plain regex. I'd have thought that it should be faster (because we can get rid of the UTF-8 check), but in fact it is slower. @BurntSushi there's probably some reason for that, just a heads-up; if the bytes variant should get faster we can retry the measurements.

On my system:

variant wall user sys
UTF-8 1.52 3.94 0.33
bytes 2.23 4.50 0.35
@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Apr 22, 2016

Contributor

Yeah, I noticed this a few weeks ago and though it was curious. I actually can't think of any obvious reason why it's slower. My (more extensive) benchmark suite says Regex::new and bytes::Regex::new have comparable performance. I'll look into it.

Another thing you could try is from_utf8_unchecked.

Contributor

BurntSushi commented Apr 22, 2016

Yeah, I noticed this a few weeks ago and though it was curious. I actually can't think of any obvious reason why it's slower. My (more extensive) benchmark suite says Regex::new and bytes::Regex::new have comparable performance. I'll look into it.

Another thing you could try is from_utf8_unchecked.

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Apr 22, 2016

Contributor

Ahahahahaha. This is because replace_all in Regex uses push_str, which should be a memcpy. On the other hand, bytes::Regex uses extend, which seems to be slower, and therefore I'm guessing doesn't get optimized down to a memcpy. Nice.

Contributor

BurntSushi commented Apr 22, 2016

Ahahahahaha. This is because replace_all in Regex uses push_str, which should be a memcpy. On the other hand, bytes::Regex uses extend, which seems to be slower, and therefore I'm guessing doesn't get optimized down to a memcpy. Nice.

@BurntSushi

This comment has been minimized.

Show comment
Hide comment
@BurntSushi

BurntSushi Apr 22, 2016

Contributor

OK, once I fixed that in regex, the best of 3 for bytes::Regex was 0.59s and for Regex it was 0.61s. I figured it would be that small. It's barely perceptible and probably in the noise because I not-so-infrequently see both of them get up to 0.7s.

Contributor

BurntSushi commented Apr 22, 2016

OK, once I fixed that in regex, the best of 3 for bytes::Regex was 0.59s and for Regex it was 0.61s. I figured it would be that small. It's barely perceptible and probably in the noise because I not-so-infrequently see both of them get up to 0.7s.

BurntSushi added a commit to rust-lang/regex that referenced this issue Apr 22, 2016

Fixes a performance bug in bytes::Regex::replace.
This was using `Vec::extend` to accumulate bytes in a buffer, but this
compiles down to less efficient code than, say, `Vec::extend_from_slice`.
However, that method is newly available as of Rust 1.6, so we do a small
backport to regain performance.

This bug was noticed by @llogiq here: TeXitoi/benchmarksgame-rs#31
In particular, this increases the performance of bytes::Regex two-fold on
that benchmark.
@llogiq

This comment has been minimized.

Show comment
Hide comment
@llogiq

llogiq Apr 23, 2016

Contributor

Ok then I'm going to wait for a new Regex release so I can submit it without complicating our makefile. Thanks @BurntSushi!

Contributor

llogiq commented Apr 23, 2016

Ok then I'm going to wait for a new Regex release so I can submit it without complicating our makefile. Thanks @BurntSushi!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment