-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi match with non const patterns #38485
Conversation
2e6c2fe
to
82442b4
Compare
@Mergifyio update |
✅ Branch has been successfully updated |
17fc728
to
c40a9fb
Compare
#38434 (comment) #38434 (comment) #38434 (comment) (the latter is no longer relevant as the affected places were removed in the meantime)
After making function multi[Fuzzy]Match(Any|AnyIndex|AllIndices)() work with non-const needles, 12 more functions started to fail in test "00233_position_function_family": multiSearchAny() multiSearchAnyCaseInsensitive() multiSearchAnyUTF8 multiSearchAnyCaseInsensitiveUTF8() multiSearchFirstPosition() multiSearchFirstPositionCaseInsensitive() multiSearchFirstPositionUTF8() multiSearchFirstPositionCaseInsensitiveUTF8() multiSearchFirstIndex() multiSearchFirstIndexCaseInsensitive() multiSearchFirstIndexUTF8() multiSearchFirstIndexCaseInsensitiveUTF8() Failing queries take the form select 0 = multiSearchAny('\0', CAST([], 'Array(String)'));
c40a9fb
to
1eed72b
Compare
Difficult to test because x86 and ARM behave differently due to vectorscan not available everyehwere ... The same stuff (multiFuzzyMatch) is tested elsewhere.
Dockerhub died before integration/stateless tests ran. Will need to fix some tests anyways, hopefully Dockerhub is up the next time. |
@Mergifyio update |
✅ Branch has been successfully updated |
The stress test errors are related to keeper and unrelated. |
I decided to experiment a bit and found logical error. It wasn't introduced in this PR, the same error occurs in master.
@rschu1ze As you are already familiar with this code, maybe it will be easy to fix it for you. But let's do it in a separate PR with bugfix changelog category |
What about functions |
Generally speaking, searching with non-const needles is more expensive than with const needles as the searcher needs to be re-initialized once per row (which is slow for fast Volnitsky search and fast for slow std lib search). So, non-const needles are nice-to-have from a functional POV but not recommended. This PR is about three functions for which a user wanted non-const needles. While doing so, I had to add support for non-const needles to more functions. multiSearchAllPositions() wasn't one of these. Looking at it's implementation, it should not be too difficult to also allow non-const needles but that would go into a separate PR then. |
} | ||
#else | ||
/// fallback if vectorscan is not compiled | ||
/// -- the code is copypasted from vectorVector() in MatchImpl.h and quite complex code ... all of it can be removed once vectorscan is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch is called in a quite obscure edge case ... for functions mutliMatchAny() / multiMatchAnyIndex() with non-const needles on platforms where vectorscan is disabled (all non-x86 at the moment). It is a bit fragile and I had to write it from scratch using some copypasta from elsewhere because the technique used in the fallback for const needles didn't work conceptually.
I did some local testing to make sure it's okay. However, if problems in CI tests pop up, I would likely remove it altogether and throw a "not implemented" exception instead. The same is done for function multiMatchAllIndices() which is sort of related. There is a good chance that vectorscan can be enabled on non-x86 so we could throw the fallback away anyways.
Interesting, thanks for finding this. Will take a look. |
192debd
to
1de5e9a
Compare
All tests are green except the ubsan stress test ("Backward compatibility check: Server failed to start"). But the failure is is caused by network issues and keeper. Other stress tests ran fine. Guess we are good to merge. |
Resolves #38046
Functions multiMatchAny(), multiMatchAnyIndex(), multiMatchAllIndices() and their fuzzy variants now accept non-const pattern array argument
While working on this PR, I noticed that functions
broke and needed the same treatment (non-const needles). So I fixed that too which eventually blew this PR a bit up.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Functions multiMatchAny(), multiMatchAnyIndex(), multiMatchAllIndices() and their fuzzy variants now accept non-const pattern array argument