-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multiFuzzyMatchAny function requirement #38046
Comments
The reason why it requires a constant argument is that compiling a regular expression (especially for fuzzy matching) can be heavy. Doing it for every record is unreasonable. The only viable case is when you have only a few different regexes. |
I'm expecting it to be heavy. |
Similar functionality (non-const pattern/regex arguments) was added very recently for functions
|
This commit migrates ClickHouse to Vectorscan. The first 10 min of [0] explain the reasons for it. * Addresses (but does not resolve) #38046 * This change renames config parameters, I guess this makes it backward-incompatible? * A note about where the new dependency is referenced from (.gitmodules): The ClickHouse GitHub organization already hosts a fork of hyperscan. Unfortunately, it's not possible to host within the same GitHub organization a fork of the parent (hyperscan) and a fork of the parent's fork (vectorscan). As a workaround, ClickHouse's fork of hyperscan now contains a new branch which was reset to vectorscan's master branch. TODO: - make fork of vectorscan in CH org and use that fork - search for "hyperscan" in code, remove all references - throw hyperscan out (unlink submodule) [0] https://www.youtube.com/watch?v=KlZWmmflW6M
This commit migrates ClickHouse to Vectorscan. The first 10 min of [0] explain the reasons for it. (*) Addresses (but does not resolve) #38046 (*) Names of hyperscan-related config parameters (e.g. "max_hyperscan_regexp_length") are preserved for compatibility. Likewise, error codes (e.g. "ErrorCodes::HYPERSCAN_CANNOT_SCAN_TEXT") and function/class names (e.g. "HyperscanDeleter") are preserved as vectorscan aims to be a drop-in replacement. (*) The ClickHouse GitHub organization already hosts a fork of hyperscan. Unfortunately, it's not possible to host within the same GitHub organization a fork of the parent (hyperscan) and a fork of the parent's fork (vectorscan). As a workaround, ClickHouse's fork of hyperscan now contains a new branch which was reset to vectorscan's master branch.
This commit migrates ClickHouse to Vectorscan. The first 10 min of [0] explain the reasons for it. (*) Addresses (but does not resolve) #38046 (*) Config parameter names (e.g. "max_hyperscan_regexp_length") are preserved for compatibility. Likewise, error codes (e.g. "ErrorCodes::HYPERSCAN_CANNOT_SCAN_TEXT") and function/class names (e.g. "HyperscanDeleter") are preserved as vectorscan aims to be a drop-in replacement. (*) The ClickHouse GitHub organization already hosts a fork of hyperscan. Unfortunately, it's not possible to host within the same GitHub organization a fork of the parent (hyperscan) and a fork of the parent's fork (vectorscan). As a workaround, ClickHouse's fork of hyperscan now contains a new branch which was reset to vectorscan's master branch. [0] https://www.youtube.com/watch?v=KlZWmmflW6M
This commit migrates ClickHouse to Vectorscan. The first 10 min of [0] explain the reasons for it. (*) Addresses (but does not resolve) #38046 (*) Config parameter names (e.g. "max_hyperscan_regexp_length") are preserved for compatibility. Likewise, error codes (e.g. "ErrorCodes::HYPERSCAN_CANNOT_SCAN_TEXT") and function/class names (e.g. "HyperscanDeleter") are preserved as vectorscan aims to be a drop-in replacement. (*) The ClickHouse GitHub organization already hosts a fork of hyperscan. Unfortunately, it's not possible to host within the same GitHub organization a fork of the parent (hyperscan) and a fork of the parent's fork (vectorscan). As a workaround, ClickHouse's fork of hyperscan now contains a new branch which was reset to vectorscan's master branch. [0] https://www.youtube.com/watch?v=KlZWmmflW6M
This commit migrates ClickHouse to Vectorscan. The first 10 min of [0] explain the reasons for it. (*) Addresses (but does not resolve) #38046 (*) Config parameter names (e.g. "max_hyperscan_regexp_length") are preserved for compatibility. Likewise, error codes (e.g. "ErrorCodes::HYPERSCAN_CANNOT_SCAN_TEXT") and function/class names (e.g. "HyperscanDeleter") are preserved as vectorscan aims to be a drop-in replacement. (*) The ClickHouse GitHub organization already hosts a fork of hyperscan. Unfortunately, it's not possible to host within the same GitHub organization a fork of the parent (hyperscan) and a fork of the parent's fork (vectorscan). As a workaround, ClickHouse's fork of hyperscan now contains a new branch which was reset to vectorscan's master branch. [0] https://www.youtube.com/watch?v=KlZWmmflW6M
This commit migrates ClickHouse to Vectorscan. The first 10 min of [0] explain the reasons for it. (*) Addresses (but does not resolve) #38046 (*) Config parameter names (e.g. "max_hyperscan_regexp_length") are preserved for compatibility. Likewise, error codes (e.g. "ErrorCodes::HYPERSCAN_CANNOT_SCAN_TEXT") and function/class names (e.g. "HyperscanDeleter") are preserved as vectorscan aims to be a drop-in replacement. [0] https://www.youtube.com/watch?v=KlZWmmflW6M
This commit migrates ClickHouse to Vectorscan. The first 10 min of [0] explain the reasons for it. (*) Addresses (but does not resolve) #38046 (*) Config parameter names (e.g. "max_hyperscan_regexp_length") are preserved for compatibility. Likewise, error codes (e.g. "ErrorCodes::HYPERSCAN_CANNOT_SCAN_TEXT") and function/class names (e.g. "HyperscanDeleter") are preserved as vectorscan aims to be a drop-in replacement. [0] https://www.youtube.com/watch?v=KlZWmmflW6M
This commit migrates ClickHouse to Vectorscan. The first 10 min of [0] explain the reasons for it. (*) Addresses (but does not resolve) #38046 (*) Config parameter names (e.g. "max_hyperscan_regexp_length") are preserved for compatibility. Likewise, error codes (e.g. "ErrorCodes::HYPERSCAN_CANNOT_SCAN_TEXT") and function/class names (e.g. "HyperscanDeleter") are preserved as vectorscan aims to be a drop-in replacement. [0] https://www.youtube.com/watch?v=KlZWmmflW6M
This commit migrates ClickHouse to Vectorscan. The first 10 min of [0] explain the reasons for it. (*) Addresses (but does not resolve) #38046 (*) Config parameter names (e.g. "max_hyperscan_regexp_length") are preserved for compatibility. Likewise, error codes (e.g. "ErrorCodes::HYPERSCAN_CANNOT_SCAN_TEXT") and function/class names (e.g. "HyperscanDeleter") are preserved as vectorscan aims to be a drop-in replacement. [0] https://www.youtube.com/watch?v=KlZWmmflW6M
In the multiFuzzyMatchAny function, Please make it possible to use column values instead of constants for the third variable
I really need this feature so much!
I need to extract names that match ( like operator ) similarly to the data in the array value category.names , and I plan to do this regularly.
Use case
Describe the solution you'd like
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: