You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need to search on columns of Array type using regular expressions. The current realization of 'match' function (https://clickhouse.com/docs/en/sql-reference/functions/string-search-functions/#matchhaystack-pattern) works with single value columns only and doesn't support columns of Array type. We know, it's possible to use something like arrayExists(x -> match(x,'<regex_value>'),<column>) = 1, but lambdas work too slow on big amount of data (1kkk values and more).
Describe the solution you'd like
It would be nice to add support for columns of Array type in 'match' function. Regular expression pattern should be applied to all array items. 0 should be returned if none of array items matched the pattern, 1 otherwise. Separate function will be also accepted.
Describe alternatives you've considered
Array iteration with lambda could be a solution as shown above, but we need better lambdas performance on big amount of data (>1kkk values in table).
The text was updated successfully, but these errors were encountered:
I think the main issue here is not with the lambda but the amount of data. Array columns are big and the cost of reading are expensive (especially array of strings in your case). I think what you need is an inverted index, which haven't supported yet.
Use case
We need to search on columns of Array type using regular expressions. The current realization of 'match' function (https://clickhouse.com/docs/en/sql-reference/functions/string-search-functions/#matchhaystack-pattern) works with single value columns only and doesn't support columns of Array type. We know, it's possible to use something like
arrayExists(x -> match(x,'<regex_value>'),<column>) = 1
, but lambdas work too slow on big amount of data (1kkk values and more).Describe the solution you'd like
It would be nice to add support for columns of Array type in 'match' function. Regular expression pattern should be applied to all array items. 0 should be returned if none of array items matched the pattern, 1 otherwise. Separate function will be also accepted.
Describe alternatives you've considered
Array iteration with lambda could be a solution as shown above, but we need better lambdas performance on big amount of data (>1kkk values in table).
The text was updated successfully, but these errors were encountered: