-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Starrocks regexp function support CASE_INSENSITIVE #14439
Comments
Cause regexp in SR support regular expression, so you can write sth like |
also |
As jxzdoing mentioned, the regexp better add new optional parameter:
that means:
|
Hi @wangsimo0 , please assign this to me. Thank you. |
As @jxzdoing said, or the ilike function that @wangsimo0 referred to should we add a new function such as named "regexp_ignore" to support case-insensitive scenario? I feel that this will look more intuitive and easier for users to understand. such as select regexp_ignore('mac','MAC');
+----------------------+
| 1 |
+----------------------+
select regexp_ignore('max','MAC');
+----------------------+
| 0 |
+----------------------+
select regexp('mac','MAC');
+----------------------+
| 0 |
+----------------------+
|
Then I will implement it according to the following method.As jxzdoing mentioned, the regexp better add new optional parameter: c: Case-sensitive matching (the default). select regexp('mac','MAC','i'); select regexp('mac','MAC','c'); select regexp('mac','MAC'); |
@numbernumberone That's great! Personally, I prefer the parameter design. There are other examples in other databases: does this make sense? or do you have any other recommendations? |
Thank you very much for the clear description. I followed the parameter mode. |
@wangsimo0 Teacher Wang, I am currently facing some difficulties and need to ask for your help. This is the code I submitted for this function. Can you help me identify where the problem is? I have been trying to debug the program for a long time but couldn't find a solution. Do you have any good suggestions for me? Thank you very much. |
2023-04-24 10:22:50,312 WARN (starrocks-mysql-nio-pool-0|120) [ConnectProcessor.handleQuery():369] Process one query failed. SQL: select regexp('mac','MAC','i'), because. |
You have done a good job at the function execution level. |
Thank you very much director, the error message I provided is the error of the FE node. Logically, if I modify functions.py, will fe automatically generate the code for antlr syntax parsing? [60021, 'REGEXP', 'BOOLEAN', ['VARCHAR', 'VARCHAR', 'VARCHAR'], 'LikePredicate::regex', 'LikePredicate::regex_prepare',
'LikePredicate::regex_close'], |
|
Thank you so much. Could I ask if there are any related reference materials or explanations? Because I didn't see any instructions for such as: https://github.com/StarRocks/starrocks/pull/1264/files and
|
1 similar comment
Thank you so much. Could I ask if there are any related reference materials or explanations? Because I didn't see any instructions for such as: https://github.com/StarRocks/starrocks/pull/1264/files and
|
For most common function, we have defined a common rule to parse them, but |
I would like to ask for advice,Constant strings are not using regular expressions, but rather using string matching algorithms. So, in a case-insensitive scenario, should string matching be used instead of regular expression matching? |
Do you mean the second paramter is a constant string? |
yeah // The following four conditionals check if the pattern is a constant string,
// starts with a constant string and is followed by any number of wildcard characters,
// ends with a constant string and is preceded by any number of wildcard characters or
// has a constant substring surrounded on both sides by any number of wildcard
// characters. In any of these conditions, we can search for the pattern more
// efficiently by using our own string match functions rather than regex matching.
if (RE2::FullMatch(pattern_str, EQUALS_RE, &search_string)) {
state->set_search_string(search_string);
state->function = &constant_equals_fn;
} else if (RE2::FullMatch(pattern_str, STARTS_WITH_RE, &search_string)) {
state->set_search_string(search_string);
state->function = &constant_starts_with_fn;
} else if (RE2::FullMatch(pattern_str, ENDS_WITH_RE, &search_string)) {
state->set_search_string(search_string);
state->function = &constant_ends_with_fn;
} else if (RE2::FullMatch(pattern_str, SUBSTRING_RE, &search_string)) {
state->set_search_string(search_string);
state->function = &constant_substring_fn;
} else {
RETURN_IF_ERROR(compile_with_hyperscan_or_re2<false>(pattern_str, state, context, pattern));
} |
You can try it and test if your string case-insensitive matching algorithm is better than common regular expression matching algorithm. |
Ok, thank you very much |
We have marked this issue as stale because it has been inactive for 6 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to StarRocks! |
Sorry, I just took some time out recently to continue working on it. It will be done soon |
Question, is a non-regexp version of Edit: Answering this for others, |
It will be appreciated. Also it seems regexp in starrocks handles "new line" differently in matching In starrocks, "abc\nd" matches "abc.*d":
Not in Mariadb:
|
Feature request
Now,Starrocks function "regexp" match content CASE_SENSITIVE,But we need to ignore case when matching in our scenario
should we add a new function such as named "regexp_ignore" or rewrite the function "regexp" add a default parameter "case_sensitive",if we don't pass the parameter,default is CASE_INSENSITIVE,if we pass value mean CASE_INSENSITIVE,sunch as:
select regexp("mac","MAC",1);
return 1
select regexp("mac","MAC");
return 0
The text was updated successfully, but these errors were encountered: