Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Update regexp_extract function for trino parser #34845

Merged

Conversation

zenoyang
Copy link
Contributor

@zenoyang zenoyang commented Nov 13, 2023

Why I'm doing:
In StarRocks, the regexp_extract function returns an empty string when there is no matching content. Trino returns NULL. The two behave inconsistently.

What I'm doing:
Function conversion in trino compatibility layer: regexp_extract -> if(regexp_extract(xxx)= ", null, regexp_extract(xxx))

Fixes #34026

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.2
    • 3.1
    • 3.0
    • 2.5

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

// regexp_extract -> if(regexp_extract(xxx)='', null, regexp_extract(xxx))
registerFunctionTransformer("regexp_extract", 2,
new FunctionCallExpr("if", ImmutableList.of(predicate, new NullLiteral(), regexpExtractFunc))
);
}

private static void registerJsonFunctionTransformer() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most risky bug in this code is:
Incorrect argument replacement in the placeholder transformation for regexp_extract function call.

You can modify the code like this:

@@ -222,13 @@ private static void registerRegexpFunctionTransformer() {
     // regexp_extract(string, pattern) -> regexp_extract(str, pattern, 0)
     FunctionCallExpr regexpExtractFunc = new FunctionCallExpr("regexp_extract",
             ImmutableList.of(new PlaceholderExpr(0, Expr.class), new PlaceholderExpr(1, Expr.class), new IntLiteral(0L)));
-    BinaryPredicate predicate = new BinaryPredicate(BinaryType.EQ, regexpExtractFunc, new StringLiteral(""));
+    BinaryPredicate predicate = new BinaryPredicate(BinaryType.EQ, new PlaceholderExpr(0, Expr.class), new StringLiteral(""));
     // regexp_extract -> if(regexp_extract(xxx)='', null, regexp_extract(xxx))
     registerFunctionTransformer("regexp_extract", 2,
             new FunctionCallExpr("if", ImmutableList.of(predicate, new NullLiteral(), regexpExtractFunc))
     );
 }

In the above modification, by changing regexpExtractFunc to new PlaceholderExpr(0, Expr.class) in the BinaryPredicate, we ensure that the first argument passed (xxx) is correctly compared with an empty string and then used in both places of the conditional expression within the if statement. Otherwise, regexpExtractFunc would be evaluated twice which is not necessary or intended.

Note: As this snippet does not contain full context or complete code, I am assuming that the indexing of placeholders follows zero-based indexing which is typical in Java (and many other programming languages). If, however, the original code indeed uses one-based indexing for placeholders (which would be atypical), the original placeholder indices should be preserved, and my correction would not be accurate.

@zenoyang zenoyang force-pushed the 231110_compatible_trino_regexp_extract branch from 8fa6e83 to ff163a7 Compare November 13, 2023 06:45
Signed-off-by: zenoyang <cookie.yz@qq.com>
@zenoyang zenoyang force-pushed the 231110_compatible_trino_regexp_extract branch from ff163a7 to 0eeb1dc Compare November 13, 2023 09:44
@zenoyang zenoyang marked this pull request as ready for review November 13, 2023 10:36
Signed-off-by: zenoyang <cookie.yz@qq.com>
@zenoyang zenoyang force-pushed the 231110_compatible_trino_regexp_extract branch from f0a5b6e to 6449423 Compare November 13, 2023 11:17
Copy link

sonarcloud bot commented Nov 13, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

0.0% 0.0% Coverage
0.0% 0.0% Duplication

warning The version of Java (11.0.21) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

@Youngwb Youngwb enabled auto-merge (squash) November 15, 2023 02:43
@github-actions github-actions bot added the 3.0 label Nov 15, 2023
@Youngwb Youngwb merged commit c9cd165 into StarRocks:main Nov 15, 2023
49 of 50 checks passed
Copy link

@Mergifyio backport branch-3.2

@github-actions github-actions bot removed the 3.2 label Nov 15, 2023
Copy link

@Mergifyio backport branch-3.1

@github-actions github-actions bot removed the 3.1 label Nov 15, 2023
Copy link
Contributor

mergify bot commented Nov 15, 2023

backport branch-3.2

✅ Backports have been created

Copy link

@Mergifyio backport branch-3.0

@github-actions github-actions bot removed the 3.0 label Nov 15, 2023
Copy link
Contributor

mergify bot commented Nov 15, 2023

backport branch-3.1

✅ Backports have been created

Copy link
Contributor

mergify bot commented Nov 15, 2023

backport branch-3.0

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Nov 15, 2023
Signed-off-by: zenoyang <cookie.yz@qq.com>
(cherry picked from commit c9cd165)
mergify bot pushed a commit that referenced this pull request Nov 15, 2023
Signed-off-by: zenoyang <cookie.yz@qq.com>
(cherry picked from commit c9cd165)
Copy link

[FE Incremental Coverage Report]

pass : 5 / 5 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/connector/parser/trino/ComplexFunctionCallTransformer.java 5 5 100.00% []

Copy link

[BE Incremental Coverage Report]

pass : 2 / 2 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 src/formats/orc/orc_chunk_writer.h 1 1 100.00% []
🔵 src/storage/row_store_encoder.h 1 1 100.00% []

wanpengfei-git pushed a commit that referenced this pull request Nov 15, 2023
Signed-off-by: zenoyang <cookie.yz@qq.com>
(cherry picked from commit c9cd165)
wanpengfei-git pushed a commit that referenced this pull request Nov 15, 2023
Signed-off-by: zenoyang <cookie.yz@qq.com>
(cherry picked from commit c9cd165)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some issues with StarRocks being compatible with Trino
4 participants