Skip to content

added regexpMatchScalar #15522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Conversation

Lyra126
Copy link

@Lyra126 Lyra126 commented Feb 6, 2024

Summary of the changes / Why this improves CrateDB

Added RegexpMatchScalar

Submitting this pull request as a solution to the following issue: #15117

  • Created a class, implementing it similar to the existing RegexpMatchOperator.
  • Instead inherited it from Scalar instead of Operator
  • Had it support the use of flags.
  • Updated ScalarFunctionModule.java to register it as a function

Checklist

  • Added an entry in the latest docs/appendices/release-notes/<x.y.0>.rst for user facing changes
  • Updated documentation & sql_features table for user facing changes
  • Touched code is covered by tests
  • [✓] CLA is signed
  • [✓] This does not contain breaking changes, or if it does:
    • It is released within a major release
    • It is recorded in the latest docs/appendices/release-notes/<x.y.0>.rst
    • It was marked as deprecated in an earlier release if possible
    • You've thought about the consequences and other components are adapted
      (E.g. AdminUI)

@crate-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@matriv matriv requested a review from seut February 8, 2024 11:47
@seut
Copy link
Member

seut commented Feb 8, 2024

@Lyra126
Thank you for this contribution.
Probably I confused you by my comment #15117 (comment) that the scalar is almost equal to the existing operator. But the scalar should return the matched values, so the return type should be a array of strings ->DataTypes.STRING_ARRAY.
See also the example in the original issue request, this example must work.

Copy link
Member

@seut seut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add related tests, best by extending from ScalarTestCase please? thanks!


public RegexpMatchScalar(Signature signature, BoundSignature boundSignature) {
super(signature, boundSignature);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a compile implementation? By this, any literal pattern will be compiled and flags will be parsed only once and not on each row, resulting in performance improvements.

Comment on lines 96 to 101
int flagSeparatorIndex = pattern.lastIndexOf('/');
if (flagSeparatorIndex != -1 && flagSeparatorIndex < pattern.length() - 1) {
pattern = pattern.substring(0, flagSeparatorIndex);
flagsStr = pattern.substring(flagSeparatorIndex + 1);
flags = parseFlags(flagsStr);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any optional flag would be the 3rd argument and not part of the pattern, see https://www.postgresql.org/docs/current/functions-matching.html.
To support this, a custom toQuery(Function function, LuceneQueryBuilder.Context context) would be needed to take the 3rd argument into account.

@Lyra126
Copy link
Author

Lyra126 commented Feb 24, 2024

Thank you for your continuous review and help with this issue! This is my first time contributing, and I appreciate the help and clarification. Let me know if I'm missing anything else.


@Test
public void testNormalize() throws Exception {
assertNormalize("'foo bar' ~ '([A-Z][^ ]+ ?){2}'", isLiteral(true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are testing the existing operator and not your new scalar.
Not exactly sure what you want to test here but this expression will fail in anyway as the regex won't match.
I highly recommend to run your tests yourself ;)

Suggested change
assertNormalize("'foo bar' ~ '([A-Z][^ ]+ ?){2}'", isLiteral(true));
assertNormalize("regexp_match('foo bar', '([A-Z][^ ]+ ?){2}')", isLiteral());

compilePattern(patternString, flags);
}

private void compilePattern(String patternString, int flags) throws PatternSyntaxException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not implementing the interface's compile method.. Please check existing implementations to understand how it is used.


List<String> matches = new ArrayList<>();
java.util.regex.Matcher matcher = java.util.regex.Pattern.compile(pattern).matcher(source);
while (matcher.find()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the PG documentation, it should return null if no match is found.
Additionally, the results of the all groups in the current match should be returned, not the first group only but for all global matches (the global flag 'g' must no be supported by this method).


// Check if the pattern contains flags
String flagsStr = null;
int flagSeparatorIndex = pattern.lastIndexOf('/');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder where the flags are added to the pattern by the / separator? I think the existing toQuery implementations won't work as they only support 2 arguments, a reference and a literal, but not 3.

@seut
Copy link
Member

seut commented Jul 5, 2024

@Lyra126 Any updates on this, do you need additional help?

@hlcianfagna hlcianfagna changed the title added regexpMathScalar added regexpMatchScalar Jul 5, 2024
@Lyra126
Copy link
Author

Lyra126 commented Jul 5, 2024

@Lyra126 Any updates on this, do you need additional help?

Yeah, I’ve realized that I’m a bit lost and may not have the necessary experience to resolve this. I’d like to step back so that someone more familiar with the project can take over. Thank you for checking in though.

@seut
Copy link
Member

seut commented Jul 8, 2024

@Lyra126 Ok thank you for trying, I'll close this then.

@seut seut closed this Jul 8, 2024
@bhaskar16
Copy link

Hello @seut , can I work on this please?

@seut
Copy link
Member

seut commented Nov 26, 2024

@bhaskar16 Sure, go ahead, thanks!
Btw. we stopped assigning tickets as this assignment got stale a lot of times as people did not progress on the development while also not inform us. So just go ahead on working on an issue and create a (Draft) PR asap so we can see your progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants