[CALCITE-6278] Add REGEXP, REGEXP_LIKE function (enabled in Spark library)#3712
[CALCITE-6278] Add REGEXP, REGEXP_LIKE function (enabled in Spark library)#3712macroguo-ghy merged 1 commit intoapache:mainfrom
Conversation
20f4d1e to
b4fa622
Compare
| } | ||
|
|
||
| @Test void testRegexpFunc() { | ||
| final SqlOperatorFixture f = fixture().setFor(SqlLibraryOperators.REGEXP); |
There was a problem hiding this comment.
fixture().setFor(SqlLibraryOperators.REGEXP).withLibrary(SqlLibrary.Spark) ?
There was a problem hiding this comment.
withLibrary is optional.
f0.forEachLibrary(list(functionAlias.libraries), consumer) will automatically load the corresponding Library type.
| @@ -0,0 +1,243 @@ | |||
| # spark.iq - Babel test for Spark dialect of SQL | |||
There was a problem hiding this comment.
I have a question.Are you sure this test only takes effect for semantics that conform to spark sql?
There was a problem hiding this comment.
This test will load Built-in functions and spark functions.
There was a problem hiding this comment.
I am not against using this test.
But I want to know the necessity of using QuidemTest for Spark.
Because your code may become a future contribution specification.
Let's discuss two questions.
- Why need
spark.iq? - What features need to be tested and what features don't need to be tested?
There was a problem hiding this comment.
+1 At present, calcite has adapted many spark functions. Do we need to add all spark functions? What are the benefits after adding them?
There was a problem hiding this comment.
- Compared to
SqlOperatorTest, I think the function tests in Babel are more accurate and complete. Therefore,
referencesbig-query.iqandredshift.iq,spark.iqwere added to verify the correctness of function execution. - I think that unit tests can be added to this file when adding spark functions, but it is unnecessary.
| /** SQL {@code RLIKE} function. */ | ||
| public boolean rlike(String s, String pattern) { | ||
| return cache.getUnchecked(new Key(0, pattern)).matcher(s).find(); | ||
| s = StringEscapeUtils.unescapeJava(s); |
There was a problem hiding this comment.
Do we need to add a little comment?
d885c38 to
d4bce2a
Compare
| # REGEXP(str, regexp) | ||
| # Returns true if str matches regexp, or false otherwise. | ||
| # | ||
| # Returns STRING |
There was a problem hiding this comment.
done, fixed to return Boolean
site/_docs/reference.md
Outdated
| | b | PARSE_TIMESTAMP(format, string[, timeZone]) | Uses format specified by *format* to convert *string* representation of timestamp to a TIMESTAMP WITH LOCAL TIME ZONE value in *timeZone* | ||
| | h s | PARSE_URL(urlString, partToExtract [, keyToExtract] ) | Returns the specified *partToExtract* from the *urlString*. Valid values for *partToExtract* include HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. *keyToExtract* specifies which query to extract | ||
| | b s | POW(numeric1, numeric2) | Returns *numeric1* raised to the power *numeric2* | ||
| | s | REGEXP(string, regexp) | Returns true if *string* matches *regexp*, or false otherwise, string literals (including regex patterns) are unescaped |
There was a problem hiding this comment.
can't you just say 'alias for RLIKE'?
There was a problem hiding this comment.
done, Equivalent to string1 RLIKE string2
| f.checkNull(fn + "(cast(null as varchar), 'abc')"); | ||
| f.checkNull(fn + "(cast(null as varchar), cast(null as varchar))"); | ||
| }; | ||
| f0.forEachLibrary(list(functionAlias.libraries), consumer); |
There was a problem hiding this comment.
if RLIKE, REGEXP_LIKE and REGEXP are identical, you should devise a way to test all three with a single test
There was a problem hiding this comment.
Although RLIKE and REGEXP have the same implementation, their function calling methods differ, so they are divided into different tests.
RLIKE: 'str' RLIKE 'regex'REGEXP|REGEXP_LIKE: fn('str', 'regex').
In addition, we can abstract a method to support different Function calling modes, but I may think it is not necessary at present.
There was a problem hiding this comment.
If you think it's better to combine them into a single test, I'll modify the test case.
There was a problem hiding this comment.
It is necessary. And it's easy to combine all three. BinaryOperator<String>. For RLIKE, pass (a, b) -> a + " RLIKE " b; for REGEXP, pass (a, b) -> "REGEXP(" + a + ", " + b + ")".
There was a problem hiding this comment.
done, combined test case of RLIKE、REGEXP、REGEXP_LIKE
1545b81 to
6161df5
Compare
| /** SQL {@code RLIKE} function. */ | ||
| public boolean rlike(String s, String pattern) { | ||
| return cache.getUnchecked(new Key(0, pattern)).matcher(s).find(); | ||
| // Since Spark 2.0, string literals (including regex patterns) are unescaped in SQL parser |
There was a problem hiding this comment.
rlike also enabled in HIVE library, does the rlike function have the same semantics in hive as in spark sql?
ef42d21 to
dc4119d
Compare
|
Would it be possible to merge these changes? I'm looking to implement CALCITE-6309 on top of this. |
site/_docs/reference.md
Outdated
| | b m p s | RIGHT(string, length) | Returns the rightmost *length* characters from the *string* | ||
| | h s | string1 RLIKE string2 | Whether *string1* matches regex pattern *string2* (similar to `LIKE`, but uses Java regex) | ||
| | h s | string1 NOT RLIKE string2 | Whether *string1* does not match regex pattern *string2* (similar to `NOT LIKE`, but uses Java regex) | ||
| | h s | string1 RLIKE string2 | Whether *string1* matches regex pattern *string2* (similar to `LIKE`, but uses Java regex), string literals (including regex patterns) are unescaped |
There was a problem hiding this comment.
Should we maintain alphabetical order?
rebefore thanri
There was a problem hiding this comment.
done, order adjusted.
@jduo I think this pr will be able to be merged recently, the details of the relevant issues have been discussed clearly in Jira. |
6a91261 to
78acd2e
Compare
|
|
If no other comment, I will merge it later. |
Thanks, that'd be great. Once that's in I'll follow-up with the CALCITE-6309 work. |




https://issues.apache.org/jira/browse/CALCITE-6278
Since this function has the same implementation as the Spark RLIKE function.
The implementation can be reused.
Source Code

[undo] Discuss results in Jira: