Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
[SPARK-30763][SQL] Fix java.lang.IndexOutOfBoundsException No group 1…
… for regexp_extract ### What changes were proposed in this pull request? The current implement of `regexp_extract` will throws a unprocessed exception show below: `SELECT regexp_extract('1a 2b 14m', 'd+')` ``` java.lang.IndexOutOfBoundsException: No group 1 [info] at java.util.regex.Matcher.group(Matcher.java:538) [info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) [info] at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) [info] at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729) ``` I think should treat this exception well. ### Why are the changes needed? Fix a bug `java.lang.IndexOutOfBoundsException No group 1 ` ### Does this PR introduce any user-facing change? Yes ### How was this patch tested? New UT Closes #27508 from beliefer/fix-regexp_extract-bug. Authored-by: beliefer <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
- Loading branch information
Showing
4 changed files
with
104 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
9 changes: 9 additions & 0 deletions
9
sql/core/src/test/resources/sql-tests/inputs/regexp-functions.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
-- regexp_extract | ||
SELECT regexp_extract('1a 2b 14m', '\\d+'); | ||
SELECT regexp_extract('1a 2b 14m', '\\d+', 0); | ||
SELECT regexp_extract('1a 2b 14m', '\\d+', 1); | ||
SELECT regexp_extract('1a 2b 14m', '\\d+', 2); | ||
SELECT regexp_extract('1a 2b 14m', '(\\d+)([a-z]+)'); | ||
SELECT regexp_extract('1a 2b 14m', '(\\d+)([a-z]+)', 0); | ||
SELECT regexp_extract('1a 2b 14m', '(\\d+)([a-z]+)', 1); | ||
SELECT regexp_extract('1a 2b 14m', '(\\d+)([a-z]+)', 2); |
69 changes: 69 additions & 0 deletions
69
sql/core/src/test/resources/sql-tests/results/regexp-functions.sql.out
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
-- Automatically generated by SQLQueryTestSuite | ||
-- Number of queries: 8 | ||
|
||
|
||
-- !query | ||
SELECT regexp_extract('1a 2b 14m', '\\d+') | ||
-- !query schema | ||
struct<> | ||
-- !query output | ||
java.lang.IllegalArgumentException | ||
Regex group count is 0, but the specified group index is 1 | ||
|
||
|
||
-- !query | ||
SELECT regexp_extract('1a 2b 14m', '\\d+', 0) | ||
-- !query schema | ||
struct<regexp_extract(1a 2b 14m, \d+, 0):string> | ||
-- !query output | ||
1 | ||
|
||
|
||
-- !query | ||
SELECT regexp_extract('1a 2b 14m', '\\d+', 1) | ||
-- !query schema | ||
struct<> | ||
-- !query output | ||
java.lang.IllegalArgumentException | ||
Regex group count is 0, but the specified group index is 1 | ||
|
||
|
||
-- !query | ||
SELECT regexp_extract('1a 2b 14m', '\\d+', 2) | ||
-- !query schema | ||
struct<> | ||
-- !query output | ||
java.lang.IllegalArgumentException | ||
Regex group count is 0, but the specified group index is 2 | ||
|
||
|
||
-- !query | ||
SELECT regexp_extract('1a 2b 14m', '(\\d+)([a-z]+)') | ||
-- !query schema | ||
struct<regexp_extract(1a 2b 14m, (\d+)([a-z]+), 1):string> | ||
-- !query output | ||
1 | ||
|
||
|
||
-- !query | ||
SELECT regexp_extract('1a 2b 14m', '(\\d+)([a-z]+)', 0) | ||
-- !query schema | ||
struct<regexp_extract(1a 2b 14m, (\d+)([a-z]+), 0):string> | ||
-- !query output | ||
1a | ||
|
||
|
||
-- !query | ||
SELECT regexp_extract('1a 2b 14m', '(\\d+)([a-z]+)', 1) | ||
-- !query schema | ||
struct<regexp_extract(1a 2b 14m, (\d+)([a-z]+), 1):string> | ||
-- !query output | ||
1 | ||
|
||
|
||
-- !query | ||
SELECT regexp_extract('1a 2b 14m', '(\\d+)([a-z]+)', 2) | ||
-- !query schema | ||
struct<regexp_extract(1a 2b 14m, (\d+)([a-z]+), 2):string> | ||
-- !query output | ||
a |