-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-41233][FOLLOWUP] Refactor array_prepend
with RuntimeReplaceable
#40563
[SPARK-41233][FOLLOWUP] Refactor array_prepend
with RuntimeReplaceable
#40563
Conversation
RuntimeReplaceable
array_append
and array_prepend
with RuntimeReplaceable
with ImplicitCastInputTypes | ||
with ComplexTypeMergingExpression | ||
with QueryErrorsBase { | ||
trait ArrayInsertEnd extends RuntimeReplaceable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we rename this trait? ArrayInsertEnd
cannot describe ArrayPrepend
well
nullSafeEval(value1, value2) | ||
} | ||
override protected def withNewChildrenInternal( | ||
newLeft: Expression, newRight: Expression): Expression = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return type change to ArrayAppend
?
|
||
override protected def withNewChildrenInternal( | ||
newLeft: Expression, newRight: Expression): Expression = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return type change to ArrayPrepend
?
@transient protected lazy val elementType: DataType = | ||
inputTypes.head.asInstanceOf[ArrayType].elementType | ||
override lazy val replacement: Expression = | ||
ArrayInsert(arr, Add(ArraySize(left), Literal(1)), ele) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use both arr
and left
?
@@ -1855,50 +1855,6 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper | |||
checkEvaluation(ArrayRepeat(Literal("hi"), Literal(null, IntegerType)), null) | |||
} | |||
|
|||
test("SPARK-41233: ArrayPrepend") { |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm... please ignore the comments above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to delete the test cases? Should be still relevant right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to delete the test cases? Should be still relevant right?
Runtimereplace
do not support execute and it will be replaced with other expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope we have reviewed that we are having all types of test cases deleted here in DataFrameFunctionsSuite. If we are missing, please help add those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked and added test cases in DataFrameFunctionsSuite
@@ -1855,50 +1855,6 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper | |||
checkEvaluation(ArrayRepeat(Literal("hi"), Literal(null, IntegerType)), null) | |||
} | |||
|
|||
test("SPARK-41233: ArrayPrepend") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to delete the test cases? Should be still relevant right?
with QueryErrorsBase { | ||
|
||
override def nullable: Boolean = left.nullable | ||
trait InsertArrayOneSide extends RuntimeReplaceable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Should we just call this RuntimeReplaceableBinary
or something? Just a thought, since there doesn't seem to be anything specific about arrays in this trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InsertArrayOneSide
should be better name. Thank you.
@@ -1,2 +1,2 @@ | |||
Project [array_append(e#0, 1) AS array_append(e, 1)#0] | |||
Project [array_insert(e#0, (size(e#0, false) + 1), 1) AS array_append(e, 1)#0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can fix array_insert
to have getTagValue(FunctionRegistry.FUNC_ALIAS).getOrElse(name)
, and set the alias in FunctionRegistry
by setting setAlias
to true
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon I tested and found this can't given help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot golden files are the same as the case.
Refer: https://github.com/apache/spark/blob/f6d6ec138fe104bc7317993fe90bfdc672c11938/connector/connect/common/src/test/resources/query-tests/explain-results/function_map_contains_key.explain
|
||
override def nullable: Boolean = left.nullable | ||
trait InsertArrayOneSide extends RuntimeReplaceable | ||
with ImplicitCastInputTypes with BinaryLike[Expression] with QueryErrorsBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any diff between BinaryLike[Expression] and BinaryExpression? Also, any specific reason for removing ComplexTypeMergingExpression, I think this can help in assigning values of containsNull and nullable fields ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BinaryExpression
could be evaluated. In fact, the implementation of RuntimeReplaceable
could not be evaluated.
ArrayInsert
already implemented the ComplexTypeMergingExpression
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense ! Thanks so much!
@transient protected lazy val elementType: DataType = | ||
inputTypes.head.asInstanceOf[ArrayType].elementType | ||
override lazy val replacement: Expression = | ||
ArrayInsert(left, Add(ArraySize(left), Literal(1)), right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inspite of ArraySize(left), I think ArrayInsert supports negative values, how about using Literal(-1), to select the position.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Literal(-1)
let insert element before the last one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ! Thanks for confirming!
@@ -1855,50 +1855,6 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper | |||
checkEvaluation(ArrayRepeat(Literal("hi"), Literal(null, IntegerType)), null) | |||
} | |||
|
|||
test("SPARK-41233: ArrayPrepend") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope we have reviewed that we are having all types of test cases deleted here in DataFrameFunctionsSuite. If we are missing, please help add those.
ping @cloud-fan |
finalData.update(arrayData.numElements(), elementData) | ||
new GenericArrayData(finalData) | ||
} | ||
override lazy val replacement: Expression = ArrayInsert(left, Literal(0), right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to this PR, but isn't the implementation of array_insert
wrong? The doc says: Places val into index pos of array x. Array indices start at 1, or start from the end if index is negative.
According to the doc, array_insert(arr, 1, val)
should put val
as the first element in this array. array_insert(arr, 0, val)
should fail as 0 is not a valid index. However, the current implementation of array_insert
uses 0-based index. cc @zhengruifeng @HyukjinKwon @dtenedor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan it is still 1-based, but also treat 0 as the first item. see the discussion #38867 (comment)
+----------------------------------+----------------------------------+----------------------------------+
|array_insert(array(1, 2, 3), 1, 4)|array_insert(array(1, 2, 3), 0, 4)|array_insert(array(1, 2, 3), 2, 4)|
+----------------------------------+----------------------------------+----------------------------------+
| [4, 1, 2, 3]| [4, 1, 2, 3]| [1, 4, 2, 3]|
+----------------------------------+----------------------------------+----------------------------------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for missing that discussion. I think failing is more consistent with other functions that use 1-based index.
And here we should use 1
as the index to be more future-proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, let me send a fix for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do want to reserve index 0
for some special behavior in array_insert
, I think appending the value to the end of the array seems more convenient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose to remove support for index 0
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't have to reserve index 0
, make it fails seems good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, let's fix it soon before the 3.4 release. also cc @xinrong-meng
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
append value to the end of array for index 0 seems hard to understand.
@transient protected lazy val elementType: DataType = | ||
inputTypes.head.asInstanceOf[ArrayType].elementType | ||
override lazy val replacement: Expression = | ||
ArrayInsert(left, Add(ArraySize(left), Literal(1)), right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A risk is left
can be evaluated twice if common subexpression elimination is disabled, or the underlying execution engine (native engine) does not support it.
I'd prefer to keep ArrayAppend
as it is, or update ArrayInsert
to provide an easy way to do append.
### What changes were proposed in this pull request? Make `array_insert` fail when input index `pos` is zero. ### Why are the changes needed? see #40563 (comment) ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? updated UT Closes #40641 from zhengruifeng/sql_array_insert_fails_zero. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
### What changes were proposed in this pull request? Make `array_insert` fail when input index `pos` is zero. ### Why are the changes needed? see #40563 (comment) ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? updated UT Closes #40641 from zhengruifeng/sql_array_insert_fails_zero. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 3e9574c) Signed-off-by: Max Gekk <max.gekk@gmail.com>
cc @MaxGekk |
@@ -1,2 +1,2 @@ | |||
Project [array_prepend(e#0, 1) AS array_prepend(e, 1)#0] | |||
Project [array_insert(e#0, 0, 1) AS array_prepend(e, 1)#0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we fix this?
|
||
override def nullable: Boolean = left.nullable | ||
override lazy val replacement: Expression = ArrayInsert(left, Literal(0), right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will fail now.
2196c2e
to
b04a175
Compare
array_append
and array_prepend
with RuntimeReplaceable
array_prepend
with RuntimeReplaceable
Can we update the PR description? |
After a second thought, this makes the performance worse. We should improve |
Updated. |
You means create another PR to simplify the code of |
…prepend with RuntimeReplaceable
b04a175
to
c939d6e
Compare
thanks, merging to master! |
@cloud-fan Thanks ! |
### What changes were proposed in this pull request? Make `array_insert` fail when input index `pos` is zero. ### Why are the changes needed? see apache#40563 (comment) ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? updated UT Closes apache#40641 from zhengruifeng/sql_array_insert_fails_zero. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 3e9574c) Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
Recently, Spark SQL supported
array_insert
andarray_prepend
. All implementations are individual.In fact,
array_prepend
is special case ofarray_insert
and we can reuse thearray_insert
by extendsRuntimeReplaceable
.Why are the changes needed?
Simplify the implementation of
array_prepend
.Does this PR introduce any user-facing change?
'No'.
Just update the inner implementation.
How was this patch tested?
Exists test case.