-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23940][SQL] Add transform_values SQL function #22045
Conversation
|
||
/** | ||
* Transform Values for every entry of the map by applying transform_values function. | ||
* Returns map wth transformed values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typos: Transforms values; with
Maybe can you think of a better comment?
usage = "_FUNC_(expr, func) - Transforms values in the map using the function.", | ||
examples = """ | ||
Examples: | ||
> SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k,v) -> k + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:(k, v)
and maybe I would use v + 1
instead of k + 1
.
|
||
override def dataType: DataType = { | ||
val map = input.dataType.asInstanceOf[MapType] | ||
MapType(map.keyType, function.dataType, map.valueContainsNull) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
map.valueContainsNull
-> function.nullable
?
MapType(map.keyType, function.dataType, map.valueContainsNull) | ||
} | ||
|
||
override def inputTypes: Seq[AbstractDataType] = Seq(MapType, expectingFunctionType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already specified by MapBasedSimpleHigherOrderFunction
.
@transient val (keyType, valueType, valueContainsNull) = | ||
HigherOrderFunction.mapKeyValueArgumentType(input.dataType) | ||
|
||
override def bind(f: (Expression, Seq[(DataType, Boolean)]) => LambdaFunction): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: formatting
ok to test. |
Test build #94783 has finished for PR 22045 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on both transform_keys
and transform_values
!
The comments here are almost the same as I commented for transform_keys
.
Maybe we can follow the changes there after it is merged.
usage = "_FUNC_(expr, func) - Transforms values in the map using the function.", | ||
examples = """ | ||
Examples: | ||
> SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> v + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we need one more right parenthesis after the second array(1, 2, 3)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ueshin Thanks for the review! and yes I agree, I made the same mistakes in both the PR's. I was waiting for the transform_key to converge so that I can make the same changes here.
MapType(map.keyType, function.dataType, function.nullable) | ||
} | ||
|
||
@transient val MapType(keyType, valueType, valueContainsNull) = argument.dataType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lazy val
?
Could you add a test when argument is not a map in invalid cases of DataFrameFunctionsSuite
?
|
||
override def dataType: DataType = { | ||
val map = argument.dataType.asInstanceOf[MapType] | ||
MapType(map.keyType, function.dataType, function.nullable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use keyType
from the following val?
val LambdaFunction( | ||
_, (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: Nil, _) = function | ||
(keyVar, valueVar) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: how about:
@transient lazy val LambdaFunction(_,
(keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: Nil, _) = function
def transformValues(expr: Expression, f: (Expression, Expression) => Expression): Expression = { | ||
val valueType = expr.dataType.asInstanceOf[MapType].valueType | ||
val keyType = expr.dataType.asInstanceOf[MapType].keyType | ||
TransformValues(expr, createLambda(keyType, false, valueType, true, f)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use valueContainsNull
instead of true
?
test("TransformValues") { | ||
val ai0 = Literal.create( | ||
Map(1 -> 1, 2 -> 2, 3 -> 3), | ||
MapType(IntegerType, IntegerType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add valueContainsNull
explicitly?
MapType(IntegerType, IntegerType)) | ||
val ain = Literal.create( | ||
Map.empty[Int, Int], | ||
MapType(IntegerType, IntegerType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add tests for Literal.create(null, MapType(IntegerType, IntegerType))
?
@@ -2302,6 +2302,210 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { | |||
assert(ex5.getMessage.contains("function map_zip_with does not support ordering on type map")) | |||
} | |||
|
|||
test("transform values function - test various primitive data types combinations") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need so many cases here. We only need to verify the api works end to end.
Evaluation checks of the function should be in HigherOrderFunctionsSuite
.
|
||
create or replace temporary view nested as values | ||
(1, map(1,1,2,2,3,3)), | ||
(2, map(4,4,5,5,6,6)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
(1, map(1, 1, 2, 2, 3, 3)),
(2, map(4, 4, 5, 5, 6, 6))
Test build #94827 has finished for PR 22045 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for some nits.
* Returns a map that applies the function to each value of the map. | ||
*/ | ||
@ExpressionDescription( | ||
usage = "_FUNC_(expr, func) - Transforms values in the map using the function.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indent
*/ | ||
@ExpressionDescription( | ||
usage = "_FUNC_(expr, func) - Transforms values in the map using the function.", | ||
examples = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
> SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v); | ||
map(array(1, 2, 3), array(2, 4, 6)) | ||
""", | ||
since = "2.4.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
} | ||
|
||
@transient lazy val LambdaFunction( | ||
_, (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: Nil, _) = function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indent
def testInvalidLambdaFunctions(): Unit = { | ||
|
||
val ex1 = intercept[AnalysisException] { | ||
dfExample1.selectExpr("transform_values(i, k -> k )") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove an extra space after k -> k
.
@codeatri Could you fix the conflicts please? Thanks! |
Test build #94843 has finished for PR 22045 at commit
|
|
||
@transient lazy val MapType(keyType, valueType, valueContainsNull) = argument.dataType | ||
|
||
override def dataType: DataType = MapType(keyType, function.dataType, valueContainsNull) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the dataType
be defined as MapType(keyType, function.dataType, function.nullable)
?
Test build #94864 has finished for PR 22045 at commit
|
Thanks! merging to master. |
## What changes were proposed in this pull request? - Revert [SPARK-23935][SQL] Adding map_entries function: #21236 - Revert [SPARK-23937][SQL] Add map_filter SQL function: #21986 - Revert [SPARK-23940][SQL] Add transform_values SQL function: #22045 - Revert [SPARK-23939][SQL] Add transform_keys function: #22013 - Revert [SPARK-23938][SQL] Add map_zip_with function: #22017 - Revert the changes of map_entries in [SPARK-24331][SPARKR][SQL] Adding arrays_overlap, array_repeat, map_entries to SparkR: #21434 ## How was this patch tested? The existing tests. Closes #22827 from gatorsmile/revertMap2.4. Authored-by: gatorsmile <gatorsmile@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This pr adds
transform_values
function which applies the function to each entry of the map and transforms the values.How was this patch tested?
New Tests added to
DataFrameFunctionsSuite
HigherOrderFunctionsSuite
SQLQueryTestSuite