Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23940][SQL] Add transform_values SQL function #22045

Closed
wants to merge 9 commits into from

Conversation

codeatri
Copy link
Contributor

@codeatri codeatri commented Aug 8, 2018

What changes were proposed in this pull request?

This pr adds transform_values function which applies the function to each entry of the map and transforms the values.

> SELECT transform_values(map(array(1, 2, 3), array(1, 2, 3)), (k,v) -> v + 1);
       map(1->2, 2->3, 3->4)

> SELECT transform_values(map(array(1, 2, 3), array(1, 2, 3)), (k,v) -> k + v);
       map(1->2, 2->4, 3->6)

How was this patch tested?

New Tests added to
DataFrameFunctionsSuite
HigherOrderFunctionsSuite
SQLQueryTestSuite

@codeatri codeatri changed the title [SPARK-23939][SQL] Add transform_values SQL function [SPARK-23940][SQL] Add transform_values SQL function Aug 8, 2018

/**
* Transform Values for every entry of the map by applying transform_values function.
* Returns map wth transformed values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typos: Transforms values; with
Maybe can you think of a better comment?

usage = "_FUNC_(expr, func) - Transforms values in the map using the function.",
examples = """
Examples:
> SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k,v) -> k + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:(k, v) and maybe I would use v + 1 instead of k + 1.


override def dataType: DataType = {
val map = input.dataType.asInstanceOf[MapType]
MapType(map.keyType, function.dataType, map.valueContainsNull)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map.valueContainsNull -> function.nullable?

MapType(map.keyType, function.dataType, map.valueContainsNull)
}

override def inputTypes: Seq[AbstractDataType] = Seq(MapType, expectingFunctionType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already specified by MapBasedSimpleHigherOrderFunction.

@transient val (keyType, valueType, valueContainsNull) =
HigherOrderFunction.mapKeyValueArgumentType(input.dataType)

override def bind(f: (Expression, Seq[(DataType, Boolean)]) => LambdaFunction):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: formatting

@ueshin
Copy link
Member

ueshin commented Aug 15, 2018

ok to test.

@SparkQA
Copy link

SparkQA commented Aug 15, 2018

Test build #94783 has finished for PR 22045 at commit b73106d.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on both transform_keys and transform_values!
The comments here are almost the same as I commented for transform_keys.
Maybe we can follow the changes there after it is merged.

usage = "_FUNC_(expr, func) - Transforms values in the map using the function.",
examples = """
Examples:
> SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> v + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we need one more right parenthesis after the second array(1, 2, 3)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ueshin Thanks for the review! and yes I agree, I made the same mistakes in both the PR's. I was waiting for the transform_key to converge so that I can make the same changes here.

MapType(map.keyType, function.dataType, function.nullable)
}

@transient val MapType(keyType, valueType, valueContainsNull) = argument.dataType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lazy val?
Could you add a test when argument is not a map in invalid cases of DataFrameFunctionsSuite?


override def dataType: DataType = {
val map = argument.dataType.asInstanceOf[MapType]
MapType(map.keyType, function.dataType, function.nullable)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use keyType from the following val?

val LambdaFunction(
_, (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: Nil, _) = function
(keyVar, valueVar)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: how about:

@transient lazy val LambdaFunction(_,
  (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: Nil, _) = function

def transformValues(expr: Expression, f: (Expression, Expression) => Expression): Expression = {
val valueType = expr.dataType.asInstanceOf[MapType].valueType
val keyType = expr.dataType.asInstanceOf[MapType].keyType
TransformValues(expr, createLambda(keyType, false, valueType, true, f))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use valueContainsNull instead of true?

test("TransformValues") {
val ai0 = Literal.create(
Map(1 -> 1, 2 -> 2, 3 -> 3),
MapType(IntegerType, IntegerType))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add valueContainsNull explicitly?

MapType(IntegerType, IntegerType))
val ain = Literal.create(
Map.empty[Int, Int],
MapType(IntegerType, IntegerType))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests for Literal.create(null, MapType(IntegerType, IntegerType))?

@@ -2302,6 +2302,210 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext {
assert(ex5.getMessage.contains("function map_zip_with does not support ordering on type map"))
}

test("transform values function - test various primitive data types combinations") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need so many cases here. We only need to verify the api works end to end.
Evaluation checks of the function should be in HigherOrderFunctionsSuite.


create or replace temporary view nested as values
(1, map(1,1,2,2,3,3)),
(2, map(4,4,5,5,6,6))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

  (1, map(1, 1, 2, 2, 3, 3)),
  (2, map(4, 4, 5, 5, 6, 6))

@SparkQA
Copy link

SparkQA commented Aug 15, 2018

Test build #94827 has finished for PR 22045 at commit daf7935.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for some nits.

* Returns a map that applies the function to each value of the map.
*/
@ExpressionDescription(
usage = "_FUNC_(expr, func) - Transforms values in the map using the function.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent

*/
@ExpressionDescription(
usage = "_FUNC_(expr, func) - Transforms values in the map using the function.",
examples = """
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

> SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v);
map(array(1, 2, 3), array(2, 4, 6))
""",
since = "2.4.0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

}

@transient lazy val LambdaFunction(
_, (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: Nil, _) = function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent

def testInvalidLambdaFunctions(): Unit = {

val ex1 = intercept[AnalysisException] {
dfExample1.selectExpr("transform_values(i, k -> k )")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove an extra space after k -> k.

@ueshin
Copy link
Member

ueshin commented Aug 16, 2018

@codeatri Could you fix the conflicts please? Thanks!

@SparkQA
Copy link

SparkQA commented Aug 16, 2018

Test build #94843 has finished for PR 22045 at commit 56d08ef.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


@transient lazy val MapType(keyType, valueType, valueContainsNull) = argument.dataType

override def dataType: DataType = MapType(keyType, function.dataType, valueContainsNull)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the dataType be defined as MapType(keyType, function.dataType, function.nullable)?

@SparkQA
Copy link

SparkQA commented Aug 17, 2018

Test build #94864 has finished for PR 22045 at commit 3382e1a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Member

ueshin commented Aug 17, 2018

Thanks! merging to master.

@asfgit asfgit closed this in f161409 Aug 17, 2018
asfgit pushed a commit that referenced this pull request Oct 25, 2018
## What changes were proposed in this pull request?

- Revert [SPARK-23935][SQL] Adding map_entries function: #21236
- Revert [SPARK-23937][SQL] Add map_filter SQL function: #21986
- Revert [SPARK-23940][SQL] Add transform_values SQL function: #22045
- Revert [SPARK-23939][SQL] Add transform_keys function: #22013
- Revert [SPARK-23938][SQL] Add map_zip_with function: #22017
- Revert the changes of map_entries in [SPARK-24331][SPARKR][SQL] Adding arrays_overlap, array_repeat, map_entries to SparkR: #21434

## How was this patch tested?
The existing tests.

Closes #22827 from gatorsmile/revertMap2.4.

Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants