Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23924][SQL] Add element_at function #21053

Closed
wants to merge 12 commits into from
Closed

Conversation

kiszk
Copy link
Member

@kiszk kiszk commented Apr 12, 2018

What changes were proposed in this pull request?

The PR adds the SQL function element_at. The behavior of the function is based on Presto's one.

This function returns element of array at given index in value if column is array, or returns value for the given key in value if column is map.

How was this patch tested?

Added UTs

@SparkQA
Copy link

SparkQA commented Apr 12, 2018

Test build #89266 has finished for PR 21053 at commit d7ec6cc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ElementAt(left: Expression, right: Expression) extends GetMapValueUtil
  • abstract class GetMapValueUtil extends BinaryExpression with ImplicitCastInputTypes
  • case class GetMapValue(child: Expression, key: Expression) extends GetMapValueUtil

@kiszk
Copy link
Member Author

kiszk commented Apr 12, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Apr 12, 2018

Test build #89284 has finished for PR 21053 at commit d7ec6cc.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ElementAt(left: Expression, right: Expression) extends GetMapValueUtil
  • abstract class GetMapValueUtil extends BinaryExpression with ImplicitCastInputTypes
  • case class GetMapValue(child: Expression, key: Expression) extends GetMapValueUtil

@gatorsmile
Copy link
Member

cc @ueshin

@SparkQA
Copy link

SparkQA commented Apr 13, 2018

Test build #89313 has finished for PR 21053 at commit bb0ab45.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Apr 13, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Apr 13, 2018

Test build #89321 has finished for PR 21053 at commit bb0ab45.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: Expression)

override def prettyName: String = "array_contains"
}

/**
* Returns the value of index `right` in Array `left` or key right` in Map `left`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can improve this doc. There is also broken quote.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thanks

@SparkQA
Copy link

SparkQA commented Apr 13, 2018

Test build #89336 has finished for PR 21053 at commit 0146945.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 13, 2018

Test build #89356 has finished for PR 21053 at commit 35844f8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -413,6 +413,78 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext {
)
}

test("element at function") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need so many test cases here? this is just to verify the api works end to end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also the function is element_at, not "element at" ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thanks

Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you correct the description saying element_at instead of array_position?

* We need to do type checking here as `key` expression maybe unresolved.
*/
case class GetMapValue(child: Expression, key: Expression) extends GetMapValueUtil
with ExtractValue with NullIntolerant {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe extends ... should be this line.

@@ -1846,6 +1846,27 @@ def array_contains(col, value):
return Column(sc._jvm.functions.array_contains(_to_java_column(col), value))


@since(2.4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to annotate as @ignore_unicode_prefix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks

| ${ev.value} = ${CodeGenerator.getValue(eval1, dataType, index)};
| }
|}
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stripMargin is missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thanks

|if ($eval1.isNullAt($index)) {
| ${ev.isNull} = true;
|} else
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stripMargin is missing?

)
}

override def nullable: Boolean = true
Copy link
Member

@viirya viirya Apr 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? ```scala override def nullable: Boolean = left.dataType match { case a: ArrayType => a.containsNull case m: MapType => m.valueContainsNull } || left.nullable || right.nullable ```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid it's wrong because this returns null when the given index is "out of bounds" (array.numElements() < math.abs(index)) for array type or the given key doesn't exist for map type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see. Invalid right can cause null result too.

Copy link
Member Author

@kiszk kiszk Apr 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, may depend on right value, too.

@ExpressionDescription(
usage = """
_FUNC_(array, index) - Returns element of array at given index. If index < 0, accesses elements
from the last to the first.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Returns NULL if the index exceeds the length of the array.

} else {
array.numElements() + index
}
if (array.isNullAt(idx)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

if(left.dataType.asInstanceOf[ArrayType].containsNull && array.isNullAt(idx)) {
  ...
}

@SparkQA
Copy link

SparkQA commented Apr 16, 2018

Test build #89383 has finished for PR 21053 at commit 013a53e.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 16, 2018

Test build #89403 has finished for PR 21053 at commit 5bdee7e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class GetMapValue(child: Expression, key: Expression)

@kiszk
Copy link
Member Author

kiszk commented Apr 18, 2018

@ueshin would it be possible to review this again?

@SparkQA
Copy link

SparkQA commented Apr 18, 2018

Test build #89476 has finished for PR 21053 at commit 98465b1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class GetMapValue(child: Expression, key: Expression)

Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for some nits.

returns value for the given key in value if col is map.

:param col: name of column containing array or map
:param value: value to check for in array or key to check for in map
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note or something to notice the index is 1-based.

@ExpressionDescription(
usage = """
_FUNC_(array, index) - Returns element of array at given index. If index < 0, accesses elements
from the last to the first. Returns NULL if the index exceeds the length of the array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

@@ -1846,6 +1846,28 @@ def array_contains(col, value):
return Column(sc._jvm.functions.array_contains(_to_java_column(col), value))


@ignore_unicode_prefix
@since(2.4)
def element_at(col, value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name value is confusing because this is for an index or a key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, the 2nd argument acts as index for an array or key for a map. This is why I used value.
Can I use index?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about extraction which is from Column.apply(extraction)? cc @gatorsmile

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done

override def dataType: DataType = left.dataType match {
case _: ArrayType => left.dataType.asInstanceOf[ArrayType].elementType
case _: MapType => left.dataType.asInstanceOf[MapType].valueType
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: how about:

override def dataType: DataType = left.dataType match {
  case ArrayType(elementType, _) => elementType
  case MapType(_, valueType, _) => valueType
}

@SparkQA
Copy link

SparkQA commented Apr 18, 2018

Test build #89502 has finished for PR 21053 at commit c734607.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 18, 2018

Test build #89520 has finished for PR 21053 at commit 68e8907.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Apr 18, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Apr 18, 2018

Test build #89528 has finished for PR 21053 at commit 68e8907.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 19, 2018

Test build #89545 has finished for PR 21053 at commit 2fbb1e8.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 19, 2018

Test build #89553 has finished for PR 21053 at commit 96dd82b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 19, 2018

Test build #89554 has finished for PR 21053 at commit 90e026e.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 19, 2018

Test build #89552 has finished for PR 21053 at commit 06fb27e.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Apr 19, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Apr 19, 2018

Test build #89558 has finished for PR 21053 at commit 90e026e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Member

ueshin commented Apr 19, 2018

Thanks! merging to master.

@asfgit asfgit closed this in 46bb2b5 Apr 19, 2018
@kiszk
Copy link
Member Author

kiszk commented Apr 19, 2018

thank you very much for your comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants