Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32459][SQL] Support WrappedArray as customCollectionCls in MapObjects #29261

Closed
wants to merge 6 commits into from

Conversation

Ngone51
Copy link
Member

@Ngone51 Ngone51 commented Jul 27, 2020

What changes were proposed in this pull request?

This PR supports WrappedArray as customCollectionCls in MapObjects.

Why are the changes needed?

This helps fix the regression caused by SPARK-31826. For the following test, it can pass in branch-3.0 but fail in master branch:

test("WrappedArray") {
    val myUdf = udf((a: WrappedArray[Int]) =>
      WrappedArray.make[Int](Array(a.head + 99)))
    checkAnswer(Seq(Array(1))
      .toDF("col")
      .select(myUdf(Column("col"))),
      Row(ArrayBuffer(100)))
  }

In SPARK-31826, we've changed the catalyst-to-scala converter from CatalystTypeConverters to ExpressionEncoder.deserializer. However, CatalystTypeConverters supports WrappedArray while ExpressionEncoder.deserializer doesn't.

Does this PR introduce any user-facing change?

No, SPARK-31826 is merged into master and branch-3.1, which haven't been released.

How was this patch tested?

Added a new test for WrappedArray in UDFSuite; Also updated ObjectExpressionsSuite for MapObjects.

@Ngone51
Copy link
Member Author

Ngone51 commented Jul 27, 2020

@cloud-fan @viirya @maropu Please take a look, thanks!

@SparkQA
Copy link

SparkQA commented Jul 27, 2020

Test build #126647 has finished for PR 29261 at commit 29c6483.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jul 28, 2020

I checked it and the fix looks okay except for the existing comments.

@Ngone51
Copy link
Member Author

Ngone51 commented Jul 28, 2020

Thanks all! I've addressed your comments. Please take another look!

""",
(genValue: String) => s"$builder.$$plus$$eq($genValue);",
s"(${cls.getName}) ${classOf[WrappedArray[_]].getName}$$." +
s"MODULE$$.make(((${classOf[ArrayBuffer[_]].getName})$builder" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$builder.result() is a ArrayBuffer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't WrappedArray HasNewBuilder[T, WrappedArray[T]], so result should return a WrappedArray?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$builder.result() is a ArrayBuffer?

Yes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by HasNewBuilder? I can't find it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I believe it's the difference between the WrappedArray instance and the WrappedArray object. The instance one is inherited from HasNewBuilder[T, WrappedArray[T]] but the object is not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Object WrappedArray's newBuilder is def newBuilder[A]: Builder[A, IndexedSeq[A]].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I think IndexedSeq is safer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, thanks!

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. Just one suggestion about the builder's result type.

@SparkQA
Copy link

SparkQA commented Jul 28, 2020

Test build #126692 has finished for PR 29261 at commit 528745c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

(genValue: String) => s"$builder.$$plus$$eq($genValue);",
s"(${cls.getName}) ${classOf[WrappedArray[_]].getName}$$." +
s"MODULE$$.make(((${classOf[ArrayBuffer[_]].getName})$builder" +
s".result()).toArray(scala.reflect.ClassTag$$.MODULE$$.Object()));"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove s in the head.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's required by the escape char $.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.

@@ -755,6 +755,9 @@ case class MapObjects private(
}

private lazy val mapElements: Seq[_] => Any = customCollectionCls match {
case Some(cls) if classOf[WrappedArray[_]].isAssignableFrom(cls) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it really work with any sub-class of WrappedArray?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We use WrappedArray.make() below to generate the sub-calss of WrappedArray and make() supports all the sub-calsses.

@SparkQA
Copy link

SparkQA commented Jul 28, 2020

Test build #126704 has finished for PR 29261 at commit a6b5a20.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in ca1ecf7 Jul 28, 2020
@Ngone51
Copy link
Member Author

Ngone51 commented Jul 29, 2020

thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants