Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18891][SQL] Support for specific Java List subtypes #18009

Closed
wants to merge 2 commits into from

Conversation

michalsenkyr
Copy link
Contributor

What changes were proposed in this pull request?

Add support for specific Java List subtypes in deserialization as well as a generic implicit encoder.

All List subtypes are supported by using either the size-specifying constructor (one int parameter) or the default constructor.

Interfaces/abstract classes use the following implementations:

  • java.util.List, java.util.AbstractList or java.util.AbstractSequentialList => java.util.ArrayList

How was this patch tested?

build/mvn -DskipTests clean package && dev/run-tests

Additionally in Spark shell:

scala> val jlist = new java.util.LinkedList[Int]; jlist.add(1)
jlist: java.util.LinkedList[Int] = [1]
res0: Boolean = true

scala> Seq(jlist).toDS().map(_.element()).collect()
res1: Array[Int] = Array(1)

@@ -28,6 +28,8 @@ case class SeqClass(s: Seq[Int])

case class ListClass(l: List[Int])

case class JListClass(l: java.util.List[Int])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not supported intentionally. If we want to support java classes and scala classes mixedly, we should unify the ScalaRelfection and JavaTypeInference, which is a lot of effort.

For now I think it's more important to completely support specific collections type in scala, and then try to unify the ScalaRelfection and JavaTypeInference.

what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively you can improve JavaTypeInference to support specific java list and test it with java bean in JavaDatasetSuite

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not notice that there is separate inference code for Java classes. It would certainly be nice to unify the code for Java and Scala classes. Is this already being worked on/planned?

I moved the List support to JavaTypeInference and rewrote tests accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's planned but no one is working on it yet. You can start by creating a JIRA ticket if you are interested :)

@cloud-fan
Copy link
Contributor

ping @michalsenkyr

Remove specific Java List support from ScalaReflection
Remove implicit encoder for Java Lists
Add relevant tests to JavaDatasetSuite
Remove tests from ScalaReflectionSuite and DatasetPrimitiveSuite
@cloud-fan
Copy link
Contributor

ok to test

@cloud-fan
Copy link
Contributor

LGTM, pending test

@SparkQA
Copy link

SparkQA commented Jun 10, 2017

Test build #77879 has finished for PR 18009 at commit 881e636.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in f48273c Jun 12, 2017
dataknocker pushed a commit to dataknocker/spark that referenced this pull request Jun 16, 2017
## What changes were proposed in this pull request?

Add support for specific Java `List` subtypes in deserialization as well as a generic implicit encoder.

All `List` subtypes are supported by using either the size-specifying constructor (one `int` parameter) or the default constructor.

Interfaces/abstract classes use the following implementations:

* `java.util.List`, `java.util.AbstractList` or `java.util.AbstractSequentialList` => `java.util.ArrayList`

## How was this patch tested?

```bash
build/mvn -DskipTests clean package && dev/run-tests
```

Additionally in Spark shell:

```
scala> val jlist = new java.util.LinkedList[Int]; jlist.add(1)
jlist: java.util.LinkedList[Int] = [1]
res0: Boolean = true

scala> Seq(jlist).toDS().map(_.element()).collect()
res1: Array[Int] = Array(1)
```

Author: Michal Senkyr <mike.senkyr@gmail.com>

Closes apache#18009 from michalsenkyr/dataset-java-lists.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants