Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13390][SQL]Fix the issue due to Iterator.map().toSeq is not Serializable #11313

Closed
wants to merge 1 commit into from
Closed

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Feb 22, 2016

What changes were proposed in this pull request?

scala.collection.Iterator's methods (e.g., map, filter) will return an AbstractIterator which is not Serializable. E.g.,

scala> val iter = Array(1, 2, 3).iterator.map(_ + 1)
iter: Iterator[Int] = non-empty iterator

scala> println(iter.isInstanceOf[Serializable])
false

If we call something like Iterator.map(...).toSeq, it will create a Stream that contains a non-serializable AbstractIterator field and make the Stream be non-serializable.

This PR uses toArray instead of toSeq to fix such issue in def createDataFrame(data: java.util.List[_], beanClass: Class[_]): DataFrame.

How was the this patch tested?

Jenkins tests.

@zsxwing zsxwing changed the title Fix the issue that Iterator.map().toSeq is not Serializable [SPARK-13390][SQL]Fix the issue that Iterator.map().toSeq is not Serializable Feb 22, 2016
@zsxwing zsxwing changed the title [SPARK-13390][SQL]Fix the issue that Iterator.map().toSeq is not Serializable [SPARK-13390][SQL]Fix the issue due to Iterator.map().toSeq is not Serializable Feb 22, 2016
@SparkQA
Copy link

SparkQA commented Feb 23, 2016

Test build #51694 has finished for PR 11313 at commit 07a88b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Feb 23, 2016

cc @holdenk @srowen

@holdenk
Copy link
Contributor

holdenk commented Feb 23, 2016

Maybe good to add a test which would have failed before this change?

@@ -579,7 +579,7 @@ class SQLContext private[sql](
val className = beanClass.getName
val beanInfo = Introspector.getBeanInfo(beanClass)
val rows = SQLContext.beansToRows(data.asScala.iterator, beanInfo, attrSeq)
DataFrame(self, LocalRelation(attrSeq, rows.toSeq))
DataFrame(self, LocalRelation(attrSeq, rows.toArray))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toArray could be kind of expensive for large local iterators - maybe we should only do this if necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have to materialize it anyways in order to support multiple scans.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah I know we need to materialize anyways (toSeq forces that too on Iterators) - but I meant more the cost of copying all the objects to an array (see https://github.com/scala/scala/blob/v2.11.1/src/library/scala/collection/TraversableOnce.scala#L269 ) so maybe only calling toArray if necessary would be worth while (but perhaps not).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was the same before -- it was materialized into a Seq. I think Josh is suggesting this is inherently necessary before making a LocalRelation so there's no meaningful opportunity for lazy eval (?) If so then this seems fine.

@zsxwing
Copy link
Member Author

zsxwing commented Feb 23, 2016

Just found this happened to be fixed in #10511. rows will be materialized in LocalTableScan.

I'm going to resubmit this patch against branch-1.6 since this is not necessary for master.

@zsxwing zsxwing closed this Feb 23, 2016
@zsxwing
Copy link
Member Author

zsxwing commented Feb 23, 2016

See #11334 instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants