Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13898][SQL] Merge DatasetHolder and DataFrameHolder #11737

Closed
wants to merge 8 commits into from

Conversation

rxin
Copy link
Contributor

@rxin rxin commented Mar 15, 2016

What changes were proposed in this pull request?

This patch merges DatasetHolder and DataFrameHolder. This makes more sense because DataFrame/Dataset are now one class.

In addition, fixed some minor issues with pull request #11732.

How was this patch tested?

Updated existing unit tests that test these implicits.

@SparkQA
Copy link

SparkQA commented Mar 15, 2016

Test build #53211 has finished for PR 11737 at commit b7c88cd.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 15, 2016

Test build #53210 has finished for PR 11737 at commit 837a2ba.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 15, 2016

Test build #53215 has finished for PR 11737 at commit 371f4e8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 16, 2016

Test build #53272 has finished for PR 11737 at commit e422f52.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class LogEntry(filename: String, message: String)
    • case class LogFile(name: String)

@SparkQA
Copy link

SparkQA commented Mar 17, 2016

Test build #53394 has finished for PR 11737 at commit 59cae95.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor Author

rxin commented Mar 17, 2016

@jodersky there are still two failures here. I probably won't have time to look at it until next week. If you have some time, please go for it! I think some are legitimate bugs in dataset encoders exposed by this change.

@rxin
Copy link
Contributor Author

rxin commented Mar 17, 2016

cc @cloud-fan can you take a look at the failure for failing to serialize?

The exception is

[info] - Read/write all types with non-primitive type *** FAILED *** (442 milliseconds)
[info]   org.apache.spark.SparkException: Task not serializable
[info]   at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
[info]   at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
[info]   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
[info]   at org.apache.spark.SparkContext.clean(SparkContext.scala:1930)
[info]   at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:364)
[info]   at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:363)
[info]   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
[info]   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
[info]   at org.apache.spark.rdd.RDD.withScope(RDD.scala:356)
[info]   at org.apache.spark.rdd.RDD.map(RDD.scala:363)
[info]   at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:465)
[info]   at org.apache.spark.sql.SQLImplicits.rddToDatasetHolder(SQLImplicits.scala:133)
[info]   at org.apache.spark.sql.hive.orc.OrcTest$$anonfun$withOrcFile$1.apply(OrcTest.scala:40)
[info]   at org.apache.spark.sql.hive.orc.OrcTest$$anonfun$withOrcFile$1.apply(OrcTest.scala:39)
[info]   at org.apache.spark.sql.test.SQLTestUtils$class.withTempPath(SQLTestUtils.scala:126)
[info]   at org.apache.spark.sql.hive.orc.OrcQuerySuite.withTempPath(OrcQuerySuite.scala:54)
[info]   at org.apache.spark.sql.hive.orc.OrcTest$class.withOrcFile(OrcTest.scala:39)
[info]   at org.apache.spark.sql.hive.orc.OrcQuerySuite.withOrcFile(OrcQuerySuite.scala:54)
[info]   at org.apache.spark.sql.hive.orc.OrcQuerySuite$$anonfun$3.apply$mcV$sp(OrcQuerySuite.scala:92)
[info]   at org.apache.spark.sql.hive.orc.OrcQuerySuite$$anonfun$3.apply(OrcQuerySuite.scala:81)
[info]   at org.apache.spark.sql.hive.orc.OrcQuerySuite$$anonfun$3.apply(OrcQuerySuite.scala:81)
[info]   at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
[info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:54)
[info]   at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
[info]   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
[info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
[info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
[info]   at scala.collection.immutable.List.foreach(List.scala:381)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
[info]   at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
[info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
[info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
[info]   at org.scalatest.Suite$class.run(Suite.scala:1424)
[info]   at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
[info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
[info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
[info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
[info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:26)
[info]   at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
[info]   at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:26)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:357)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:502)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[info]   at java.lang.Thread.run(Thread.java:745)
[info]   Cause: java.io.IOException: unexpected exception type
[info]   at java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538)
[info]   at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:994)
[info]   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
[info]   at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
[info]   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
[info]   at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
[info]   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
[info]   at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
[info]   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
[info]   at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
[info]   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
[info]   at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
[info]   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
[info]   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
[info]   at scala.collection.immutable.List$SerializationProxy.writeObject(List.scala:468)
[info]   at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
[info]   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[info]   at java.lang.reflect.Method.invoke(Method.java:606)
[info]   at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
[info]   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
[info]   at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
[info]   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
[info]   at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
[info]   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
[info]   at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
[info]   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
[info]   at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
[info]   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
[info]   at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
[info]   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
[info]   at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
[info]   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
[info]   at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
[info]   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
[info]   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
[info]   at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
[info]   at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
[info]   at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
[info]   at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
[info]   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
[info]   at org.apache.spark.SparkContext.clean(SparkContext.scala:1930)
[info]   at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:364)
[info]   at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:363)
[info]   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
[info]   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
[info]   at org.apache.spark.rdd.RDD.withScope(RDD.scala:356)
[info]   at org.apache.spark.rdd.RDD.map(RDD.scala:363)
[info]   at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:465)
[info]   at org.apache.spark.sql.SQLImplicits.rddToDatasetHolder(SQLImplicits.scala:133)
[info]   at org.apache.spark.sql.hive.orc.OrcTest$$anonfun$withOrcFile$1.apply(OrcTest.scala:40)
[info]   at org.apache.spark.sql.hive.orc.OrcTest$$anonfun$withOrcFile$1.apply(OrcTest.scala:39)
[info]   at org.apache.spark.sql.test.SQLTestUtils$class.withTempPath(SQLTestUtils.scala:126)
[info]   at org.apache.spark.sql.hive.orc.OrcQuerySuite.withTempPath(OrcQuerySuite.scala:54)
[info]   at org.apache.spark.sql.hive.orc.OrcTest$class.withOrcFile(OrcTest.scala:39)
[info]   at org.apache.spark.sql.hive.orc.OrcQuerySuite.withOrcFile(OrcQuerySuite.scala:54)
[info]   at org.apache.spark.sql.hive.orc.OrcQuerySuite$$anonfun$3.apply$mcV$sp(OrcQuerySuite.scala:92)
[info]   at org.apache.spark.sql.hive.orc.OrcQuerySuite$$anonfun$3.apply(OrcQuerySuite.scala:81)
[info]   at org.apache.spark.sql.hive.orc.OrcQuerySuite$$anonfun$3.apply(OrcQuerySuite.scala:81)
[info]   at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
[info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:54)
[info]   at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
[info]   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
[info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
[info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
[info]   at scala.collection.immutable.List.foreach(List.scala:381)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
[info]   at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
[info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
[info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
[info]   at org.scalatest.Suite$class.run(Suite.scala:1424)
[info]   at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
[info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
[info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
[info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
[info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:26)
[info]   at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
[info]   at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:26)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:357)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:502)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[info]   at java.lang.Thread.run(Thread.java:745)
[info]   Cause: org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object, tree: 'data
[info]   at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:60)
[info]   at org.apache.spark.sql.catalyst.expressions.GetStructField.childSchema$lzycompute(complexTypeExtractors.scala:109)
[info]   at org.apache.spark.sql.catalyst.expressions.GetStructField.childSchema(complexTypeExtractors.scala:109)
[info]   at org.apache.spark.sql.catalyst.expressions.GetStructField$$anonfun$toString$1.apply(complexTypeExtractors.scala:113)
[info]   at org.apache.spark.sql.catalyst.expressions.GetStructField$$anonfun$toString$1.apply(complexTypeExtractors.scala:113)
[info]   at scala.Option.getOrElse(Option.scala:121)
[info]   at org.apache.spark.sql.catalyst.expressions.GetStructField.toString(complexTypeExtractors.scala:113)
[info]   at java.lang.String.valueOf(String.java:2849)
[info]   at scala.collection.mutable.StringBuilder.append(StringBuilder.scala:200)
[info]   at scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:357)
[info]   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
[info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
[info]   at scala.collection.TraversableOnce$class.addString(TraversableOnce.scala:355)
[info]   at scala.collection.AbstractIterator.addString(Iterator.scala:1194)
[info]   at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:321)
[info]   at scala.collection.AbstractIterator.mkString(Iterator.scala:1194)
[info]   at org.apache.spark.sql.catalyst.expressions.Expression.toString(Expression.scala:197)
[info]   at java.lang.String.valueOf(String.java:2849)
[info]   at scala.collection.mutable.StringBuilder.append(StringBuilder.scala:200)
[info]   at scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:362)
[info]   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
[info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
[info]   at scala.collection.TraversableOnce$class.addString(TraversableOnce.scala:355)
[info]   at scala.collection.AbstractIterator.addString(Iterator.scala:1194)
[info]   at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:321)
[info]   at scala.collection.AbstractIterator.mkString(Iterator.scala:1194)
[info]   at org.apache.spark.sql.catalyst.expressions.Expression.toString(Expression.scala:197)
[info]   at java.lang.String.valueOf(String.java:2849)
[info]   at java.lang.StringBuilder.append(StringBuilder.java:128)
[info]   at scala.StringContext.standardInterpolator(StringContext.scala:125)
[info]   at scala.StringContext.s(StringContext.scala:95)
[info]   at org.apache.spark.sql.catalyst.expressions.Invoke.toString(objects.scala:177)
[info]   at java.lang.String.valueOf(String.java:2849)
[info]   at scala.collection.mutable.StringBuilder.append(StringBuilder.scala:200)
[info]   at scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:362)
[info]   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
[info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
[info]   at scala.collection.TraversableOnce$class.addString(TraversableOnce.scala:355)
[info]   at scala.collection.AbstractIterator.addString(Iterator.scala:1194)
[info]   at scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:321)
[info]   at scala.collection.AbstractIterator.mkString(Iterator.scala:1194)
[info]   at org.apache.spark.sql.catalyst.expressions.Expression.toString(Expression.scala:197)
[info]   at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1418)
[info]   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
[info]   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
[info]   at scala.collection.immutable.List$SerializationProxy.writeObject(List.scala:468)
[info]   at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
[info]   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[info]   at java.lang.reflect.Method.invoke(Method.java:606)
[info]   at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
[info]   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
[info]   at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
[info]   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
[info]   at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
[info]   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
[info]   at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
[info]   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
[info]   at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
[info]   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
[info]   at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
[info]   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
[info]   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
[info]   at scala.collection.immutable.List$SerializationProxy.writeObject(List.scala:468)
[info]   at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
[info]   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[info]   at java.lang.reflect.Method.invoke(Method.java:606)
[info]   at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
[info]   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
[info]   at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
[info]   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
[info]   at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
[info]   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
[info]   at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
[info]   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
[info]   at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
[info]   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
[info]   at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
[info]   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
[info]   at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
[info]   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
[info]   at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
[info]   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
[info]   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
[info]   at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
[info]   at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
[info]   at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
[info]   at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
[info]   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
[info]   at org.apache.spark.SparkContext.clean(SparkContext.scala:1930)
[info]   at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:364)
[info]   at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:363)
[info]   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
[info]   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
[info]   at org.apache.spark.rdd.RDD.withScope(RDD.scala:356)
[info]   at org.apache.spark.rdd.RDD.map(RDD.scala:363)
[info]   at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:465)
[info]   at org.apache.spark.sql.SQLImplicits.rddToDatasetHolder(SQLImplicits.scala:133)
[info]   at org.apache.spark.sql.hive.orc.OrcTest$$anonfun$withOrcFile$1.apply(OrcTest.scala:40)
[info]   at org.apache.spark.sql.hive.orc.OrcTest$$anonfun$withOrcFile$1.apply(OrcTest.scala:39)
[info]   at org.apache.spark.sql.test.SQLTestUtils$class.withTempPath(SQLTestUtils.scala:126)
[info]   at org.apache.spark.sql.hive.orc.OrcQuerySuite.withTempPath(OrcQuerySuite.scala:54)
[info]   at org.apache.spark.sql.hive.orc.OrcTest$class.withOrcFile(OrcTest.scala:39)
[info]   at org.apache.spark.sql.hive.orc.OrcQuerySuite.withOrcFile(OrcQuerySuite.scala:54)
[info]   at org.apache.spark.sql.hive.orc.OrcQuerySuite$$anonfun$3.apply$mcV$sp(OrcQuerySuite.scala:92)
[info]   at org.apache.spark.sql.hive.orc.OrcQuerySuite$$anonfun$3.apply(OrcQuerySuite.scala:81)
[info]   at org.apache.spark.sql.hive.orc.OrcQuerySuite$$anonfun$3.apply(OrcQuerySuite.scala:81)
[info]   at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
[info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:54)
[info]   at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
[info]   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
[info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
[info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
[info]   at scala.collection.immutable.List.foreach(List.scala:381)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
[info]   at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
[info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
[info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
[info]   at org.scalatest.Suite$class.run(Suite.scala:1424)
[info]   at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
[info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
[info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
[info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
[info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:26)
[info]   at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
[info]   at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:26)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:357)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:502)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[info]   at java.lang.Thread.run(Thread.java:745)

After changing the toString of GetStructField to the following:

  override def toString: String = {
    if (resolved) {
      s"$child.${name.getOrElse(childSchema(ordinal).name)}"
    } else {
      s"$child.unknownName"
    }
  }

A different exception appears:

[info] - Read/write all types with non-primitive type *** FAILED *** (412 milliseconds)
[info]   org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object, tree: 'data
[info]   at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:60)
[info]   at org.apache.spark.sql.catalyst.expressions.GetStructField.childSchema$lzycompute(complexTypeExtractors.scala:111)
[info]   at org.apache.spark.sql.catalyst.expressions.GetStructField.childSchema(complexTypeExtractors.scala:111)
[info]   at org.apache.spark.sql.catalyst.expressions.GetStructField.dataType(complexTypeExtractors.scala:113)
[info]   at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$validate$3.apply(ExpressionEncoder.scala:301)
[info]   at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$validate$3.apply(ExpressionEncoder.scala:299)
[info]   at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
[info]   at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
[info]   at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
[info]   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
[info]   at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
[info]   at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.validate(ExpressionEncoder.scala:299)
[info]   at org.apache.spark.sql.Dataset.<init>(Dataset.scala:163)
[info]   at org.apache.spark.sql.Dataset.<init>(Dataset.scala:134)
[info]   at org.apache.spark.sql.Dataset$.apply(Dataset.scala:53)
[info]   at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:468)
[info]   at org.apache.spark.sql.SQLImplicits.rddToDatasetHolder(SQLImplicits.scala:133)
[info]   at org.apache.spark.sql.hive.orc.OrcTest$$anonfun$withOrcFile$1.apply(OrcTest.scala:40)
[info]   at org.apache.spark.sql.hive.orc.OrcTest$$anonfun$withOrcFile$1.apply(OrcTest.scala:39)
[info]   at org.apache.spark.sql.test.SQLTestUtils$class.withTempPath(SQLTestUtils.scala:126)
[info]   at org.apache.spark.sql.hive.orc.OrcQuerySuite.withTempPath(OrcQuerySuite.scala:54)
[info]   at org.apache.spark.sql.hive.orc.OrcTest$class.withOrcFile(OrcTest.scala:39)
[info]   at org.apache.spark.sql.hive.orc.OrcQuerySuite.withOrcFile(OrcQuerySuite.scala:54)
[info]   at org.apache.spark.sql.hive.orc.OrcQuerySuite$$anonfun$3.apply$mcV$sp(OrcQuerySuite.scala:92)
[info]   at org.apache.spark.sql.hive.orc.OrcQuerySuite$$anonfun$3.apply(OrcQuerySuite.scala:81)
[info]   at org.apache.spark.sql.hive.orc.OrcQuerySuite$$anonfun$3.apply(OrcQuerySuite.scala:81)
[info]   at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
[info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:54)
[info]   at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
[info]   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
[info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
[info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
[info]   at scala.collection.immutable.List.foreach(List.scala:381)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
[info]   at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
[info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
[info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
[info]   at org.scalatest.Suite$class.run(Suite.scala:1424)
[info]   at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
[info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
[info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
[info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
[info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:26)
[info]   at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
[info]   at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:26)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:357)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:502)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[info]   at java.lang.Thread.run(Thread.java:745)

@@ -729,7 +729,7 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
}

test("SPARK-5203 union with different decimal precision") {
Seq.empty[(Decimal, Decimal)]
Seq.empty[(java.math.BigDecimal, java.math.BigDecimal)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all these decimal related changes because merging DataFrameHolder and DatasetHolder breaks compilation, or just because Decimal is internal type and shouldn't be exposed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decimal's an internal type. (although I think we should expose it in the future -- but we haven't done that yet)

@liancheng
Copy link
Contributor

PR #11816, which fixes this exception you hit, has been merged.

@SparkQA
Copy link

SparkQA commented Mar 21, 2016

Test build #53687 has finished for PR 11737 at commit 776e1a1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 21, 2016

Test build #53705 has finished for PR 11737 at commit 501c9b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • //INSTANCE()method to get the single instance of class$read. Then call$iw()``

@cloud-fan
Copy link
Contributor

LGTM

@rxin
Copy link
Contributor Author

rxin commented Mar 22, 2016

Merging in master.

@asfgit asfgit closed this in b3e5af6 Mar 22, 2016
@SparkQA
Copy link

SparkQA commented Mar 22, 2016

Test build #2655 has finished for PR 11737 at commit 501c9b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • //INSTANCE()method to get the single instance of class$read. Then call$iw()``

roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
## What changes were proposed in this pull request?
This patch merges DatasetHolder and DataFrameHolder. This makes more sense because DataFrame/Dataset are now one class.

In addition, fixed some minor issues with pull request apache#11732.

## How was this patch tested?
Updated existing unit tests that test these implicits.

Author: Reynold Xin <rxin@databricks.com>

Closes apache#11737 from rxin/SPARK-13898.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants