Test against Spark 1.5.0-rc2 in Travis #76

JoshRosen · 2015-08-24T18:15:52Z

This PR adds Spark 1.5.0-rc2 to our Travis build matrix. I removed the use of TestSQLContext since that class has been removed in Spark 1.5.

JoshRosen · 2015-08-24T18:45:21Z

It looks like this legitimately fails tests with 1.5.0-rc1: https://travis-ci.org/databricks/spark-avro/jobs/77030705

18:42:40.130 ERROR org.apache.spark.sql.execution.datasources.DefaultWriterContainer: Aborting task.
java.lang.NullPointerException
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$DecimalConverter.toScala(CatalystTypeConverters.scala:332)
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$DecimalConverter.toScala(CatalystTypeConverters.scala:318)
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter$$anonfun$toScala$1.apply(CatalystTypeConverters.scala:178)
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter$$anonfun$toScala$1.apply(CatalystTypeConverters.scala:177)
    at org.apache.spark.sql.types.ArrayData.foreach(ArrayData.scala:127)
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.toScala(CatalystTypeConverters.scala:177)
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.toScalaImpl(CatalystTypeConverters.scala:185)
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.toScalaImpl(CatalystTypeConverters.scala:148)
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toScala(CatalystTypeConverters.scala:110)
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toScala(CatalystTypeConverters.scala:278)
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toScala(CatalystTypeConverters.scala:245)
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToScalaConverter$2.apply(CatalystTypeConverters.scala:406)
    at org.apache.spark.sql.sources.OutputWriter.writeInternal(interfaces.scala:380)
    at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:240)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Looks like there's a problem in the decimal converter.

JoshRosen · 2015-08-24T18:46:14Z

There's also a match error in CatalystTypeConverters:

[info]   Cause: scala.MatchError: 3.14 (of class java.lang.Double)
[info]   at org.apache.spark.sql.catalyst.CatalystTypeConverters$DecimalConverter.toCatalystImpl(CatalystTypeConverters.scala:321)
[info]   at org.apache.spark.sql.catalyst.CatalystTypeConverters$DecimalConverter.toCatalystImpl(CatalystTypeConverters.scala:318)
[info]   at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
[info]   at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:255)
[info]   at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:245)
[info]   at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
[info]   at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:393)
[info]   at org.apache.spark.sql.SQLContext$$anonfun$7.apply(SQLContext.scala:439)
[info]   at org.apache.spark.sql.SQLContext$$anonfun$7.apply(SQLContext.scala:439)
[info]   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
[info]   ...

JoshRosen · 2015-08-24T20:30:23Z

The first issue is a legitimate Spark bug: https://issues.apache.org/jira/browse/SPARK-10190

JoshRosen · 2015-08-24T22:21:25Z

NPE will be fixed by apache/spark#8401

JoshRosen · 2015-08-25T01:29:19Z

src/main/scala/com/databricks/spark/avro/AvroRelation.scala

+  private lazy val avroSchema = if (paths.isEmpty) {
+    throw NoFilesException
+  } else {
+    // As of Spark 1.5.0, it's possible to receive an array which contains a single non-existent


@liancheng, could you take a look at this change? It looks like the globPathIfNecessary change in Spark 1.5.0 means that non-existent paths may get passed down to data sources if those paths didn't contain any glob characters. In the cases where this happens, though, I think that we will only receive an array with a single path, so the third-party data source code should only need to check path existence when paths.size == 1. Would be good to confirm this intuition, though.

Yeah, according existing code paths calling globPathIfNecessary in Spark 1.5, this assumption is right. We should probably do the existence check within globPathIfNecessary when the path pattern doesn't contain any glob characters though.

codecov-io · 2015-08-25T01:48:58Z

Current coverage is `93.30%`

Merging #76 into master will increase coverage by +0.37% as of 01e5e1d

@@            master    #76   diff @@
=====================================
  Files            6      6       
  Stmts          269    269       
  Branches        45     45       
  Methods          0      0       
=====================================
+ Hit            250    251     +1
  Partial          0      0       
+ Missed          19     18     -1

Review entire Coverage Diff as of 01e5e1d

Powered by Codecov. Updated on successful CI builds.

JoshRosen · 2015-08-26T18:28:05Z

Going to update this to test with RC2, then will merge if it passes.

JoshRosen · 2015-08-26T18:48:37Z

I just realized that this is sort of testing the wrong thing: we should be compiling with a fixed version of Spark and running tests with different versions in order to better simulate how our released library will actually be used. I'm going to update the Travis build to do this.

JoshRosen · 2015-08-26T22:16:44Z

I have a partial fix for separate compile and test versions of the same dependency but I'm not convinced that it works correctly / isn't brittle, so let's hold off on merging for now.

JoshRosen · 2015-08-26T22:17:57Z

Yep, it didn't work: SBT recompiled the non-test sources with the newer dependency version:

[info] Compiling 6 Scala sources to /home/travis/build/databricks/spark-avro/target/scala-2.10/classes...
[info] [info] Cleaning datadir [/home/travis/build/databricks/spark-avro/target/scala-2.10/scoverage-data]
[info] [info] Beginning coverage instrumentation
[info] [info] Instrumentation completed [505 statements]
[info] [info] Wrote instrumentation file [/home/travis/build/databricks/spark-avro/target/scala-2.10/scoverage-data/scoverage.coverage.xml]
[info] [info] Will write measurement data to [/home/travis/build/databricks/spark-avro/target/scala-2.10/scoverage-data]
[info] Compiling 5 Scala sources to /home/travis/build/databricks/spark-avro/target/scala-2.10/test-classes...

JoshRosen · 2015-08-27T22:57:26Z

@marmbrus, it looks like this library is going to face the same multi-Hadoop-version-compatibility problems that spark-redshift has since it calls the same incompatible methods on TaskAttemptContext, so I think we need to block the release on performing a similar fix here. I'll bring this PR up-to-date now.

JoshRosen · 2015-08-28T00:01:49Z

Will address Hadoop compatibility in a separate PR.

findchris · 2015-10-22T06:27:32Z

@JoshRosen - You wrote "There's also a match error in CatalystTypeConverters" above. Should that have been resolved by your commits to this PR?

I'm seeing something similar:

scala.MatchError: 1.000000000000 (of class java.lang.String)
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$DecimalConverter.toCatalystImpl(CatalystTypeConverters.scala:326)
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$DecimalConverter.toCatalystImpl(CatalystTypeConverters.scala:323)
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:260)
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:250)
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
    at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
    at org.apache.spark.sql.SQLContext$$anonfun$7.apply(SQLContext.scala:445)
    at org.apache.spark.sql.SQLContext$$anonfun$7.apply(SQLContext.scala:445)

JoshRosen · 2015-10-22T18:15:03Z

Hi @findchris,

For legacy reasons, the existing releases of spark-avro store decimals as strings instead of native Avro decimals, since decimals weren't supported by Avro prior to version 1.7.7. In a future release, I'd like to fix this (see #80).

How did you manage to trigger the error above? I could see how that might happen if you saved a decimal column to Avro and then read it back while manually specifying the schema as a decimal. If that wasn't what you were doing, could you share a small reproduction that triggers the issue?

JoshRosen added 3 commits August 24, 2015 11:11

Test against Spark 1.5.0-rc1 in Travis

33b277e

Update to remove dependency on TestSQLContext.

963230c

Cache Ivy directory in Travis.

29f7d76

JoshRosen changed the title ~~[WIP] Test against Spark 1.5.0-rc1 in Travis~~ Test against Spark 1.5.0-rc1 in Travis Aug 24, 2015

JoshRosen added 3 commits August 24, 2015 16:46

Fix decimal issue.

90ac45e

Another decimal precision change.

6e7a1b4

Handle non-existent paths passed by Spark.

2d06837

JoshRosen reviewed Aug 25, 2015
View reviewed changes

Only test 1.5.0 with Java 7.

a342ab6

JoshRosen modified the milestone: 2.0 Aug 25, 2015

JoshRosen changed the title ~~Test against Spark 1.5.0-rc1 in Travis~~ Test against Spark 1.5.0-rc2 in Travis Aug 26, 2015

Bump to RC2

5a67d58

Vary the test Spark version, not the compile version.

f168703

JoshRosen added 3 commits August 27, 2015 16:33

Merge remote-tracking branch 'origin/master' into test-against-1.5.0

2d26533

Update to match spark-redshift's configurations.

7615dde

Update test to work around SPARK-10325

abfca91

JoshRosen added the enhancement label Aug 28, 2015

JoshRosen added 2 commits August 27, 2015 17:09

bump sbt version

fb91362

Test Scala 2.11

1e0df11

JoshRosen closed this in 44a806e Aug 28, 2015

JoshRosen deleted the test-against-1.5.0 branch August 28, 2015 03:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test against Spark 1.5.0-rc2 in Travis #76

Test against Spark 1.5.0-rc2 in Travis #76

JoshRosen commented Aug 24, 2015

JoshRosen commented Aug 24, 2015

JoshRosen commented Aug 24, 2015

JoshRosen commented Aug 24, 2015

JoshRosen commented Aug 24, 2015

JoshRosen Aug 25, 2015

liancheng Aug 26, 2015

codecov-io commented Aug 25, 2015

JoshRosen commented Aug 26, 2015

JoshRosen commented Aug 26, 2015

JoshRosen commented Aug 26, 2015

JoshRosen commented Aug 26, 2015

JoshRosen commented Aug 27, 2015

JoshRosen commented Aug 28, 2015

findchris commented Oct 22, 2015

JoshRosen commented Oct 22, 2015

Test against Spark 1.5.0-rc2 in Travis #76

Test against Spark 1.5.0-rc2 in Travis #76

Conversation

JoshRosen commented Aug 24, 2015

JoshRosen commented Aug 24, 2015

JoshRosen commented Aug 24, 2015

JoshRosen commented Aug 24, 2015

JoshRosen commented Aug 24, 2015

JoshRosen Aug 25, 2015

Choose a reason for hiding this comment

liancheng Aug 26, 2015

Choose a reason for hiding this comment

codecov-io commented Aug 25, 2015

Current coverage is 93.30%

JoshRosen commented Aug 26, 2015

JoshRosen commented Aug 26, 2015

JoshRosen commented Aug 26, 2015

JoshRosen commented Aug 26, 2015

JoshRosen commented Aug 27, 2015

JoshRosen commented Aug 28, 2015

findchris commented Oct 22, 2015

JoshRosen commented Oct 22, 2015

Current coverage is `93.30%`