Skip to content
Permalink
Browse files

[SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4.3

## What changes were proposed in this pull request?

This PR updates Apache ORC dependencies to 1.4.3 released on February 9th. Apache ORC 1.4.2 release removes unnecessary dependencies and 1.4.3 has 5 more patches (https://s.apache.org/Fll8).

Especially, the following ORC-285 is fixed at 1.4.3.

```scala
scala> val df = Seq(Array.empty[Float]).toDF()

scala> df.write.format("orc").save("/tmp/floatarray")

scala> spark.read.orc("/tmp/floatarray")
res1: org.apache.spark.sql.DataFrame = [value: array<float>]

scala> spark.read.orc("/tmp/floatarray").show()
18/02/12 22:09:10 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.io.IOException: Error reading file: file:/tmp/floatarray/part-00000-9c0b461b-4df1-4c23-aac1-3e4f349ac7d6-c000.snappy.orc
	at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1191)
	at org.apache.orc.mapreduce.OrcMapreduceRecordReader.ensureBatch(OrcMapreduceRecordReader.java:78)
...
Caused by: java.io.EOFException: Read past EOF for compressed stream Stream for column 2 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
```

## How was this patch tested?

Pass the Jenkins test.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #21093 from dongjoon-hyun/SPARK-23340-2.
  • Loading branch information...
dongjoon-hyun authored and gatorsmile committed Apr 19, 2018
1 parent fb96821 commit be184d16e86f96a748d6bf1642c1c319d2a09f5c
@@ -156,8 +156,8 @@ objenesis-2.1.jar
okhttp-3.8.1.jar
okio-1.13.0.jar
opencsv-2.3.jar
orc-core-1.4.1-nohive.jar
orc-mapreduce-1.4.1-nohive.jar
orc-core-1.4.3-nohive.jar
orc-mapreduce-1.4.3-nohive.jar
oro-2.0.8.jar
osgi-resource-locator-1.0.1.jar
paranamer-2.8.jar
@@ -157,8 +157,8 @@ objenesis-2.1.jar
okhttp-3.8.1.jar
okio-1.13.0.jar
opencsv-2.3.jar
orc-core-1.4.1-nohive.jar
orc-mapreduce-1.4.1-nohive.jar
orc-core-1.4.3-nohive.jar
orc-mapreduce-1.4.3-nohive.jar
oro-2.0.8.jar
osgi-resource-locator-1.0.1.jar
paranamer-2.8.jar
@@ -130,7 +130,7 @@
<hive.version.short>1.2.1</hive.version.short>
<derby.version>10.12.1.1</derby.version>
<parquet.version>1.8.2</parquet.version>
<orc.version>1.4.1</orc.version>
<orc.version>1.4.3</orc.version>
<orc.classifier>nohive</orc.classifier>
<hive.parquet.version>1.6.0</hive.parquet.version>
<jetty.version>9.3.20.v20170531</jetty.version>
@@ -1739,10 +1739,6 @@
<groupId>org.apache.hive</groupId>
<artifactId>hive-storage-api</artifactId>
</exclusion>
<exclusion>
<groupId>io.airlift</groupId>
<artifactId>slice</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
@@ -160,6 +160,15 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll {
}
}
}

test("SPARK-23340 Empty float/double array columns raise EOFException") {
Seq(Seq(Array.empty[Float]).toDF(), Seq(Array.empty[Double]).toDF()).foreach { df =>
withTempPath { path =>
df.write.format("orc").save(path.getCanonicalPath)
checkAnswer(spark.read.orc(path.getCanonicalPath), df)
}
}
}
}

class OrcSourceSuite extends OrcSuite with SharedSQLContext {
@@ -208,4 +208,14 @@ class HiveOrcQuerySuite extends OrcQueryTest with TestHiveSingleton {
}
}
}

test("SPARK-23340 Empty float/double array columns raise EOFException") {
withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "false") {
withTable("spark_23340") {
sql("CREATE TABLE spark_23340(a array<float>, b array<double>) STORED AS ORC")
sql("INSERT INTO spark_23340 VALUES (array(), array())")
checkAnswer(spark.table("spark_23340"), Seq(Row(Array.empty[Float], Array.empty[Double])))
}
}
}
}

0 comments on commit be184d1

Please sign in to comment.
You can’t perform that action at this time.