Skip to content

Commit

Permalink
[SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

Like Parquet, this PR aims to depend on the latest Apache ORC 1.4 for Apache Spark 2.3. There are key benefits for Apache ORC 1.4.

- Stability: Apache ORC 1.4.0 has many fixes and we can depend on ORC community more.
- Maintainability: Reduce the Hive dependency and can remove old legacy code later.

Later, we can get the following two key benefits by adding new ORCFileFormat in SPARK-20728 (apache#17980), too.
- Usability: User can use ORC data sources without hive module, i.e, -Phive.
- Speed: Use both Spark ColumnarBatch and ORC RowBatch together. This will be faster than the current implementation in Spark.

## How was this patch tested?

Pass the jenkins.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes apache#18640 from dongjoon-hyun/SPARK-21422.
  • Loading branch information
dongjoon-hyun authored and gatorsmile committed Aug 16, 2017
1 parent 07549b2 commit 8c54f1e
Show file tree
Hide file tree
Showing 5 changed files with 66 additions and 0 deletions.
6 changes: 6 additions & 0 deletions assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,12 @@
<hive.deps.scope>provided</hive.deps.scope>
</properties>
</profile>
<profile>
<id>orc-provided</id>
<properties>
<orc.deps.scope>provided</orc.deps.scope>
</properties>
</profile>
<profile>
<id>parquet-provided</id>
<properties>
Expand Down
3 changes: 3 additions & 0 deletions dev/deps/spark-deps-hadoop-2.6
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ JavaEWAH-0.3.2.jar
RoaringBitmap-0.5.11.jar
ST4-4.0.4.jar
activation-1.1.1.jar
aircompressor-0.3.jar
antlr-2.7.7.jar
antlr-runtime-3.4.jar
antlr4-runtime-4.5.3.jar
Expand Down Expand Up @@ -148,6 +149,8 @@ netty-3.9.9.Final.jar
netty-all-4.0.43.Final.jar
objenesis-2.1.jar
opencsv-2.3.jar
orc-core-1.4.0-nohive.jar
orc-mapreduce-1.4.0-nohive.jar
oro-2.0.8.jar
osgi-resource-locator-1.0.1.jar
paranamer-2.6.jar
Expand Down
3 changes: 3 additions & 0 deletions dev/deps/spark-deps-hadoop-2.7
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ JavaEWAH-0.3.2.jar
RoaringBitmap-0.5.11.jar
ST4-4.0.4.jar
activation-1.1.1.jar
aircompressor-0.3.jar
antlr-2.7.7.jar
antlr-runtime-3.4.jar
antlr4-runtime-4.5.3.jar
Expand Down Expand Up @@ -149,6 +150,8 @@ netty-3.9.9.Final.jar
netty-all-4.0.43.Final.jar
objenesis-2.1.jar
opencsv-2.3.jar
orc-core-1.4.0-nohive.jar
orc-mapreduce-1.4.0-nohive.jar
oro-2.0.8.jar
osgi-resource-locator-1.0.1.jar
paranamer-2.6.jar
Expand Down
44 changes: 44 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,8 @@
<hive.version.short>1.2.1</hive.version.short>
<derby.version>10.12.1.1</derby.version>
<parquet.version>1.8.2</parquet.version>
<orc.version>1.4.0</orc.version>
<orc.classifier>nohive</orc.classifier>
<hive.parquet.version>1.6.0</hive.parquet.version>
<jetty.version>9.3.20.v20170531</jetty.version>
<javaxservlet.version>3.1.0</javaxservlet.version>
Expand Down Expand Up @@ -208,6 +210,7 @@
<flume.deps.scope>compile</flume.deps.scope>
<hadoop.deps.scope>compile</hadoop.deps.scope>
<hive.deps.scope>compile</hive.deps.scope>
<orc.deps.scope>compile</orc.deps.scope>
<parquet.deps.scope>compile</parquet.deps.scope>
<parquet.test.deps.scope>test</parquet.test.deps.scope>

Expand Down Expand Up @@ -1695,6 +1698,44 @@
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.orc</groupId>
<artifactId>orc-core</artifactId>
<version>${orc.version}</version>
<classifier>${orc.classifier}</classifier>
<scope>${orc.deps.scope}</scope>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-storage-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.orc</groupId>
<artifactId>orc-mapreduce</artifactId>
<version>${orc.version}</version>
<classifier>${orc.classifier}</classifier>
<scope>${orc.deps.scope}</scope>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.orc</groupId>
<artifactId>orc-core</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-storage-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-column</artifactId>
Expand Down Expand Up @@ -2727,6 +2768,9 @@
<profile>
<id>hive-provided</id>
</profile>
<profile>
<id>orc-provided</id>
</profile>
<profile>
<id>parquet-provided</id>
</profile>
Expand Down
10 changes: 10 additions & 0 deletions sql/core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,16 @@
<scope>test</scope>
</dependency>

<dependency>
<groupId>org.apache.orc</groupId>
<artifactId>orc-core</artifactId>
<classifier>${orc.classifier}</classifier>
</dependency>
<dependency>
<groupId>org.apache.orc</groupId>
<artifactId>orc-mapreduce</artifactId>
<classifier>${orc.classifier}</classifier>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-column</artifactId>
Expand Down

0 comments on commit 8c54f1e

Please sign in to comment.