Skip to content

Commit

Permalink
[SPARK-16060][SQL] Support Vectorized ORC Reader
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

This PR adds an ORC columnar-batch reader to native `OrcFileFormat`. Since both Spark `ColumnarBatch` and ORC `RowBatch` are used together, it is faster than the current Spark implementation. This replaces the prior PR, #17924.

Also, this PR adds `OrcReadBenchmark` to show the performance improvement.

## How was this patch tested?

Pass the existing test cases.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #19943 from dongjoon-hyun/SPARK-16060.
  • Loading branch information
dongjoon-hyun authored and cloud-fan committed Jan 9, 2018
1 parent 6a4206f commit f44ba91
Show file tree
Hide file tree
Showing 5 changed files with 1,022 additions and 25 deletions.
Expand Up @@ -386,6 +386,11 @@ object SQLConf {
.checkValues(Set("hive", "native"))
.createWithDefault("native")

val ORC_VECTORIZED_READER_ENABLED = buildConf("spark.sql.orc.enableVectorizedReader")
.doc("Enables vectorized orc decoding.")
.booleanConf
.createWithDefault(true)

val ORC_FILTER_PUSHDOWN_ENABLED = buildConf("spark.sql.orc.filterPushdown")
.doc("When true, enable filter pushdown for ORC files.")
.booleanConf
Expand Down Expand Up @@ -1183,6 +1188,8 @@ class SQLConf extends Serializable with Logging {

def orcCompressionCodec: String = getConf(ORC_COMPRESSION)

def orcVectorizedReaderEnabled: Boolean = getConf(ORC_VECTORIZED_READER_ENABLED)

def parquetCompressionCodec: String = getConf(PARQUET_COMPRESSION)

def parquetVectorizedReaderEnabled: Boolean = getConf(PARQUET_VECTORIZED_READER_ENABLED)
Expand Down

0 comments on commit f44ba91

Please sign in to comment.