[SPARK-12992][SQL]: Update parquet reader to support more types when decoding to ColumnarBatch. #10908

nongli · 2016-01-26T00:38:34Z

This patch implements support for more types when doing the vectorized decode. There are
a few more types remaining but they should be very straightforward after this. This code
has a few copy and paste pieces but they are difficult to eliminate due to performance
considerations.

Specifically, this patch adds support for:

String, Long, Byte types
Dictionary encoding for those types.

SparkQA · 2016-01-26T02:54:56Z

Test build #50047 has finished for PR 10908 at commit ea1f406.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-27T07:08:03Z

Test build #50174 has finished for PR 10908 at commit 2c8aaad.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2016-01-27T18:15:28Z

.../main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java

+
+  @Override
+  public byte readByte() {
+    throw new UnsupportedOperationException("only readInts is valid.");


readInts -> readBytes

This should be readInts. The only valid read* APIs that doesn't also decode definition levels is used to decode dictionary ids, which are always ints. I updated the comment for readIntegers() to try to capture this.

…decoding to ColumnarBatch. This patch implements support for more types when doing the vectorized decode. There are a few more types remaining but they should be very straightfoward after this. This code has a few copy and paste pieces but they are difficult to eliminate due to performance considerations. Specifically, this patch adds support for: - String, Long, Byte types - Dictionary encoding for those types.

SparkQA · 2016-02-01T20:44:16Z

Test build #50499 has finished for PR 10908 at commit 2ea2d54.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-03T00:12:14Z

Test build #2493 has finished for PR 10908 at commit 2ea2d54.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

davies · 2016-02-03T00:33:01Z

LGTM, merging this into master, thanks!

nongli force-pushed the spark-12992 branch from ea1f406 to 2c8aaad Compare January 27, 2016 05:46

davies reviewed Jan 27, 2016
View reviewed changes

nongli added 2 commits February 1, 2016 10:47

Code review and rebase.

2ea2d54

nongli force-pushed the spark-12992 branch from 2c8aaad to 2ea2d54 Compare February 1, 2016 20:08

asfgit closed this in 21112e8 Feb 3, 2016

nongli deleted the spark-12992 branch February 3, 2016 19:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-12992][SQL]: Update parquet reader to support more types when decoding to ColumnarBatch. #10908

[SPARK-12992][SQL]: Update parquet reader to support more types when decoding to ColumnarBatch. #10908

nongli commented Jan 26, 2016

SparkQA commented Jan 26, 2016

SparkQA commented Jan 27, 2016

davies Jan 27, 2016

nongli Feb 1, 2016

SparkQA commented Feb 1, 2016

SparkQA commented Feb 3, 2016

davies commented Feb 3, 2016

[SPARK-12992][SQL]: Update parquet reader to support more types when decoding to ColumnarBatch. #10908

[SPARK-12992][SQL]: Update parquet reader to support more types when decoding to ColumnarBatch. #10908

Conversation

nongli commented Jan 26, 2016

SparkQA commented Jan 26, 2016

SparkQA commented Jan 27, 2016

davies Jan 27, 2016

Choose a reason for hiding this comment

nongli Feb 1, 2016

Choose a reason for hiding this comment

SparkQA commented Feb 1, 2016

SparkQA commented Feb 3, 2016

davies commented Feb 3, 2016