[SPARK-13745][SQL]Support columnar in memory representation on Big Endian platforms #12397

robbinspg · 2016-04-14T14:31:33Z

What changes were proposed in this pull request?

parquet datasource and ColumnarBatch tests fail on big-endian platforms This patch adds support for the little-endian byte arrays being correctly interpreted on a big-endian platform

How was this patch tested?

Spark test builds ran on big endian z/Linux and regression build on little endian amd64

robbinspg · 2016-04-14T14:32:15Z

@nongli please can you review this

nongli · 2016-04-14T18:28:46Z

...ain/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java

@@ -31,6 +33,8 @@
  private byte[] buffer;
  private int offset;
  private int bitOffset; // Only used for booleans.
+
+  private final static boolean bigEndianPlatform = ByteOrder.nativeOrder().equals(ByteOrder.BIG_ENDIAN);


How does this implementation compare to ByteBuffer.wrap(buffer).order(...)?

In which method? Do you mean:

ByteBuffer.allocate(8).putDouble(v).order(ByteOrder.LITTLE_ENDIAN).getDouble(0);
vs
ByteBuffer.wrap(buffer).order(ByteOrder.LITTLE_ENDIAN).getDouble(offset);

to save the allocate?

I'm wondering if there is a difference between wrapping the buffer with a ByteBuffer once (in InitFromPage) and if that behaves any differently than reversing for each integer.

So the first step is to get this functional on big endian without affecting the little endian implementation asap for inclusion in 2.0.0.

In VectorizedPlainValuesReader we could wrap the buffer once (assuming it never gets re-allocated) and use the ByteBuffer for the big endian access. We could always use the ByteBuffer rather than the byte[] even in little endian implementation but I do not know what performance impact that would have and as stated above I did not want to mess around with the little endian implementation at this time.

Of course the methods in on/offHeapColumnVector that take the byte[] as a parameter could also be changed to take the byte buffer instead.

There is also the unnecessary subtracting/adding the Platform.BYTE_ARRAY_OFFSET around a lot of the method calls. eg

public final void readIntegers(int total, ColumnVector c, int rowId) {
c.putIntsLittleEndian(rowId, total, buffer, offset - Platform.BYTE_ARRAY_OFFSET);
offset += 4 * total;
}

where the first thing that putIntsLittleEndian does is to add that back on to the offset.

So I think for now I'm reasonably happy with this big endian support in this patch and we could maybe review the whole code structure later. I've improved the patch and will maybe make the change to VectorizedPlainValuesReader so the code in there only wraps the buffer once.

SparkQA · 2016-04-14T22:45:38Z

Test build #2790 has finished for PR 12397 at commit 1fc0483.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

robbinspg · 2016-04-15T17:09:32Z

@nongli

So I've changed the patch to wrap the buffer in initFromPage but only for Big Endian. I'd like to see this patch get in to 2.0.0 so Big Endian platforms are not broken. We can discuss refactoring the code to always use a ByteBuffer rather than byte[] later?

nongli · 2016-04-15T19:05:34Z

Agreed. LGTM.

rxin · 2016-04-18T07:29:47Z

Did you test the performance? IIUC, this is relying on JIT's dead code elimination in order to not regress performance?

robbinspg · 2016-04-18T07:47:29Z

I haven't ran any explicit performance tests for this. Do we have any specific to this area?

Using the static final boolean is allowing the jit to eliminate the dead code path. I discussed an alternative implementation with @nongli in #10628 (comment) and the if(...) implementation was deemed ok.

robbinspg · 2016-04-18T20:28:43Z

@rxin so what do we need to do to get this into 2.0.0? Although the JIRA is of type "improvement" I could argue that it is a blocking defect as Spark has supported Big Endian platforms up to now. I'm happy to re-write the patch as loading separate subclassed implementations of the On/OffHeapColumnVector and VectorizedPlainValuesReader but that is a far more complex fix and given the code paths I'd doubt any measurable performance change.

rxin · 2016-04-18T20:31:44Z

Can you try ParquetReadBenchmark to measure before/after this pr?

robbinspg · 2016-04-18T20:36:02Z

will do

robbinspg · 2016-04-19T14:39:20Z

So I have ran the ParquetReadBenchmark several times before this PR and after. I'm not sure how to interpret the results though as there is quite a variation in results on the same code base. I've also implemented the fix via subclassing which should not affect the Liltte Endian code path at all and will benchmark that and get back with the results.

robbinspg · 2016-04-19T17:20:19Z

Alternative implementation in #12501

hvanhovell · 2016-04-20T09:32:55Z

@robbinspg could you post the benchmark results anyway? I would be nice to have a fact based discussion. I am particularly curious what the results of this approach are, because I would have put my money on dead-code elimination instead of subclassing performance wise.

I am also wondering if putting the Platform dependent code in static methods would not yield better results (Hotspot should only have to perform deadcode elimination once).

robbinspg · 2016-04-20T09:56:39Z

@hvanhovell Yes I will. I'm trying to get a stable base benchmark first as running the ParquetReadBenchmark repeatedly against the base code (before either PR) I get what looks like a 10% variation in results. The same is true with either of the PRs applied so I will average out the runs.

As far as subclassing goes I would expect performance on LE to remain the same as the code path should be identical once the classes are instantiated. Performance on BE between the 2 approaches is another thing but not the major concern at the moment as we are going from failing/exceptions thrown to "working"

robbinspg · 2016-04-20T11:31:32Z

Averaged results for 5 runs for first 3 benchmarks:

ParquetReadBenchmark.txt

robbinspg · 2016-04-20T12:31:02Z

ParquetReadBenchmark-PartitionedTable.txt

robbinspg · 2016-04-21T07:41:36Z

@hvanhovell Any thoughts/interpretations on those benchmark results? I think the differences are all within the bounds of randomness!

hvanhovell · 2016-04-21T07:57:17Z

@robbinspg the tests look good. I do think that the tests in general are quite short with only 2479 ms for the longest test. Is it possible to increase the test sizes?

robbinspg · 2016-04-21T13:54:24Z

@hvanhovell Here are the test results running 10x the size:
ParquetReadBenchmarks.txt

I'm not sure there is a lot in it and if it were up to me I'd go with the implementation in this PR rather than the subclassing.

robbinspg · 2016-04-22T15:47:12Z

@rxin Do you think we can merge in this PR?

hvanhovell · 2016-04-22T21:05:25Z

@robbinspg I have been been looking at the assembler this generates for VectorizedPlainValuesReader.readInteger. I can confirm that C2 compilation eleminates the entire if (bigEndian) { ... } block. Here is the output for further reference: SPARK-13745.txt

One remark regarding benchmarking. If my logs are correct and nothing funny happens to std.out, then I can say we need to be a bit more carefull about benchmarking. The C1/C2 phases only kicked after the after the first two benchmarks. So some warm-up would nice in the future.

Otherwise LGTM.

For future reference. This is what I did to print the compiled bytecode:

java -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=print,*VectorizedPlainValuesReader.readInteger -cp ./dist/jars/*:./sql/core/target/scala-2.11/spark-sql_2.11-2.0.0-SNAPSHOT-tests.jar org.apache.spark.sql.execution.datasources.parquet.ParquetReadBenchmark

See http://psy-lob-saw.blogspot.nl/2013/01/java-print-assembly.html for more details.

robbinspg · 2016-04-23T05:57:12Z

Can we re test this as I think there was a minor change since the test build

SparkQA · 2016-04-24T22:24:27Z

Test build #2864 has finished for PR 12397 at commit a652865.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

robbinspg · 2016-04-25T16:28:08Z

@hvanhovell Can me merge this now?

I agree the benchmarks should run after a steady state is achieved. Also I'll probably create a change to allow the Benchmark to output in csv format!

hvanhovell · 2016-04-25T21:21:01Z

@robbinspg is this the only thing keeping us from running Spark on BE platforms?

a-roberts · 2016-04-25T21:43:59Z

@hvanhovell I'm working with Pete on this, with this patch around 220 failing tests now pass on BE leaving us with only a handful to investigate, this fix definitely takes away the majority of our problems on BE platforms

robbinspg · 2016-04-26T06:32:35Z

@hvanhovell Spark 1.6.1 is fine on BE. The issues have been with new function added for Spark 2.0. This PR fixes the major issue. There are a few other issues which need investigating which may be flaky tests but this is the most important.

robbinspg · 2016-04-27T06:18:05Z

@rxin @hvanhovell is there anything preventing this being merged? IMHO the jira it is fixing is a blocking defect

robbinspg · 2016-04-29T11:39:44Z

Sorry to keep bugging you on this but I'd really like to fix this major issue and move on. If there are no objections to merging this into master could a committer please do the honours. I don't want to have to create and maintain a Big Endian fork if possible.

Cheers

davies · 2016-05-02T20:16:26Z

Merging this into master and 2.0 branch, thanks!

…Endian platforms ## What changes were proposed in this pull request? parquet datasource and ColumnarBatch tests fail on big-endian platforms This patch adds support for the little-endian byte arrays being correctly interpreted on a big-endian platform ## How was this patch tested? Spark test builds ran on big endian z/Linux and regression build on little endian amd64 Author: Pete Robbins <robbinspg@gmail.com> Closes #12397 from robbinspg/master. (cherry picked from commit 8a1ce48) Signed-off-by: Davies Liu <davies.liu@gmail.com>

robbinspg · 2016-05-02T20:55:03Z

Many thanks!

Support columnar in memory representation on Big Endian platforms

1fc0483

nongli reviewed Apr 14, 2016
View reviewed changes

Use ByteBuffer.wrap instead of allocate

3eb481d

robbinspg added 2 commits April 15, 2016 11:10

Fix offsets

69fc667

Wrap byte array once

a1f0610

robbinspg changed the title ~~[SPARK-13745][SQL]Support columnar in memory representation on Big Endian platforms~~ [WIP][SPARK-13745][SQL]Support columnar in memory representation on Big Endian platforms Apr 15, 2016

remove trailing spaces

a652865

robbinspg changed the title ~~[WIP][SPARK-13745][SQL]Support columnar in memory representation on Big Endian platforms~~ [SPARK-13745][SQL]Support columnar in memory representation on Big Endian platforms Apr 15, 2016

robbinspg mentioned this pull request Apr 19, 2016

[SPARK-13745][SQL]Support columnar in memory representation on Big Endian platforms - implement by subclassing #12501

Closed

asfgit closed this in 8a1ce48 May 2, 2016

peter-toth mentioned this pull request Jun 21, 2020

[SPARK-29375][SPARK-28940][SPARK-32041][SQL] Whole plan exchange and subquery reuse #28885

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-13745][SQL]Support columnar in memory representation on Big Endian platforms #12397

[SPARK-13745][SQL]Support columnar in memory representation on Big Endian platforms #12397

robbinspg commented Apr 14, 2016

robbinspg commented Apr 14, 2016

nongli Apr 14, 2016

robbinspg Apr 14, 2016

nongli Apr 14, 2016

robbinspg Apr 15, 2016

SparkQA commented Apr 14, 2016

robbinspg commented Apr 15, 2016

nongli commented Apr 15, 2016

rxin commented Apr 18, 2016

robbinspg commented Apr 18, 2016

robbinspg commented Apr 18, 2016

rxin commented Apr 18, 2016

robbinspg commented Apr 18, 2016

robbinspg commented Apr 19, 2016

robbinspg commented Apr 19, 2016

hvanhovell commented Apr 20, 2016

robbinspg commented Apr 20, 2016

robbinspg commented Apr 20, 2016

robbinspg commented Apr 20, 2016

robbinspg commented Apr 21, 2016

hvanhovell commented Apr 21, 2016

robbinspg commented Apr 21, 2016

robbinspg commented Apr 22, 2016

hvanhovell commented Apr 22, 2016 •

edited

robbinspg commented Apr 23, 2016

SparkQA commented Apr 24, 2016

robbinspg commented Apr 25, 2016

hvanhovell commented Apr 25, 2016

a-roberts commented Apr 25, 2016

robbinspg commented Apr 26, 2016

robbinspg commented Apr 27, 2016

robbinspg commented Apr 29, 2016

davies commented May 2, 2016

robbinspg commented May 2, 2016

[SPARK-13745][SQL]Support columnar in memory representation on Big Endian platforms #12397

[SPARK-13745][SQL]Support columnar in memory representation on Big Endian platforms #12397

Conversation

robbinspg commented Apr 14, 2016

What changes were proposed in this pull request?

How was this patch tested?

robbinspg commented Apr 14, 2016

nongli Apr 14, 2016

Choose a reason for hiding this comment

robbinspg Apr 14, 2016

Choose a reason for hiding this comment

nongli Apr 14, 2016

Choose a reason for hiding this comment

robbinspg Apr 15, 2016

Choose a reason for hiding this comment

SparkQA commented Apr 14, 2016

robbinspg commented Apr 15, 2016

nongli commented Apr 15, 2016

rxin commented Apr 18, 2016

robbinspg commented Apr 18, 2016

robbinspg commented Apr 18, 2016

rxin commented Apr 18, 2016

robbinspg commented Apr 18, 2016

robbinspg commented Apr 19, 2016

robbinspg commented Apr 19, 2016

hvanhovell commented Apr 20, 2016

robbinspg commented Apr 20, 2016

robbinspg commented Apr 20, 2016

robbinspg commented Apr 20, 2016

robbinspg commented Apr 21, 2016

hvanhovell commented Apr 21, 2016

robbinspg commented Apr 21, 2016

robbinspg commented Apr 22, 2016

hvanhovell commented Apr 22, 2016 • edited

robbinspg commented Apr 23, 2016

SparkQA commented Apr 24, 2016

robbinspg commented Apr 25, 2016

hvanhovell commented Apr 25, 2016

a-roberts commented Apr 25, 2016

robbinspg commented Apr 26, 2016

robbinspg commented Apr 27, 2016

robbinspg commented Apr 29, 2016

davies commented May 2, 2016

robbinspg commented May 2, 2016

hvanhovell commented Apr 22, 2016 •

edited