Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13745][SQL]Support columnar in memory representation on Big Endian platforms #12397

Closed
wants to merge 5 commits into from

Conversation

robbinspg
Copy link
Member

What changes were proposed in this pull request?

parquet datasource and ColumnarBatch tests fail on big-endian platforms This patch adds support for the little-endian byte arrays being correctly interpreted on a big-endian platform

How was this patch tested?

Spark test builds ran on big endian z/Linux and regression build on little endian amd64

@robbinspg
Copy link
Member Author

@nongli please can you review this

@@ -31,6 +33,8 @@
private byte[] buffer;
private int offset;
private int bitOffset; // Only used for booleans.

private final static boolean bigEndianPlatform = ByteOrder.nativeOrder().equals(ByteOrder.BIG_ENDIAN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this implementation compare to ByteBuffer.wrap(buffer).order(...)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which method? Do you mean:

ByteBuffer.allocate(8).putDouble(v).order(ByteOrder.LITTLE_ENDIAN).getDouble(0);
vs
ByteBuffer.wrap(buffer).order(ByteOrder.LITTLE_ENDIAN).getDouble(offset);

to save the allocate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there is a difference between wrapping the buffer with a ByteBuffer once (in InitFromPage) and if that behaves any differently than reversing for each integer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the first step is to get this functional on big endian without affecting the little endian implementation asap for inclusion in 2.0.0.

In VectorizedPlainValuesReader we could wrap the buffer once (assuming it never gets re-allocated) and use the ByteBuffer for the big endian access. We could always use the ByteBuffer rather than the byte[] even in little endian implementation but I do not know what performance impact that would have and as stated above I did not want to mess around with the little endian implementation at this time.

Of course the methods in on/offHeapColumnVector that take the byte[] as a parameter could also be changed to take the byte buffer instead.

There is also the unnecessary subtracting/adding the Platform.BYTE_ARRAY_OFFSET around a lot of the method calls. eg

public final void readIntegers(int total, ColumnVector c, int rowId) {
c.putIntsLittleEndian(rowId, total, buffer, offset - Platform.BYTE_ARRAY_OFFSET);
offset += 4 * total;
}

where the first thing that putIntsLittleEndian does is to add that back on to the offset.

So I think for now I'm reasonably happy with this big endian support in this patch and we could maybe review the whole code structure later. I've improved the patch and will maybe make the change to VectorizedPlainValuesReader so the code in there only wraps the buffer once.

@SparkQA
Copy link

SparkQA commented Apr 14, 2016

Test build #2790 has finished for PR 12397 at commit 1fc0483.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@robbinspg robbinspg changed the title [SPARK-13745][SQL]Support columnar in memory representation on Big Endian platforms [WIP][SPARK-13745][SQL]Support columnar in memory representation on Big Endian platforms Apr 15, 2016
@robbinspg
Copy link
Member Author

@nongli

So I've changed the patch to wrap the buffer in initFromPage but only for Big Endian. I'd like to see this patch get in to 2.0.0 so Big Endian platforms are not broken. We can discuss refactoring the code to always use a ByteBuffer rather than byte[] later?

@robbinspg robbinspg changed the title [WIP][SPARK-13745][SQL]Support columnar in memory representation on Big Endian platforms [SPARK-13745][SQL]Support columnar in memory representation on Big Endian platforms Apr 15, 2016
@nongli
Copy link
Contributor

nongli commented Apr 15, 2016

Agreed. LGTM.

@rxin
Copy link
Contributor

rxin commented Apr 18, 2016

Did you test the performance? IIUC, this is relying on JIT's dead code elimination in order to not regress performance?

@robbinspg
Copy link
Member Author

I haven't ran any explicit performance tests for this. Do we have any specific to this area?

Using the static final boolean is allowing the jit to eliminate the dead code path. I discussed an alternative implementation with @nongli in #10628 (comment) and the if(...) implementation was deemed ok.

@robbinspg
Copy link
Member Author

@rxin so what do we need to do to get this into 2.0.0? Although the JIRA is of type "improvement" I could argue that it is a blocking defect as Spark has supported Big Endian platforms up to now. I'm happy to re-write the patch as loading separate subclassed implementations of the On/OffHeapColumnVector and VectorizedPlainValuesReader but that is a far more complex fix and given the code paths I'd doubt any measurable performance change.

@rxin
Copy link
Contributor

rxin commented Apr 18, 2016

Can you try ParquetReadBenchmark to measure before/after this pr?

@robbinspg
Copy link
Member Author

will do

@robbinspg
Copy link
Member Author

So I have ran the ParquetReadBenchmark several times before this PR and after. I'm not sure how to interpret the results though as there is quite a variation in results on the same code base. I've also implemented the fix via subclassing which should not affect the Liltte Endian code path at all and will benchmark that and get back with the results.

@robbinspg
Copy link
Member Author

Alternative implementation in #12501

@hvanhovell
Copy link
Contributor

@robbinspg could you post the benchmark results anyway? I would be nice to have a fact based discussion. I am particularly curious what the results of this approach are, because I would have put my money on dead-code elimination instead of subclassing performance wise.

I am also wondering if putting the Platform dependent code in static methods would not yield better results (Hotspot should only have to perform deadcode elimination once).

@robbinspg
Copy link
Member Author

@hvanhovell Yes I will. I'm trying to get a stable base benchmark first as running the ParquetReadBenchmark repeatedly against the base code (before either PR) I get what looks like a 10% variation in results. The same is true with either of the PRs applied so I will average out the runs.

As far as subclassing goes I would expect performance on LE to remain the same as the code path should be identical once the classes are instantiated. Performance on BE between the 2 approaches is another thing but not the major concern at the moment as we are going from failing/exceptions thrown to "working"

@robbinspg
Copy link
Member Author

Averaged results for 5 runs for first 3 benchmarks:

ParquetReadBenchmark.txt

@robbinspg
Copy link
Member Author

@robbinspg
Copy link
Member Author

@hvanhovell Any thoughts/interpretations on those benchmark results? I think the differences are all within the bounds of randomness!

@hvanhovell
Copy link
Contributor

@robbinspg the tests look good. I do think that the tests in general are quite short with only 2479 ms for the longest test. Is it possible to increase the test sizes?

@robbinspg
Copy link
Member Author

@hvanhovell Here are the test results running 10x the size:
ParquetReadBenchmarks.txt

I'm not sure there is a lot in it and if it were up to me I'd go with the implementation in this PR rather than the subclassing.

@robbinspg
Copy link
Member Author

@rxin Do you think we can merge in this PR?

@hvanhovell
Copy link
Contributor

hvanhovell commented Apr 22, 2016

@robbinspg I have been been looking at the assembler this generates for VectorizedPlainValuesReader.readInteger. I can confirm that C2 compilation eleminates the entire if (bigEndian) { ... } block. Here is the output for further reference: SPARK-13745.txt

One remark regarding benchmarking. If my logs are correct and nothing funny happens to std.out, then I can say we need to be a bit more carefull about benchmarking. The C1/C2 phases only kicked after the after the first two benchmarks. So some warm-up would nice in the future.

Otherwise LGTM.

For future reference. This is what I did to print the compiled bytecode:

java -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=print,*VectorizedPlainValuesReader.readInteger -cp ./dist/jars/*:./sql/core/target/scala-2.11/spark-sql_2.11-2.0.0-SNAPSHOT-tests.jar org.apache.spark.sql.execution.datasources.parquet.ParquetReadBenchmark

See http://psy-lob-saw.blogspot.nl/2013/01/java-print-assembly.html for more details.

@robbinspg
Copy link
Member Author

Can we re test this as I think there was a minor change since the test build

@SparkQA
Copy link

SparkQA commented Apr 24, 2016

Test build #2864 has finished for PR 12397 at commit a652865.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@robbinspg
Copy link
Member Author

@hvanhovell Can me merge this now?

I agree the benchmarks should run after a steady state is achieved. Also I'll probably create a change to allow the Benchmark to output in csv format!

@hvanhovell
Copy link
Contributor

@robbinspg is this the only thing keeping us from running Spark on BE platforms?

@a-roberts
Copy link
Contributor

@hvanhovell I'm working with Pete on this, with this patch around 220 failing tests now pass on BE leaving us with only a handful to investigate, this fix definitely takes away the majority of our problems on BE platforms

@robbinspg
Copy link
Member Author

@hvanhovell Spark 1.6.1 is fine on BE. The issues have been with new function added for Spark 2.0. This PR fixes the major issue. There are a few other issues which need investigating which may be flaky tests but this is the most important.

@robbinspg
Copy link
Member Author

@rxin @hvanhovell is there anything preventing this being merged? IMHO the jira it is fixing is a blocking defect

@robbinspg
Copy link
Member Author

Sorry to keep bugging you on this but I'd really like to fix this major issue and move on. If there are no objections to merging this into master could a committer please do the honours. I don't want to have to create and maintain a Big Endian fork if possible.

Cheers

@davies
Copy link
Contributor

davies commented May 2, 2016

Merging this into master and 2.0 branch, thanks!

@asfgit asfgit closed this in 8a1ce48 May 2, 2016
asfgit pushed a commit that referenced this pull request May 2, 2016
…Endian platforms

## What changes were proposed in this pull request?

parquet datasource and ColumnarBatch tests fail on big-endian platforms This patch adds support for the little-endian byte arrays being correctly interpreted on a big-endian platform

## How was this patch tested?

Spark test builds ran on big endian z/Linux and regression build on little endian amd64

Author: Pete Robbins <robbinspg@gmail.com>

Closes #12397 from robbinspg/master.

(cherry picked from commit 8a1ce48)
Signed-off-by: Davies Liu <davies.liu@gmail.com>
@robbinspg
Copy link
Member Author

Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants