Skip to content

[SPARK-16634][sql] Workaround JVM bug by moving some code out of ctor.#14271

Closed
vanzin wants to merge 1 commit intoapache:masterfrom
vanzin:SPARK-16634
Closed

[SPARK-16634][sql] Workaround JVM bug by moving some code out of ctor.#14271
vanzin wants to merge 1 commit intoapache:masterfrom
vanzin:SPARK-16634

Conversation

@vanzin
Copy link
Copy Markdown
Contributor

@vanzin vanzin commented Jul 19, 2016

Some 1.7 JVMs have a bug that is triggered by certain Scala-generated
bytecode. GenericArrayData suffers from that and fails to load in certain
JVMs.

Moving the offending code out of the constructor and into a helper method
avoids the issue.

Some 1.7 JVMs have a bug that is triggered by certain Scala-generated
bytecode. GenericArrayData suffers from that and fails to load in certain
JVMs.

Moving the offending code out of the constructor and into a helper method
avoids the issue.
@SparkQA
Copy link
Copy Markdown

SparkQA commented Jul 20, 2016

Test build #62560 has finished for PR 14271 at commit b201790.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Copy Markdown
Member

srowen commented Jul 20, 2016

LGTM

@vanzin
Copy link
Copy Markdown
Contributor Author

vanzin commented Jul 20, 2016

Merging to master / 2.0.

@asfgit asfgit closed this in e3cd5b3 Jul 20, 2016
asfgit pushed a commit that referenced this pull request Jul 20, 2016
Some 1.7 JVMs have a bug that is triggered by certain Scala-generated
bytecode. GenericArrayData suffers from that and fails to load in certain
JVMs.

Moving the offending code out of the constructor and into a helper method
avoids the issue.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #14271 from vanzin/SPARK-16634.

(cherry picked from commit e3cd5b3)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
@vanzin vanzin deleted the SPARK-16634 branch August 2, 2016 18:34
dongjoon-hyun pushed a commit that referenced this pull request Jan 20, 2020
…onstructor, plus related optimization in ParquetMapConverter

### What changes were proposed in this pull request?

This PR implements a tiny performance optimization for a `GenericArrayData` constructor, avoiding an unnecessary roundtrip through `WrappedArray` when the provided value is already an array of objects.

It also fixes a related performance problem in `ParquetRowConverter`.

### Why are the changes needed?

`GenericArrayData` has a `this(seqOrArray: Any)` constructor, which was originally added in #13138 for use in `RowEncoder` (where we may not know concrete types until runtime) but is also called (perhaps unintentionally) in a few other code paths.

In this constructor's existing implementation, a call to `new WrappedArray(Array[Object](""))` is dispatched to the `this(seqOrArray: Any)` constructor, where we then call `this(array.toSeq)`: this wraps the provided array into a `WrappedArray`, which is subsequently unwrapped in a `this(seq.toArray)` call. For an interactive example, see https://scastie.scala-lang.org/7jOHydbNTaGSU677FWA8nA

This PR changes the `this(seqOrArray: Any)` constructor so that it calls the primary `this(array: Array[Any])` constructor, allowing us to save a `.toSeq.toArray` call; this comes at the cost of one additional `case` in the `match` statement (but I believe this has a negligible performance impact relative to the other savings).

As code cleanup, I also reverted the JVM 1.7 workaround from #14271.

I also fixed a related performance problem in `ParquetRowConverter`: previously, this code called `ArrayBasedMapData.apply` which, in turn, called the `this(Any)` constructor for `GenericArrayData`: this PR's micro-benchmarks show that this is _significantly_ slower than calling the `this(Array[Any])` constructor (and I also observed time spent here during other Parquet scan benchmarking work). To fix this performance problem, I replaced the call to the  `ArrayBasedMapData.apply` method with direct calls to the `ArrayBasedMapData` and `GenericArrayData` constructors.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

I tested this by running code in a debugger and by running microbenchmarks (which I've added to a new `GenericArrayDataBenchmark` in this PR):

- With JDK8 benchmarks: this PR's changes more than double the performance of calls to the `this(Any)` constructor. Even after improvements, however, calls to the `this(Array[Any])` constructor are still ~60x faster than calls to `this(Any)` when passing a non-primitive array (thereby motivating this patch's other change in `ParquetRowConverter`).
- With JDK11 benchmarks: the changes more-or-less completely eliminate the performance penalty associated with the `this(Any)` constructor.

Closes #27088 from JoshRosen/joshrosen/GenericArrayData-optimization.

Authored-by: Josh Rosen <rosenville@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants