[SPARK-16973][SQL] remove the buffer offsets in ImperativeAggregate#14562
[SPARK-16973][SQL] remove the buffer offsets in ImperativeAggregate#14562cloud-fan wants to merge 5 commits intoapache:masterfrom
Conversation
|
Test build #63438 has finished for PR 14562 at commit
|
|
Does this have performance implications? We are adding a layer of indirection to a hot code path. |
| } | ||
|
|
||
| override def copy(): InternalRow = { | ||
| throw new UnsupportedOperationException("Cannot copy a SlicedMutableRow") |
There was a problem hiding this comment.
SlicedMutableRow -> SlicedInternalRow?
|
@hvanhovell I'm not sure about the performance, will benchmark it later, hopefully they can be inlined by JVM successfully. |
| } | ||
| } | ||
|
|
||
| case class SlicedInternalRow(offset: Int, numFields: Int) extends BaseSlicedInternalRow { |
There was a problem hiding this comment.
does this need to be a case class?
There was a problem hiding this comment.
hmmm, does case class has performance penalty? It doesn't need to be though.
There was a problem hiding this comment.
It generates a lot of crap in bytecode, so would be good to not generate them unless they are useful.
|
Test build #63524 has finished for PR 14562 at commit
|
|
Test build #63552 has finished for PR 14562 at commit
|
| */ | ||
| def withNewMutableAggBufferOffset(newMutableAggBufferOffset: Int): ImperativeAggregate | ||
| final def setMutableBufferOffset(offset: Int): Unit = { | ||
| assert(mutableBufferRow == null) |
There was a problem hiding this comment.
Do you want to do a runtime check? Then how about using require?
assert may be removed by compiler.
There was a problem hiding this comment.
Require implies that the caller has passed a bad argument. Assert checks if invariants hold (the class is not in an unexpected state). I think this should be an assert.
There was a problem hiding this comment.
Instead of setting an attribute that is supposed to be immutable, can we use copy to copy the whole class?
|
I think my biggest concern is about the performance and abstraction.
|
|
Maybe there is another alternative, for example, we can define an InternalRowReader, which wraps the offset. Sub-class of ImperativeAggregate need to use the InternalRowReader to read the fields from InternalRow. |
| } | ||
|
|
||
| // Note: although this simply copies aggBufferAttributes, this common code can not be placed | ||
| // in the superclass because that will lead to initialization ordering issues. |
There was a problem hiding this comment.
finally get rid of it!
|
Test build #63906 has finished for PR 14562 at commit
|
|
Test build #63908 has finished for PR 14562 at commit
|
What changes were proposed in this pull request?
the
mutableAggBufferOffsetandinputAggBufferOffsetinImperativeAggregateare really hard to understand and tightly coupled with aggregation implementation. What's worse, allImperativeAggregateimplementations need to understand this concept and deal with it carefully.This PR isolate this buffet offsets concept into the base class
ImperativeAggregate, by introducing a sliced row. Then put the interface toImperativeAggregateImpl, allImperativeAggregateImplimplementations don't need to care about the buffer offsets anymore.How was this patch tested?
existing tests.