-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-11149] [SQL] Improve cache performance for primitive types #9145
Conversation
Test build #43824 has finished for PR 9145 at commit
|
Test build #43834 has finished for PR 9145 at commit
|
Test build #43836 has finished for PR 9145 at commit
|
@@ -44,11 +45,13 @@ private class CodeFormatter { | |||
} else { | |||
indentString | |||
} | |||
code.append(f"${currentLine}%03d ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed offline, we can add /* ... */
to still enable pasting this into an IDE.
|
||
protected def create(columnTypes: Seq[DataType]): ColumnarIterator = { | ||
val ctx = newCodeGenContext() | ||
val (creaters, accesses) = columnTypes.zipWithIndex.map { case (dt, index) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
creators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually probably more clear to say
initializeAccessors and extractors
Test build #43844 has finished for PR 9145 at commit
|
LGTM - although I didn't look super closely so might be good for an extra pair of eyes too. |
Test build #43936 has finished for PR 9145 at commit
|
I'm going to merge this first, unblock me to work on output UnsafeRow for columnar cache, because having Unsafe format inside a MutableRow could result unexpected behavior, any new comments will be address in follow up PR. |
} | ||
|
||
override def extract(buffer: ByteBuffer, row: MutableRow, ordinal: Int): Unit = { | ||
row.setDouble(ordinal, buffer.getDouble()) | ||
row.setDouble(ordinal, ByteBufferHelper.getDouble(buffer)) | ||
} | ||
|
||
override def setField(row: MutableRow, ordinal: Int, value: Double): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Around line 332, there is call to buffer.getShort()
Is it worth adding corresponding method to ByteBufferHelper ?
If so, I can send a PR.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it does not worth it.
LGTM |
This PR improve the performance by:
Generate an Iterator that take Iterator[CachedBatch] as input, and call accessors (unroll the loop for columns), avoid the expensive Iterator.flatMap.
Use Unsafe.getInt/getLong/getFloat/getDouble instead of ByteBuffer.getInt/getLong/getFloat/getDouble, the later one actually read byte by byte.
Remove the unnecessary copy() in Coalesce(), which is not related to memory cache, found during benchmark.
The following benchmark showed that we can speedup the columnar cache of int by 2x.