[SPARK-10917] [SQL] improve performance of complex type in columnar cache#8971
[SPARK-10917] [SQL] improve performance of complex type in columnar cache#8971davies wants to merge 10 commits intoapache:masterfrom
Conversation
|
Test build #43203 has finished for PR 8971 at commit
|
|
Test build #43241 has finished for PR 8971 at commit
|
|
Test build #1843 has finished for PR 8971 at commit
|
There was a problem hiding this comment.
The sizeInBytes is not aligned to words.
|
Test build #43249 has finished for PR 8971 at commit
|
|
@liancheng Could you help to review this? |
|
Test build #43253 has finished for PR 8971 at commit
|
|
Test build #43257 has finished for PR 8971 at commit
|
Conflicts: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnAccessor.scala sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnBuilder.scala sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnType.scala sql/core/src/main/scala/org/apache/spark/sql/columnar/compression/CompressionScheme.scala sql/core/src/test/scala/org/apache/spark/sql/columnar/ColumnTypeSuite.scala sql/core/src/test/scala/org/apache/spark/sql/columnar/ColumnarTestUtils.scala sql/core/src/test/scala/org/apache/spark/sql/columnar/NullableColumnAccessorSuite.scala sql/core/src/test/scala/org/apache/spark/sql/columnar/NullableColumnBuilderSuite.scala
|
Test build #43276 has finished for PR 8971 at commit
|
|
Test build #43281 has finished for PR 8971 at commit
|
There was a problem hiding this comment.
can we save the length header for decimal? i.e. always write 16 bytes.
There was a problem hiding this comment.
The constructor of BigInteger need to how the number of bytes, it will become even complicated. And most of Decimal will be smaller than 8 bytes, even with precision as 38.
There was a problem hiding this comment.
This seems to be a bug that worths a separate JIRA ticket.
There was a problem hiding this comment.
|
LGTM except for a few minor issues. |
There was a problem hiding this comment.
For non-compact decimal, we use ByteArrayColumnType, so maybe use ObjectColumnStats here?
There was a problem hiding this comment.
No, we may want to have min/max of Decimal
|
Test build #43329 has finished for PR 8971 at commit
|
There was a problem hiding this comment.
maybe LargeDecimalColumnAccessor according to the renaming change?
|
Test build #43341 has finished for PR 8971 at commit
|
|
Merged into master. Other nit comments will be addressed by follow up PR, thanks! |
This PR improve the performance of complex types in columnar cache by using UnsafeProjection instead of KryoSerializer.
A simple benchmark show that this PR could improve the performance of scanning a cached table with complex columns by 15x (comparing to Spark 1.5).
Here is the code used to benchmark: