Skip to content

[SPARK-10917] [SQL] improve performance of complex type in columnar cache#8971

Closed
davies wants to merge 10 commits intoapache:masterfrom
davies:complex
Closed

[SPARK-10917] [SQL] improve performance of complex type in columnar cache#8971
davies wants to merge 10 commits intoapache:masterfrom
davies:complex

Conversation

@davies
Copy link
Contributor

@davies davies commented Oct 3, 2015

This PR improve the performance of complex types in columnar cache by using UnsafeProjection instead of KryoSerializer.

A simple benchmark show that this PR could improve the performance of scanning a cached table with complex columns by 15x (comparing to Spark 1.5).

Here is the code used to benchmark:

df = sc.range(1<<23).map(lambda i: Row(a=Row(b=i, c=str(i)), d=range(10), e=dict(zip(range(10), [str(i) for i in range(10)])))).toDF()
df.write.parquet("table")
df = sqlContext.read.parquet("table")
df.cache()
df.count()
t = time.time()
print df.select("*")._jdf.queryExecution().toRdd().count()
print time.time() - t

@SparkQA
Copy link

SparkQA commented Oct 3, 2015

Test build #43203 has finished for PR 8971 at commit 713c884.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 5, 2015

Test build #43241 has finished for PR 8971 at commit 59bb2f9.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 5, 2015

Test build #1843 has finished for PR 8971 at commit 59bb2f9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sizeInBytes is not aligned to words.

@SparkQA
Copy link

SparkQA commented Oct 5, 2015

Test build #43249 has finished for PR 8971 at commit f9a3502.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class BufferHolder
    • public class UnsafeArrayWriter
    • public class UnsafeRowWriter

@davies
Copy link
Contributor Author

davies commented Oct 5, 2015

@liancheng Could you help to review this?

@SparkQA
Copy link

SparkQA commented Oct 5, 2015

Test build #43253 has finished for PR 8971 at commit 3ad1e9c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class BufferHolder
    • public class UnsafeArrayWriter
    • public class UnsafeRowWriter

@SparkQA
Copy link

SparkQA commented Oct 6, 2015

Test build #43257 has finished for PR 8971 at commit 3b9f59e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Conflicts:
	sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java
	sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnAccessor.scala
	sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnBuilder.scala
	sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnType.scala
	sql/core/src/main/scala/org/apache/spark/sql/columnar/compression/CompressionScheme.scala
	sql/core/src/test/scala/org/apache/spark/sql/columnar/ColumnTypeSuite.scala
	sql/core/src/test/scala/org/apache/spark/sql/columnar/ColumnarTestUtils.scala
	sql/core/src/test/scala/org/apache/spark/sql/columnar/NullableColumnAccessorSuite.scala
	sql/core/src/test/scala/org/apache/spark/sql/columnar/NullableColumnBuilderSuite.scala
@SparkQA
Copy link

SparkQA commented Oct 6, 2015

Test build #43276 has finished for PR 8971 at commit 54479f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 6, 2015

Test build #43281 has finished for PR 8971 at commit d0be9e4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we save the length header for decimal? i.e. always write 16 bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor of BigInteger need to how the number of bytes, it will become even complicated. And most of Decimal will be smaller than 8 bytes, even with precision as 38.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a bug that worths a separate JIRA ticket.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liancheng
Copy link
Contributor

LGTM except for a few minor issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For non-compact decimal, we use ByteArrayColumnType, so maybe use ObjectColumnStats here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we may want to have min/max of Decimal

@SparkQA
Copy link

SparkQA commented Oct 7, 2015

Test build #43329 has finished for PR 8971 at commit 9c5718d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe LargeDecimalColumnAccessor according to the renaming change?

@SparkQA
Copy link

SparkQA commented Oct 7, 2015

Test build #43341 has finished for PR 8971 at commit 4f0a94e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor Author

davies commented Oct 7, 2015

Merged into master. Other nit comments will be addressed by follow up PR, thanks!

@asfgit asfgit closed this in 075a0b6 Oct 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants