[SPARK-13057][SQL] Add benchmark codes and the performance results for implemented compression schemes for InMemoryRelation #10965

maropu · 2016-01-28T05:55:42Z

This pr adds benchmark codes for in-memory cache compression to make future developments and discussions more smooth.

SparkQA · 2016-01-28T07:21:45Z

Test build #50254 has finished for PR 10965 at commit b3bf70c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-02T08:11:50Z

Test build #50546 has finished for PR 10965 at commit cc58f20.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2016-02-05T06:52:07Z

The size of in-memory columnar cache is much bigger than parquet data on disk because Spark uses simpler compression algorithms than parquet does in compressionSchemes (See a discussion on spark-user mailing list - https://www.mail-archive.com/user@spark.apache.org/msg45241.html).

Since spark-sql already has parquet-column dependency, we can use the same efficient compression algorithms with Parquet to reduce GC pressure.
Thought?

maropu · 2016-02-05T07:25:13Z

I tried to use DeltaBinaryPackingValuesReader and DeltaBinaryPackingValuesWriter in parquet-column package.

Benchmark
Running benchmark: INT Decode(Lower Skew)
  Running case: PassThrough(1.000)
  Running case: RunLengthEncoding(1.005)
  Running case: DictionaryEncoding(0.500)
  Running case: IntDelta(0.250)
  Running case: ParquetIntDelta(0.072)

Intel(R) Core(TM) i7-4578U CPU @ 3.00GHz
INT Decode(Lower Skew):            Avg Time(ms)    Avg Rate(M/s)  Relative Rate
-------------------------------------------------------------------------------
PassThrough(1.000)                       278.04           241.36         1.00 X
RunLengthEncoding(1.005)                 741.29            90.53         0.38 X
DictionaryEncoding(0.500)                954.02            70.34         0.29 X
IntDelta(0.250)                          836.38            80.24         0.33 X
ParquetIntDelta(0.072)                   839.36            79.95         0.33 X

Running benchmark: INT Decode(Higher Skew)
  Running case: PassThrough(1.000)
  Running case: RunLengthEncoding(1.339)
  Running case: DictionaryEncoding(0.501)
  Running case: IntDelta(0.250)
  Running case: ParquetIntDelta(0.161)

Intel(R) Core(TM) i7-4578U CPU @ 3.00GHz
INT Decode(Higher Skew):           Avg Time(ms)    Avg Rate(M/s)  Relative Rate
-------------------------------------------------------------------------------
PassThrough(1.000)                       899.44            74.61         1.00 X
RunLengthEncoding(1.339)                1192.31            56.28         0.75 X
DictionaryEncoding(0.501)                981.44            68.38         0.92 X
IntDelta(0.250)                          874.43            76.75         1.03 X
ParquetIntDelta(0.161)                   937.08            71.61         0.96 X

maropu · 2016-02-05T07:29:00Z

@rxin Could you give me any comment on this?

rxin · 2016-02-05T07:46:01Z

cc @nongli is this useful?

maropu · 2016-02-07T16:53:17Z

@nongli ping

cloud-fan · 2016-02-08T03:11:17Z

the benchmark infra is updated, I think we need to rerun it and update the results.

nongli · 2016-02-08T19:00:26Z

The benchmark LGTM and I think this is useful.

@maropu Before you make significant changes to this, can you write up what you plan to do?

maropu · 2016-02-09T09:42:00Z

@nongli Okay, I'll let you know the plan first. plz give me some time to look around similar codes in Parquet.

SparkQA · 2016-02-09T17:18:40Z

Test build #50979 has finished for PR 10965 at commit fab3fb2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-02-10T21:33:57Z

Thanks - I've merged this in master.

maropu added 2 commits February 10, 2016 00:41

Add benchmark codes for implemented compression schemes

63373e0

Remove dollar signs for benchmark names

fab3fb2

maropu force-pushed the ImproveColumnarCache branch from cc58f20 to fab3fb2 Compare February 9, 2016 15:42

asfgit closed this in 5947fa8 Feb 10, 2016

maropu mentioned this pull request Feb 17, 2016

[SPARK-13361][SQL] Add benchmark codes for Encoder#compress() in CompressionSchemeBenchmark #11236

Closed

maropu deleted the ImproveColumnarCache branch July 5, 2017 11:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-13057][SQL] Add benchmark codes and the performance results for implemented compression schemes for InMemoryRelation #10965

[SPARK-13057][SQL] Add benchmark codes and the performance results for implemented compression schemes for InMemoryRelation #10965

maropu commented Jan 28, 2016

SparkQA commented Jan 28, 2016

SparkQA commented Feb 2, 2016

maropu commented Feb 5, 2016

maropu commented Feb 5, 2016

maropu commented Feb 5, 2016

rxin commented Feb 5, 2016

maropu commented Feb 7, 2016

cloud-fan commented Feb 8, 2016

nongli commented Feb 8, 2016

maropu commented Feb 9, 2016

SparkQA commented Feb 9, 2016

rxin commented Feb 10, 2016

[SPARK-13057][SQL] Add benchmark codes and the performance results for implemented compression schemes for InMemoryRelation #10965

[SPARK-13057][SQL] Add benchmark codes and the performance results for implemented compression schemes for InMemoryRelation #10965

Conversation

maropu commented Jan 28, 2016

SparkQA commented Jan 28, 2016

SparkQA commented Feb 2, 2016

maropu commented Feb 5, 2016

maropu commented Feb 5, 2016

maropu commented Feb 5, 2016

rxin commented Feb 5, 2016

maropu commented Feb 7, 2016

cloud-fan commented Feb 8, 2016

nongli commented Feb 8, 2016

maropu commented Feb 9, 2016

SparkQA commented Feb 9, 2016

rxin commented Feb 10, 2016