[SPARK-12907][SQL] Use bitvectos to represent null fields for reducing memory footprint #10833

kiszk · 2016-01-19T18:58:24Z

Use BitSet for OnHeapColumnVector and BitSetMethod for OffHeapColumnVector

Use BitSet for OnHeapColumnVector Use BitSetMethod for OffHeapColumnVector

AmplabJenkins · 2016-01-19T19:18:21Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49691/
Test FAILed.

nongli · 2016-01-19T19:26:56Z

This was explicitly not done. See this benchmark:
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchBenchmark.scala#L314

Memory footprint is not the biggest concern for this component. It's not clear to me this is a better approach.

Why should we do this?

kiszk · 2016-01-19T19:39:16Z

I see the OnHeap case. I overlooked this.
May I add the similar benchmark for OffHeap for information? Or, do you have the performance data for off-heap, too?

I thought that to reduce memory foodprint would be good in general. Do you want to focus on performance to write data by eliminating overhead to use bit vector?

kiszk · 2016-02-24T05:23:50Z

According to

spark/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java

Line 38 in 3708d13

    
            * is designed to maximize CPU efficiency and not storage footprint. Since it is expected that

, this structure is designed for performance rather than memory footprint.
Close this PR for now

Use bit vectors to represent null fields for reducing memory footprint

373073c

Use BitSet for OnHeapColumnVector Use BitSetMethod for OffHeapColumnVector

kiszk closed this Feb 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-12907][SQL] Use bitvectos to represent null fields for reducing memory footprint #10833

[SPARK-12907][SQL] Use bitvectos to represent null fields for reducing memory footprint #10833

Uh oh!

kiszk commented Jan 19, 2016

Uh oh!

AmplabJenkins commented Jan 19, 2016

Uh oh!

nongli commented Jan 19, 2016

Uh oh!

kiszk commented Jan 19, 2016

Uh oh!

kiszk commented Feb 24, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-12907][SQL] Use bitvectos to represent null fields for reducing memory footprint #10833

[SPARK-12907][SQL] Use bitvectos to represent null fields for reducing memory footprint #10833

Uh oh!

Conversation

kiszk commented Jan 19, 2016

Uh oh!

AmplabJenkins commented Jan 19, 2016

Uh oh!

nongli commented Jan 19, 2016

Uh oh!

kiszk commented Jan 19, 2016

Uh oh!

kiszk commented Feb 24, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants