Skip to content

Conversation

@kiszk
Copy link
Member

@kiszk kiszk commented Jan 19, 2016

Use BitSet for OnHeapColumnVector and BitSetMethod for OffHeapColumnVector

Use BitSet for OnHeapColumnVector

Use BitSetMethod for OffHeapColumnVector
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49691/
Test FAILed.

@nongli
Copy link
Contributor

nongli commented Jan 19, 2016

This was explicitly not done. See this benchmark:
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchBenchmark.scala#L314

Memory footprint is not the biggest concern for this component. It's not clear to me this is a better approach.

Why should we do this?

@kiszk
Copy link
Member Author

kiszk commented Jan 19, 2016

I see the OnHeap case. I overlooked this.
May I add the similar benchmark for OffHeap for information? Or, do you have the performance data for off-heap, too?

I thought that to reduce memory foodprint would be good in general. Do you want to focus on performance to write data by eliminating overhead to use bit vector?

@kiszk
Copy link
Member Author

kiszk commented Feb 24, 2016

According to

* is designed to maximize CPU efficiency and not storage footprint. Since it is expected that
, this structure is designed for performance rather than memory footprint.
Close this PR for now

@kiszk kiszk closed this Feb 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants