Skip to content

[SPARK-2405][SQL] Reusue same byte buffers when creating new instance of InMemoryRelation#1332

Closed
marmbrus wants to merge 3 commits intoapache:masterfrom
marmbrus:doubleCache
Closed

[SPARK-2405][SQL] Reusue same byte buffers when creating new instance of InMemoryRelation#1332
marmbrus wants to merge 3 commits intoapache:masterfrom
marmbrus:doubleCache

Conversation

@marmbrus
Copy link
Contributor

@marmbrus marmbrus commented Jul 8, 2014

Reuse byte buffers when creating unique attributes for multiple instances of an InMemoryRelation in a single query plan.

@aarondav
Copy link
Contributor

aarondav commented Jul 8, 2014

This loses the synchronization guarantees we had before -- without synchronized or volatile, there are no cross-thread guarantees. Is this OK?

(By the way, how is newInstance() used?)

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@marmbrus
Copy link
Contributor Author

marmbrus commented Jul 8, 2014

@aarondav, newInstance() is used before we perform resolution to ensure that that all expression ids in a plan are unique. Consider the case where you self join an InMemoryRelation with itself: we need to know which side of the join a given attribute is coming from, so we produce unique instances of the relation before resolving attributes.

I thought about the possible concurrency issues, but they will only arise in edge cases (simultaneous self-join queries on a table that is cached, but not yet materialized?), and will only result in double caching, not correctness issues... so this patch is strictly better than what we had before I think.

That said I guess we could fix it with a SyncVar probably... I'll have to think about it some more.

@aarondav
Copy link
Contributor

aarondav commented Jul 8, 2014

This type of concurrency problem can be a correctness issue, albeit with low probability. For instance, we could at any point read a not-fully-instantiated "_cachedColumnBuffers", which would not be null but still be in an invalid state, leading to a very tricky exception. This is a problem even without the byte buffer reuse -- use of the same InMemoryRelation in two different threads could now cause this issue.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16412/

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16420/

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16421/

@rxin
Copy link
Contributor

rxin commented Jul 12, 2014

Merging in master & branch-1.0.

asfgit pushed a commit that referenced this pull request Jul 12, 2014
… of InMemoryRelation

Reuse byte buffers when creating unique attributes for multiple instances of an InMemoryRelation in a single query plan.

Author: Michael Armbrust <michael@databricks.com>

Closes #1332 from marmbrus/doubleCache and squashes the following commits:

4a19609 [Michael Armbrust] Clean up concurrency story by calculating buffersn the constructor.
b39c931 [Michael Armbrust] Allocations are kind of a side effect.
f67eff7 [Michael Armbrust] Reusue same byte buffers when creating new instance of InMemoryRelation

(cherry picked from commit 1a7d7cc)
Signed-off-by: Reynold Xin <rxin@apache.org>
@asfgit asfgit closed this in 1a7d7cc Jul 12, 2014
@marmbrus marmbrus deleted the doubleCache branch August 27, 2014 20:44
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
… of InMemoryRelation

Reuse byte buffers when creating unique attributes for multiple instances of an InMemoryRelation in a single query plan.

Author: Michael Armbrust <michael@databricks.com>

Closes apache#1332 from marmbrus/doubleCache and squashes the following commits:

4a19609 [Michael Armbrust] Clean up concurrency story by calculating buffersn the constructor.
b39c931 [Michael Armbrust] Allocations are kind of a side effect.
f67eff7 [Michael Armbrust] Reusue same byte buffers when creating new instance of InMemoryRelation
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
…pache#1332)

* rdar://88044325: [Boson] Expose Arrow Vector from ArrowColumnVector

* Remove unnecessary api

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants