[SPARK-2405][SQL] Reusue same byte buffers when creating new instance of InMemoryRelation#1332
[SPARK-2405][SQL] Reusue same byte buffers when creating new instance of InMemoryRelation#1332marmbrus wants to merge 3 commits intoapache:masterfrom
Conversation
|
This loses the synchronization guarantees we had before -- without synchronized or volatile, there are no cross-thread guarantees. Is this OK? (By the way, how is newInstance() used?) |
|
Merged build triggered. |
|
Merged build started. |
|
@aarondav, I thought about the possible concurrency issues, but they will only arise in edge cases (simultaneous self-join queries on a table that is cached, but not yet materialized?), and will only result in double caching, not correctness issues... so this patch is strictly better than what we had before I think. That said I guess we could fix it with a SyncVar probably... I'll have to think about it some more. |
|
This type of concurrency problem can be a correctness issue, albeit with low probability. For instance, we could at any point read a not-fully-instantiated "_cachedColumnBuffers", which would not be null but still be in an invalid state, leading to a very tricky exception. This is a problem even without the byte buffer reuse -- use of the same InMemoryRelation in two different threads could now cause this issue. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16412/ |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16420/ |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
|
Merging in master & branch-1.0. |
… of InMemoryRelation Reuse byte buffers when creating unique attributes for multiple instances of an InMemoryRelation in a single query plan. Author: Michael Armbrust <michael@databricks.com> Closes #1332 from marmbrus/doubleCache and squashes the following commits: 4a19609 [Michael Armbrust] Clean up concurrency story by calculating buffersn the constructor. b39c931 [Michael Armbrust] Allocations are kind of a side effect. f67eff7 [Michael Armbrust] Reusue same byte buffers when creating new instance of InMemoryRelation (cherry picked from commit 1a7d7cc) Signed-off-by: Reynold Xin <rxin@apache.org>
… of InMemoryRelation Reuse byte buffers when creating unique attributes for multiple instances of an InMemoryRelation in a single query plan. Author: Michael Armbrust <michael@databricks.com> Closes apache#1332 from marmbrus/doubleCache and squashes the following commits: 4a19609 [Michael Armbrust] Clean up concurrency story by calculating buffersn the constructor. b39c931 [Michael Armbrust] Allocations are kind of a side effect. f67eff7 [Michael Armbrust] Reusue same byte buffers when creating new instance of InMemoryRelation
…pache#1332) * rdar://88044325: [Boson] Expose Arrow Vector from ArrowColumnVector * Remove unnecessary api Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Reuse byte buffers when creating unique attributes for multiple instances of an InMemoryRelation in a single query plan.