-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20474] Fixing OnHeapColumnVector reallocation #17773
Conversation
ok to test |
Test build #76184 has finished for PR 17773 at commit
|
@@ -410,53 +410,53 @@ protected void reserveInternal(int newCapacity) { | |||
int[] newLengths = new int[newCapacity]; | |||
int[] newOffsets = new int[newCapacity]; | |||
if (this.arrayLengths != null) { | |||
System.arraycopy(this.arrayLengths, 0, newLengths, 0, elementsAppended); | |||
System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, elementsAppended); | |||
System.arraycopy(this.arrayLengths, 0, newLengths, 0, capacity); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. Do we also need to fix reserveInternal
in OffHeapColumnVector
? Additionally, after this change, do we even need elementsAppended
anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elementsAppended
is necessary to keep the tail position by append<TYPE>()
.
add to whitelist |
1 similar comment
add to whitelist |
Test build #76191 has finished for PR 17773 at commit
|
Test build #76192 has finished for PR 17773 at commit
|
Merging in master/branch-2.2. |
## What changes were proposed in this pull request? OnHeapColumnVector reallocation copies to the new storage data up to 'elementsAppended'. This variable is only updated when using the ColumnVector.appendX API, while ColumnVector.putX is more commonly used. ## How was this patch tested? Tested using existing unit tests. Author: Michal Szafranski <michal@databricks.com> Closes #17773 from michal-databricks/spark-20474. (cherry picked from commit a277ae8) Signed-off-by: Reynold Xin <rxin@databricks.com>
Do we need similar changes for |
Actually yes, I missed it because |
Yes, I think it should see |
## What changes were proposed in this pull request? As #17773 revealed `OnHeapColumnVector` may copy a part of the original storage. `OffHeapColumnVector` reallocation also copies to the new storage data up to 'elementsAppended'. This variable is only updated when using the `ColumnVector.appendX` API, while `ColumnVector.putX` is more commonly used. This PR copies the new storage data up to the previously-allocated size in`OffHeapColumnVector`. ## How was this patch tested? Existing test suites Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #17811 from kiszk/SPARK-20537. (cherry picked from commit afb21bf) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request? As #17773 revealed `OnHeapColumnVector` may copy a part of the original storage. `OffHeapColumnVector` reallocation also copies to the new storage data up to 'elementsAppended'. This variable is only updated when using the `ColumnVector.appendX` API, while `ColumnVector.putX` is more commonly used. This PR copies the new storage data up to the previously-allocated size in`OffHeapColumnVector`. ## How was this patch tested? Existing test suites Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #17811 from kiszk/SPARK-20537.
What changes were proposed in this pull request?
OnHeapColumnVector reallocation copies to the new storage data up to 'elementsAppended'. This variable is only updated when using the ColumnVector.appendX API, while ColumnVector.putX is more commonly used.
How was this patch tested?
Tested using existing unit tests.