Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-672: FIX ORC type conversion within arrays where array.length > 1… #568

Merged
merged 5 commits into from
Nov 9, 2020

Conversation

pgaref
Copy link
Contributor

@pgaref pgaref commented Nov 6, 2020

What changes were proposed in this pull request?

Orc type conversion is throwing ArrayIndexOutOfBoundsException when using ColumnVectors larger than 1024.
The issue is originating from the ConvertTreeReaderFactory where we convert types always using the default CV length instead of the specific batch size.

Why are the changes needed?

This PR takes into account batchSize when converting types and adds tests for all possible type conversions (with large batchSize) as part of TestConvertTreeReaderFactory.

How was this patch tested?

TestConvertTreeReaderFactory

Panos Garefalakis added 2 commits November 6, 2020 16:22
…024 (the default CV size)

Change-Id: I26426e6146ec05cbc1f96d79ec7960494cbc3669
Change-Id: Id73642d6b398554091cf9230e2870feb1b11cb64
@dongjoon-hyun
Copy link
Member

cc @omalley , @wgtmac

@dongjoon-hyun
Copy link
Member

Thank you, @pgaref . According to the JIRA, is this a bug for 1.5/1.6/1.7?

@pgaref
Copy link
Contributor Author

pgaref commented Nov 6, 2020

Thank you, @pgaref . According to the JIRA, is this a bug for 1.5/1.6/1.7?

Hey @dongjoon-hyun -- thats correct.
This has to be backported along with ORC-598 (which for some reason never made it to 1.5 and 1.6 branches)

@dongjoon-hyun
Copy link
Member

Got it. Thanks, @pgaref .

@dongjoon-hyun
Copy link
Member

I backported ORC-598 to branch-1.6/1.5 for next releases.

Change-Id: Id739b2f01a465d641e72eafba4f04576cb7c44ca
Nit
Change-Id: If2db5f3cbe20c3886ad0e3962d11fcf6dc8c4a71
@@ -1396,7 +1396,7 @@ public void nextVector(ColumnVector previousVector,
FilterContext filterContext) throws IOException {
if (inBytesColVector == null) {
// Allocate column vector for file; cast column vector for reader.
inBytesColVector = new BytesColumnVector();
inBytesColVector = new BytesColumnVector(batchSize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pgaref . Could you add a test coverage for this change? It seems that we are missing Varchar to Binary conversion test case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @dongjoon-hyun ! test added

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (except one test coverage).

Change-Id: I057613382422270e6dcd540ae07d89b0e4be6f30
@dongjoon-hyun dongjoon-hyun merged commit ec644f8 into apache:master Nov 9, 2020
dongjoon-hyun pushed a commit that referenced this pull request Nov 9, 2020
…024 (#568)

### What changes were proposed in this pull request?

Orc type conversion is throwing ArrayIndexOutOfBoundsException when using ColumnVectors larger than 1024.
The issue is originating from the ConvertTreeReaderFactory where we convert types always using the default CV length instead of the specific batch size.

### Why are the changes needed?

This PR takes into account batchSize when converting types and adds tests for all possible type conversions (with large batchSize) as part of TestConvertTreeReaderFactory.

### How was this patch tested?

TestConvertTreeReaderFactory

(cherry picked from commit ec644f8)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun
Copy link
Member

Merged to master/1.6.

Could you make a backporting PR to branch-1.5, too?

pgaref pushed a commit to pgaref/orc that referenced this pull request Nov 9, 2020
…024 (apache#568)

### What changes were proposed in this pull request?

Orc type conversion is throwing ArrayIndexOutOfBoundsException when using ColumnVectors larger than 1024.
The issue is originating from the ConvertTreeReaderFactory where we convert types always using the default CV length instead of the specific batch size.

### Why are the changes needed?

This PR takes into account batchSize when converting types and adds tests for all possible type conversions (with large batchSize) as part of TestConvertTreeReaderFactory.

### How was this patch tested?

TestConvertTreeReaderFactory

(cherry picked from commit ec644f8)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@pgaref
Copy link
Contributor Author

pgaref commented Nov 9, 2020

Merged to master/1.6.

Could you make a backporting PR to branch-1.5, too?

@dongjoon-hyun Thanks for taking care of this -- 1.5 backport available at #569

@dongjoon-hyun
Copy link
Member

Thank you always!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants