Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-6594: Data batches for Project operator are not being split properly and exceed the maximum specified #1375

Closed
wants to merge 1 commit into from

Conversation

@bitblender
Copy link
Contributor

bitblender commented Jul 12, 2018

This change fixes the incorrect accounting in the case where a columns is being projected more than once.

ProjectMemoryManager was recording input column names of varlen columns, instead of output column names of these columns. Since input names are unique, this caused columns to be counted only once irrespective of the number of times they were being projected.

Eg. select some_varchar_column_a as some_varchar_column_b, some_varchar_column_a as some_varchar_column_c....

In this case, if input column name is used, the outputColumnSizes map in ProjectMemoryManger will have only one entry i.e some_varchar_column_a. ProjectMemoryManger should instead record some_varchar_column_b and some_varchar_column_c.

…perly and exceed the maximum specified

This change fixes the incorrect accounting in the case where a columns is being projected more than once
@bitblender

This comment has been minimized.

Copy link
Contributor Author

bitblender commented Jul 12, 2018

@Ben-Zvi @ppadma Can one of you please take a look at this. This is an important fix that should be in 1.14

Copy link
Contributor

Ben-Zvi left a comment

+1. LGTM ....

@sohami sohami closed this in c644367 Jul 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.