Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
DRILL-6594: Data batches for Project operator are not being split properly and exceed the maximum specified #1375
This change fixes the incorrect accounting in the case where a columns is being projected more than once.
ProjectMemoryManager was recording input column names of varlen columns, instead of output column names of these columns. Since input names are unique, this caused columns to be counted only once irrespective of the number of times they were being projected.
Eg. select some_varchar_column_a as some_varchar_column_b, some_varchar_column_a as some_varchar_column_c....
In this case, if input column name is used, the outputColumnSizes map in ProjectMemoryManger will have only one entry i.e some_varchar_column_a. ProjectMemoryManger should instead record some_varchar_column_b and some_varchar_column_c.