You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Gandiva filter module returns a selection vector representing the indices of records (in the batch) that matched the filter. We can connect this to other modules, by passing along this selection vector as an input argument to the downstream projector/filter.
Francois Saint-Jacques / @fsaintjacques:
I'm curious to know why gandiva makes primary use of selction vector as opposed to bitmap as is used in arrow.
Wes McKinney / @wesm:
Selection integer vectors and boolean vectors are complementary; they are frequently used in pandas, and useful since you don't have to scan the boolean vector in order to determine the size of a filtered array. Others can comment further
Pindikura Ravindra / @pravindra:
I picked the idea of using selection vectors from dremio. Iterating over a selection vector should be more efficient that iterating over bits in a bitmap, especially when the selectivity is low. but, I haven't benchmarked this.
Wes McKinney / @wesm:
Yes, for low selectivity selections it's also a lot more memory efficient.
Either way, I see this as part of the "algebra" of the compiler. So at some point we can probably augment the algebra and code generation with boolean selections
The Gandiva filter module returns a selection vector representing the indices of records (in the batch) that matched the filter. We can connect this to other modules, by passing along this selection vector as an input argument to the downstream projector/filter.
Reporter: Pindikura Ravindra / @pravindra
Assignee: Praveen Krishna / @Praveen2112
PRs and other links:
Note: This issue was originally created as ARROW-3511. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: