-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Open
Description
The new filter API seems to be much slower (or perhaps I'm using it wrong :)
Code using an UnboundRecordFilter:
ColumnRecordFilter.column(column,
ColumnPredicates.applyFunctionToBinary(
input -> Binary.fromString(value).equals(input)));vs. code using FilterPredicate:
eq(binaryColumn(column), Binary.fromString(value));The latter performs twice as slow on the same Parquet file (built using 1.6.0rc2).
Note: the reader is constructed using
ParquetReader.builder(new ProtoReadSupport().withFilter(filter).build()The new filter API based approach seems to create a whole lot more garbage (perhaps due to reconstructing all the rows?).
Reporter: Viktor Szathmáry / @phraktle
Related issues:
Note: This issue was originally created as PARQUET-98. Please see the migration documentation for further details.