filter2 API performance regression

The new filter API seems to be much slower (or perhaps I'm using it wrong \:)

Code using an UnboundRecordFilter:

```java
ColumnRecordFilter.column(column,
    ColumnPredicates.applyFunctionToBinary(
    input -> Binary.fromString(value).equals(input)));
```

vs. code using FilterPredicate:

```java
eq(binaryColumn(column), Binary.fromString(value));
```

The latter performs twice as slow on the same Parquet file (built using 1.6.0rc2).

Note: the reader is constructed using

```java
ParquetReader.builder(new ProtoReadSupport().withFilter(filter).build()
```

The new filter API based approach seems to create a whole lot more garbage (perhaps due to reconstructing all the rows?).


**Reporter**: [Viktor Szathmáry](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=phraktle) / @phraktle
#### Related issues:
- [FilteredRecordReader skips rows it shouldn't for schema with optional columns](https://github.com/apache/parquet-java/issues/1730) (is related to)

<sub>**Note**: *This issue was originally created as [PARQUET-98](https://issues.apache.org/jira/browse/PARQUET-98). Please see the [migration documentation](https://issues.apache.org/jira/browse/PARQUET-2502) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

filter2 API performance regression #1583

Related issues:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

filter2 API performance regression #1583

Description

Related issues:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions