Skip to content

Predicate pruning is broken for parquet #656

@alamb

Description

@alamb

Describe the bug
Predicate pruning no longer occurs for queries against parquet files

To Reproduce
Run a query against a parquet file with multiple row groups with a predicate that could be used to prune. No pruning occurs

Expected behavior
The predicate should be able to eliminate some row groups

Additional context
While updating IOx to use the latest datafusion in https://github.com/influxdata/influxdb_iox/pull/1799 I discovered another place where #55 has caused some issues

Basically, the predicates that get pushed down to the parquet exec scan now are fully qualified, for example #foo.bar > 5 however, the parquet schema only has columns named bar and thus the code can not match them up

The reason this was not caught in #55 is that there is no end-to-end test of parquet that exercises the entire path.

The fix for this issue is fairly straightforward (it is to strip the qualifiers from the expressions) but the end-to-end test is quite involved. I plan to fix this in two PRs

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions