Skip to content

ManifestGroup::FilterFiles silently ignores file filters #663

@fallintoplace

Description

@fallintoplace

Problem

ManifestGroup::FilterFiles() accepts a file-level expression and stores it in file_filter_, but ReadEntries() does not currently build or run an evaluator against each DataFile. As a result, non-true file filters are accepted but silently ignored.

This is primarily a public API correctness issue. Normal table scans still apply data and partition filtering through the existing scan path, but direct use of ManifestGroup::FilterFiles() can return entries that should have been filtered out.

Expected behavior

ManifestGroup::FilterFiles() should either evaluate supported predicates against each DataFile metadata struct, including partition metadata for the manifest partition spec, or fail explicitly for unsupported predicates instead of behaving as a silent no-op.

Reproduction idea

Create a manifest with two entries that differ by record_count, call FilterFiles(record_count >= 10), and observe that entries below the threshold are still returned.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions