Problem
ManifestGroup::FilterFiles() accepts a file-level expression and stores it in file_filter_, but ReadEntries() does not currently build or run an evaluator against each DataFile. As a result, non-true file filters are accepted but silently ignored.
This is primarily a public API correctness issue. Normal table scans still apply data and partition filtering through the existing scan path, but direct use of ManifestGroup::FilterFiles() can return entries that should have been filtered out.
Expected behavior
ManifestGroup::FilterFiles() should either evaluate supported predicates against each DataFile metadata struct, including partition metadata for the manifest partition spec, or fail explicitly for unsupported predicates instead of behaving as a silent no-op.
Reproduction idea
Create a manifest with two entries that differ by record_count, call FilterFiles(record_count >= 10), and observe that entries below the threshold are still returned.
Problem
ManifestGroup::FilterFiles()accepts a file-level expression and stores it infile_filter_, butReadEntries()does not currently build or run an evaluator against eachDataFile. As a result, non-true file filters are accepted but silently ignored.This is primarily a public API correctness issue. Normal table scans still apply data and partition filtering through the existing scan path, but direct use of
ManifestGroup::FilterFiles()can return entries that should have been filtered out.Expected behavior
ManifestGroup::FilterFiles()should either evaluate supported predicates against eachDataFilemetadata struct, including partition metadata for the manifest partition spec, or fail explicitly for unsupported predicates instead of behaving as a silent no-op.Reproduction idea
Create a manifest with two entries that differ by
record_count, callFilterFiles(record_count >= 10), and observe that entries below the threshold are still returned.