Feature: Add Bloom Filter Support for Iceberg read/write operations #6746
navneethk-xch
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
DAFT recently migrated to
arrow-rs, which supports Parquet bloom filters. Iceberg already exposes bloom filter configuration via table properties. This proposal enables DAFT to pass through and honor these properties duringwrite_iceberg()and leverage them implicitly duringread_iceberg()without introducing new user-facing APIs.Currently iceberg spec supports the following configurations for adding bloom filters to table properties:
Motivation
user_id,email,trace_id).Example Usage (Pseudo-code)
Table properties [configured with bloom filters]
Filtering on the column when reading from Iceberg
df = df.filter(daft.col("user_id") == "abc123")Corner Cases [Some cases to consider]
1. Table properties not picked up by writer
2. Partial configuration (enabled but missing params)
3. Schema Evolution
4. Missing bloom filters at read time
5. Unsupported column types at write time
6. Write modes (append / overwrite)
Beta Was this translation helpful? Give feedback.
All reactions