Skip to content

Commit

Permalink
PARQUET-1630: Update Bloom filter format (#146)
Browse files Browse the repository at this point in the history
  • Loading branch information
chenjunjiedada authored and rdblue committed Aug 26, 2019
1 parent cd08b7f commit 3fb10e0
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 4 deletions.
18 changes: 14 additions & 4 deletions BloomFilter.md
Original file line number Diff line number Diff line change
Expand Up @@ -264,10 +264,13 @@ false positive rates:
| 41 | 0.001 % |

#### File Format
The Bloom filter data of a column chunk, which contains the size of the filter in bytes, the
algorithm, the hash function and the Bloom filter bitset, is stored near the footer. The Bloom
filter data offset is stored in column chunk metadata. Here are Bloom filter definitions in
thrift:

Each multi-block Bloom filter is required to work for only one column chunk. The data of a multi-block
bloom filter consists of the bloom filter header followed by the bloom filter bitset. The bloom filter
header encodes the size of the bloom filter bit set in bytes that is used to read the bitset.

Here are the Bloom filter definitions in thrift:


```
/** Block-based algorithm type annotation. **/
Expand Down Expand Up @@ -323,6 +326,13 @@ struct ColumnMetaData {
```

The Bloom filters are grouped by row group and with data for each column in the same order as the file schema.
The Bloom filter data can be stored before the page indexes after all row groups. The file layout looks like:
![File Layout - Bloom filter footer](doc/images/FileLayoutBloomFilter2.png)

Or it can be stored between row groups, the file layout looks like:
![File Layout - Bloom filter footer](doc/images/FileLayoutBloomFilter1.png)

#### Encryption
In the case of columns with sensitive data, the Bloom filter exposes a subset of sensitive
information such as the presence of value. Therefore the Bloom filter of columns with sensitive
Expand Down
Binary file added doc/images/FileLayoutBloomFilter1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/FileLayoutBloomFilter2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 3fb10e0

Please sign in to comment.