Faster Parquet Bloom Writer #3333

tustvold · 2022-12-12T15:30:25Z

Which issue does this PR close?

Part of #3320.

Rationale for this change

$ RUSTFLAGS="-C target-cpu=native" cargo bench --bench arrow_writer -- --baseline master "bloom filter"

write_batch primitive/4096 values primitive with bloom filter
                        time:   [16.561 ms 16.848 ms 17.135 ms]
                        thrpt:  [10.302 MiB/s 10.477 MiB/s 10.659 MiB/s]
                 change:
                        time:   [-11.831% -10.178% -8.6583%] (p = 0.00 < 0.05)
                        thrpt:  [+9.4790% +11.331% +13.419%]
                        Performance has improved.
write_batch primitive/4096 values primitive non-null with bloom filter
                        time:   [5.4278 ms 5.4894 ms 5.5546 ms]
                        thrpt:  [31.164 MiB/s 31.534 MiB/s 31.892 MiB/s]
                 change:
                        time:   [-70.261% -69.885% -69.508%] (p = 0.00 < 0.05)
                        thrpt:  [+227.96% +232.06% +236.26%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Benchmarking write_batch primitive/4096 values string with bloom filter: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.6s, enable flat sampling, or reduce sample count to 50.
write_batch primitive/4096 values string with bloom filter
                        time:   [1.4824 ms 1.4851 ms 1.4879 ms]
                        thrpt:  [53.521 MiB/s 53.623 MiB/s 53.722 MiB/s]
                 change:
                        time:   [-21.422% -21.052% -20.634%] (p = 0.00 < 0.05)
                        thrpt:  [+25.998% +26.666% +27.263%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
write_batch primitive/4096 values string dictionary with bloom filter
                        time:   [675.68 µs 677.72 µs 679.75 µs]
                        thrpt:  [70.823 MiB/s 71.035 MiB/s 71.249 MiB/s]
                 change:
                        time:   [-21.076% -20.658% -20.301%] (p = 0.00 < 0.05)
                        thrpt:  [+25.472% +26.037% +26.704%]
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  16 (16.00%) low mild
  2 (2.00%) high severe
Benchmarking write_batch primitive/4096 values string non-null with bloom filter: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.7s, enable flat sampling, or reduce sample count to 50.
write_batch primitive/4096 values string non-null with bloom filter
                        time:   [1.5209 ms 1.5253 ms 1.5297 ms]
                        thrpt:  [51.422 MiB/s 51.569 MiB/s 51.719 MiB/s]
                 change:
                        time:   [-29.848% -29.673% -29.495%] (p = 0.00 < 0.05)
                        thrpt:  [+41.835% +42.192% +42.548%]
                        Performance has improved.

What changes are included in this PR?

Are there any user-facing changes?

tustvold · 2022-12-12T15:32:00Z

parquet/src/bloom_filter/mod.rs

@@ -194,14 +238,14 @@ impl Sbbf {
    /// Write the bitset in serialized form to the writer.
    fn write_bitset<W: Write>(&self, mut writer: W) -> Result<(), ParquetError> {
        for block in &self.0 {


I also experimented with transmuting the entire Vec<Block> this did lead to a very slight additional speed boost of 2%, but I didn't think it justified its unsafe-ness

If it is long slice for all blocks instead of Vec<Block>? Do you think it would be better?

I found this only yielded a very slight speed boost of ~2%

viirya · 2022-12-12T18:41:33Z

parquet/src/bloom_filter/mod.rs

-            for word in block {
-                writer.write_all(&word.to_le_bytes()).map_err(|e| {
+            writer
+                .write_all(block.to_le_bytes().as_slice())


Hmm, I think this doesn't write the bytes in same format as before. I guess the reason to write each word in a block sequentially is to follow the spec? The written bloom filter must be sure to be read and understand by other Parquet libraries. Otherwise I have thought to write all words in a bulk like that.

It should write in the same order as the spec? It writes the integers first to last, with each 32-bit integer written little endian?

Ah, right, I have thought it wrongly on the order of sequence... 😅

viirya · 2022-12-12T20:08:15Z

parquet/src/bloom_filter/mod.rs

-        let y = x.wrapping_mul(SALT[i]);
-        let y = y >> 27;
-        result[i] = 1 << y;
+#[derive(Debug, Copy, Clone)]


Have we used Copy and Clone?

Yes, we use Copy in methods that return Block

ursabot · 2022-12-13T16:24:59Z

Benchmark runs are scheduled for baseline = 31d5706 and contender = 46b2848. 46b2848 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Experiment

2e65778

tustvold requested review from viirya and jimexist December 12, 2022 15:30

tustvold mentioned this pull request Dec 12, 2022

Speed up writing bloom filter #3325

Closed

tustvold changed the title ~~Experiment~~ Faster Parquet Bloom Writer Dec 12, 2022

tustvold commented Dec 12, 2022

View reviewed changes

viirya reviewed Dec 12, 2022

View reviewed changes

viirya approved these changes Dec 12, 2022

View reviewed changes

viirya reviewed Dec 12, 2022

View reviewed changes

tustvold merged commit 46b2848 into apache:master Dec 13, 2022

tustvold added the parquet Changes to the parquet crate label Jan 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster Parquet Bloom Writer #3333

Faster Parquet Bloom Writer #3333

tustvold commented Dec 12, 2022 •

edited

tustvold Dec 12, 2022

viirya Dec 12, 2022

tustvold Dec 12, 2022

viirya Dec 12, 2022 •

edited

tustvold Dec 12, 2022 •

edited

viirya Dec 12, 2022

viirya Dec 12, 2022

tustvold Dec 12, 2022

ursabot commented Dec 13, 2022

Faster Parquet Bloom Writer #3333

Faster Parquet Bloom Writer #3333

Conversation

tustvold commented Dec 12, 2022 • edited

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

tustvold Dec 12, 2022

Choose a reason for hiding this comment

viirya Dec 12, 2022

Choose a reason for hiding this comment

tustvold Dec 12, 2022

Choose a reason for hiding this comment

viirya Dec 12, 2022 • edited

Choose a reason for hiding this comment

tustvold Dec 12, 2022 • edited

Choose a reason for hiding this comment

viirya Dec 12, 2022

Choose a reason for hiding this comment

viirya Dec 12, 2022

Choose a reason for hiding this comment

tustvold Dec 12, 2022

Choose a reason for hiding this comment

ursabot commented Dec 13, 2022

tustvold commented Dec 12, 2022 •

edited

viirya Dec 12, 2022 •

edited

tustvold Dec 12, 2022 •

edited