Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster Parquet Bloom Writer #3333

Merged
merged 1 commit into from
Dec 13, 2022

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Dec 12, 2022

Which issue does this PR close?

Part of #3320.

Rationale for this change

$ RUSTFLAGS="-C target-cpu=native" cargo bench --bench arrow_writer -- --baseline master "bloom filter"

write_batch primitive/4096 values primitive with bloom filter
                        time:   [16.561 ms 16.848 ms 17.135 ms]
                        thrpt:  [10.302 MiB/s 10.477 MiB/s 10.659 MiB/s]
                 change:
                        time:   [-11.831% -10.178% -8.6583%] (p = 0.00 < 0.05)
                        thrpt:  [+9.4790% +11.331% +13.419%]
                        Performance has improved.
write_batch primitive/4096 values primitive non-null with bloom filter
                        time:   [5.4278 ms 5.4894 ms 5.5546 ms]
                        thrpt:  [31.164 MiB/s 31.534 MiB/s 31.892 MiB/s]
                 change:
                        time:   [-70.261% -69.885% -69.508%] (p = 0.00 < 0.05)
                        thrpt:  [+227.96% +232.06% +236.26%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Benchmarking write_batch primitive/4096 values string with bloom filter: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.6s, enable flat sampling, or reduce sample count to 50.
write_batch primitive/4096 values string with bloom filter
                        time:   [1.4824 ms 1.4851 ms 1.4879 ms]
                        thrpt:  [53.521 MiB/s 53.623 MiB/s 53.722 MiB/s]
                 change:
                        time:   [-21.422% -21.052% -20.634%] (p = 0.00 < 0.05)
                        thrpt:  [+25.998% +26.666% +27.263%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
write_batch primitive/4096 values string dictionary with bloom filter
                        time:   [675.68 µs 677.72 µs 679.75 µs]
                        thrpt:  [70.823 MiB/s 71.035 MiB/s 71.249 MiB/s]
                 change:
                        time:   [-21.076% -20.658% -20.301%] (p = 0.00 < 0.05)
                        thrpt:  [+25.472% +26.037% +26.704%]
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  16 (16.00%) low mild
  2 (2.00%) high severe
Benchmarking write_batch primitive/4096 values string non-null with bloom filter: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.7s, enable flat sampling, or reduce sample count to 50.
write_batch primitive/4096 values string non-null with bloom filter
                        time:   [1.5209 ms 1.5253 ms 1.5297 ms]
                        thrpt:  [51.422 MiB/s 51.569 MiB/s 51.719 MiB/s]
                 change:
                        time:   [-29.848% -29.673% -29.495%] (p = 0.00 < 0.05)
                        thrpt:  [+41.835% +42.192% +42.548%]
                        Performance has improved.

What changes are included in this PR?

Are there any user-facing changes?

@tustvold tustvold changed the title Experiment Faster Parquet Bloom Writer Dec 12, 2022
@@ -194,14 +238,14 @@ impl Sbbf {
/// Write the bitset in serialized form to the writer.
fn write_bitset<W: Write>(&self, mut writer: W) -> Result<(), ParquetError> {
for block in &self.0 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also experimented with transmuting the entire Vec<Block> this did lead to a very slight additional speed boost of 2%, but I didn't think it justified its unsafe-ness

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is long slice for all blocks instead of Vec<Block>? Do you think it would be better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this only yielded a very slight speed boost of ~2%

for word in block {
writer.write_all(&word.to_le_bytes()).map_err(|e| {
writer
.write_all(block.to_le_bytes().as_slice())
Copy link
Member

@viirya viirya Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think this doesn't write the bytes in same format as before. I guess the reason to write each word in a block sequentially is to follow the spec? The written bloom filter must be sure to be read and understand by other Parquet libraries. Otherwise I have thought to write all words in a bulk like that.

Copy link
Contributor Author

@tustvold tustvold Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should write in the same order as the spec? It writes the integers first to last, with each 32-bit integer written little endian?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, I have thought it wrongly on the order of sequence... 😅

let y = x.wrapping_mul(SALT[i]);
let y = y >> 27;
result[i] = 1 << y;
#[derive(Debug, Copy, Clone)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we used Copy and Clone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we use Copy in methods that return Block

@tustvold tustvold merged commit 46b2848 into apache:master Dec 13, 2022
@ursabot
Copy link

ursabot commented Dec 13, 2022

Benchmark runs are scheduled for baseline = 31d5706 and contender = 46b2848. 46b2848 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@tustvold tustvold added the parquet Changes to the parquet crate label Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants