Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use BufWriter when writing bloom filters and limit tests (#3318) #3319

Merged
merged 1 commit into from
Dec 9, 2022

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Dec 9, 2022

Which issue does this PR close?

Closes #3318

Rationale for this change

The tests were taking an exceedingly long time to run, this disables them except for a few select tests that explicitly test the bloom filter.

This also adds a BufWriter when writing bloom filters, it is almost certain that this can be optimised further, but this was a quick win that took i32_column_bloom_filter from taking 17 seconds to 2. This is still terrible, but not catastrophic 馃槄

What changes are included in this PR?

Are there any user-facing changes?

Disable bloom filters for most tests
@@ -630,7 +630,7 @@ struct ColumnProperties {
statistics_enabled: Option<EnabledStatistics>,
max_statistics_size: Option<usize>,
/// bloom filter related properties
bloom_filter_properies: Option<BloomFilterProperties>,
bloom_filter_properties: Option<BloomFilterProperties>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a drive by fix of a typo

@github-actions github-actions bot added the parquet Changes to the parquet crate label Dec 9, 2022
@@ -177,14 +177,17 @@ impl Sbbf {
}

/// Write the bloom filter data (header and then bitset) to the output
pub(crate) fn write<W: Write>(&self, mut writer: W) -> Result<(), ParquetError> {
pub(crate) fn write<W: Write>(&self, writer: W) -> Result<(), ParquetError> {
// Use a BufWriter to avoid costs of writing individual blocks
Copy link
Contributor Author

@tustvold tustvold Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should be using BufWriter at a higher level 馃 I guess we mostly don't run into this issue as we write large page sized blobs

@tustvold tustvold changed the title Use BufWriter when writing bloom filters (#3318) Use BufWriter when writing bloom filters and limit tests (#3318) Dec 9, 2022
@alamb alamb merged commit c215f49 into apache:master Dec 9, 2022
@alamb
Copy link
Contributor

alamb commented Dec 9, 2022

Thank you for the quick turnaround

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parquet arrow-writer tests take very long to run (over 60 seconds)
3 participants