Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary flush calls on TrackedWrite #3374

Merged
merged 1 commit into from Dec 20, 2022

Conversation

viirya
Copy link
Member

@viirya viirya commented Dec 20, 2022

Which issue does this PR close?

Closes #.

Rationale for this change

We have BufWriter inside TrackedWrite which buffers output and flushes if necessary. We don't need to call flush in the middle before closing the writer.

It seems to be a simple win without regression.

write_batch primitive/4096 values primitive non-null
                        time:   [639.52 µs 642.46 µs 646.64 µs]
                        thrpt:  [267.70 MiB/s 269.44 MiB/s 270.68 MiB/s]
                 change:
                        time:   [-7.1374% -6.3056% -5.1252%] (p = 0.00 < 0.05)
                        thrpt:  [+5.4021% +6.7299% +7.6860%]
                        Performance has improved.
write_batch primitive/4096 values bool non-null
                        time:   [81.192 µs 83.273 µs 86.123 µs]
                        thrpt:  [7.7071 MiB/s 7.9708 MiB/s 8.1752 MiB/s]
                 change:
                        time:   [-6.9304% -4.3398% -1.8032%] (p = 0.00 < 0.05)
                        thrpt:  [+1.8363% +4.5367% +7.4465%]
                        Performance has improved.
write_batch primitive/4096 values string non-null
                        time:   [447.68 µs 448.90 µs 450.81 µs]
                        thrpt:  [174.48 MiB/s 175.23 MiB/s 175.71 MiB/s]
                 change:
                        time:   [-9.6784% -9.1649% -8.6186%] (p = 0.00 < 0.05)
                        thrpt:  [+9.4314% +10.090% +10.715%]
                        Performance has improved.
write_batch nested/4096 values primitive list non-null
                        time:   [1.2610 ms 1.2647 ms 1.2708 ms]
                        thrpt:  [149.95 MiB/s 150.67 MiB/s 151.12 MiB/s]
                 change:
                        time:   [-4.5901% -3.9472% -3.2749%] (p = 0.00 < 0.05)
                        thrpt:  [+3.3858% +4.1094% +4.8109%]
                        Performance has improved.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the parquet Changes to the parquet crate label Dec 20, 2022
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, with the caveat that this isn't necessary to flush Thrift encoder state

@@ -326,7 +324,6 @@ impl<W: Write> SerializedFileWriter<W> {
{
let mut protocol = TCompactOutputProtocol::new(&mut self.buf);
file_metadata.write_to_out_protocol(&mut protocol)?;
protocol.flush()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some need to flush the protocol before drop, or does this simply flush the inner writer

Copy link
Member Author

@viirya viirya Dec 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It simply flushes the inner writer (a TWriteTransport: Write) which is TrackedWrite.

@tustvold tustvold merged commit c1c97f1 into apache:master Dec 20, 2022
@ursabot
Copy link

ursabot commented Dec 20, 2022

Benchmark runs are scheduled for baseline = 8b84d4d and contender = c1c97f1. c1c97f1 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants