Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41159: [Go][Parquet] Improvement Parquet BitWriter WriteVlqInt Performance #41160

Merged
merged 3 commits into from
Apr 12, 2024

Conversation

DuanWeiFan
Copy link
Contributor

@DuanWeiFan DuanWeiFan commented Apr 11, 2024

GH-41159

Rationale for this change

This change improves Parquet FileWriter performance while writing parquets from arrow Records.
We saw a speed improvement from writing 320k rows/sec -> 650 rows/sec after making this change.

What changes are included in this PR?

This PR reuses the buf variable being used by the bitWriter when writing parquet files.

Are these changes tested?

Yes

Are there any user-facing changes?

No

Authored-by: @hhoughgg

Copy link

⚠️ GitHub issue #41159 has been automatically assigned in GitHub to PR creator.

@DuanWeiFan DuanWeiFan marked this pull request as ready for review April 11, 2024 22:58
@kou kou changed the title GH-41159: [Go] [Parquet] Improvement Parquet BitWriter WriteVlqInt Performance GH-41159: [Go][Parquet] Improvement Parquet BitWriter WriteVlqInt Performance Apr 12, 2024
Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! It's a great find. Do you think you could add a benchmark into the test file to track this so that we can make sure we don't introduce any performance regressions in the future?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Apr 12, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 12, 2024
@DuanWeiFan
Copy link
Contributor Author

thanks for checking @zeroshade
Yeah I just added the benchmark for the method to the pr.

And here's the Benchmark Test Before vs After code change

--- Before
goos: darwin
goarch: amd64
pkg: github.com/apache/arrow/go/v15/parquet/internal/utils
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkBitWriter
BenchmarkBitWriter-12    	40391650	        28.91 ns/op	      17 B/op	       1 allocs/op
PASS
ok  	github.com/apache/arrow/go/v15/parquet/internal/utils	3.107s

--- After
goos: darwin
goarch: amd64
pkg: github.com/apache/arrow/go/v15/parquet/internal/utils
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkBitWriter
BenchmarkBitWriter-12    	107185990	         9.702 ns/op	       1 B/op	       0 allocs/op
PASS
ok  	github.com/apache/arrow/go/v15/parquet/internal/utils	3.111s

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic, thanks! I'll merge once CI completes

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Apr 12, 2024
@zeroshade zeroshade merged commit ec2d7cb into apache:main Apr 12, 2024
22 of 25 checks passed
@zeroshade zeroshade removed the awaiting merge Awaiting merge label Apr 12, 2024
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit ec2d7cb.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them.

vibhatha pushed a commit to vibhatha/arrow that referenced this pull request Apr 15, 2024
…nt Performance (apache#41160)

[apacheGH-41159](apache#41159)
### Rationale for this change

This change improves Parquet FileWriter performance while writing parquets from arrow Records.
We saw a speed improvement from writing 320k rows/sec -> 650 rows/sec after making this change.

### What changes are included in this PR?
This PR reuses the `buf` variable being used by the bitWriter when writing parquet files.

### Are these changes tested?
Yes

### Are there any user-facing changes?
No

Authored-by: @ hhoughgg 

* GitHub Issue: apache#41159

Lead-authored-by: Andy Fan <duan-wei@cloudflare.com>
Co-authored-by: andyfan <52736754+DuanWeiFan@users.noreply.github.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
tolleybot pushed a commit to tmct/arrow that referenced this pull request May 2, 2024
…nt Performance (apache#41160)

[apacheGH-41159](apache#41159)
### Rationale for this change

This change improves Parquet FileWriter performance while writing parquets from arrow Records.
We saw a speed improvement from writing 320k rows/sec -> 650 rows/sec after making this change.

### What changes are included in this PR?
This PR reuses the `buf` variable being used by the bitWriter when writing parquet files.

### Are these changes tested?
Yes

### Are there any user-facing changes?
No

Authored-by: @ hhoughgg 

* GitHub Issue: apache#41159

Lead-authored-by: Andy Fan <duan-wei@cloudflare.com>
Co-authored-by: andyfan <52736754+DuanWeiFan@users.noreply.github.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
…nt Performance (apache#41160)

[apacheGH-41159](apache#41159)
### Rationale for this change

This change improves Parquet FileWriter performance while writing parquets from arrow Records.
We saw a speed improvement from writing 320k rows/sec -> 650 rows/sec after making this change.

### What changes are included in this PR?
This PR reuses the `buf` variable being used by the bitWriter when writing parquet files.

### Are these changes tested?
Yes

### Are there any user-facing changes?
No

Authored-by: @ hhoughgg 

* GitHub Issue: apache#41159

Lead-authored-by: Andy Fan <duan-wei@cloudflare.com>
Co-authored-by: andyfan <52736754+DuanWeiFan@users.noreply.github.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants