Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enable concurrent write #3214

Merged
merged 3 commits into from Jan 23, 2024

Conversation

WenyXu
Copy link
Member

@WenyXu WenyXu commented Jan 22, 2024

I hereby agree to the terms of the GreptimeDB CLA

What's changed and what's your intention?

  1. set DEFAULT_WRITE_CONCURRENT to 8.
  2. Enable concurrent write for Copy To and ParquetWriter.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR does not require documentation updates.

Refer to a related PR or issue link (optional)

@github-actions github-actions bot added docs-not-required This change does not impact docs. Size: S labels Jan 22, 2024
@WenyXu WenyXu force-pushed the feat/enable-concurrent-write branch from 18578c0 to f97c0eb Compare January 22, 2024 14:16
Copy link

codecov bot commented Jan 22, 2024

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (2bf4b08) 85.82% compared to head (9519deb) 85.44%.
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3214      +/-   ##
==========================================
- Coverage   85.82%   85.44%   -0.38%     
==========================================
  Files         840      844       +4     
  Lines      137704   138639     +935     
==========================================
+ Hits       118183   118463     +280     
- Misses      19521    20176     +655     

Copy link
Contributor

@killme2008 killme2008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! Could we provide some benchmark results for this feature? And I wonder if the default value of 8 is the best practice for most common usage.

src/operator/src/statement/copy_table_to.rs Outdated Show resolved Hide resolved
src/mito2/src/sst.rs Outdated Show resolved Hide resolved
@WenyXu
Copy link
Member Author

WenyXu commented Jan 22, 2024

Good job! Could we provide some benchmark results for this feature? And I wonder if the default value of 8 is the best practice for most common usage.

I will create a benchmark with throughput and latency in the different write concurrency settings. BTW, Based on some simple experiments I did before, a 10Gpbs bandwidth EC2 instance, setting the write concurrency to 25, the program could reach throughput: 500MiB/s with p50 ~250ms.

src/common/datasource/src/file_format.rs Outdated Show resolved Hide resolved
src/common/datasource/src/file_format/csv.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@killme2008 killme2008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@killme2008 killme2008 added this pull request to the merge queue Jan 23, 2024
Merged via the queue into GreptimeTeam:main with commit 26535f5 Jan 23, 2024
15 checks passed
@killme2008 killme2008 added this to the v0.7 milestone Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants