-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MINOR] docs: Add benchmark results #904
Conversation
Codecov Report
@@ Coverage Diff @@
## master #904 +/- ##
============================================
+ Coverage 55.24% 57.08% +1.84%
Complexity 2201 2201
============================================
Files 333 313 -20
Lines 16449 14089 -2360
Branches 1307 1307
============================================
- Hits 9087 8043 -1044
+ Misses 6851 5606 -1245
+ Partials 511 440 -71 see 20 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Great improvement. How about adding |
We should update the bencmark after finishing Netty feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jerqi for the benchmark result. Maybe we should add some charts along with the table to show the result intuitively.
docs/benchmark.md
Outdated
Software: Uniffle 0.2.0 Hadoop 2.8.5 Spark 2.4.6 | ||
Hardware: Machine 176 cores, 265G memory, 4T * 12 HDD, network bandwidth 10GB/s | ||
Hadoop Yarn Cluster: 1 * ResourceManager + 6 * NodeManager, every machine 4T * 10 HDD | ||
Uniffle Cluster: 1 * Coordinator + 6 * Shuffle Server, every machine 4T * 10 HDD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix the format here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What format do you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either add
after each line, or add *
before each line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What format do you prefer?
You are missing line breaks here, see preview.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
Overall Time: | ||
![Overall Time](asset/rss_benchmark3.png) | ||
Write Time: | ||
![Write Time](asset/rss_benchmark2.png) | ||
Read Time: | ||
![Read Time](asset/rss_benchmark1.png) | ||
#### vanilla Spark performance | ||
Overall Time: | ||
![Overall Time](asset/vanilla_benchmark1.png) | ||
Write Time: | ||
![Write Time](asset/vanilla_benchmark2.png) | ||
Read Time: | ||
![Read Time](asset/vanilla_benchmark3.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These screenshots looks blurry, is it possible to put raw data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don‘t have raw data now. Maybe we can update the data after finishing Netty.
Co-authored-by: Kaijie Chen <ckj@apache.org>
I tried. But there are too many queries. Charts won't give more effective information. |
|query99|13|12| | ||
|total|5821|6494| | ||
|
||
Uniffle is a little 9% slower than vanilla Spark. Because the amount of shuffle is tiny. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The perf is lower than vanilla Spark. Is it better to remove TPC-DS bench for now? We can add it back when we have better perf with large shuffle data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TPC-DS situation can reflect the performance of tiny shuffle. We don't have too poor performance. It's acceptable.
It would be good to include the uniffle coordinator and shuffle-server configuration used when performing these benchmark e.g. heap, buffer sizes etc |
This is our previous benchmark results. I can't find the complete configuration. But we can do new benchmark after finishing Netty feature. |
1TB is quite a small dataset, has anyone benchmarked more than this and if so what did the results/config look like? |
What dataset do you hope to add? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge this firstly
Thanks all, merged. |
What changes were proposed in this pull request?
Add the benchmark data.
Why are the changes needed?
Attract more users. If we have benchmark data, users don't need to test by themselves. It will lower the barrier to use.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Just doc