Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MINOR] docs: Add benchmark results #904

Merged
merged 4 commits into from
May 28, 2023
Merged

[MINOR] docs: Add benchmark results #904

merged 4 commits into from
May 28, 2023

Conversation

jerqi
Copy link
Contributor

@jerqi jerqi commented May 25, 2023

What changes were proposed in this pull request?

Add the benchmark data.

Why are the changes needed?

Attract more users. If we have benchmark data, users don't need to test by themselves. It will lower the barrier to use.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Just doc

@jerqi jerqi requested a review from zuston May 25, 2023 06:43
@codecov-commenter
Copy link

codecov-commenter commented May 25, 2023

Codecov Report

Merging #904 (ac4c82a) into master (3e58805) will increase coverage by 1.84%.
The diff coverage is n/a.

@@             Coverage Diff              @@
##             master     #904      +/-   ##
============================================
+ Coverage     55.24%   57.08%   +1.84%     
  Complexity     2201     2201              
============================================
  Files           333      313      -20     
  Lines         16449    14089    -2360     
  Branches       1307     1307              
============================================
- Hits           9087     8043    -1044     
+ Misses         6851     5606    -1245     
+ Partials        511      440      -71     

see 20 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@zuston
Copy link
Member

zuston commented May 25, 2023

Great improvement.

How about adding Netty ?

@jerqi
Copy link
Contributor Author

jerqi commented May 25, 2023

Great improvement.

How about adding Netty

We should update the bencmark after finishing Netty feature.

@jerqi jerqi changed the title [MINOR] Add benchmark test data [MINOR] docs: Add benchmark test data May 25, 2023
@jerqi jerqi requested a review from kaijchen May 25, 2023 10:48
@kaijchen kaijchen changed the title [MINOR] docs: Add benchmark test data [MINOR] docs: Add benchmark results May 25, 2023
Copy link
Contributor

@kaijchen kaijchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jerqi for the benchmark result. Maybe we should add some charts along with the table to show the result intuitively.

Comment on lines 18 to 21
Software: Uniffle 0.2.0 Hadoop 2.8.5 Spark 2.4.6
Hardware: Machine 176 cores, 265G memory, 4T * 12 HDD, network bandwidth 10GB/s
Hadoop Yarn Cluster: 1 * ResourceManager + 6 * NodeManager, every machine 4T * 10 HDD
Uniffle Cluster: 1 * Coordinator + 6 * Shuffle Server, every machine 4T * 10 HDD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix the format here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What format do you prefer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either add after each line, or add * before each line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What format do you prefer?

You are missing line breaks here, see preview.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

docs/benchmark.md Show resolved Hide resolved
Comment on lines +154 to +166
Overall Time:
![Overall Time](asset/rss_benchmark3.png)
Write Time:
![Write Time](asset/rss_benchmark2.png)
Read Time:
![Read Time](asset/rss_benchmark1.png)
#### vanilla Spark performance
Overall Time:
![Overall Time](asset/vanilla_benchmark1.png)
Write Time:
![Write Time](asset/vanilla_benchmark2.png)
Read Time:
![Read Time](asset/vanilla_benchmark3.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These screenshots looks blurry, is it possible to put raw data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don‘t have raw data now. Maybe we can update the data after finishing Netty.

docs/benchmark.md Outdated Show resolved Hide resolved
jerqi and others added 2 commits May 25, 2023 20:03
Co-authored-by: Kaijie Chen <ckj@apache.org>
@jerqi
Copy link
Contributor Author

jerqi commented May 25, 2023

Thanks @jerqi for the benchmark result. Maybe we should add some charts along with the table to show the result intuitively.

I tried. But there are too many queries. Charts won't give more effective information.

|query99|13|12|
|total|5821|6494|

Uniffle is a little 9% slower than vanilla Spark. Because the amount of shuffle is tiny.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The perf is lower than vanilla Spark. Is it better to remove TPC-DS bench for now? We can add it back when we have better perf with large shuffle data.

Copy link
Contributor Author

@jerqi jerqi May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TPC-DS situation can reflect the performance of tiny shuffle. We don't have too poor performance. It's acceptable.

@awdavidson
Copy link
Contributor

awdavidson commented May 25, 2023

It would be good to include the uniffle coordinator and shuffle-server configuration used when performing these benchmark e.g. heap, buffer sizes etc

@jerqi
Copy link
Contributor Author

jerqi commented May 25, 2023

It would be good to include the uniffle coordinator and shuffle-server configuration used when performing these benchmark e.g. heap, buffer sizes etc

This is our previous benchmark results. I can't find the complete configuration. But we can do new benchmark after finishing Netty feature.

@jerqi jerqi requested a review from kaijchen May 26, 2023 06:33
@jerqi
Copy link
Contributor Author

jerqi commented May 26, 2023

@zuston @kaijchen @jiafuzha I think we should approve this pr first and merge it. It will be better than nothing. When Netty feature is finished, we can update benchmark results.

@connorlwilkes
Copy link
Contributor

1TB is quite a small dataset, has anyone benchmarked more than this and if so what did the results/config look like?

@jerqi
Copy link
Contributor Author

jerqi commented May 26, 2023

1TB is quite a small dataset, has anyone benchmarked more than this and if so what did the results/config look like?

What dataset do you hope to add?

Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this firstly

@jerqi jerqi merged commit 7ba062d into apache:master May 28, 2023
25 checks passed
@jerqi
Copy link
Contributor Author

jerqi commented May 28, 2023

Thanks all, merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants