[MINOR] docs: Add benchmark results #904

jerqi · 2023-05-25T06:42:56Z

What changes were proposed in this pull request?

Add the benchmark data.

Why are the changes needed?

Attract more users. If we have benchmark data, users don't need to test by themselves. It will lower the barrier to use.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Just doc

codecov-commenter · 2023-05-25T06:51:57Z

Codecov Report

Merging #904 (ac4c82a) into master (3e58805) will increase coverage by 1.84%.
The diff coverage is n/a.

@@             Coverage Diff              @@
##             master     #904      +/-   ##
============================================
+ Coverage     55.24%   57.08%   +1.84%     
  Complexity     2201     2201              
============================================
  Files           333      313      -20     
  Lines         16449    14089    -2360     
  Branches       1307     1307              
============================================
- Hits           9087     8043    -1044     
+ Misses         6851     5606    -1245     
+ Partials        511      440      -71

see 20 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

zuston · 2023-05-25T10:16:29Z

Great improvement.

How about adding Netty ?

jerqi · 2023-05-25T10:47:38Z

Great improvement.

How about adding Netty

We should update the bencmark after finishing Netty feature.

kaijchen

Thanks @jerqi for the benchmark result. Maybe we should add some charts along with the table to show the result intuitively.

kaijchen · 2023-05-25T11:37:53Z

docs/benchmark.md

+  Software: Uniffle 0.2.0  Hadoop 2.8.5 Spark 2.4.6
+  Hardware: Machine 176 cores, 265G memory, 4T * 12 HDD, network bandwidth 10GB/s
+  Hadoop Yarn Cluster: 1 * ResourceManager + 6 * NodeManager, every machine 4T * 10 HDD
+  Uniffle Cluster: 1 * Coordinator + 6 * Shuffle Server, every machine 4T * 10 HDD


please fix the format here.

What format do you prefer?

Either add after each line, or add * before each line.

What format do you prefer?

You are missing line breaks here, see preview.

docs/benchmark.md

kaijchen · 2023-05-25T11:41:06Z

docs/benchmark.md

+  Overall Time:
+  ![Overall Time](asset/rss_benchmark3.png)
+  Write Time:
+  ![Write Time](asset/rss_benchmark2.png)
+  Read Time:
+  ![Read Time](asset/rss_benchmark1.png)
+  #### vanilla Spark performance
+  Overall Time:
+  ![Overall Time](asset/vanilla_benchmark1.png)
+  Write Time:
+  ![Write Time](asset/vanilla_benchmark2.png)
+  Read Time:
+  ![Read Time](asset/vanilla_benchmark3.png)


These screenshots looks blurry, is it possible to put raw data?

I don‘t have raw data now. Maybe we can update the data after finishing Netty.

docs/benchmark.md

Co-authored-by: Kaijie Chen <ckj@apache.org>

jerqi · 2023-05-25T12:17:09Z

Thanks @jerqi for the benchmark result. Maybe we should add some charts along with the table to show the result intuitively.

I tried. But there are too many queries. Charts won't give more effective information.

jiafuzha · 2023-05-25T12:57:48Z

docs/benchmark.md

+  |query99|13|12|
+  |total|5821|6494|
+
+  Uniffle is a little 9% slower than vanilla Spark. Because the amount of shuffle is tiny.


The perf is lower than vanilla Spark. Is it better to remove TPC-DS bench for now? We can add it back when we have better perf with large shuffle data.

TPC-DS situation can reflect the performance of tiny shuffle. We don't have too poor performance. It's acceptable.

awdavidson · 2023-05-25T13:53:49Z

It would be good to include the uniffle coordinator and shuffle-server configuration used when performing these benchmark e.g. heap, buffer sizes etc

jerqi · 2023-05-25T14:03:26Z

It would be good to include the uniffle coordinator and shuffle-server configuration used when performing these benchmark e.g. heap, buffer sizes etc

This is our previous benchmark results. I can't find the complete configuration. But we can do new benchmark after finishing Netty feature.

jerqi · 2023-05-26T10:12:31Z

@zuston @kaijchen @jiafuzha I think we should approve this pr first and merge it. It will be better than nothing. When Netty feature is finished, we can update benchmark results.

connorlwilkes · 2023-05-26T12:21:54Z

1TB is quite a small dataset, has anyone benchmarked more than this and if so what did the results/config look like?

jerqi · 2023-05-26T13:13:40Z

1TB is quite a small dataset, has anyone benchmarked more than this and if so what did the results/config look like?

What dataset do you hope to add?

zuston

Let's merge this firstly

jerqi · 2023-05-28T05:07:07Z

Thanks all, merged.

[MINOR] Add benchmark test data

2c6c90b

jerqi requested a review from zuston May 25, 2023 06:43

jerqi requested review from smallzhongfeng, xianjingfeng and jiafuzha May 25, 2023 07:07

jerqi changed the title ~~[MINOR] Add benchmark test data~~ [MINOR] docs: Add benchmark test data May 25, 2023

jerqi requested a review from kaijchen May 25, 2023 10:48

kaijchen changed the title ~~[MINOR] docs: Add benchmark test data~~ [MINOR] docs: Add benchmark results May 25, 2023

kaijchen reviewed May 25, 2023

View reviewed changes

jerqi and others added 2 commits May 25, 2023 20:03

Update docs/benchmark.md

63c6f37

Co-authored-by: Kaijie Chen <ckj@apache.org>

fix

3a5a0da

jiafuzha reviewed May 25, 2023

View reviewed changes

fix

ac4c82a

jerqi requested a review from kaijchen May 26, 2023 06:33

zuston approved these changes May 28, 2023

View reviewed changes

jerqi merged commit 7ba062d into apache:master May 28, 2023
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MINOR] docs: Add benchmark results #904

[MINOR] docs: Add benchmark results #904

jerqi commented May 25, 2023

codecov-commenter commented May 25, 2023 •

edited

Loading

zuston commented May 25, 2023

jerqi commented May 25, 2023 •

edited

Loading

kaijchen left a comment

kaijchen May 25, 2023

jerqi May 25, 2023

kaijchen May 25, 2023

kaijchen May 25, 2023

jerqi May 25, 2023

kaijchen May 25, 2023

jerqi May 25, 2023

jerqi commented May 25, 2023 •

edited

Loading

jiafuzha May 25, 2023

jerqi May 25, 2023 •

edited

Loading

awdavidson commented May 25, 2023 •

edited

Loading

jerqi commented May 25, 2023

jerqi commented May 26, 2023

connorlwilkes commented May 26, 2023

jerqi commented May 26, 2023

zuston left a comment

jerqi commented May 28, 2023

[MINOR] docs: Add benchmark results #904

[MINOR] docs: Add benchmark results #904

Conversation

jerqi commented May 25, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

codecov-commenter commented May 25, 2023 • edited Loading

Codecov Report

zuston commented May 25, 2023

jerqi commented May 25, 2023 • edited Loading

kaijchen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerqi commented May 25, 2023 • edited Loading

Choose a reason for hiding this comment

jerqi May 25, 2023 • edited Loading

Choose a reason for hiding this comment

awdavidson commented May 25, 2023 • edited Loading

jerqi commented May 25, 2023

jerqi commented May 26, 2023

connorlwilkes commented May 26, 2023

jerqi commented May 26, 2023

zuston left a comment

Choose a reason for hiding this comment

jerqi commented May 28, 2023

codecov-commenter commented May 25, 2023 •

edited

Loading

jerqi commented May 25, 2023 •

edited

Loading

jerqi commented May 25, 2023 •

edited

Loading

jerqi May 25, 2023 •

edited

Loading

awdavidson commented May 25, 2023 •

edited

Loading