Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fix incorrect spark metrics #324

Merged
merged 6 commits into from
Nov 18, 2022
Merged

[BUG] Fix incorrect spark metrics #324

merged 6 commits into from
Nov 18, 2022

Conversation

zuston
Copy link
Member

@zuston zuston commented Nov 15, 2022

What changes were proposed in this pull request?

Fix incorrect spark metrics

Why are the changes needed?

  1. The corresponding shuffle-read records number and shuffle-write records number is not consistent in our internal cluster
  2. Log wont show the correct fetch bytes, always return 0 like

22/11/15 13:54:53 INFO RssShuffleDataIterator: Fetch 0 bytes cost 30791 ms and 53 ms to serialize, 347 ms to decompress with unCompressionLength[274815736]

Does this PR introduce any user-facing change?

No

How was this patch tested?

  1. UTs
  2. Online spark3 jobs test

@zuston
Copy link
Member Author

zuston commented Nov 15, 2022

PTAL @jerqi @kaijchen @leixm

@codecov-commenter
Copy link

codecov-commenter commented Nov 15, 2022

Codecov Report

Merging #324 (664b308) into master (eae2621) will increase coverage by 0.11%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master     #324      +/-   ##
============================================
+ Coverage     61.21%   61.32%   +0.11%     
- Complexity     1506     1526      +20     
============================================
  Files           185      186       +1     
  Lines          9360     9441      +81     
  Branches        908      924      +16     
============================================
+ Hits           5730     5790      +60     
- Misses         3325     3341      +16     
- Partials        305      310       +5     
Impacted Files Coverage Δ
...e/spark/shuffle/reader/RssShuffleDataIterator.java 90.90% <100.00%> (+0.36%) ⬆️
.../apache/uniffle/common/metrics/MetricsManager.java 68.42% <0.00%> (-17.30%) ⬇️
...org/apache/uniffle/common/metrics/GRPCMetrics.java 40.00% <0.00%> (-6.52%) ⬇️
...pache/uniffle/server/ShuffleServerGrpcService.java 0.83% <0.00%> (-0.04%) ⬇️
...pache/uniffle/server/ShuffleServerGrpcMetrics.java 100.00% <0.00%> (ø)
...apache/uniffle/coordinator/ApplicationManager.java 83.80% <0.00%> (ø)
...fle/coordinator/AbstractSelectStorageStrategy.java 20.00% <0.00%> (ø)
...e/coordinator/AppBalanceSelectStorageStrategy.java 72.00% <0.00%> (ø)
...java/org/apache/uniffle/coordinator/RankValue.java 70.00% <0.00%> (ø)
.../org/apache/uniffle/server/ShuffleTaskManager.java 77.23% <0.00%> (+0.17%) ⬆️
... and 3 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@zuston
Copy link
Member Author

zuston commented Nov 17, 2022

I'm confused that the WriteAndReadMetricsTest will fail in spark3.2.x . Could you help check why? @jerqi

@jerqi
Copy link
Contributor

jerqi commented Nov 17, 2022

Could you debug in your local machine?

@zuston
Copy link
Member Author

zuston commented Nov 18, 2022

Could you debug in your local machine?

Fixed @jerqi PTAL

Copy link
Contributor

@jerqi jerqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @zuston

@jerqi jerqi merged commit 79d2f54 into apache:master Nov 18, 2022
kaijchen added a commit to kaijchen/incubator-uniffle that referenced this pull request Nov 18, 2022
Fix incorrect spark metrics

1. The corresponding shuffle-read records number and shuffle-write records number is not consistent in our internal cluster
2. Log wont show the correct fetch bytes, always return 0 like

`22/11/15 13:54:53 INFO RssShuffleDataIterator: Fetch 0 bytes cost 30791 ms and 53 ms to serialize, 347 ms to decompress with unCompressionLength[274815736]
`

No

1. UTs
2. Online spark3 jobs test

Co-authored-by: Kaijie Chen <ckj@apache.org>
kaijchen pushed a commit that referenced this pull request Nov 18, 2022
Fix incorrect spark metrics

1. The corresponding shuffle-read records number and shuffle-write records number is not consistent in our internal cluster
2. Log wont show the correct fetch bytes, always return 0 like

`22/11/15 13:54:53 INFO RssShuffleDataIterator: Fetch 0 bytes cost 30791 ms and 53 ms to serialize, 347 ms to decompress with unCompressionLength[274815736]
`

No

1. UTs
2. Online spark3 jobs test
import org.junit.jupiter.api.Test;
import scala.collection.Seq;

public class WriteAndReadMetricsTest extends SimpleTestBase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants