Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#691] test :fix flaky test CoordinatorMetricsTest#testCoordinatorMetrics #694

Merged
merged 2 commits into from
Mar 9, 2023

Conversation

smallzhongfeng
Copy link
Contributor

What changes were proposed in this pull request?

Do not start quotaManager when performing ut test of indicators.

Why are the changes needed?

Fix: #691

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Fix ut.

@smallzhongfeng
Copy link
Contributor Author

Please don't merge first. I will trigger GA more times. Thanks !

@jerqi jerqi changed the title [#691] test :fix flaky test CoordinatorMetricsTest#testCoordinatorMetrics [DON'T MERGE][#691] test :fix flaky test CoordinatorMetricsTest#testCoordinatorMetrics Mar 8, 2023
@codecov-commenter
Copy link

codecov-commenter commented Mar 8, 2023

Codecov Report

Merging #694 (c3f2d97) into master (00454c1) will increase coverage by 2.23%.
The diff coverage is n/a.

@@             Coverage Diff              @@
##             master     #694      +/-   ##
============================================
+ Coverage     60.80%   63.03%   +2.23%     
  Complexity     1840     1840              
============================================
  Files           221      207      -14     
  Lines         12648    10684    -1964     
  Branches       1062     1062              
============================================
- Hits           7690     6735     -955     
+ Misses         4552     3603     -949     
+ Partials        406      346      -60     
Impacted Files Coverage Δ
...pkg/controller/sync/shuffleserver/shuffleserver.go
...y/kubernetes/operator/pkg/webhook/inspector/pod.go
deploy/kubernetes/operator/pkg/utils/certs.go
deploy/kubernetes/operator/pkg/utils/util.go
...y/kubernetes/operator/pkg/webhook/inspector/rss.go
...oy/kubernetes/operator/pkg/controller/util/util.go
...tor/pkg/controller/sync/coordinator/coordinator.go
...eploy/kubernetes/operator/pkg/utils/coordinator.go
deploy/kubernetes/operator/pkg/webhook/manager.go
...bernetes/operator/pkg/controller/controller/rss.go
... and 4 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@smallzhongfeng smallzhongfeng force-pushed the ISSUE-691 branch 2 times, most recently from bbb5b3d to 3712096 Compare March 8, 2023 14:38
@smallzhongfeng
Copy link
Contributor Author

I tried to find out why testCoordinatorMetrics in CoordinatorMetricsTest collected app_num in QuotaManagerTest # testCheckQuotaMetrics, but there are not many clues. At present, the simplest way is to filter out app_num in the indicators, do you have any good suggestions ? @jerqi @advancedxy @xianjingfeng

The result as follow:

Test metrics => [
{"name":"running_app_num","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"total_quota_denied_request","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"total_load_denied_request","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"total_server_num","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"remote_storage_in_used_path1","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"total_candidates_denied_request","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"total_app_num","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"app_num","labelNames":["user_name"],"labelValues":["user3"],"value":0.0,"timestampMs":null},
{"name":"app_num","labelNames":["user_name"],"labelValues":["user4"],"value":0.0,"timestampMs":null},
{"name":"unhealthy_server_num","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"total_access_request","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"exclude_server_num","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null}]

@xianjingfeng
Copy link
Member

I tried to find out why testCoordinatorMetrics in CoordinatorMetricsTest collected app_num in QuotaManagerTest # testCheckQuotaMetrics, but there are not many clues. At present, the simplest way is to filter out app_num in the indicators, do you have any good suggestions ? @jerqi @advancedxy @xianjingfeng

The result as follow:

Test metrics => [
{"name":"running_app_num","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"total_quota_denied_request","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"total_load_denied_request","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"total_server_num","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"remote_storage_in_used_path1","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"total_candidates_denied_request","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"total_app_num","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"app_num","labelNames":["user_name"],"labelValues":["user3"],"value":0.0,"timestampMs":null},
{"name":"app_num","labelNames":["user_name"],"labelValues":["user4"],"value":0.0,"timestampMs":null},
{"name":"unhealthy_server_num","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"total_access_request","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"exclude_server_num","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null}]

I think we can remove this test case. it doesn't make sense.

@jerqi
Copy link
Contributor

jerqi commented Mar 9, 2023

I tried to find out why testCoordinatorMetrics in CoordinatorMetricsTest collected app_num in QuotaManagerTest # testCheckQuotaMetrics, but there are not many clues. At present, the simplest way is to filter out app_num in the indicators, do you have any good suggestions ? @jerqi @advancedxy @xianjingfeng

The result as follow:

Test metrics => [
{"name":"running_app_num","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"total_quota_denied_request","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"total_load_denied_request","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"total_server_num","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"remote_storage_in_used_path1","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"total_candidates_denied_request","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"total_app_num","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"app_num","labelNames":["user_name"],"labelValues":["user3"],"value":0.0,"timestampMs":null},
{"name":"app_num","labelNames":["user_name"],"labelValues":["user4"],"value":0.0,"timestampMs":null},
{"name":"unhealthy_server_num","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"total_access_request","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null},
{"name":"exclude_server_num","labelNames":[],"labelValues":[],"value":0.0,"timestampMs":null}]

We can filter them.

@smallzhongfeng smallzhongfeng changed the title [DON'T MERGE][#691] test :fix flaky test CoordinatorMetricsTest#testCoordinatorMetrics [#691] test :fix flaky test CoordinatorMetricsTest#testCoordinatorMetrics Mar 9, 2023
@smallzhongfeng
Copy link
Contributor Author

Copy link
Contributor

@jerqi jerqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, merged. thanks @smallzhongfeng

@jerqi jerqi merged commit 272c5c6 into apache:master Mar 9, 2023
assertEquals(10, actualObj.get("metrics").size());
int actualMetrics = 0;
for (JsonNode metrics : actualObj.get("metrics")) {
if (CoordinatorMetrics.APP_NUM_TO_USER.equals(metrics.get("name").textValue())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this change.

The better way is to call org.apache.uniffle.coordinator.metric.CoordinatorMetrics#clear in @beforeAll of this test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smallzhongfeng tried, but it doesn't matter. This test is so flaky that we should fix this issue asap. So we use the simplest to fix first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your advice, I have tried this method, but it still doesn't work. I don't think it's the reason why the monitoring didn't clean up, but when I ran the GA, it seems like reading the latest indicators of other uts. I tried to modify the default JETTY_ HTTP_ PORT and RPC_ SERVER_ PORT, but still not effective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Flaky Test] CoordinatorMetricsTest#testCoordinatorMetrics
5 participants