-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky Test: GrpcServerTest#testGrpcExecutorPool #388
Comments
Is |
Emm... Good question. |
I feel it's difficult to fix flaky test. WDYT? |
If it's easy to fix, we should fix the flaky test immediately. |
Yes. Maybe if we have the clear fix solution or small easy features, it could be mark it as good first issue. |
This problem still exists (https://github.com/apache/incubator-uniffle/actions/runs/5518411958/jobs/10062325314?pr=982). The testGrpcExecutorPool has two state, then verify the metrics. I think the problem is that we can't make sure to enter first state. Because the three thread only sleep 200ms, in this patch sleep 120ms to enter the first state, the mistake is only 80ms, it is too short. So I think two way to solve this problem: |
I prefer solution a. |
@zhengchenyu There is a fix. #1023 |
### What changes were proposed in this pull request? I observed a previous fix injected some delay to fix the flaky test, but I believe that fix might be unstable in a CI environment or when run on different machines, given the dependency on some constant wait time. I suggest a new way to fix the test by adding some synchronization for the test execution only. I at first identify the source code location whose slow execution leads to the flaky test failure, where if common/src/main/java/org/apache/uniffle/common/rpc/GrpcServer.java#157 slows the grpcMetrics.incGauge() and grpcMetrics.setGauge(), then the test fails (https://github.com/apache/incubator-uniffle/blob/master/common/src/main/java/org/apache/uniffle/common/rpc/GrpcServer.java#L157 and https://github.com/apache/incubator-uniffle/blob/master/common/src/main/java/org/apache/uniffle/common/rpc/GrpcServer.java#L158). Hence, I introduce one variable in this class [GrpcServer.java](https://github.com/apache/incubator-uniffle/blob/master/common/src/main/java/org/apache/uniffle/common/rpc/GrpcServer.java#L158) that is only there to provide some synchronization. Basically, until these two statements are executed, I force the thread that the test runs on to wait before it accesses the value of grpcMetrics.getGaugeMap(). The waiting location is at https://github.com/apache/incubator-uniffle/blob/master/common/src/test/java/org/apache/uniffle/common/rpc/GrpcServerTest.java#L72 ### Why are the changes needed? This test is flakily fails. I run this test many times and it makes assertion fails. The failure message is as follows. Failure: [INFO] Running org.apache.uniffle.common.rpc.GrpcServerTest [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.185 s <<< FAILURE! - in org.apache.uniffle.common.rpc.GrpcServerTest [ERROR] testGrpcExecutorPool Time elapsed: 0.142 s <<< FAILURE! org.opentest4j.AssertionFailedError: expected: <2.0> but was: <0.0> at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55) at org.junit.jupiter.api.AssertionUtils.failNotEqual(AssertionUtils.java:62) at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:70) at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:65) at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:885) at org.apache.uniffle.common.rpc.GrpcServerTest.testGrpcExecutorPool(GrpcServerTest.java:76) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725) at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) [ERROR] Failures: [ERROR] GrpcServerTest.testGrpcExecutorPool:76 expected: <2.0> but was: <0.0> [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0 ### How was this patch tested? I run this test more than 1000 times and the test always passes. Co-authored-by: other <other@ECE-A55006.austin.utexas.edu>
Code of Conduct
Search before asking
Describe the flaky test
Actions URL
https://github.com/apache/incubator-uniffle/actions/runs/3625931180/jobs/6114463345
Parent issue
#1733
Are you willing to submit PR?
The text was updated successfully, but these errors were encountered: