New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STORM-3121: Fix flaky metrics tests in storm-core #2735
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks great and I ran Travis build 5 times and all succeed, though the test before the patch is also failing rarely so it may not be right approach to verify.
Just would like to confirm my understanding of the issue before giving +1.
(try | ||
(do | ||
(wait-for-atleast-N-buckets! min-buckets comp-id metric-name cluster) | ||
(.until |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as far as I understand the description of issue, the metric value may not be added to exactly N bucket though we simulate the time, so we need to wait more to check if condition is met, and though we are now having another waiting logic, we still have to call wait-for-atleast-N-buckets!
because of advancing simulated time, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The part about the metric value not being added to exactly bucket N is right. The reason we call wait-for-atleast-N-buckets
is to preserve the existing test logic as much as possible. For example, one of the replaced assert-buckets
were asserting that a metric had buckets [1 0 0 0 0 0]
. The new assert would just assert that the running sum is 1. By also waiting for 6 buckets, we are able to assert that the running sum is 1 within the first 6 buckets, rather than potentially stopping the check as soon as the first bucket is available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the details. Now I can understand what the patch is addressing clearer.
I saw failures in the metrics tests very often when adding Java 10 to the testing matrix, e.g. https://travis-ci.org/srdo/storm/builds/395663084. Maybe Travis runs the tests on the same machine, and the extra load happens to trigger the failure? I've also been unable to trigger the error locally or on master. |
@srdo Yeah... strange that it fails on JDK 8 instead of JDK 10, which happened rarely. We would be better to fix it anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
storm-core/pom.xml
Outdated
<dependency> | ||
<groupId>org.awaitility</groupId> | ||
<artifactId>awaitility</artifactId> | ||
</dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: like in the other pull request, it would be nice to have this marked as test here too, just so it is obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scope is inherited from dependencyManagement. Do you want me to add it anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we are consistent regarding this, but IMHO explicitly marking it looks better for me, though we know the scope is inherited. I'm OK to not adding here if we are already not consistent on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 again
https://issues.apache.org/jira/browse/STORM-3121