[FLINK-29198][test] Fail after maximum RetryOnException #20805

RyanSkraba · 2022-09-09T17:56:07Z

What is the purpose of the change

This pull request causes tests annotated with @RetryOnException to fail after the given number of retries when used with the JUnit5 RetryExtension. This is the old behaviour we saw with the JUnit4 RetryRule.

Currently, the test is retried for the maximum number of executions, but the "expected" exception is never thrown even on the last try. This causes the test to appear skipped instead of failed.

Brief change log

If the test doesn't succeed on the last possible retry, the exception is propagated as a test failure.
I reverted some overeager JUnit5 migration on test cases for the JUnit4 RetryRule. This code is appropriately duplicated for RetryExtension.

Verifying this change

This change is already covered by existing tests, such as RetryOnExceptionExtensionTest.

I've manually tested that the following test now fails after the expected number of retries (instead of being skipped):

    @TestTemplate
    @RetryOnException(times = NUMBER_OF_RUNS, exception = IllegalArgumentException.class)
    void testThrowExceptionForever() {
        throw new IllegalArgumentException();
    }

In JUnit5, it's very tricky to specify that a test is supposed to fail, but it's possible. I started down this route with a technique similar to ExpectToFail but it ends up needing to be very aware of the internals of our RetryExtension. I'm not sure it's worth adding test extension code to test extension code, but I can continue down that route!

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @Public(Evolving): no
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
The S3 file system connector: no

Documentation

Does this pull request introduce a new feature? no
If yes, how is the feature documented? not applicable

flinkbot · 2022-09-09T17:59:52Z

CI report:

fa21097 Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

...link-test-utils-junit/src/test/java/org/apache/flink/testutils/junit/RetryOnFailureTest.java

snuyanzin · 2022-09-21T14:52:00Z

Thanks for the contribution.
I think it would be great to have a test here confirming that the change fixes the issue

RyanSkraba · 2022-09-22T06:50:11Z

Thanks for the contribution. I think it would be great to have a test here confirming that the change fixes the issue

Thanks Sergey, especially for the collaboration! This was a tricky thing to test; it gets a bit unclear about how far to go when testing test code :D But since this is such a widely used TestExtension, it's good to have extra coverage -- especially when the symptom is otherwise silently skipping tests that should have failed.

snuyanzin

I have some tiny comments: add Error and Throwable to the tests

...tils-junit/src/test/java/org/apache/flink/testutils/junit/RetryOnExceptionExtensionTest.java

snuyanzin · 2022-09-26T07:20:32Z

@flinkbot run azure

snuyanzin · 2022-09-28T09:07:41Z

lgtm from my side

//cc @XComp

XComp

Thanks for creating this PR, @RyanSkraba, and @snuyanzin for the initial review. I looked into it and added a few comments. Please find them below.

...tils-junit/src/test/java/org/apache/flink/testutils/junit/RetryOnExceptionExtensionTest.java

...ava/org/apache/flink/testutils/junit/extensions/retry/strategy/RetryOnExceptionStrategy.java

XComp · 2022-09-28T13:11:19Z

...ava/org/apache/flink/testutils/junit/extensions/retry/strategy/RetryOnExceptionStrategy.java

+        // Failed when reach the total retry times
+        if (attemptIndex >= totalTimes) {
+            LOG.error("Test Failed at the last retry.", throwable);
+            throw throwable;
+        }


I'm wondering whether we should move this code into AbstractRetryStrategy instead of using the same code snippet in both implementing classes.

Hello! I thought about this too -- the way the code is expressed, there could be other implementations of AbstractRetryStrategy that apply different strategies (for example, to continue retrying for max period of time regardless of the number of attempts).

My opinion isn't strong on this, but the logic isn't really complicated enough to need to be abstracted away.

Valid point - but your example would probably get its own implementation of RetryStrategy instead of relying on AbstractRetryStrategy as the parent. A counter-argument would be to not try to optimize code for future feature to keep the code as small as possible. If someone decides to come up with a feature that doesn't fit the current implementation, it's time to refactor. Having code duplicates usually doesn't help but introduces potential problems, e.g. bugs only being fixed in one version but not the other.

Anyway, I still see your point...

I gave your suggestion a try, but creating and delegating to a new method in the abstract strategy takes more code and makes the final implementation non-obvious.

It might have been a bit premature to have introduced the strategy pattern with only two alternatives, but I'm not too worried about repeating this pretty simple if explicitly! Do you have a strong opinion about this?

I leave this up to you. No strong opinion here since, as you said, it's a small code change. So, I consider it an edge case. But in general, my comment from above still applies: Duplicate code creates the risk of fixing bugs only in one location but not the other.

XComp

Btw. nice catch with the accidental JUnit4->5 migration. Could you reorganize/squash the commits. Ideally, reverting the JUnit4->5 migration should be in a separate commit pointing to the relevant ticket.

RyanSkraba · 2022-10-17T14:04:29Z

Sorry for the delay! I've merge/squashed all of the commits (with @snuyanzin now as co-author, thanks!). The only changes were 1 javadoc clarification and the small requested simplification in the unit test. Can you take a look?

XComp

Thanks @RyanSkraba for addressing my comments. The change looks good now. Could you rebase the PR and we do a final PR run. It would also help to update the commit message and move the JavaDoc update into its own commit.

.../src/main/java/org/apache/flink/testutils/junit/extensions/retry/strategy/RetryStrategy.java

RyanSkraba · 2022-10-20T07:42:55Z

Rebased again and edited the commit messages!

...link-test-utils-junit/src/test/java/org/apache/flink/testutils/junit/RetryOnFailureTest.java

RyanSkraba · 2022-10-21T15:39:52Z

Rebased again and moved some modifications to their own separate commit

Co-authored-by: Sergey Nuyanzin <sergey.nuyanzin@aiven.io>

It's actually meant to test the JUnit4 RetryRule feature. The corresponding JUnit5 Extension is tested in `RetryOnFailureExtensionTest`.

RyanSkraba · 2022-11-03T08:53:47Z

Rebased and changed the commit messages to the suggested text. Thanks for your attention!

In fact, for testing correctness, I would suggest that all three commits be cherry-picked back into branch-1.16

XComp · 2022-11-03T15:16:26Z

@flinkbot run azure

XComp · 2022-11-04T08:00:19Z

Failure caused by FLINK-24119

XComp · 2022-11-04T08:00:26Z

@flinkbot run azure

XComp · 2022-11-04T10:01:17Z

@flinkbot run azure

flinkbot added the component=Tests label Sep 9, 2022

snuyanzin reviewed Sep 19, 2022

View reviewed changes

...link-test-utils-junit/src/test/java/org/apache/flink/testutils/junit/RetryOnFailureTest.java Outdated Show resolved Hide resolved

RyanSkraba requested a review from snuyanzin September 20, 2022 11:14

snuyanzin reviewed Sep 22, 2022

View reviewed changes

XComp requested changes Sep 28, 2022

View reviewed changes

RyanSkraba force-pushed the rskraba/FLINK-29198-retry-exception branch from c29037a to 7fa5d53 Compare October 17, 2022 13:56

RyanSkraba requested a review from XComp October 17, 2022 14:03

XComp approved these changes Oct 18, 2022

View reviewed changes

snuyanzin reviewed Oct 18, 2022

View reviewed changes

.../src/main/java/org/apache/flink/testutils/junit/extensions/retry/strategy/RetryStrategy.java Show resolved Hide resolved

RyanSkraba force-pushed the rskraba/FLINK-29198-retry-exception branch from 7fa5d53 to bde4719 Compare October 20, 2022 07:42

XComp reviewed Oct 21, 2022

View reviewed changes

...link-test-utils-junit/src/test/java/org/apache/flink/testutils/junit/RetryOnFailureTest.java Show resolved Hide resolved

RyanSkraba force-pushed the rskraba/FLINK-29198-retry-exception branch from bde4719 to 4235eb4 Compare October 21, 2022 15:38

RyanSkraba requested review from XComp and snuyanzin and removed request for XComp October 24, 2022 09:34

RyanSkraba and others added 3 commits November 3, 2022 09:50

[FLINK-29198][test] Fail after maximum RetryOnException

8cd7bac

Co-authored-by: Sergey Nuyanzin <sergey.nuyanzin@aiven.io>

[FLINK-27740][test] Revert JUnit5 migration for RetryOnFailureTest

f87a492

It's actually meant to test the JUnit4 RetryRule feature. The corresponding JUnit5 Extension is tested in `RetryOnFailureExtensionTest`.

[hotfix][test] Clarify javadoc exceptions

fa21097

RyanSkraba force-pushed the rskraba/FLINK-29198-retry-exception branch from 4235eb4 to fa21097 Compare November 3, 2022 08:51

This was referenced Nov 3, 2022

[BP-1.16][FLINK-29198][test] Fail after maximum RetryOnException #21234

Merged

[BP-1.15][FLINK-29198][test] Fail after maximum RetryOnException #21235

Merged

[BP-1.16][FLINK-27740][test] Revert JUnit5 migration for RetryOnFailureTest #21236

Merged

XComp merged commit b9c5799 into apache:master Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-29198][test] Fail after maximum RetryOnException #20805

[FLINK-29198][test] Fail after maximum RetryOnException #20805

RyanSkraba commented Sep 9, 2022

flinkbot commented Sep 9, 2022 •

edited

Loading

snuyanzin commented Sep 21, 2022

RyanSkraba commented Sep 22, 2022

snuyanzin left a comment

snuyanzin commented Sep 26, 2022

snuyanzin commented Sep 28, 2022

XComp left a comment •

edited

Loading

XComp Sep 28, 2022

RyanSkraba Sep 30, 2022

XComp Oct 5, 2022

RyanSkraba Oct 17, 2022

XComp Oct 18, 2022

XComp left a comment

RyanSkraba commented Oct 17, 2022

XComp left a comment

RyanSkraba commented Oct 20, 2022

RyanSkraba commented Oct 21, 2022

RyanSkraba commented Nov 3, 2022

XComp commented Nov 3, 2022

XComp commented Nov 4, 2022

XComp commented Nov 4, 2022

XComp commented Nov 4, 2022

[FLINK-29198][test] Fail after maximum RetryOnException #20805

[FLINK-29198][test] Fail after maximum RetryOnException #20805

Conversation

RyanSkraba commented Sep 9, 2022

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

flinkbot commented Sep 9, 2022 • edited Loading

CI report:

snuyanzin commented Sep 21, 2022

RyanSkraba commented Sep 22, 2022

snuyanzin left a comment

Choose a reason for hiding this comment

snuyanzin commented Sep 26, 2022

snuyanzin commented Sep 28, 2022

XComp left a comment • edited Loading

Choose a reason for hiding this comment

XComp Sep 28, 2022

Choose a reason for hiding this comment

RyanSkraba Sep 30, 2022

Choose a reason for hiding this comment

XComp Oct 5, 2022

Choose a reason for hiding this comment

RyanSkraba Oct 17, 2022

Choose a reason for hiding this comment

XComp Oct 18, 2022

Choose a reason for hiding this comment

XComp left a comment

Choose a reason for hiding this comment

RyanSkraba commented Oct 17, 2022

XComp left a comment

Choose a reason for hiding this comment

RyanSkraba commented Oct 20, 2022

RyanSkraba commented Oct 21, 2022

RyanSkraba commented Nov 3, 2022

XComp commented Nov 3, 2022

XComp commented Nov 4, 2022

XComp commented Nov 4, 2022

XComp commented Nov 4, 2022

flinkbot commented Sep 9, 2022 •

edited

Loading

XComp left a comment •

edited

Loading