-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-30202][tests] Do not assert on checkpointId #21416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Capturing the checkpointId for a generated record is impossible since the notifyCheckpointComplete notification may arrive at any time (or not at all). Instead just assert that each subtask got exactly as many records as expected, which can only happen (reliably) if the rate-limiting works as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which can only happen (reliably) if the rate-limiting works as expected.
You're saying that we would introduce a test instability here if the RateLimitedStrategy
wouldn't perform as expected?
Yes. At least that was the idea. Now I'm not so sure anymore whether this makes sense. Given that we limit the count we invariably end up with |
yeah, that's something I was wondering as well. But the behavior of the |
There is a I'll try finding another way to test this; my current thinking goes towards using a FlatMapFunction that stops emitting values after the first call to |
I think we actually found a bug. If a split was already assigned to a reader, then the first call to |
Additionally, the RateLimitedSourceReader may reset the checkpoint limit at the wrong time. We don't really that to happen when the checkpoint is complete, but rather when the next checkpoint starts (== when snapshotState was called). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really should push up reading up on FLIP-27 in my todo list. 8) Anyway, after some code reading, the change in pollNext()
makes sense. Inially, I thought of initializing availabilityFuture
in pollNext()
instead of returning NOTHING_AVAILABLE
. But that was a wrong train of thought. I still don't get you 2nd comment, though. Please find my remarks below.
...tagen/src/test/java/org/apache/flink/connector/datagen/source/DataGeneratorSourceITCase.java
Outdated
Show resolved
Hide resolved
...tagen/src/test/java/org/apache/flink/connector/datagen/source/DataGeneratorSourceITCase.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue where we complete the gatingFuture
when receiving the completed checkpoint notification instead of when a new checkpoint is triggered sounds like a separate issue. I think it would make sense to create a new Jira ticket for that. WDYT?
yes, it's a separate issue (and for the one in this ticket we at least already have a test that shows the issue). |
- the test was never calling isAvailable(), relying on the previous (bugged) behavior of rate-limiting not being enforced - The loop was difficult to understand in terms of how many records are actually being processed and was refactored accordingly - there were a series of math errors in here; 563-177=386, but 128(elementsPerCycle)*3 = 384. This was hidden by the final call to pollNext() in the while loop (emitting 1 additional record), and the final range assertion also incrementing to by 1.
Another test relied on the previous (bugged) behavior :( |
- use 0-383 to make off-by-one error obvious (the splits included 385 values, not 384) - assert that we reach END_OF_INPUT - correctly assert all 384 elements
...datagen/src/test/java/org/apache/flink/connector/datagen/source/DataGeneratorSourceTest.java
Outdated
Show resolved
Hide resolved
...datagen/src/test/java/org/apache/flink/connector/datagen/source/DataGeneratorSourceTest.java
Outdated
Show resolved
Hide resolved
...datagen/src/test/java/org/apache/flink/connector/datagen/source/DataGeneratorSourceTest.java
Outdated
Show resolved
Hide resolved
...datagen/src/test/java/org/apache/flink/connector/datagen/source/DataGeneratorSourceTest.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 ...just a few minor things
Capturing the checkpointId for a generated record in a subsequent map function is impossible since the notifyCheckpointComplete notification may arrive at any time (or not at all). Instead just assert that each subtask got exactly as many records as expected, which can only happen (reliably) if the rate-limiting works as expected.