Skip to content

Conversation

@1996fanrui
Copy link
Member

What is the purpose of the change

Correct the unaligned checkpoint type after aligned barrier timeout to unaligned barrier on PipelinedSubpartition.

Brief change log

Correct the unaligned checkpoint type after aligned barrier timeout to unaligned barrier on PipelinedSubpartition.

Verifying this change

This change improved old tests and can be verified as follows:

  • PipelinedSubpartitionTest#testConsumeTimeoutableCheckpointBarrierQuickly
  • PipelinedSubpartitionTest#testTimeoutAlignedToUnalignedBarrier

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not documented

@flinkbot
Copy link
Collaborator

flinkbot commented Apr 13, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@pnowojski pnowojski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix proposal @1996fanrui , I've left a couple of comments.

() -> {
try {
operatorChain.alignedBarrierTimeout(checkpointId);
operatorChain.alignedBarrierTimeout(checkpointId, metrics);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of adding a dependency to CheckpointMetrics to all of the call stack down to the subpartition, can alignedBarrierTimeout return true or false depending if the barrier has timed out or not? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pnowojski , thanks for your review.

It's a great suggestion, updated.

() -> {
try {
operatorChain.alignedBarrierTimeout(checkpointId);
operatorChain.alignedBarrierTimeout(checkpointId, metrics);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we safely pass the CheckpointMetricsBuilder to the timer thread? Maybe I'm mis-remembering something, but the ownership and fully responsibility for CheckpointMetricsBuilder seems to be passed from the SubtaskCheckpointCoordinatorImpl to the AsyncCheckpointRunnable, which uses it to build the metrics. AsyncCheckpointRunnable and the alignment timer, are running in different threads, creating both problems with the actual memory visibility AND race conditions?

Shouldn't this be set in the AsyncCheckpointRunnable thread via a code path similar to org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.SnapshotsFinalizeResult#bytesPersistedDuringAlignment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AsyncCheckpointRunnable and the alignment timer, are running in different threads, creating both problems with the actual memory visibility AND race conditions?

Before this PR, the CheckpointMetricsBuilder#setUnalignedCheckpoint is only called on Task thread[1]. And IIUC, registerTimer should be executed by task thread as well.AsyncCheckpointRunnable won't call CheckpointMetricsBuilder#setUnalignedCheckpoint, AsyncCheckpointRunnable just uses it to build the metrics. So it cannot be modified concurrently.

After detailed analysis, I guess the first comment[2] should be reverted. It may lead to wrong unaligned type due to the order of execution, for example:

  1. registerTimer thread: aligned barrier timeout to unaligned
  2. registerTimer thread: channelStateFuture.complete(inflightBuffers)
  3. Channel state writer thread: write these buffers and complete the resultSubpartitionStateFuture
  4. AsyncCheckpointRunnable thread: all states are written, and build metrics
  5. registerTimer thread: call CheckpointMetricsBuilder#setUnalignedCheckpoint(true)

If the inflightBuffers is empty or very small, the step 3 and step 4 will faster than step5, and then the unaligned type will be wrong.

Based on this case, i think the solution is :

  1. CheckpointMetricsBuilder#setUnalignedCheckpoint(true) should be executed before channelStateFuture.complete(inflightBuffers), that is, CheckpointMetricsBuilder should be passed to PipelinedSubpartition#alignedBarrierTimeout.
  2. Add the volatile for CheckpointMetricsBuilder#unalignedCheckpoint to ensure AsyncCheckpointRunnable can read it correctly.

I updated the solution here[3].

Shouldn't this be set in the AsyncCheckpointRunnable thread via a code path similar to org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.SnapshotsFinalizeResult#bytesPersistedDuringAlignment?

This solution can work. Actually, I have tried to this solution. And I found this code path is too complex, it includes too many exception cases(in ChannelStateCheckpointWriter), completedFuture(includes dataFuture and resultFuture) and complete these futures after merging channel state.

However, I implemented a POC version[4] using this solution. Core process:

  • Adding a CompletableFuture<Boolean> timeoutToUnaligned inside of PipelinedSubpartition, and complete it when complete channelStateFuture
  • ChannelStateWriteResult(it's at subtask level) added a CompletableFuture<Boolean> resultSubpartitionTimeoutToUnaligned;, the future will be completed in the following cases:
      1. true: Any subpartition be switched from aligned to unaligned checkpoint.
      1. false: This result was completed and all subpartitions don't switched to unaligned checkpoint.
      1. false: This result fails before any subpartition switched to unaligned checkpoint.
  • ChannelStateWriteResult will pass the result to OperatorSnapshotFutures, and then pass it to AsyncCheckpointRunnable

Solution2 is more complex than solution1, however, it's more reasonable. Which one do you prefer?

[1]

checkpointMetrics.setUnalignedCheckpoint(checkpointOptions.isUnalignedCheckpoint());

[2] #22392 (comment)
[3] 1996fanrui@c23c5db
[4] 1996fanrui@d5f2537

Copy link
Contributor

@pnowojski pnowojski Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ehhh. [4] is indeed a bit complicated. I think I misspoke, I actually meant something a bit different ([5] below).
[3] Technically it works, but I don't like how fragile the contract there actually is, where value of the volatile boolean unalignedCheckpoint is only valid if other methods/things are happening in the correct order. To fix it, we would need something like:
[5] Change CheckpointMetricsBuilder#unalignedCheckpoint into some kind of CompletableFuture<Boolean>. Using that, AsyncCheckpointRunnable could just call CheckpointMetricsBuilder#unalignedCheckpoint.get(), without taking into account if that's safe or not. However I hoped that it can be set exactly as [3], but there is a problem. We know when to set it to true, but when to set it to false would require quite a bit of logic :/
[6] Another potential solution would be to move out the completion of the AsyncCheckpointRunnable from that the async thread, into the mailbox thread, which would also remove some race conditions and simplify the logic. But that's probably not worth doing for the sake of this single flag...

All in all, I'm started to think that maybe your original idea, to approximate the true/false flag based on the bytesPersistedDuringAlignment > 0 might be the lesser evil. The case when bytesPersistedDuringAlignment == 0 but the checkpoint barrier actually timed out in the output buffers is quite extreme/rare, and shouldn't be that significant to the end user and probably not worth of making the code so much more complicated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all, I'm started to think that maybe your original idea, to approximate the true/false flag based on the bytesPersistedDuringAlignment > 0 might be the lesser evil. The case when bytesPersistedDuringAlignment == 0 but the checkpoint barrier actually timed out in the output buffers is quite extreme/rare, and shouldn't be that significant to the end user and probably not worth of making the code so much more complicated.

I agree with you, these solutions are complicated, and it probably not worth of making the code so much more complicated, so I prefer generate unaligned checkpoint type based on persisted data.

Do you think it's ok? If yes, I can go ahead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, lets' do that. Apart of that let's create a ticket to explain the problem with the approximation, linking to this conversation, and setting it priority to "not a priority"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your quick response, I created FLINK-31864 to briefly explain it.

@1996fanrui 1996fanrui force-pushed the 31588/unaligned_type branch from c8d6b94 to 8963055 Compare April 14, 2023 05:24
@1996fanrui 1996fanrui force-pushed the 31588/unaligned_type branch 2 times, most recently from 4f830a1 to a6b1c49 Compare April 21, 2023 10:03
Copy link
Contributor

@pnowojski pnowojski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

Can you add a small unit test? Apart of that LGTM Feel free to merge after adding the unit test and with green azure :)

@1996fanrui 1996fanrui force-pushed the 31588/unaligned_type branch from a6b1c49 to 9391716 Compare April 21, 2023 11:10
@1996fanrui 1996fanrui force-pushed the 31588/unaligned_type branch from 9391716 to 8bebc45 Compare April 21, 2023 11:16
@1996fanrui
Copy link
Member Author

Thanks for the fix!

Can you add a small unit test? Apart of that LGTM Feel free to merge after adding the unit test and with green azure :)

Thanks for the quick feedback, updated.

@1996fanrui 1996fanrui merged commit d46d8d0 into apache:master Apr 23, 2023
@1996fanrui 1996fanrui deleted the 31588/unaligned_type branch August 4, 2023 06:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants