[CDAP-20913] Fix bug where runs fails when appfabric is restarted#15517
Merged
[CDAP-20913] Fix bug where runs fails when appfabric is restarted#15517
Conversation
albertshau
approved these changes
Jan 11, 2024
| runId = RunIds.fromString(((ExtendedTwillApplication) application).getRunId()); | ||
| appVersion = ((ExtendedTwillApplication) application).getApplicationVersion(); | ||
| } else { | ||
| appVersion = null; |
Contributor
There was a problem hiding this comment.
can you add a comment about when we would expect this to happen (if at all)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Depends on cdapio/twill#46.
Bug description
If appfabric is restated while a run is in progress, we fail to correctly process the program completion message, because we can't find a run record with the given ProgramRunId in the AppMetadataStore.
Eventually the run record corrector transitions the run to failed state, but in this case the program had actually run to completion successfully.
Root cause
After restart we construct the ProgramId from the app name; app version is not available here, so it's set the default value =
SNAPSHOT.However, the ProgramRunId in the run record does have a non-default version, so we fail to find the run record as the versions don't match.
Fix
Added application version to the LiveInfo so we have access to it when constructing the ProgramId from the application name.