-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ETag retrieval for orchestrations with multiple executions #363
Conversation
src/DurableTask.AzureStorage/Tracking/AzureTableTrackingStore.cs
Outdated
Show resolved
Hide resolved
eTagValue = entity.ETag; | ||
if (i != (tableEntities.Count - 1)) | ||
{ | ||
AnalyticsEventSource.Log.GeneralWarning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debatably this should be an error and an exception should be thrown here. However, I am a bit hesitant throwing an exception, as I'm not sure how it is handled here.
I see what you're trying to do but this last refactor seems unnecessarily complicated. Here is my suggestion, which I think is simpler and does not strictly require the sentinel to be the last row: IList<HistoryEvent> historyEvents;
string executionId;
DynamicTableEntity sentinel = null;
if (tableEntities.Count > 0)
{
// The most recent generation will always be in the first history event.
executionId = tableEntities[0].Properties["ExecutionId"].StringValue;
// Convert the table entities into history events.
var events = new List<HistoryEvent>(tableEntities.Count);
foreach (DynamicTableEntity entity in tableEntities)
{
if (entity.Properties["ExecutionId"].StringValue != executionId)
{
// The remaining entities are from a previous generation and can be discarded.
break;
}
if (entity.RowKey == SentinelRowKey)
{
sentinel = entity;
continue;
}
// Some entity properties may be stored in blob storage.
await this.DecompressLargeEntityProperties(entity);
events.Add((HistoryEvent)this.tableEntityConverter.ConvertFromTableEntity(entity, GetTypeForTableEntity));
}
historyEvents = events;
}
else
{
historyEvents = EmptyHistoryEventList;
executionId = expectedExecutionId;
}
// Read the checkpoint completion time from the sentinel row, which should always be the last row.
// The only time a sentinel won't exist is if no instance of this ID has ever existed.
// The IsCheckpointCompleteProperty was newly added _after_ v1.6.4.
DateTime checkpointCompletionTime = DateTime.MinValue;
if (sentinel == null)
{
sentinel = tableEntities.LastOrDefault(e => e.RowKey == SentinelRowKey);
}
string eTagValue = sentinel?.ETag;
if (sentinel != null &&
sentinel.Properties.TryGetValue(CheckpointCompletedTimestampProperty, out EntityProperty timestampProperty))
{
checkpointCompletionTime = timestampProperty.DateTime ?? DateTime.MinValue;
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Found one typo in comment, and have a slight suggestion on wording.
@@ -184,9 +184,11 @@ public override async Task<OrchestrationHistory> GetHistoryEventsAsync(string in | |||
break; | |||
} | |||
|
|||
// The sentinal row does not contain any history events, so save it for later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sentinal => sentinel
@@ -208,9 +210,10 @@ public override async Task<OrchestrationHistory> GetHistoryEventsAsync(string in | |||
// The only time a sentinel won't exist is if no instance of this ID has ever existed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or it was removed via a history purge management operation?
It is possible for the history table to get into a state where the latest execution history is first, followed by the previous execution history, with the last entry for an instance id being the sentinal row. The previous history parsing logic stopped processing as soon as it saw a new execution id, meaning it failed to grab the ETag from the sentinal row. This, in combination with changes made when the ETag were not present, caused UpdateState calls to fail, meaning queue messages for Activities were being scheduled without recording that we did schedule them. This leads to duplicate Activity executions until we retrieve the history with an execution id (i.e. after an activity function successfully executes). This fix makes our ETag retrieval logic more robust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- LGTM!
Though it looks like the test run timed out. Let's make sure the CI is healthy before merging. |
704bc51
to
a580dae
Compare
I rebased onto master just to retrigger the build, and it appears to have worked this time. The timeout last time was in a Service Bus test, which I didn't touch here, so I am going to merge now. |
…#363) It is possible for the history table to get into a state where the latest execution history is first, followed by the previous execution history, with the last entry for an instance id being the sentinal row. The previous history parsing logic stopped processing as soon as it saw a new execution id, meaning it failed to grab the ETag from the sentinal row. This, in combination with changes made when the ETag were not present, caused UpdateState calls to fail, meaning queue messages for Activities were being scheduled without recording that we did schedule them. This leads to duplicate Activity executions until we retrieve the history with an execution id (i.e. after an activity function successfully executes).
It is possible for the history table to get into a state where the
latest execution history is first, followed by the previous execution
history, with the last entry for an instance id being the sentinal row.
The previous history parsing logic stopped processing as soon as it saw
a new execution id, meaning it failed to grab the ETag from the sentinal
row.
This, in combination with changes made when the ETag were not present,
caused UpdateState calls to fail, meaning queue messages for Activities
were being scheduled without recording that we did schedule them. This
leads to duplicate Activity executions until we retrieve the history
with an execution id (i.e. after an activity function successfully
executes).
This fix makes our ETag retrieval logic more robust.