Fix ETag retrieval for orchestrations with multiple executions #363

ConnorMcMahon · 2020-01-29T19:35:43Z

It is possible for the history table to get into a state where the
latest execution history is first, followed by the previous execution
history, with the last entry for an instance id being the sentinal row.
The previous history parsing logic stopped processing as soon as it saw
a new execution id, meaning it failed to grab the ETag from the sentinal
row.

This, in combination with changes made when the ETag were not present,
caused UpdateState calls to fail, meaning queue messages for Activities
were being scheduled without recording that we did schedule them. This
leads to duplicate Activity executions until we retrieve the history
with an execution id (i.e. after an activity function successfully
executes).

This fix makes our ETag retrieval logic more robust.

src/DurableTask.AzureStorage/Tracking/AzureTableTrackingStore.cs

ConnorMcMahon · 2020-01-30T18:41:17Z

src/DurableTask.AzureStorage/Tracking/AzureTableTrackingStore.cs

-                        eTagValue = entity.ETag;
+                        if (i != (tableEntities.Count - 1))
+                        {
+                            AnalyticsEventSource.Log.GeneralWarning(


Debatably this should be an error and an exception should be thrown here. However, I am a bit hesitant throwing an exception, as I'm not sure how it is handled here.

src/DurableTask.AzureStorage/Tracking/AzureTableTrackingStore.cs

cgillum · 2020-01-30T19:10:06Z

I see what you're trying to do but this last refactor seems unnecessarily complicated. Here is my suggestion, which I think is simpler and does not strictly require the sentinel to be the last row:

IList<HistoryEvent> historyEvents;
string executionId;
DynamicTableEntity sentinel = null;
if (tableEntities.Count > 0)
{
    // The most recent generation will always be in the first history event.
    executionId = tableEntities[0].Properties["ExecutionId"].StringValue;

    // Convert the table entities into history events.
    var events = new List<HistoryEvent>(tableEntities.Count);

    foreach (DynamicTableEntity entity in tableEntities)
    {
        if (entity.Properties["ExecutionId"].StringValue != executionId)
        {
            // The remaining entities are from a previous generation and can be discarded.
            break;
        }

        if (entity.RowKey == SentinelRowKey)
        {
            sentinel = entity;
            continue;
        }

        // Some entity properties may be stored in blob storage.
        await this.DecompressLargeEntityProperties(entity);

        events.Add((HistoryEvent)this.tableEntityConverter.ConvertFromTableEntity(entity, GetTypeForTableEntity));
    }

    historyEvents = events;
}
else
{
    historyEvents = EmptyHistoryEventList;
    executionId = expectedExecutionId;
}

// Read the checkpoint completion time from the sentinel row, which should always be the last row.
// The only time a sentinel won't exist is if no instance of this ID has ever existed.
// The IsCheckpointCompleteProperty was newly added _after_ v1.6.4.
DateTime checkpointCompletionTime = DateTime.MinValue;
if (sentinel == null)
{
    sentinel = tableEntities.LastOrDefault(e => e.RowKey == SentinelRowKey);
}

string eTagValue = sentinel?.ETag;
if (sentinel != null &&
    sentinel.Properties.TryGetValue(CheckpointCompletedTimestampProperty, out EntityProperty timestampProperty))
{
    checkpointCompletionTime = timestampProperty.DateTime ?? DateTime.MinValue;
}

sebastianburckhardt

Looks good to me. Found one typo in comment, and have a slight suggestion on wording.

sebastianburckhardt · 2020-01-30T21:27:29Z

src/DurableTask.AzureStorage/Tracking/AzureTableTrackingStore.cs

@@ -184,9 +184,11 @@ public override async Task<OrchestrationHistory> GetHistoryEventsAsync(string in
                        break;
                    }

+                    // The sentinal row does not contain any history events, so save it for later


sentinal => sentinel

sebastianburckhardt · 2020-01-30T21:28:30Z

src/DurableTask.AzureStorage/Tracking/AzureTableTrackingStore.cs

@@ -208,9 +210,10 @@ public override async Task<OrchestrationHistory> GetHistoryEventsAsync(string in
            // The only time a sentinel won't exist is if no instance of this ID has ever existed.


... or it was removed via a history purge management operation?

It is possible for the history table to get into a state where the latest execution history is first, followed by the previous execution history, with the last entry for an instance id being the sentinal row. The previous history parsing logic stopped processing as soon as it saw a new execution id, meaning it failed to grab the ETag from the sentinal row. This, in combination with changes made when the ETag were not present, caused UpdateState calls to fail, meaning queue messages for Activities were being scheduled without recording that we did schedule them. This leads to duplicate Activity executions until we retrieve the history with an execution id (i.e. after an activity function successfully executes). This fix makes our ETag retrieval logic more robust.

cgillum

- LGTM!

cgillum · 2020-01-31T01:27:12Z

Though it looks like the test run timed out. Let's make sure the CI is healthy before merging.

ConnorMcMahon · 2020-01-31T18:23:06Z

I rebased onto master just to retrigger the build, and it appears to have worked this time. The timeout last time was in a Service Bus test, which I didn't touch here, so I am going to merge now.

…#363) It is possible for the history table to get into a state where the latest execution history is first, followed by the previous execution history, with the last entry for an instance id being the sentinal row. The previous history parsing logic stopped processing as soon as it saw a new execution id, meaning it failed to grab the ETag from the sentinal row. This, in combination with changes made when the ETag were not present, caused UpdateState calls to fail, meaning queue messages for Activities were being scheduled without recording that we did schedule them. This leads to duplicate Activity executions until we retrieve the history with an execution id (i.e. after an activity function successfully executes).

ConnorMcMahon requested review from cgillum and sebastianburckhardt January 29, 2020 19:35

ConnorMcMahon self-assigned this Jan 29, 2020

ConnorMcMahon mentioned this pull request Jan 29, 2020

Duplicate activity calls and entity operations in v2.1.x Azure/azure-functions-durable-extension#1179

Closed

cgillum reviewed Jan 30, 2020

View reviewed changes

src/DurableTask.AzureStorage/Tracking/AzureTableTrackingStore.cs Outdated Show resolved Hide resolved

ConnorMcMahon commented Jan 30, 2020

View reviewed changes

src/DurableTask.AzureStorage/Tracking/AzureTableTrackingStore.cs Show resolved Hide resolved

sebastianburckhardt approved these changes Jan 30, 2020

View reviewed changes

Connor McMahon added 4 commits January 30, 2020 17:22

Refactor GetHistoryEventsAsync

8ba3ea7

Take PR feedback for simplification

e73aaee

PR feedback

a580dae

cgillum approved these changes Jan 31, 2020

View reviewed changes

ConnorMcMahon force-pushed the FixDuplicateActivities branch from 704bc51 to a580dae Compare January 31, 2020 17:14

ConnorMcMahon merged commit e2d319c into master Jan 31, 2020

ConnorMcMahon deleted the FixDuplicateActivities branch June 4, 2020 00:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ETag retrieval for orchestrations with multiple executions #363

Fix ETag retrieval for orchestrations with multiple executions #363

ConnorMcMahon commented Jan 29, 2020

ConnorMcMahon Jan 30, 2020

cgillum commented Jan 30, 2020

sebastianburckhardt left a comment

sebastianburckhardt Jan 30, 2020

sebastianburckhardt Jan 30, 2020

cgillum left a comment

cgillum commented Jan 31, 2020

ConnorMcMahon commented Jan 31, 2020

		@@ -208,9 +210,10 @@ public override async Task<OrchestrationHistory> GetHistoryEventsAsync(string in
		// The only time a sentinel won't exist is if no instance of this ID has ever existed.

Fix ETag retrieval for orchestrations with multiple executions #363

Fix ETag retrieval for orchestrations with multiple executions #363

Conversation

ConnorMcMahon commented Jan 29, 2020

ConnorMcMahon Jan 30, 2020

Choose a reason for hiding this comment

cgillum commented Jan 30, 2020

sebastianburckhardt left a comment

Choose a reason for hiding this comment

sebastianburckhardt Jan 30, 2020

Choose a reason for hiding this comment

sebastianburckhardt Jan 30, 2020

Choose a reason for hiding this comment

cgillum left a comment

Choose a reason for hiding this comment

cgillum commented Jan 31, 2020

ConnorMcMahon commented Jan 31, 2020