Fix case-sensitive session ID handling in ServiceBusOrchestrationService#1334
Conversation
Service Bus can change the casing of session IDs during upgrades or failovers. The DurableTask framework used ordinal (case-sensitive) ConcurrentDictionary keys for orchestrationSessions and orchestrationMessages, causing a lowercased session ID to create a ghost session with empty state instead of finding the existing session. This broke eternal orchestrations (ContinueAsNew timer bridge) because: 1. Timer message sent to PascalCase session ID 2. Service Bus delivered to lowercased session ID after upgrade 3. Framework created new empty session (ghost) instead of finding existing 4. Real session orphaned permanently with no pending messages Fix: Use StringComparer.OrdinalIgnoreCase for both ConcurrentDictionary instances so session lookups are resilient to casing changes. Incident: IcM 771856247 — Service Bus scheduled message loss Impact: 15+ APIM tenants, billing orchestrations stuck 43+ hours Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Added 3 new log points to ease investigation of session-related failures: 1. TrySetSessionState-DeletingState (Warning) — Logs the reason when session state is set to null (runtime state null, missing ExecutionStartedEvent, or non-Running status). Previously silent. 2. GetSessionState-EmptyState (Warning) — Warns when a session has null or empty state, which may indicate a ghost session from a casing change. 3. SentMessageLog enhancement — Now includes ScheduledEnqueueTimeUtc and target SessionId for timer messages, enabling end-to-end timer lifecycle tracing without cross-event correlation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes a reliability issue in the Service Bus backend where session IDs can be delivered with different casing (e.g., after upgrades/failovers), which previously caused “ghost sessions” and could stall long-running/eternal orchestrations.
Changes:
- Initialize
orchestrationSessions(and currently alsoorchestrationMessages) with a case-insensitiveConcurrentDictionarycomparer. - Add diagnostic logging around empty session state reads and state deletion.
- Add new unit tests intended to validate case-insensitive behavior and reproduce the incident scenario.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
src/DurableTask.ServiceBus/ServiceBusOrchestrationService.cs |
Switches dictionaries to OrdinalIgnoreCase and adds new warning logs + enhanced sent-message logging. |
Test/DurableTask.ServiceBus.Tests/SessionIdCaseInsensitiveTests.cs |
Adds new MSTest tests for case-insensitive dictionary behavior and incident reproduction (but currently placed outside the solution’s active test/ test project tree). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Commenter does not have sufficient privileges for PR 1334 in repo Azure/durabletask |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/DurableTask.ServiceBus/ServiceBusOrchestrationService.cs:1612
GetSessionStateAsynclogs when the session state is null/empty, but it still constructs aMemoryStreamwhenstate.Length == 0. An empty stream will deserialize as an empty string and will throw inRuntimeStateStreamConverter.DeserializeToRuntimeStateWithFallback. Consider treating empty state the same as null when creatingrawSessionStream(e.g., only create the stream whenstate?.Length > 0).
byte[] state = await session.GetStateAsync();
if (state == null || state.Length == 0)
{
TraceHelper.TraceSession(
TraceEventType.Information,
"ServiceBusOrchestrationService-GetSessionState-EmptyState",
session.SessionId,
$"Session '{session.SessionId}' has null or empty state ({state?.Length ?? 0} bytes).");
}
using (Stream rawSessionStream = state != null ? new MemoryStream(state) : null)
{
this.ServiceStats.OrchestrationDispatcherStats.SessionGets.Increment();
return await RuntimeStateStreamConverter.RawStreamToRuntimeState(rawSessionStream, session.SessionId, orchestrationServiceBlobStore, JsonDataConverter.Default);
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
change warning to informational
05189a0 to
a59d98c
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Summary
Service Bus can change the casing of session IDs during upgrades or failovers. The DurableTask framework used ordinal (case-sensitive)
ConcurrentDictionarykeys fororchestrationSessionsandorchestrationMessages, causing a lowercased session ID to create a ghost session with empty state instead of finding the existing session.Problem
This broke eternal orchestrations (
ContinueAsNewtimer bridge) because:System_MoveBillingEvents_a3c79b00)system_movebillingevents_a3c79b00)Impact
Changes
1. Case-insensitive dictionary (core fix)
Changed both
ConcurrentDictionaryinstances inServiceBusOrchestrationService.StartAsync()to useStringComparer.OrdinalIgnoreCase:\\csharp
// Before
this.orchestrationSessions = new ConcurrentDictionary<string, ServiceBusOrchestrationSession>();
// After
this.orchestrationSessions = new ConcurrentDictionary<string, ServiceBusOrchestrationSession>(StringComparer.OrdinalIgnoreCase);
\\
2. Diagnostic logging (3 new log points)
ScheduledEnqueueTimeUtcand targetSessionIdfor timer messages3. Unit tests (5 new tests)
Testing
All 5 new unit tests pass. Existing build succeeds with 0 warnings, 0 errors.