Skip to content

v10.2.0

Latest

Choose a tag to compare

@ReubenBond ReubenBond released this 12 Jun 23:44
· 7 commits to main since this release
94443a5

Orleans v10.2.0 focuses on runtime reliability, grain directory correctness, observability, Durable Jobs and journaling improvements, provider fixes, and a broad test/CI stabilization push.

Highlights

  • Runtime and grain directory reliability: fixes for activation/deactivation races (#10016, #10046), shutdown cleanup (#10206), membership reconciliation (#10086), directory transfer batching (#10047, #10053), stale cache invalidation (#10078, #10105), and rolling-upgrade compatibility (#10050).
  • Journaling and Durable Jobs: JSON Lines is now the default Orleans.Journaling format (#9973), Durable Jobs use the journaling catalog (#10148), and new tracing/metrics make job scheduling and execution observable (#10151).
  • Observability: expanded activation and grain directory metrics (#10124, #10125, #10147), grain_type tags on request timeout/cancel counters (#10178), a Dashboard lifecycle dependency graph (#10145), and migration of Orleans metrics to IMeterFactory (#10201).
  • Provider stability: reminder delivery and shutdown behavior were hardened (#10154, #10155), Cosmos fixes landed for reminder deletes, ETags, and indexing (#10037, #10041, #10084), Redis reminder serialization and multiplexer ownership were fixed (#10099, #10146, #10182), NATS storage configuration was added (#10028, #10176), Azure retry behavior improved (#10192), and streaming checkpoint persistence was fixed (#10096).
  • Serialization: added ArcBuffer support (#10066), built-in codecs for collection interface types (#10104, #10106), serializer conformance baselines (#10034), and safer request-context deserialization (#10017).
  • Testing and CI: many flaky tests were made deterministic (#10101, #10186, #10216), provider Docker pulls became retryable (#10190, #10191, #10198), artifact uploads are non-blocking (#10158), workflows are more secure (#10027, #10102), and SDK/frontend dependencies were updated (#10205, #9998).

Breaking changes and notable behavior changes

  • Redis multiplexer ownership: Redis providers using a DI-provided IConnectionMultiplexer via ServiceKey no longer dispose that shared multiplexer on shutdown. Providers which create their own multiplexer still own and dispose it. (#10146, #10182)
  • Metrics meter access: the static Orleans meter was removed. Code which referenced Orleans.Runtime.Instruments.Meter should resolve OrleansInstruments from DI and use its Meter instead. (#10201)
  • Journaling default format: JSON Lines is now the default Orleans.Journaling storage format. Existing journals with stored format metadata continue to be read correctly; legacy entries without metadata are treated as OrleansBinary and migrate on snapshot write. Set JournaledStateManagerOptions.JournalFormatKey = "orleans-binary" to retain the old format for new writes. (#9973)
  • Silo startup lifecycle: a new ValidateInitialConnectivity stage runs before BecomeActive, keeping silos in Joining until initial peer connectivity is validated. (#10153)
  • Reminder lifecycle: reminder delivery is gated on the silo becoming active, while reminder register/update/unregister operations remain available later into shutdown. (#10154, #10155)
  • Client connection retries: the default client retry filter now retries additional transient gateway/startup failures, including OrleansMessageRejectionException and ConnectionFailedException, with a longer default retry window. (#10140)
  • Directory activation-failure retries: retry loops after directory activation failures are now bounded by the normal message forward-count budget. (#10094)
  • Diagnostics listener name: GrainLifecycleEvents.ListenerName is now correctly "Orleans.GrainLifecycle"; code subscribing by the previous string literal should update. (#10121)
  • Durable Jobs storage providers: custom Durable Jobs journaling providers need catalog support via IJournalStorageCatalog. (#10112, #10148)
  • NATS options validation: invalid StorageType enum values are rejected at startup. (#10176)

Runtime, activation, placement, and lifecycle

  • Fixed a stateless-worker reactivation race and a crash when cancelling indefinite keep-alive tickets. (#10016, #10014)
  • Fixed stuck deactivation recovery by removing stranded activations from the catalog, unregistering directory entries, and avoiding stale forwarding loops. (#10046)
  • Fixed shutdown activations leaving stale grain directory entries. (#10206)
  • Made the activation collector safer under memory pressure and when there are no valid candidates. (#10113)
  • Stopped and awaited PlacementService workers during silo shutdown. (#9993)
  • Prevented persistent stream pulling agents from accepting work or processing queue reads after shutdown starts. (#10036)
  • Stabilized late lifecycle registration scheduling to avoid lock/scheduler races. (#10135)
  • Added explicit initial connectivity validation before a silo becomes active. (#10153)
  • Avoided tracking deactivated grains in activation repartitioner state and fixed repartitioning waits for inactive migrations. (#10061, #10130)
  • Reduced runtime hot-path overhead by avoiding per-activation scheduler logger fields, reducing request monitoring memory use, avoiding context capture in outgoing calls, optimizing response completion sources, simplifying InsideRuntimeClient response handling, and removing locks from activation response processing. (#10118, #10119, #10129, #10127, #10128, #10139, #10141)
  • Broadened the default client connection retry filter to cover more transient startup and gateway failures. (#10140)

Grain directory, routing, and cache correctness

  • Added TTL cleanup to the grain directory cache and diagnostic hooks for cache state changes. (#10055)
  • Added a fast-path message destination cache for grain calls, with invalidation on activation, connection, gateway, and client state changes. (#10064)
  • Capped cache invalidation header growth during message send and deserialization. (#10078, #10105)
  • Set target silo metadata correctly on cached silo connections. (#10080)
  • Capped distributed directory ownership transfer batch sizes and split large transfer payloads into multiple messages. (#10047, #10053)
  • Fixed distributed directory recovery handoff for grains activating during ownership transfer. (#10082)
  • Added distributed remote grain directory compatibility for rolling upgrades and a regression test for directory migration joins. (#10050, #10049)
  • Fixed LocalGrainDirectory membership reconciliation using snapshot-based processing, membership-version-aware stale cleanup, and membership refresh before directory RPC routing. (#10086, #10087, #10088)
  • Simplified LocalGrainDirectory membership processing after the reconciliation changes. (#10089)
  • Refined directory forwarding retry checks, removed a problematic directory failure forwarding optimization, and bounded retries after directory activation failures. (#10092, #10095, #10094)
  • Restored the local grain directory lookup fast path for locally-owned partitions. (#10126)

Reminders, timers, providers, storage, and streaming

Reminders and timers

  • Added reminder lifecycle diagnostics and deterministic reminder testing infrastructure. (#10033)
  • Fixed immediate local reminder scheduling when the first tick is already due. (#10040)
  • Gated reminder tick delivery on active silos and allowed reminder updates during shutdown cleanup. (#10154, #10155)
  • Auto-registered ITimerRegistry on SystemTarget instances, enabling grain services/system targets to use features such as async enumerable cleanup timers without manual registration. (#10038)

Cosmos DB

  • Treated missing Cosmos reminder deletes as successful idempotent unregisters. (#10037)
  • Preserved Cosmos reminder ETags so stale conditional deletes can be detected correctly. (#10041)
  • Stabilized Cosmos reminder/container startup behavior and removed Cosmos DB CI exclusions. (#10059, #10042)
  • Fixed Cosmos indexing paths for newer emulator/service versions. (#10084)

Redis

  • Isolated Redis reminder serialization from application-level JsonConvert.DefaultSettings by moving reminder row serialization to System.Text.Json. (#10099)
  • Fixed Redis multiplexer ownership so shared DI-provided multiplexers are not disposed by Orleans providers, and added disposal/ownership tests for the Redis grain directory. (#10146, #10182)

NATS, SQS, memory storage, Azure Storage, transactions, and streaming

  • Added configurable NATS JetStream storage type (File or Memory) and validation for invalid storage-type values. (#10028, #10176)
  • Added ElasticMQ-backed SQS provider coverage in CI. (#10003)
  • Enabled nullable annotations in memory storage provider builder extensions. (#10110)
  • Fixed an Azure Table transaction atomicity issue during Confirm when storage failures occur, and improved classification of conflict/precondition responses. (#10123)
  • Reduced Azure Table retry pressure with bounded jittered backoff and less duplicate retrying under storage outages. (#10192)
  • Fixed streaming checkpoint gaps which could cause replay after restart. (#10096)
  • Fixed TransactionResponse diagnostic formatting. (#10083)

Serialization, analyzers, journaling, and Durable Jobs

Serialization and analyzers

  • Capped request-context dictionary preallocation during message deserialization to protect against oversized encoded counts. (#10017)
  • Fixed TypeConverter.ParseInternal alias resolution, enabled nullable annotations, and added support for JSON framework types in type conversion. (#10032)
  • Added serializer conformance tests and baselines for codec authors and regression detection. (#10034)
  • Added ArcBuffer support across serialization reader/writer/deserializer paths for lower-copy high-throughput scenarios. (#10066)
  • Added built-in codecs and snapshots for BCL collection interface types, including IEnumerable<T>, IReadOnlyList<T>, ISet<T>, and dictionary interfaces. (#10104, #10106)
  • Fixed Orleans analyzers to match framework types by fully-qualified metadata name instead of short name, avoiding false positives/negatives when user code has similarly named types. (#10107)

Orleans.Journaling

  • Added the JSON Lines journal format and made it the default for new journal writes, with opt-in APIs for configuring System.Text.Json source-generated metadata. (#9973)
  • Added IJournalStorageCatalog and optional journal metadata operations to support storage discovery and metadata updates. (#10112)
  • Made Azure Blob journal compaction checkpoint publication safer using immutable generation-scoped checkpoints. (#10103)
  • Added Azure Blob journal storage metrics and retry tuning for metadata-only ETag conflicts. (#10149)

Durable Jobs

  • Ported Durable Jobs to the journaled storage catalog, so shard discovery is backed by Orleans.Journaling. (#10148)
  • Improved journaled shard throughput by batching writes with a configurable linger delay. (#10150)
  • Added Durable Jobs tracing, trace propagation fields on DurableJob, and metrics for scheduling, dispatch lag, attempts, handler execution, shard processing, storage batches, and ownership checks. (#10151)
  • Added a Durable Jobs journaling playground sample. (#10152)

API surface and diagnostics

  • Updated generated public API surface baselines. (#9986, #10171, #10204)
  • Fixed the API-diff validation action to avoid disallowed operations. (#10120)
  • Fixed the grain lifecycle diagnostic listener name. (#10121)

Observability, metrics, logging, and Dashboard

  • Migrated logging in transactions test kit, testing host, event sourcing, reminders, runtime, Dashboard, Durable Jobs, TLS middleware, and async enumerable streaming to source-generated [LoggerMessage] methods, reducing disabled-log overhead and allocations. (#10018, #10019, #10020, #10021, #10022, #10023, #10024, #10025, #10026)
  • Changed runtime memory metrics to ObservableGauge, matching point-in-time memory semantics for exporters such as Prometheus. (#10029)
  • Implemented remaining grain directory metrics for snapshot transfers, range recovery, range-lock hold time, and registrations. (#10124)
  • Added activation lifecycle latency histograms and activation population/lifecycle counters. (#10125, #10147)
  • Added grain_type tags to application request timeout and cancellation counters. (#10178)
  • Added a Dashboard silo lifecycle dependency graph view for inspecting startup/shutdown stage ordering and registered lifecycle observers. (#10145)
  • Migrated Orleans metrics to IMeterFactory for correct DI scoping, listener registration, test isolation, and multi-host scenarios. This covered networking, watchdog, gateway, scheduler, consistent ring, storage, catalog, client, reminder, directory, grain, stream, messaging, messaging processing, Durable Jobs, Azure Blob journal storage, journaling, environment statistics, stream cache, Event Hubs receiver/cache, and stream receiver metrics. (#10137, #10138, #10142, #10143, #10144, #10162, #10168, #10169, #10172, #10173, #10175, #10177, #10179, #10180, #10181, #10183, #10184, #10193, #10194, #10195, #10199, #10200)
  • Removed the static Orleans meter after the IMeterFactory migration. (#10201)

Testing, benchmarks, CI, docs, and dependencies

Test reliability and speed

  • Replaced timing-sensitive waits with deterministic synchronization for stateless-worker late delivery, async enumerable slow-consumer tests, activation rebalancing, deactivation placement, broadcast channels, idle/deactivate-on-idle behavior, reminder scheduling, activation collection, placement migration, journaling recovery retry, transaction fault injection, stream generator reset, rolling-upgrade directory behavior, streaming client drops, runtime timers, membership health monitoring, queue balancer changes, liveness stabilization, upgrade readiness, timer callbacks, and consistent ring provider tests. (#10077, #10101, #10131, #10132, #10134, #10156, #10157, #10159, #10163, #10166, #10170, #10185, #10186, #10187, #10188, #10189, #10207, #10208, #10209, #10212, #10213, #10215, #10216, #10220)
  • Removed slow or redundant sleeps in ring cleanup and sped up membership health monitor, runtime timer, and collector tests. (#10210, #10207, #10209)

Benchmarks

  • Added an AdaptivePing benchmark with hill-climbing concurrency tuning and improved convergence rules for more reproducible throughput measurements. (#10069, #10076)

CI and workflow reliability

  • Added explicit least-privilege GitHub Actions permissions and restricted maintenance workflows to the upstream repository. (#10027, #10102)
  • Made artifact uploads non-blocking so upload failures do not fail otherwise successful jobs. (#10158)
  • Made MariaDB, Azure emulator, and provider Docker pulls more resilient to registry/network timeouts. (#10190, #10191, #10198)
  • Skipped external Dashboard asset builds in provider test jobs to reduce unrelated npm/network failures. (#10196)
  • Repaired the API diff workflow. (#10164)
  • Updated GitHub Actions to Node.js 24. (#10203)

Documentation and contributor workflow

  • Documented the upstream PR workflow and required Conventional Commits for future contribution history. (#10056, #10133)
  • Corrected cluster membership heartbeat/defaults documentation. (#10197)

SDK and dependency updates

  • Updated the pinned .NET SDK from 10.0.201 to 10.0.203, then 10.0.300, then 10.0.301. (#10015, #10097, #10205)
  • Updated Dashboard frontend dependencies: minimatch, picomatch, flatted, postcss, rollup, and vite. (#9956, #9974, #9967, #10043, #9957, #9998)

What's Changed

New Contributors

Full Changelog: v10.1.0...v10.2.0