diff --git a/docs/core/diagnostics/compare-metric-apis.md b/docs/core/diagnostics/compare-metric-apis.md index e8f0bc0ac8c07..38443f7d5f14b 100644 --- a/docs/core/diagnostics/compare-metric-apis.md +++ b/docs/core/diagnostics/compare-metric-apis.md @@ -41,7 +41,7 @@ aiming at broad compatibility, this API adds support for many things that were l - Multiple simultaneous listeners - Listener access to unaggregated measurements -Although this API was designed to work well with OpenTelemetry and its growing ecosystem of pluggable vendor integration libraries, applications also have the option to use the .NET built-in listener APIs directly. With this option, you can create custom metric tooling without taking any external library dependencies. At the time of writing, the [System.Diagnostics.Metrics](xref:System.Diagnostics.Metrics) support is limited to [dotnet-counters](dotnet-counters.md) and [OpenTelemetry.NET](https://opentelemetry.io/docs/net/). However, we expect support for these APIs will grow given the active nature of the OpenTelemetry project. +Although this API was designed to work well with OpenTelemetry and its growing ecosystem of pluggable vendor integration libraries, applications also have the option to use the .NET built-in listener APIs directly. With this option, you can create custom metric tooling without taking any external library dependencies. At the time of writing, the [System.Diagnostics.Metrics](xref:System.Diagnostics.Metrics) support is limited to [dotnet-counters](dotnet-counters.md) and [OpenTelemetry.NET](https://opentelemetry.io/docs/net/). However, we expect support for these APIs will grow given the active nature of the OpenTelemetry project. ### PerformanceCounter diff --git a/docs/orleans/host/monitoring/index.md b/docs/orleans/host/monitoring/index.md index 77694efa1338c..5f0117c0c2138 100644 --- a/docs/orleans/host/monitoring/index.md +++ b/docs/orleans/host/monitoring/index.md @@ -1,7 +1,7 @@ --- title: Orleans observability description: Explore the various runtime monitoring, logging, distributed tracing, and metrics options available in .NET Orleans. -ms.date: 02/22/2023 +ms.date: 08/31/2023 zone_pivot_groups: orleans-version --- @@ -11,7 +11,7 @@ One of the most important aspects of a distributed system is observability. Obse ## Logging -Orleans leverages [Microsoft.Extensions.Logging](https://www.nuget.org/packages/Microsoft.Extensions.Logging) for all silo and client logs. This means that you can use any logging provider that is compatible with `Microsoft.Extensions.Logging`. Your app code would rely on [dependency injection](../../../core/extensions/dependency-injection.md) to get an instance of and use it to log messages. For more information, see [Logging in .NET](../../../core/extensions/logging.md). +Orleans uses [Microsoft.Extensions.Logging](https://www.nuget.org/packages/Microsoft.Extensions.Logging) for all silo and client logs. You can use any logging provider that is compatible with `Microsoft.Extensions.Logging`. Your app code would rely on [dependency injection](../../../core/extensions/dependency-injection.md) to get an instance of and use it to log messages. For more information, see [Logging in .NET](../../../core/extensions/logging.md). :::zone target="docs" pivot="orleans-7-0" @@ -79,9 +79,245 @@ Press p to pause, r to resume, q to quit. For more information, see [Investigate performance counters (dotnet-counters)](../../../core/diagnostics/dotnet-counters.md). +### Orleans meters + +Orleans uses the [System.Diagnostics.Metrics](../../../core/diagnostics/compare-metric-apis.md#systemdiagnosticsmetrics) APIs to collect metrics. Orleans categorizes each meter into domain-centric concerns, such as networking, messaging, gateway, and so on. The following subsections describe the meters that Orleans uses. + +#### Networking + +The following table represents a collection of networking meters that are used to monitor the Orleans networking layer. + +| Meter name | Type | Description | +|--|--|--| +| `orleans-networking-sockets-closed` | | A count of sockets that have closed. | +| `orleans-networking-sockets-opened` | | A count of sockets that have opened. | + +#### Messaging + +The following table represents a collection of messaging meters that are used to monitor the Orleans messaging layer. + +| Meter name | Type | Description | +|--|--|--| +| `orleans-messaging-sent-messages-size` | | A histogram representing the size of messages in bytes that have been sent. | +| `orleans-messaging-received-messages-size` | | A histogram representing the size of messages in bytes that have been received. | +| `orleans-messaging-sent-header-size` | | An observable counter representing the number of header bytes sent. | +| `orleans-messaging-received-header-size` | | An observable counter representing the number of header bytes received. | +| `orleans-messaging-sent-failed` | | A count of failed sent messages. | +| `orleans-messaging-sent-dropped` | | A count of dropped sent messages. | +| `orleans-messaging-processing-dispatcher-received` | | An observable counter representing the number dispatcher received messages. | +| `orleans-messaging-processing-dispatcher-processed` | | An observable counter representing the number dispatcher processed messages. | +| `orleans-messaging-processing-dispatcher-forwarded` | | An observable counter representing the number dispatcher forwarded messages. | +| `orleans-messaging-processing-ima-received` | | An observable counter representing the number of incoming messages received. | +| `orleans-messaging-processing-ima-enqueued` | | An observable counter representing the number of incoming messages enqueued. | +| `orleans-messaging-processing-activation-data` | | An observable gauge representing all of the processing activation data. | +| `orleans-messaging-pings-sent` | | A count of pings sent. | +| `orleans-messaging-pings-received` | | A count of pings received. | +| `orleans-messaging-pings-reply-received` | | A count of ping replies received. | +| `orleans-messaging-pings-reply-missed` | | A count of ping replies missed. | +| `orleans-messaging-expired"` | | A count of messages that have expired. | +| `orleans-messaging-rejected` | | A count of messages that have been rejected. | +| `orleans-messaging-rerouted` | | A count of messages that have been rerouted. | +| `orleans-messaging-sent-local` | | An observable counter representing the number of local messages sent. | + +#### Gateway + +The following table represents a collection of gateway meters that are used to monitor the Orleans gateway layer. + +| Meter name | Type | Description | +|--|--|--| +| `orleans-gateway-connected-clients` | | An up/down counter representing the number of connected clients. | +| `orleans-gateway-sent` | | A count of gateway messages sent. | +| `orleans-gateway-received` | | A count of gateway messages received. | +| `orleans-gateway-load-shedding` | | A count of gateway (load shedding) messages that have been rejected due to the gateway being overloaded. | + +#### Runtime + +The following table represents a collection of runtime meters that are used to monitor the Orleans runtime layer. + +| Meter name | Type | Description | +|--|--|--| +| `orleans-scheduler-long-running-turns` | | A count of long running turns within the scheduler. | +| `orleans-runtime-total-physical-memory` | | An observable counter representing the total number of memory (in MB) of the Orleans runtime. | +| `orleans-runtime-available-memory` | | An observable counter representing the available memory (in MB) for the Orleans runtime. | + +#### Catalog + +The following table represents a collection of catalog meters that are used to monitor the Orleans catalog layer. + +| Meter name | Type | Description | +|--|--|--| +| `orleans-catalog-activations` | | An observable gauge representing the number of catalog activations. | +| `orleans-catalog-activation-working-set` | | An observable gauge representing the number of activations within the working set. | +| `orleans-catalog-activation-created` | | A count of created activations. | +| `orleans-catalog-activation-destroyed` | | A count of destroyed activations. | +| `orleans-catalog-activation-failed-to-activate` | | A count of activations that failed to activate. | +| `orleans-catalog-activation-collections` | | A count of idle activation collections. | +| `orleans-catalog-activation-shutdown` | | A count of shutdown activations. | +| `orleans-catalog-activation-non-existent` | | A count of non-existent activations. | +| `orleans-catalog-activation-concurrent-registration-attempts` | | A count of concurrent activation registration attempts. | + +#### Directory + +The following table represents a collection of directory meters that are used to monitor the Orleans directory layer. + +| Meter name | Type | Description | +|--|--|--| +| `orleans-directory-lookups-local-issued` | | A count of local lookups issued. | +| `orleans-directory-lookups-local-successes` | | A count of local successful lookups. | +| `orleans-directory-lookups-full-issued` | | A count of full directory lookups issued. | +| `orleans-directory-lookups-remote-sent` | | A count of remote directory lookups sent. | +| `orleans-directory-lookups-remote-received` | | A count of remote directory lookups received. | +| `orleans-directory-lookups-local-directory-issued` | | A count of local directory lookups issued. | +| `orleans-directory-lookups-local-directory-successes` | | A count of local directory successful lookups. | +| `orleans-directory-lookups-cache-issued` | | A count cached lookups issued. | +| `orleans-directory-lookups-cache-successes` | | A count of cached successful lookups. | +| `orleans-directory-validations-cache-sent` | | A count of directory cache validations sent. | +| `orleans-directory-validations-cache-received` | | A count of directory cache validations received. | +| `orleans-directory-partition-size` | | An observable gauge representing the directory partition size. | +| `orleans-directory-cache-size` | | An observable gauge representing the directory cache size. | +| `orleans-directory-ring-size` | | An observable gauge representing the directory ring size. | +| `orleans-directory-ring-local-portion-distance` | | An observable gauge representing the ring range owned by the local directory partition. | +| `orleans-directory-ring-local-portion-percentage` | | An observable gauge representing the ring range owned by the local directory, represented as a percentage of the total range. | +| `orleans-directory-ring-local-portion-average-percentage` | | An observable gauge representing the average percentage of the directory ring range owned by each silo, giving a representation of how balanced directory ownership. | +| `orleans-directory-registrations-single-act-issued` | | A count of directory single activation registrations issued. | +| `orleans-directory-registrations-single-act-local` | | A count of directory single activation registrations handled by the local directory partition. | +| `orleans-directory-registrations-single-act-remote-sent` | | A count of directory single activation registrations sent to a remote directory partition. | +| `orleans-directory-registrations-single-act-remote-received` | | A count of directory single activation registrations received from remote hosts. | +| `orleans-directory-unregistrations-issued` | | A count of directory deregistrations issued. | +| `orleans-directory-unregistrations-local` | | A count of directory deregistrations handled by the local directory partition. | +| `orleans-directory-unregistrations-remote-sent` | | A count of directory deregistrations sent to remote directory partitions. | +| `orleans-directory-unregistrations-remote-received` | | A count of directory deregistrations received from remote hosts. | +| `orleans-directory-unregistrations-many-issued` | | A count of directory multi-activation deregistrations issued. | +| `orleans-directory-unregistrations-many-remote-sent` | | A count of directory multi-activations deregistrations sent to remote directory partitions. | +| `orleans-directory-unregistrations-many-remote-received` | | A count of directory multi-activation deregistrations received from remote hosts. | + +#### Consistent ring + +The following table represents a collection of consistent ring meters that are used to monitor the Orleans consistent ring layer. + +| Meter name | Type | Description | +|--|--|--| +| `orleans-consistent-ring-size` | | An observable gauge representing the consistent ring size. | +| `orleans-consistent-ring-range-percentage-local` | | An observable gauge representing the consistent ring local percentage. | +| `orleans-consistent-ring-range-percentage-average` | | An observable gauge representing the consistent ring average percentage. | + +#### Watchdog + +The following table represents a collection of watchdog meters that are used to monitor the Orleans watchdog layer. + +| Meter name | Type | Description | +|--|--|--| +| `orleans-watchdog-health-checks` | | A count of watchdog health checks. | +| `orleans-watchdog-health-checks-failed` | | A count of failed watchdog health checks. | + +#### Client + +The following table represents a collection of client meters that are used to monitor the Orleans client layer. + +| Meter name | Type | Description | +|--|--|--| +| `orleans-client-connected-gateways` | | An observable gauge representing the number of connected gateway clients. | + +#### Miscellaneous + +The following table represents a collection of miscellaneous meters that are used to monitor various layers. + +| Meter name | Type | Description | +|--|--|--| +| `orleans-grains` | | A count representing the number of grains. | +| `orleans-system-targets` | | A count representing the number of system targets. | + +#### App requests + +The following table represents a collection of app request meters that are used to monitor the Orleans app request layer. + +| Meter name | Type | Description | +|--|--|--| +| `orleans-app-requests-latency` | | An observable counter representing app request latency. | +| `orleans-app-requests-timedout` | | An observable counter representing app requests that have timed out. | + +#### Reminders + +The following table represents a collection of reminder meters that are used to monitor the Orleans reminder layer. + +| Meter name | Type | Description | +|--|--|--| +| `orleans-reminders-tardiness` | | A histogram representing the number of seconds a reminder is tardy. | +| `orleans-reminders-active` | | An observable gauge representing the number active reminders. | +| `orleans-reminders-ticks-delivered` | | A count representing the number of reminder ticks that have been delivered. | + +#### Storage + +The following table represents a collection of storage meters that are used to monitor the Orleans storage layer. + +| Meter name | Type | Description | +|--|--|--| +| `orleans-storage-read-errors` | | A count representing the number of storage read errors. | +| `orleans-storage-write-errors` | | A count representing the number of storage write errors. | +| `orleans-storage-clear-errors` | | A count representing the number of storage clear errors. | +| `orleans-storage-read-latency` | | A histogram representing the storage read latency in milliseconds. | +| `orleans-storage-write-latency` | | A histogram representing the storage write latency in milliseconds. | +| `orleans-storage-clear-latency` | | A histogram representing the storage clear latency in milliseconds. | + +#### Streams + +The following table represents a collection of stream meters that are used to monitor the Orleans stream layer. + +| Meter name | Type | Description | +|--|--|--| +| `orleans-streams-pubsub-producers-added` | | A count of streaming pubsub producers added. | +| `orleans-streams-pubsub-producers-removed` | | A count of streaming pubsub producers removed. | +| `orleans-streams-pubsub-producers` | | A count of streaming pubsub producers. | +| `orleans-streams-pubsub-consumers-added` | | A count of streaming pubsub consumers added. | +| `orleans-streams-pubsub-consumers-removed` | | A count of streaming pubsub consumers removed. | +| `orleans-streams-pubsub-consumers` | | A count of streaming pubsub consumers. | +| `orleans-streams-persistent-stream-pulling-agents` | | An observable gauge representing the number of persistent stream pulling agents. | +| `orleans-streams-persistent-stream-messages-read` | | A count of persistent stream messages read. | +| `orleans-streams-persistent-stream-messages-sent` | | A count of persistent stream messages sent. | +| `orleans-streams-persistent-stream-pubsub-cache-size` | | An observable gauge representing the persistent stream pubsub cache size. | +| `orleans-streams-queue-initialization-failures` | | A count of steam queue initialization failures. | +| `orleans-streams-queue-initialization-duration` | | A count of steam queue initialization occurrences. | +| `orleans-streams-queue-initialization-exceptions` | | A count of steam queue initialization exceptions. | +| `orleans-streams-queue-read-failures` | | A count of steam queue read failures. | +| `orleans-streams-queue-read-duration` | | A count of steam queue read occurrences. | +| `orleans-streams-queue-read-exceptions` | | A count of steam queue read exceptions. | +| `orleans-streams-queue-shutdown-failures` | | A count of steam queue shutdown failures. | +| `orleans-streams-queue-shutdown-duration` | | A count of steam queue shutdown occurrences. | +| `orleans-streams-queue-shutdown-exceptions` | | A count of steam queue shutdown exceptions. | +| `orleans-streams-queue-messages-received` | | An observable counter representing the number of stream queue messages received. | +| `orleans-streams-queue-oldest-message-enqueue-age` | | An observable gauge representing the age of the oldest enqueued message. | +| `orleans-streams-queue-newest-message-enqueue-age` | | An observable gauge representing the age of the newest enqueued message. | +| `orleans-streams-block-pool-total-memory` | | An observable counter representing the stream block pool total memory in bytes. | +| `orleans-streams-block-pool-available-memory` | | An observable counter representing the stream block pool available memory in bytes. | +| `orleans-streams-block-pool-claimed-memory` | | An observable counter representing the stream block pool claimed memory in bytes. | +| `orleans-streams-block-pool-released-memory` | | An observable counter representing the stream block pool released memory in bytes. | +| `orleans-streams-block-pool-allocated-memory` | | An observable counter representing the stream block pool allocated memory in bytes. | +| `orleans-streams-queue-cache-size` | | An observable counter representing the stream queue cache size in bytes. | +| `orleans-streams-queue-cache-length` | | An observable counter representing the stream queue length. | +| `orleans-streams-queue-cache-messages-added` | | An observable counter representing the stream queue messages added. | +| `orleans-streams-queue-cache-messages-purged` | | An observable counter representing the stream queue messages purged. | +| `orleans-streams-queue-cache-memory-allocated` | | An observable counter representing the stream queue memory allocated. | +| `orleans-streams-queue-cache-memory-released` | | An observable counter representing the stream queue memory released. | +| `orleans-streams-queue-cache-oldest-to-newest-duration` | | An observable gauge representing the duration from the oldest to the newest stream queue cache. | +| `orleans-streams-queue-cache-oldest-age` | | An observable gauge representing the age of the oldest cached message. | +| `orleans-streams-queue-cache-pressure` | | An observable gauge representing the pressure on the stream queue cache. | +| `orleans-streams-queue-cache-under-pressure` | | An observable gauge representing whether the stream queue cache is under pressure. | +| `orleans-streams-queue-cache-pressure-contribution-count` | | An observable counter representing the stream queue cache pressure contributions. | + +#### Transactions + +The following table represents a collection of transaction meters that are used to monitor the Orleans transaction layer. + +| Meter name | Type | Description | +|--|--|--| +| `orleans-transactions-started` | | An observable counter representing the number of started transactions. | +| `orleans-transactions-successful` | | An observable counter representing the number of successful transactions. | +| `orleans-transactions-failed` | | An observable counter representing the number of failed transactions. | +| `orleans-transactions-throttled` | | An observable counter representing the number of throttled transactions. | + ### Prometheus -There are various third-party metrics providers that you can use with Orleans. One popular example is [Prometheus](https://prometheus.io), which can be used to collect metrics from your app in conjunction with OpenTelemetry. +There are various third-party metrics providers that you can use with Orleans. One popular example is [Prometheus](https://prometheus.io), which can be used to collect metrics from your app with OpenTelemetry. To use OpenTelemetry and Prometheus with Orleans, call the following `IServiceCollection` extension method: @@ -113,10 +349,10 @@ app.Run(); Distributed tracing is a set of tools and practices to monitor and troubleshoot distributed applications. Distributed tracing is a key component of observability, and it's a critical tool for developers to understand the behavior of their apps. Orleans also supports distributed tracing with [OpenTelemetry](https://opentelemetry.io). -Regardless of the distributed tracing exporter you choose, you'll call: +Regardless of the distributed tracing exporter you choose, you call: -- : To enable distributed tracing for the silo. -- : To enable distributed tracing for the client. +- : which enables distributed tracing for the silo. +- : which enables distributed tracing for the client. Referring back to the [Orleans GPS Tracker sample app](/samples/dotnet/samples/orleans-gps-device-tracker-sample), you can use the [Zipkin](https://zipkin.io) distributed tracing system to monitor the app by updating the _Program.cs_. To use OpenTelemetry and Zipkin with Orleans, call the following `IServiceCollection` extension method: @@ -156,11 +392,11 @@ For more information, see [Distributed tracing](../../../core/diagnostics/distri Orleans outputs its runtime statistics and metrics through the interface. The application can register one or more telemetry consumers for their silos and clients, to receive statistics and metrics that the Orleans runtime periodically publishes. These can be consumers for popular telemetry analytics solutions or custom ones for any other destination and purpose. Three telemetry consumers are currently included in the Orleans codebase. -They are released as separate NuGet packages: +They're released as separate NuGet packages: - `Microsoft.Orleans.OrleansTelemetryConsumers.AI` for publishing to [Azure Application Insights](/azure/azure-monitor/app/app-insights-overview). -- `Microsoft.Orleans.OrleansTelemetryConsumers.Counters` for publishing to Windows performance counters. The Orleans runtime continually updates a number of them. The _CounterControl.exe_ tool, included in the [`Microsoft.Orleans.CounterControl`](https://www.nuget.org/packages/Microsoft.Orleans.CounterControl/) NuGet package, helps register necessary performance counter categories. It has to run with elevated privileges. The performance counters can be monitored using any of the standard monitoring tools. +- `Microsoft.Orleans.OrleansTelemetryConsumers.Counters` for publishing to Windows performance counters. The Orleans runtime continually updates many them. The _CounterControl.exe_ tool, included in the [`Microsoft.Orleans.CounterControl`](https://www.nuget.org/packages/Microsoft.Orleans.CounterControl/) NuGet package, helps register necessary performance counter categories. It has to run with elevated privileges. The performance counters can be monitored using any of the standard monitoring tools. - `Microsoft.Orleans.OrleansTelemetryConsumers.NewRelic`, for publishing to [New Relic](https://newrelic.com/).