Skip to content

Metrics to Monitor Microservices with OpenTelemetry

Ebubekir Dinç edited this page Jan 1, 2024 · 1 revision

Metrics are essential to monitoring, controlling, and optimizing the system’s scalability, performance,
and reliability in a microservices architecture. Metrics make it easier to keep an eye on how well the system and each
individual microservice are performing. This covers resource usage, throughput, and response times. You can find bottlenecks
and improve the efficiency of particular services by looking at these indicators. Metrics offer information about the availability
and condition of every microservice. Tracking metrics like error rates and service uptime guarantees that the system is
responsive and available generally and aids in the early detection of problems.

We can also access data such as how many calls an endpoint has received, how many messages have been left in a queue,
the last state of the stock and its change over time, the execution times of an endpoint, or methods that run for
more than 500 ms
. These are all data that we might want to track in a distributed system.

In a microservices design, metrics work in tandem with logging and tracing. While logs provide detailed information
about specific events, metrics offer aggregated and summarized data that can be used for trend analysis and
high-level monitoring.


In our project, SuuCat, Metrics has been implemented using OpenTelemetry together with Prometheus. OpenTelemetry
Metrics
facilitates consistency and interoperability in the observability arena by offering an extensible
and standardized method for instrumenting, gathering, and exporting metric data from applications.
And Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability
in modern, dynamic microservices architectures. They both have Nuget packages in .NET and are easy to implement.

Prometheus can be installed using the following Docker files.
More information about the installation is here: https://github.com/ebubekirdinc/SuuCat/wiki/GettingStarted

metrics_docker.png
https://github.com/ebubekirdinc/SuuCat/blob/master/docker-compose.yml

metrics_docker_override.png
https://github.com/ebubekirdinc/SuuCat/blob/master/docker-compose.override.yml

In the prometheus.yml file above, we define the set of instructions that Prometheus follows to collect metrics
data from certain microservices. The scrape_configs section contains a list of jobs, each with a unique job_name.
A job represents a collection of similar instances of a service that Prometheus scrapes. The scrape_interval
parameter defines how often Prometheus should scrape metrics from the targets. Here, the scrape_interval is set to 2s,
meaning Prometheus will scrape metrics from these targets every 2 seconds.

For Metrics, we will use the same common project we use for Tracing. We need to add the
OpenTelemetry.Exporter.Prometheus.AspNetCore” package to that project.

metrics_addopentelemetrymetrics.png
https://github.com/ebubekirdinc/SuuCat/blob/master/src/BuildingBlocks/Tracing/OpenTelemetryExtensions.cs

As you can see in the image above The extension method AddOpenTelemetryMetrics() is used to configure
OpenTelemetry metrics for our application. It adds OpenTelemetry to the services collection and configures it
with metrics options. It adds a Prometheus exporter, which is used to export the metrics data to Prometheus.
The resource is set with the service name and version from the OpenTelemetry parameters which are set in
appsettings.json files in each microservice.

When we run the project, we should see that the microservices to which we have added metrics are green as shown below.

metrics_prometheus_main.png
Prometheus Targets screen

Now let’s come to where we define the counters of the metrics. For defining and managing metrics in a microservices architecture,
the OpenTelemetryMetric class is created. This class contains several static Meter objects, each representing a different
microservice in the system: IdentityMeter, OrderMeter, and StockMeter. These Meter objects are used to create
different types of metrics.

metrics_opentelemetrymetric.png
https://github.com/ebubekirdinc/SuuCat/blob/master/src/BuildingBlocks/Tracing/OpenTelemetryMetric.cs

Now we will see how these are defined and with example screenshots from Prometheus.

For instance, IdentityMeter is the meter of the Identity microservice. And the UserCreatedEventCounter is
a Counter metric that tracks the number of user-created events in the Identity microservice. It is implemented
in the AuthController like this:

 OpenTelemetryMetric.UserCreatedEventCounter.Add(1, new KeyValuePair<string, object>("event.name", "UserCreatedEvent"));

To see this in Prometheus, go to the Prometheus home page, enter “user” in the search box, and
user_created_event_count_total” will appear among the options. Select it and press Execute and you will see a
screen similar to the one below.

metrics_Prometheus_Up_Counter.png
Prometheus Up Counter

Of course, you will need to make some requests to the SignUp endpoint in Swagger to generate enough data.

metrics_swagger.png
Identity microservice Swagger

In this type of counter (CreateCounter()) the data increases continuously, if we need both an increasing and
decreasing counter, we can use CreateUpDownCounter() as in StockMeter.CreateUpDownCounter().

After defining CreateUpDownCounter in OpenTelemetryMetric, we will add a line like the following to AddStockCommand in
the Subscription microservice to increase the stock.

createOrderCommandHandler_up_down_counter.png
https://github.com/ebubekirdinc/SuuCat/blob/master/src/Services/Subscription/src/Application/Stock/Commands/AddStock/AddStockCommandHandler.cs

We will also add a line to the OrderCreatedEventConsumer event to reduce the stock each time it is consumed.

orderCreatedEventConsumer_down_counter.png
https://github.com/ebubekirdinc/SuuCat/blob/master/src/Services/Subscription/src/Infrastructure/Consumers/Events/OrderCreatedEventConsumer.cs

To see the data generated, type "stock" in the search box in Prometheus and select "subscription_stock_count"
from the list that appears, then click Execute. You will see a screen similar to the one below. Again don't forget to
generate data before. Here you will see that the data is not only increasing but also decreasing.

metrics_Prometheus_UpDown_Counter.png
Prometheus UpDown Counter

Now let’s look at a third type, the Histogram. In OpenTelemetry, a histogram is a metric used to measure the
distribution of values over a period of time
. A histogram counter, specifically, is a type of counter metric that is
designed to capture statistical distribution information about a set of values. Unlike a simple counter that increments
by a fixed amount, a histogram counter captures a range of values and their frequencies. It provides insights into
how values are distributed across a given range.

In our case, it’s used to measure the duration of a method. We are expecting to store durations measured in milliseconds.
As you can see here, we collect the histogram data with Record(). You can also see below the normal
counter(OrderLongRunningRequestCounter) added for long-running methods.

performanceBehaviour_histogram.png
https://github.com/ebubekirdinc/SuuCat/blob/master/src/Services/Order/src/Application/Common/Behaviours/PerformanceBehaviour.cs

To see the data generated, type “order” in the search box in Prometheus and select “order_method_duration_milliseconds_bucket
from the list that appears, then click Execute. As a result, you can see the data divided into buckets like below.

metrics_Prometheus_Histogram_screen.png
Prometheus Histogram screen

To see individual metrics in each bucket you can click the corresponding bucket below the chart.

metrics_Prometheus_Histogram_buckets.png
Prometheus Histogram buckets


We have seen how to implement metrics in a microservices architecture using OpenTelemetry and Prometheus.
If you are looking for a visually better place, you can use Grafana. But Prometheus will be enough to get you started.

Related to this topic, you can also look at Distributed Tracing. Distributed Tracing with Jaeger and OpenTelemetry

More info can be found in the Prometheus docs,


References

https://prometheus.io/docs/prometheus/latest/getting_started/

https://github.com/prometheus/prometheus

https://opentelemetry.io/docs/instrumentation/net/getting-started/