## Chapter 25: Monitoring and Observability

Once your application is deployed to production, you need to understand what it’s doing. Is it healthy? Are there errors? Is performance degrading? These questions are answered by **observability**—the ability to measure the internal state of a system by examining its outputs. In modern cloud‑native applications, observability is built on three pillars: **logs**, **metrics**, and **traces**. ASP.NET Core provides robust foundations for all three, and with tools like Serilog, OpenTelemetry, and health checks, you can gain deep insight into your application’s behaviour. In this chapter, you’ll learn how to implement structured logging, collect and export metrics, instrument distributed tracing, and set up health checks for production readiness. By the end, you’ll be able to build observable applications that can be effectively monitored and debugged.

### 25.1 The Three Pillars of Observability

- **Logs**: Discrete events recorded by the application. They are often the first place developers look when something goes wrong. Structured logs (key‑value pairs) make them searchable and analyzable.
- **Metrics**: Aggregated numerical measurements over time (e.g., request rate, error rate, latency). Metrics help you spot trends and set alerts.
- **Traces**: Represent the path of a single request as it travels through distributed services. Traces show you where time is spent and where errors occur.

Together, these three pillars give you a complete picture of your system’s health and performance.

### 25.2 Application Performance Monitoring (APM) Concepts

**APM** tools (like Application Insights, New Relic, Datadog) combine logs, metrics, and traces into a unified view. They often provide automatic instrumentation for frameworks like ASP.NET Core, so you can get started with minimal code changes.

In this chapter, we’ll focus on building our own observability stack using open standards (OpenTelemetry) and popular open‑source tools (Prometheus, Jaeger, Grafana). However, the concepts apply equally to commercial APM solutions.

### 25.3 Structured Logging with Serilog

We introduced logging in Chapter 11, but let’s revisit it in the context of observability. **Structured logging** means that log events are not just text; they have properties that can be queried. Serilog is the most popular structured logging library for .NET.

#### Setting Up Serilog

Add the packages:

```bash
dotnet add package Serilog.AspNetCore
dotnet add package Serilog.Sinks.Console
dotnet add package Serilog.Sinks.File
dotnet add package Serilog.Sinks.Seq   // optional, for a structured log server
```

In `Program.cs`, configure Serilog before building the host:

```csharp
using Serilog;

Log.Logger = new LoggerConfiguration()
    .MinimumLevel.Debug()
    .WriteTo.Console(outputTemplate: "[{Timestamp:HH:mm:ss} {Level:u3}] {Message:lj} {Properties:j}{NewLine}{Exception}")
    .WriteTo.File("logs/myapp.txt", rollingInterval: RollingInterval.Day)
    .WriteTo.Seq("http://localhost:5341")
    .Enrich.WithMachineName()
    .Enrich.WithThreadId()
    .CreateLogger();

try
{
    Log.Information("Starting web application");
    var builder = WebApplication.CreateBuilder(args);
    builder.Host.UseSerilog(); // Replace default ILoggerFactory with Serilog
    // ... rest of builder
    var app = builder.Build();
    app.Run();
}
catch (Exception ex)
{
    Log.Fatal(ex, "Application start-up failed");
}
finally
{
    Log.CloseAndFlush();
}
```

Now, any `ILogger<T>` injected into your classes will write through Serilog, capturing structured properties.

#### Using Structured Logging

```csharp
public class ProductsController : ControllerBase
{
    private readonly ILogger<ProductsController> _logger;

    public ProductsController(ILogger<ProductsController> logger)
    {
        _logger = logger;
    }

    [HttpGet("{id}")]
    public async Task<ActionResult<Product>> GetProduct(int id)
    {
        _logger.LogInformation("Fetching product {ProductId}", id);
        // ...
    }
}
```

The `{ProductId}` placeholder is captured as a property named `ProductId`. In Seq or Elasticsearch, you can search for `ProductId = 123`.

#### Benefits

- **Searchability**: You can filter logs by any property.
- **Context**: You can enrich logs with global properties (machine name, version, etc.).
- **Integration**: Structured logs can be consumed by log aggregators and correlated with traces.

### 25.4 Introduction to OpenTelemetry (Traces, Metrics, Logs)

**OpenTelemetry** (OTel) is a vendor‑neutral standard for collecting telemetry data. It provides APIs and SDKs to generate traces, metrics, and logs, and then export them to various backends. Microsoft has adopted OpenTelemetry as the recommended approach for .NET observability.

#### Key Concepts

- **Tracer**: Creates spans representing units of work.
- **Meter**: Creates instruments for recording measurements (counters, histograms, etc.).
- **Exporter**: Sends telemetry to a backend (e.g., Jaeger, Prometheus, Azure Monitor).

#### Setting Up OpenTelemetry in ASP.NET Core

Add the packages:

```bash
dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package OpenTelemetry.Instrumentation.AspNetCore
dotnet add package OpenTelemetry.Instrumentation.Http
dotnet add package OpenTelemetry.Exporter.Console    // for debugging
dotnet add package OpenTelemetry.Exporter.Jaeger      // for traces
dotnet add package OpenTelemetry.Exporter.Prometheus.AspNetCore // for metrics
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol (OTLP) // for generic OTLP
```

In `Program.cs`:

```csharp
using OpenTelemetry.Metrics;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;

var builder = WebApplication.CreateBuilder(args);

// Configure OpenTelemetry
builder.Services.AddOpenTelemetry()
    .ConfigureResource(resource => resource.AddService(
        serviceName: builder.Environment.ApplicationName,
        serviceVersion: "1.0.0"))
    .WithTracing(tracing => tracing
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddConsoleExporter() // for development; remove in production
        .AddJaegerExporter(options =>
        {
            options.AgentHost = "localhost"; // or your Jaeger agent
            options.AgentPort = 6831;
        }))
    .WithMetrics(metrics => metrics
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddPrometheusExporter());

builder.Services.AddControllers();
// ...
```

Now, your application automatically generates spans for incoming HTTP requests and outgoing HTTP calls, as well as metrics like request duration and count.

#### Adding Manual Instrumentation

You can create custom spans to measure specific operations.

```csharp
using System.Diagnostics;

public class ProductService
{
    private static readonly ActivitySource ActivitySource = new("MyApp.ProductService");

    public async Task<Product> GetProductAsync(int id)
    {
        using var activity = ActivitySource.StartActivity("GetProduct");
        activity?.SetTag("product.id", id);

        // ... your logic

        return product;
    }
}
```

#### Viewing Traces

If you run Jaeger locally (using Docker: `docker run -d --name jaeger -p 6831:6831/udp -p 16686:16686 jaegertracing/all-in-one`), you can view traces at `http://localhost:16686`.

### 25.5 Health Checks Middleware (`/health` endpoints)

We touched on health checks in Chapter 11. Now let’s dive deeper. Health checks are crucial for orchestration systems like Kubernetes, which use them to decide when to restart a container or route traffic.

#### Basic Health Check Setup

```csharp
builder.Services.AddHealthChecks();

app.MapHealthChecks("/health"); // simple health endpoint
```

This returns a plain text "Healthy" (200 OK) or "Unhealthy" (503 Service Unavailable) based on the overall status.

#### Adding Database and Dependent Service Checks

You can add checks for specific dependencies:

```csharp
builder.Services.AddHealthChecks()
    .AddDbContextCheck<AppDbContext>() // checks database connectivity
    .AddUrlGroup(new Uri("https://api.example.com"), "External API");
```

#### Custom Health Checks

Create a class implementing `IHealthCheck`.

```csharp
public class MemoryHealthCheck : IHealthCheck
{
    private readonly long _threshold = 1024 * 1024 * 1024; // 1GB

    public Task<HealthCheckResult> CheckHealthAsync(HealthCheckContext context, CancellationToken cancellationToken = default)
    {
        var memory = GC.GetTotalMemory(forceFullCollection: false);
        if (memory < _threshold)
        {
            return Task.FromResult(HealthCheckResult.Healthy($"Memory usage is okay: {memory / 1024 / 1024} MB"));
        }
        return Task.FromResult(HealthCheckResult.Unhealthy($"Memory usage exceeds threshold: {memory / 1024 / 1024} MB"));
    }
}
```

Register it:

```csharp
builder.Services.AddHealthChecks()
    .AddCheck<MemoryHealthCheck>("memory");
```

#### Readiness and Liveness Probes in Kubernetes

Kubernetes distinguishes between **liveness** (is the app alive? if not, restart) and **readiness** (is the app ready to serve traffic?). You can expose different health endpoints for each.

```csharp
app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("ready")
});

app.MapHealthChecks("/health/live", new HealthCheckOptions
{
    Predicate = _ => false // exclude all checks; just returns 200 if process is alive
});
```

Tag your checks accordingly:

```csharp
.AddDbContextCheck<AppDbContext>(tags: new[] { "ready" })
.AddCheck<MemoryHealthCheck>("memory", tags: new[] { "ready" });
```

### 25.6 Exporting Telemetry to Backends

#### Prometheus for Metrics

We already added the Prometheus exporter. To expose metrics for Prometheus scraping, add:

```csharp
app.UseOpenTelemetryPrometheusScrapingEndpoint();
```

This creates an endpoint `/metrics` (configurable) that Prometheus can scrape.

#### Jaeger for Traces

We configured the Jaeger exporter. Ensure your Jaeger agent is reachable.

#### Azure Monitor / Application Insights

You can use the OpenTelemetry exporter for Azure Monitor, or use the Application Insights SDK directly. For OpenTelemetry:

```bash
dotnet add package OpenTelemetry.Exporter.AzureMonitor
```

Then configure:

```csharp
.AddAzureMonitorTraceExporter(options => options.ConnectionString = builder.Configuration["APPLICATIONINSIGHTS_CONNECTION_STRING"]);
```

#### OTLP Exporter

If you have an OTLP collector (e.g., Grafana Tempo, Honeycomb), use the OTLP exporter:

```bash
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol
```

```csharp
.AddOtlpExporter(options => options.Endpoint = new Uri("http://otel-collector:4317"));
```

### 25.7 Building a Simple Observability Dashboard with Grafana

Once you have Prometheus collecting metrics and Jaeger for traces, you can visualize them in Grafana.

1. Run Grafana locally with Docker:

   ```bash
   docker run -d --name grafana -p 3000:3000 grafana/grafana
   ```

2. Add Prometheus as a data source (URL: `http://host.docker.internal:9090` if Prometheus is on host).
3. Import a dashboard for ASP.NET Core (e.g., from Grafana.com) or create your own to display request rates, error rates, and latency.

### 25.8 Best Practices

- **Use consistent naming** for services and spans.
- **Add meaningful attributes** to spans (e.g., user ID, product ID) for correlation.
- **Sample traces** in production to reduce overhead (OpenTelemetry supports head‑based sampling).
- **Set up alerts** on key metrics (e.g., error rate > 1%, p95 latency > 500ms).
- **Correlate logs, metrics, and traces** using common identifiers like `traceId`. OpenTelemetry can inject trace IDs into log contexts.
- **Store logs and traces with retention policies** appropriate for your needs.

### 25.9 Putting It All Together: An Observable ASP.NET Core App

Let’s create a minimal observable application that combines structured logging, distributed tracing, and health checks.

**Program.cs** (excerpt):

```csharp
using Serilog;
using OpenTelemetry.Metrics;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;

// Serilog setup
Log.Logger = new LoggerConfiguration()
    .MinimumLevel.Debug()
    .WriteTo.Console()
    .WriteTo.Seq("http://localhost:5341")
    .Enrich.WithProperty("Application", "MyApp")
    .CreateLogger();

try
{
    var builder = WebApplication.CreateBuilder(args);
    builder.Host.UseSerilog();

    // OpenTelemetry
    builder.Services.AddOpenTelemetry()
        .ConfigureResource(resource => resource.AddService(
            serviceName: "MyApp",
            serviceVersion: "1.0.0"))
        .WithTracing(tracing => tracing
            .AddAspNetCoreInstrumentation()
            .AddHttpClientInstrumentation()
            .AddJaegerExporter())
        .WithMetrics(metrics => metrics
            .AddAspNetCoreInstrumentation()
            .AddPrometheusExporter());

    // Health checks
    builder.Services.AddHealthChecks()
        .AddDbContextCheck<AppDbContext>();

    builder.Services.AddControllers();

    var app = builder.Build();

    app.UseRouting();
    app.UseEndpoints(endpoints =>
    {
        endpoints.MapControllers();
        endpoints.MapHealthChecks("/health");
        endpoints.MapPrometheusScrapingEndpoint(); // /metrics
    });

    app.Run();
}
catch (Exception ex)
{
    Log.Fatal(ex, "Application failed to start");
}
finally
{
    Log.CloseAndFlush();
}
```

Now, when you run the app:
- Logs go to console and Seq (http://localhost:5341).
- Traces go to Jaeger (http://localhost:16686).
- Metrics are exposed at `/metrics` for Prometheus.
- Health checks at `/health`.

### Summary

In this chapter, you’ve learned the essentials of making your ASP.NET Core applications observable:

- The **three pillars** of observability: logs, metrics, and traces.
- **Structured logging** with Serilog for searchable, contextual logs.
- **OpenTelemetry** for automatic and manual instrumentation of traces and metrics.
- **Health checks** for Kubernetes readiness and liveness probes.
- Exporting telemetry to backends like Jaeger, Prometheus, and Azure Monitor.

With these tools, you can gain deep insight into your application’s behavior, detect issues early, and maintain high reliability in production.

**Exercise:**

1. Add Serilog to your existing project and configure it to write to the console and a file. Include additional enrichment (e.g., machine name, process ID).
2. Set up OpenTelemetry with tracing and metrics. Run Jaeger locally and verify that traces appear after making requests.
3. Add a custom health check that verifies the availability of an external API you depend on.
4. (Optional) Install Prometheus and Grafana locally, configure Prometheus to scrape your app’s `/metrics` endpoint, and create a simple dashboard showing request rate and error rate.

In the next and final chapter, **"Conclusion and Next Steps,"** we’ll recap the journey, discuss how to stay up‑to‑date with the .NET ecosystem, and suggest further learning paths, including microservices, Blazor, and MAUI. You’ll also find a glossary and appendices with common interview questions and CLI cheat sheets.