## Chapter 10: Advanced Service Discovery and Resiliency

In the previous chapters, you’ve seen how Aspire simplifies service discovery and adds resilience to HTTP calls with just a few lines of code. But as your application grows, you may need to go beyond the defaults. Perhaps you need to:

- Expose multiple endpoints for a service (e.g., HTTP and gRPC on different ports).
- Implement a custom load‑balancing strategy.
- Route traffic to different versions of a service for testing.
- Fine‑tune retry policies for specific downstream services.

In this chapter, we’ll explore the advanced capabilities of Aspire’s service discovery and resilience layers. You’ll learn how to configure multiple binding schemes, customize the discovery process, and implement sophisticated resilience patterns with Polly. We’ll also look at traffic splitting techniques for local development. By the end, you’ll be able to build highly resilient and flexible microservices.

---

### 10.1 Manual vs. Automatic Service Discovery

Aspire’s default service discovery is **automatic**: when you add a project resource with a logical name, and another resource references it via `WithReference`, the AppHost injects environment variables containing the service’s endpoints. The `AddServiceDiscovery` method in the service defaults then configures an `IHttpClientBuilder` to resolve `http://servicename` URIs using those variables.

But sometimes you need **manual** control. For example:

- You have an external service not managed by Aspire.
- You want to override the resolved address for testing.
- You need to integrate with an existing service discovery system like Consul or Kubernetes DNS.

#### 10.1.1 Manual Endpoint Specification

You can manually add an endpoint to a resource using the `WithEndpoint` method. For instance, suppose you have an external API running at `https://api.example.com`. You can model it as a resource with a fixed endpoint:

```csharp
var externalApi = builder.AddResource(new ExternalResource("externalapi"))
    .WithEndpoint("https", uri: new Uri("https://api.example.com"))
    .AsService();  // Marks it as a service for discovery
```

Then you can reference it like any other resource:

```csharp
var apiService = builder.AddProject<Projects.MyAspireApp_ApiService>("apiservice")
    .WithReference(externalApi);
```

Now, inside your service, you can use `http://externalapi` and it will resolve to `https://api.example.com`. The scheme in the URI you use (`http` or `https`) determines which endpoint is selected.

#### 10.1.2 When to Use Manual Discovery

Manual discovery is useful for:

- Integrating with legacy systems.
- Pointing to cloud services that have fixed endpoints.
- Testing with different environments by changing the endpoint in one place.

---

### 10.2 Configuring Service Binding Schemes

A single service might need to expose multiple endpoints with different protocols. For example, a gRPC service might listen on a different port than its HTTP endpoints. Aspire allows you to define multiple endpoints per resource.

#### 10.2.1 Adding Multiple Endpoints to a Project

When you add a project with `AddProject`, Aspire automatically detects the HTTP and HTTPS endpoints from the project’s `launchSettings.json`. But you can also add custom endpoints explicitly:

```csharp
var apiService = builder.AddProject<Projects.MyAspireApp_ApiService>("apiservice")
    .WithEndpoint(port: 5001, scheme: "http", name: "http")
    .WithEndpoint(port: 5002, scheme: "https", name: "https")
    .WithEndpoint(port: 5003, scheme: "grpc", name: "grpc");
```

The `name` parameter is important: it’s used in the environment variable name (`services__apiservice__grpc__0`) and when you reference the endpoint with a specific scheme in your URI.

#### 10.2.2 Using Specific Endpoints in Service Discovery

When your code uses `http://apiservice`, the resolver picks the endpoint named `http` (or the first one with that scheme). If you want to use a different endpoint, specify the scheme in the URI:

```csharp
// For gRPC, you might use https (if using TLS) or grpc scheme
var client = new HttpClient { BaseAddress = new Uri("grpc://apiservice") };
```

The resolver will look for an endpoint with scheme `grpc`. If not found, it falls back to `http` or `https`. The mapping is controlled by the `IEndpointResolver` implementation.

#### 10.2.3 TCP Endpoints

For non‑HTTP protocols (like raw TCP), you can define a `tcp` endpoint. However, service discovery for HTTP clients won’t work with TCP; you’d need to use a different client. Aspire’s discovery is primarily for HTTP, but you can still inject the endpoint via environment variables and use it manually.

---

### 10.3 Customizing Service Discovery

Aspire’s service discovery is built on the `Microsoft.Extensions.ServiceDiscovery` library. You can customize how logical names are resolved by adding your own endpoint providers.

#### 10.3.1 The Service Discovery Architecture

The core interface is `IServiceEndpointProvider`. Implementations are registered in DI and are called in order to resolve a name to a list of endpoints. The default provider reads from configuration (environment variables injected by the AppHost). You can add providers for Consul, Kubernetes, or a custom source.

#### 10.3.2 Adding a Custom Endpoint Provider

Suppose you want to resolve a service named `legacy-inventory` from a static configuration file. Create a class implementing `IServiceEndpointProvider`:

```csharp
public class StaticConfigEndpointProvider : IServiceEndpointProvider
{
    private readonly IConfiguration _configuration;

    public StaticConfigEndpointProvider(IConfiguration configuration)
    {
        _configuration = configuration;
    }

    public async ValueTask<IEnumerable<Endpoint>> GetEndpointsAsync(
        string serviceName,
        CancellationToken cancellationToken = default)
    {
        if (serviceName == "legacy-inventory")
        {
            var address = _configuration["LegacyInventory:Address"];
            if (!string.IsNullOrEmpty(address))
            {
                return new[] { new Endpoint(new Uri(address)) };
            }
        }
        return Enumerable.Empty<Endpoint>();
    }
}
```

Then register it in the service’s `Program.cs`:

```csharp
builder.Services.AddServiceDiscoveryCore();
builder.Services.AddSingleton<IServiceEndpointProvider, StaticConfigEndpointProvider>();
```

You also need to keep the default configuration provider; you can add multiple providers, and they are called in registration order. The first one that returns endpoints wins.

#### 10.3.3 Client‑Side Load Balancing

When a service has multiple instances (e.g., you scaled it to 2 replicas in the AppHost), the environment variables will contain multiple endpoints for the same service:

```
services__apiservice__http__0=http://localhost:5001
services__apiservice__http__1=http://localhost:5002
```

The default resolver returns all endpoints, and the `IHttpClientFactory` integration uses a **round‑robin** load balancer by default. You can change this by adding a custom `IServiceEndpointSelector` or by configuring the `HttpClient` to use a different strategy.

For example, to use a random endpoint, you could implement a custom selector:

```csharp
public class RandomEndpointSelector : IServiceEndpointSelector
{
    private readonly Random _random = new();

    public Endpoint? SelectEndpoint(IEnumerable<Endpoint> endpoints)
    {
        var list = endpoints.ToList();
        if (list.Count == 0) return null;
        return list[_random.Next(list.Count)];
    }
}
```

Register it in DI:

```csharp
builder.Services.AddSingleton<IServiceEndpointSelector, RandomEndpointSelector>();
```

---

### 10.4 Advanced Resiliency with Polly

The default `AddStandardResilienceHandler` adds a robust set of policies. But you may need to tailor these policies for specific clients or scenarios.

#### 10.4.1 The Default Resilience Pipeline

The default pipeline includes:

- **Retry**: up to 3 attempts with exponential backoff for transient failures (HTTP 5XX, network errors).
- **Circuit breaker**: breaks after a certain number of failures, then half‑opens.
- **Timeout**: total timeout of 30 seconds.
- **Rate limiter**: (optional) limits concurrent requests.

You can see the exact defaults in the [source code](https://github.com/dotnet/aspnetcore/blob/main/src/Middleware/Http.Resilience/src/HttpResilienceExtensions.cs).

#### 10.4.2 Customizing Policies per Client

Use the `AddResilienceHandler` method on an `IHttpClientBuilder` to define a custom pipeline. For example, to change the retry count to 5 and use a fixed delay:

```csharp
builder.Services.AddHttpClient<WeatherApiClient>(client =>
{
    client.BaseAddress = new Uri("http://apiservice");
})
.AddResilienceHandler("custom-pipeline", builder =>
{
    builder.AddRetry(new HttpRetryStrategyOptions
    {
        MaxRetryAttempts = 5,
        Delay = TimeSpan.FromSeconds(1),
        BackoffType = DelayBackoffType.Constant, // no exponential
        UseJitter = true
    });
    builder.AddTimeout(TimeSpan.FromSeconds(10));
    // Circuit breaker is optional
});
```

You can also combine with the default by calling `AddStandardResilienceHandler` and then overriding parts? Not directly; you’d need to replicate the parts you want. But you can use the `StandardResilienceHandler` extension as a starting point.

#### 10.4.3 Circuit Breaker Patterns

A circuit breaker prevents calls to a failing service, giving it time to recover. Polly’s circuit breaker has three states: Closed (normal), Open (failing fast), and Half‑Open (testing recovery). You can configure thresholds:

```csharp
.AddResilienceHandler("cb-pipeline", builder =>
{
    builder.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
    {
        FailureRatio = 0.2, // break when 20% of requests fail
        SamplingDuration = TimeSpan.FromSeconds(10),
        MinimumThroughput = 5,
        BreakDuration = TimeSpan.FromSeconds(30)
    });
});
```

#### 10.4.4 Timeout and Cancellation

Always set a timeout. The default is 30 seconds, but for some operations you might want shorter timeouts. Use `AddTimeout` to set a per‑attempt timeout, or combine with a total request timeout.

#### 10.4.5 Combining with Service Discovery

When you use `AddHttpClient` with a logical address, the resilience handler works in tandem with service discovery. The request goes through the following steps:

1. The `ServiceDiscoveryHandler` resolves the logical name to one or more physical addresses (using the selector).
2. The resolved URI is passed to the next handler (the resilience pipeline).
3. The resilience pipeline executes the request, possibly retrying on failure. On retry, it may resolve the address again (depending on configuration) to pick a different instance.

You can control whether the address is re‑resolved on each retry by setting `ShouldRedecorate` on the service discovery handler. By default, it is re‑resolved, which allows failover to a different instance.

---

### 10.5 Traffic Splitting and Routing for Local Testing

When developing microservices, you may want to test a new version of a service without affecting the whole system. For example, you could route 10% of traffic to a “canary” instance. In a local development environment, you can achieve this with a reverse proxy like **YARP** (Yet Another Reverse Proxy) integrated into Aspire.

#### 10.5.1 Adding a YARP Proxy Resource

You can add a YARP container or a project that runs YARP and configure it to split traffic. The Aspire community has templates for this. A simpler approach is to use the `AddReverseProxy` extension in the AppHost (if available) or manually add a YARP configuration.

For instance, create a new project called `MyAspireApp.Gateway` that uses YARP. In its `Program.cs`, configure routing to split traffic between two versions of the API service:

```csharp
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddReverseProxy()
    .LoadFromConfig(builder.Configuration.GetSection("ReverseProxy"));
```

In `appsettings.json`:

```json
{
  "ReverseProxy": {
    "Routes": {
      "api-route": {
        "ClusterId": "api-cluster",
        "Match": {
          "Path": "/api/{**catch-all}"
        }
      }
    },
    "Clusters": {
      "api-cluster": {
        "Destinations": {
          "api-v1": {
            "Address": "http://apiservice-v1/"
          },
          "api-v2": {
            "Address": "http://apiservice-v2/"
          }
        },
        "LoadBalancingPolicy": "WeightedRoundRobin",
        "Weights": {
          "api-v1": 90,
          "api-v2": 10
        }
      }
    }
  }
}
```

Then, in the AppHost, you would have two API service instances (maybe with different tags) and the gateway project referencing both. The gateway would use service discovery to resolve `apiservice-v1` and `apiservice-v2`.

This is an advanced pattern, but it demonstrates the flexibility of Aspire’s model.

#### 10.5.2 Traffic Splitting via Environment Variables

Another approach is to use environment variables to control which instance a client calls. For example, you could set a feature flag that directs traffic to a different version. This is simpler but less dynamic.

---

### 10.6 Hands‑on: Scaling and Load Balancing the API

Let’s put these concepts into practice. We’ll scale the API service to two instances and observe how the web frontend’s HTTP client load‑balances across them. We’ll also add a custom endpoint selector to use a random instance.

#### Step 1: Scale the API Service in the AppHost

In `Program.cs`, modify the API service resource to have two replicas:

```csharp
var apiService = builder.AddProject<Projects.MyAspireApp_ApiService>("apiservice")
    .WithReplicas(2)   // This tells Aspire to start two instances
    .WithReference(productsDb)
    .WithReference(messaging)
    .WithReference(blobs);
```

When you run the AppHost, you’ll see two instances of `apiservice` in the dashboard, each with a different port. The environment variables injected into the web frontend will now contain two endpoints for `apiservice`.

#### Step 2: Verify Client‑Side Load Balancing

The web frontend uses an `HttpClient` typed to `WeatherApiClient` with base address `http://apiservice`. Because we have `AddServiceDefaults` which configures service discovery and the standard resilience handler, the client will automatically round‑robin between the two instances.

To verify, you can modify the API endpoint to log the instance ID (e.g., the port it’s running on). In the API’s `Program.cs`, add a logging statement:

```csharp
app.MapGet("/weatherforecast", (ILogger<Program> logger) =>
{
    logger.LogInformation("Handling request on port {Port}", Environment.GetEnvironmentVariable("ASPNETCORE_URLS"));
    // ... rest
});
```

Run the application and refresh the web frontend’s weather page multiple times. Check the logs of each API instance in the dashboard; you should see requests distributed between them.

#### Step 3: Change Load Balancing Strategy to Random

We want to replace the default round‑robin with a random selector. First, create a custom selector class in the web frontend project (or a shared location):

```csharp
public class RandomEndpointSelector : IServiceEndpointSelector
{
    private readonly Random _random = new();

    public Endpoint? SelectEndpoint(IEnumerable<Endpoint> endpoints)
    {
        var list = endpoints.ToList();
        if (list.Count == 0) return null;
        return list[_random.Next(list.Count)];
    }
}
```

Now, in the web frontend’s `Program.cs`, after `AddServiceDefaults()`, we need to register this selector. However, the service defaults already registered a default selector (round‑robin). We need to replace it. The easiest way is to clear existing providers and add our own. But we also want to keep the default endpoint provider that reads from configuration. So we’ll do:

```csharp
builder.Services.AddServiceDiscoveryCore();
builder.Services.AddSingleton<IServiceEndpointProvider, ConfigurationServiceEndpointProvider>();
builder.Services.AddSingleton<IServiceEndpointSelector, RandomEndpointSelector>();
```

But note that `AddServiceDiscoveryCore()` is already called by `AddServiceDefaults()`. To avoid double registration, we need to remove the default selector. A cleaner approach is to not use `AddServiceDefaults` for service discovery configuration, but that’s cumbersome. Alternatively, we can replace the selector after the defaults are added by removing the existing registration and adding ours. This requires a bit of work with `ServiceDescriptor`.

We can do:

```csharp
// After AddServiceDefaults, remove the default selector
var selectorDescriptor = builder.Services.FirstOrDefault(sd => sd.ServiceType == typeof(IServiceEndpointSelector));
if (selectorDescriptor != null)
{
    builder.Services.Remove(selectorDescriptor);
}
builder.Services.AddSingleton<IServiceEndpointSelector, RandomEndpointSelector>();
```

This is a bit hacky, but it works.

Now run again and observe distribution. It should be roughly even over many requests.

#### Step 4: (Optional) Add a Custom Health Check for Each Instance

Each API instance has its own health endpoint. The dashboard polls them separately, and you can see each instance’s health status.

#### Step 5: Simulate Failure and Observe Retries

Stop one of the API instances (in the dashboard, click the stop button). Then make a few requests. The web frontend’s resilience handler will retry failed requests, and because the selector still has the stopped instance in its list, it may try it, fail, retry, and eventually succeed on the healthy instance. The Polly retry policy will handle this. Check the logs to see the retries.

---

### 10.7 Summary

In this chapter, you’ve taken service discovery and resilience to the next level. Key takeaways:

- You can manually define endpoints for external services or override default behavior.
- Multiple endpoints with different schemes (HTTP, HTTPS, gRPC, TCP) are supported and can be selected via URI scheme.
- Service discovery is extensible via custom `IServiceEndpointProvider` and `IServiceEndpointSelector`.
- Polly resilience policies can be customized per client, allowing fine‑grained control over retries, circuit breakers, and timeouts.
- Traffic splitting can be achieved with a reverse proxy like YARP for advanced testing scenarios.
- Through hands‑on, you scaled a service to multiple instances and observed client‑side load balancing.

These advanced techniques prepare your application for real‑world demands. In the next chapter, we’ll explore **Testing Aspire Applications**, covering integration tests that leverage the AppHost and how to write reliable tests for distributed systems.

---

**Exercises**

1. Modify the custom random selector to respect endpoint health (e.g., if an endpoint fails a health check, deprioritize it). This requires integrating health check data.
2. Implement a custom `IServiceEndpointProvider` that reads from a Consul agent. Use the Consul .NET client to query for service instances.
3. Create a YARP gateway project and configure it to route to the two API instances with weighted load balancing. Test by sending requests through the gateway.
4. Experiment with Polly’s circuit breaker by intentionally failing one instance and observing how the circuit opens and eventually half‑opens.

In Chapter 11, we’ll shift focus to **Testing Aspire Applications**, where you’ll learn how to write integration tests that run against the full Aspire orchestration.

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='9. custom_resources_and_lifecycle_hooks.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='11. testing_aspire_applications.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
