Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SRV lookups only use first response #4174

Closed
sorenisanerd opened this issue May 20, 2021 · 12 comments · Fixed by #4470
Closed

SRV lookups only use first response #4174

sorenisanerd opened this issue May 20, 2021 · 12 comments · Fixed by #4470
Labels
feature ⚙️ New feature or request
Milestone

Comments

@sorenisanerd
Copy link

addr.Host = records[0].Target

Even if the the DNS randomizes the order of entries (while still observing priorities), it's still perfectly possible for all the top-priority servers to be at capacity, in which case the we should move on to entries with lower priority.

@francislavoie francislavoie added the needs info 📭 Requires more information label May 20, 2021
@francislavoie
Copy link
Member

Please elaborate, I don't think there's really enough detail to make this actionable.

PRs welcome, if you know the fix that needs to be made.

@sorenisanerd
Copy link
Author

An SRV lookup returns all SRV entries. Each entry has a priority and a weight.

A correct use of that response should group the entries by priority and attempt to use the top (numerically lowest) priority entries first. Their respective weights should define their odds of getting picked. Once the top priority group is exhausted with no succesful reply, only then should we move on to the next group.

However, Caddy will only use a single entry, namely the first one. Even with many SRV entries, an incoming request will fail to be served correctly if the first entry happens to be one for a server that is unhealthy.

@francislavoie
Copy link
Member

francislavoie commented May 20, 2021

I think you need to enable lb_try_duration for the retry logic to kick in.

https://caddyserver.com/docs/caddyfile/directives/reverse_proxy#load-balancing

@sorenisanerd
Copy link
Author

The gold plated way to address this would be:

  1. The addition of a selection policy that (independent of SRV lookups) allows specifying both priority and weight to backends.
  2. An incoming request triggers an SRV lookup which populates the backend data for the above mentioned selection policy which then proceeds as usual.

@sorenisanerd
Copy link
Author

lb_try_duration will do me little good if it'll only ever try the first entry. There's no guarantee that the DNS will randomize responses and even if it did, I don't see how Caddy would ever fall back to an entry that does not have top priority.

@sorenisanerd
Copy link
Author

For clarity, I understand that most people probably have Consul or something similar maintaining their SRV records, so you should never get SRV entries for unhealthy backends. In that case, everything is fine. My use case is somewhat different.

@francislavoie francislavoie added feature ⚙️ New feature or request help wanted 🆘 Extra attention is needed and removed needs info 📭 Requires more information labels May 20, 2021
@francislavoie francislavoie added this to the 2.x milestone May 20, 2021
@francislavoie
Copy link
Member

This kinda goes hand-in-hand with #1545

@mohammed90
Copy link
Member

Jotting down thoughts as I took glances at the code to understand what it could touch:

  • The module implementing this will most likely be in the http.reverse_proxy.selection_policies namespace, and the body of the logic is an implementation of the Selector interface.

    // Selector selects an available upstream from the pool.
    type Selector interface {
    Select(UpstreamPool, *http.Request, http.ResponseWriter) *Upstream
    }

  • The Upstream struct needs to be extended with members representing weight and priority as received via the SRV lookup. It already has commented fields for HeaderAffinity and IPAffinity, which have similar concept.

    type Upstream struct {
    Host `json:"-"`
    // The [network address](/docs/conventions#network-addresses)
    // to dial to connect to the upstream. Must represent precisely
    // one socket (i.e. no port ranges). A valid network address
    // either has a host and port or is a unix socket address.
    //
    // Placeholders may be used to make the upstream dynamic, but be
    // aware of the health check implications of this: a single
    // upstream that represents numerous (perhaps arbitrary) backends
    // can be considered down if one or enough of the arbitrary
    // backends is down. Also be aware of open proxy vulnerabilities.
    Dial string `json:"dial,omitempty"`
    // If DNS SRV records are used for service discovery with this
    // upstream, specify the DNS name for which to look up SRV
    // records here, instead of specifying a dial address.
    LookupSRV string `json:"lookup_srv,omitempty"`
    // The maximum number of simultaneous requests to allow to
    // this upstream. If set, overrides the global passive health
    // check UnhealthyRequestCount value.
    MaxRequests int `json:"max_requests,omitempty"`
    // TODO: This could be really useful, to bind requests
    // with certain properties to specific backends
    // HeaderAffinity string
    // IPAffinity string
    activeHealthCheckPort int
    healthCheckPolicy *PassiveHealthChecks
    cb CircuitBreaker
    }

@mholt
Copy link
Member

mholt commented Dec 8, 2021

I'm working on this.

@mholt
Copy link
Member

mholt commented Jan 18, 2022

@sorenisanerd Would you like to give my PR, #4470, a try? It only supports JSON config for now, but you can just use caddy adapt and then tweak your config to use the new dynamic upstreams feature.

@sorenisanerd
Copy link
Author

I would love to, but I discarded the experiment I was doing and canned the project, so I don't have backend servers, name servers, or anything else that I'd need to try it out. Sorry :(

@mholt mholt removed help wanted 🆘 Extra attention is needed in progress 🏃‍♂️ Being actively worked on labels Mar 7, 2022
@mholt
Copy link
Member

mholt commented Mar 7, 2022

Done in #4470.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature ⚙️ New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants