Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin/fallback: proxy request in case of error returned by downstream plugins #1382

Closed
fturib opened this issue Jan 11, 2018 · 18 comments
Closed

Comments

@fturib
Copy link
Contributor

fturib commented Jan 11, 2018

The idea is to check the return of regular downstream plugins, and be able to redirect the query to a 'fallback' proxy base on the code error returned.
NOTE: we would like use similar if the whole CoreDNS service is unhealthy.

proposition for config:

fallback [ZONES…] {
  on RESPONSE-CODE [ IP:PORT or filename … ]
}

Example:

fallback example.com {
  on NXDOMAIN /etc/resolv.conf
  on SERVFAIL 10.10.10.10:100 8.8.8.8:53
}

So, you can have as many ”on” directives as you want. It could potentially re-use the proxy code so it could also accept any proxy options.

The usage right now would be to fallback in following cases:
1 CoreDNS is unhealthy (the whole stack is bypassed in this case)
2. Grpc communication is broken - one of the plugin cannot do its job.
3. REFUSED response.

NOTE: that could be en enhancement of proxy plugin, but we could make more generic as a separated plugin.
However, would need this plugin to be placed on top of order of the list of plugins.

NOTE: Right now the needs is for a real use case at Infoblox, proposition of plugin is from @johnbelamaric .

@fturib fturib changed the title plugin/fallback - proxy request in case of error returned by usual process WIP - plugin/fallback - proxy request in case of error returned by usual process Jan 11, 2018
@fturib
Copy link
Contributor Author

fturib commented Jan 11, 2018

Work in progress - by @yongtang

@fturib fturib changed the title WIP - plugin/fallback - proxy request in case of error returned by usual process WIP - plugin/fallback - proxy request in case of error returned by downstream plugins Jan 11, 2018
@miekg
Copy link
Member

miekg commented Jan 11, 2018

icky, but interesting :)

about the proposed syntax: you probably want to prefix it (optionally) with a scheme: grpc://, tls://
and NXDOMAIN should maybe be DENIAL (to catch nodata)?

I think it would be good to keep this outside proxy, as that one is (too) complex already.

@johnbelamaric
Copy link
Member

wrt to the "NOTE" on "the whole CoreDNS being unhealthy" - I think you mean if all the proxy endpoints are unhealthy we would use this fallback list?

@miekg
Copy link
Member

miekg commented Jan 11, 2018 via email

@johnbelamaric
Copy link
Member

You mean when they are all unhealthy? I think it could except in our case it's different protocols, and we don't have the forward plugin style upstream specification in proxy yet. So, the proxy will be using gRPC, but if all of our upstream gRPC endpoints are dead, we would fallback to ordinary UDP.

For the routing piece (on RCODE), we didn't want to add more stuff to the proxy plugin. So that's why we're thinking a separate plugin.

@miekg
Copy link
Member

miekg commented Jan 11, 2018 via email

@yongtang
Copy link
Member

Let's rephrase the problem here.

I think the issue is the definition of health for the current proxy plugin: if an endpoint of the proxy plugin returns SERVFAIL/REFUSED, then we consider it as unhealthy. (NXDOMAIN is irrelevant).

In case an endpoint is declared as unhealthy, the proxy plugin will switch to another endpoint anyway.

So the issue is really about adding an unhealthy definition to the proxy plugin.

@miekg
Copy link
Member

miekg commented Jan 18, 2018

Meta note: I think this plugin should live in it's own repo, just like forward? We can still add them by default by pointing plugin.cfg to the remote repo, something I also want to do for forward.

@johnbelamaric
Copy link
Member

@yongtang the issue is that we need to fallback to regular DNS if gRPC proxy is failing. So, it's changing the proxy protocol in addition to changing the endpoint IP/port.

proxy already has a lot of stuff in it, so that's why a separate plugin.

If we change proxy to use the URI-like format for endpoints (udp://, tcp://, grpc://, etc.), then your suggestion of creating unhealthy for proxy could work. There was some urgency from our ATC team to get this done though, which is another reason to stick it in an external plugin and then we can take our time to see if we want to do it another way.

@johnbelamaric
Copy link
Member

johnbelamaric commented Jan 18, 2018

Here was the original description (internal email):

I am thinking something that uses a similar mechanism as the policy plugin. In there, it uses a ”fake response writer” to capture the result from the downstream plugins, and then react to that. We could do the same thing. 
 
fallback [ZONES…] {
  on RESPONSE-CODE [ IP:PORT or filename … ]
}
 
Example:
 
fallback example.com {
  on NXDOMAIN /etc/resolv.conf
  on SERVFAIL 10.10.10.10:100 8.8.8.8:53
}
 
So, you can have as many ”on” directives as you want. It could potentially re-use the proxy code so it could also accept any proxy options.
 
You *could* add this as an option to proxy, too. But then it would have to be upstream and not external. And proxy has a lot of options already.

So, it's actually pretty simple:

  1. You create the special fake writer - actually looks like this is now in pkg as "non writer".
  2. send the query down the chain
  3. look at the RCODE of the response; if it matches an ON statement do that, otherwise write it to your original writer

@yongtang
Copy link
Member

In case of SERVERFAIL returned by previous proxy plugin, the message is forwarded to 10.10.10.10:100 but 10.10.10.10:100 could return NXDOMAIN or SERVFAIL as well.

What is the behavior of SERVFAIL returned by 10.10.10.10:100? Should SERVFAIL returned by 10.10.10.10:100 to try 8.8.8.8:53?

If SERVFAIL returned by 10.10.10.10:100 is always stopped and no further propagate, then multiple endpoints (8.8.8.8:53) do not make sense. That means there should only be one endpoint associate with one specified action.

Any this has not take into consideration of NXDOMAIN by 10.10.10.10:100 (as NXDOMAIN was specified earlier).
Should 10.10.10.10:100 be associated with an "annotation" to specify the logic to follow up? I think this is not a well defined design.

@yongtang
Copy link
Member

#612 might be related.

@johnbelamaric
Copy link
Member

johnbelamaric commented Jan 18, 2018

Here is the original requirement (for DFP):

If the ActiveTrust DNS server replies to a DNS query with the REFUSED error code, the DNS Proxy doesn’t forward this reply to the caller. Instead it sends again the DNS query to the system default resolver currently configured. Then the reply, whatever it is, it sent to the caller.

The system default resolver means /etc/resolv.conf as more than one may be defined. Thus, ordinary proxy behavior for going to the next endpoint in a multi-endpoint list (the RCODE in the on clause does not change this behavior).

@yongtang
Copy link
Member

yongtang commented Jan 18, 2018

The specification should really be:

fallback [ZONES…] {
  <ACTION> <IP:PORT or filename * 1>
}

As each action could only be forwarded to one endpoint so on SERVFAIL 10.10.10.10:100 8.8.8.8:53 is not appropriate here.

@johnbelamaric
Copy link
Member

Why? There is some default logic in the proxy as to when to go to the next endpoint. That same logic applies here if multiple endpoints are listed. The "special" logic only applies to the original proxy rcode. Not to the subsequent calls. So, if you had multiple actions, they would apply ONLY based on the RCODE from the original request, not subsequent requests made by this plugin. This way there is no chance of loops.

@yongtang
Copy link
Member

This is more like a "forked" version of proxy plugin with very customized hacks. Not sure if it is appropriate for generic usage to be honest. Also, "SERVFAIL", "REFUSED", "FORMERR", "NOTIMPL" are actually handled by DefaultErrorFunc, a ResponseWriter may not work here.

@yongtang
Copy link
Member

Let's take offline. I think more discussion about the design might be needed.

@johnbelamaric johnbelamaric reopened this Jan 23, 2018
@johnbelamaric johnbelamaric changed the title WIP - plugin/fallback - proxy request in case of error returned by downstream plugins plugin/fallback: proxy request in case of error returned by downstream plugins Jan 23, 2018
@miekg
Copy link
Member

miekg commented Feb 6, 2018

@miekg miekg closed this as completed Feb 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants