Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsdist: Send better HTTP status codes, handle ACL drops earlier #7917

Merged
merged 2 commits into from Jul 8, 2019

Conversation

@rgacogne
Copy link
Member

commented Jun 12, 2019

Short description

We used to send a 500 status code when a query was dropped, regardless of the reason. This PR tries to return a more appropriate status code instead.
It also moves the ACL check right after the connection has been accepted, before the HTTP/2 and DNS parsing, to avoid wasting cycles on queries that we will end up dropping anyway. It's a bit more mean to HTTP clients that will get their TCP connection closed right away instead of a 403 status code, but I think it's worth it. Comments welcome :)

Checklist

I have:

  • read the CONTRIBUTING.md document
  • compiled this code
  • tested this code
  • included documentation (including possible behaviour changes)
  • documented the code
  • added or modified regression test(s)
  • added or modified unit test(s)

@rgacogne rgacogne added this to the dnsdist-1.4.0 milestone Jun 12, 2019

@appliedprivacy

This comment has been minimized.

Copy link
Contributor

commented Jun 12, 2019

here is our line of reasoning why we consider it discouraged to close a TCP connection directly instead of providing a proper HTTP response code:
jedisct1/rust-doh#23 (comment)

If the TCP connection comes from a client that is not allowed as per ACL why not answer with
403 Forbidden https://tools.ietf.org/html/rfc7231#section-6.5.3
In that case it was a policy decision to not answer the query. 500 is indicating an error condition and reverse proxy will retry with the next dnsdist instance instead of giving up on the first one.

And if dnsdist on a plain (non-DoH) frontend were to answer with RCODE 5 REFUSED, the proper way would be to answer with just that in an HTTP 200 OK response containing the usual REFUSED DNS reply.

  • we encountered an error when sending the query to the selected backend
    (downstream-send-errors should increase, as well the 'sendErrors'
    counter of the corresponding backend).

this sound more like an issue with the backend? and
504 Gateway Timeout could be more suitable? (this error code is already returned by dnsdist in some cases)
https://tools.ietf.org/html/rfc7231#section-6.6.5

@rgacogne

This comment has been minimized.

Copy link
Member Author

commented Jun 13, 2019

Right, I understand that closing the TCP connection right away will not play well with some HTTP clients or proxies. I will make sure we send a 403 code instead when the source IP is not allowed by the ACL.

this sound more like an issue with the backend? and 504 Gateway Timeout could be more suitable? (this error code is already returned by dnsdist in some cases)
https://tools.ietf.org/html/rfc7231#section-6.6.5

This PR currently turns that into a 502 instead. I don't really know if 502 or 504 is more appropriate, since it's not really a timeout at that point. I'm not sure it matters much for this case, since it's very unlikely to happen.

Thanks a lot for the feedback, this is very useful!

@rgacogne

This comment has been minimized.

Copy link
Member Author

commented Jun 13, 2019

I will make sure we send a 403 code instead when the source IP is not allowed by the ACL.

Done!

@rgacogne rgacogne referenced this pull request Jun 13, 2019

Merged

dnsdist: Proper HTTP response for timeouts over DoH #7927

3 of 7 tasks complete
@appliedprivacy

This comment has been minimized.

Copy link
Contributor

commented Jun 15, 2019

Right, I understand that closing the TCP connection right away will not play well with some HTTP clients or proxies. I will make sure we send a 403 code instead when the source IP is not allowed by the ACL.

Thanks a lot for considering our concerns, this is very encouraging when feedback is actually taken into account!

this sound more like an issue with the backend? and 504 Gateway Timeout could be more suitable? (this error code is already returned by dnsdist in some cases)
https://tools.ietf.org/html/rfc7231#section-6.6.5

This PR currently turns that into a 502 instead. I don't really know if 502 or 504 is more appropriate, since it's not really a timeout at that point. I'm not sure it matters much for this case, since it's very unlikely to happen.

we would suggest to aim for the following principles in this area of "what response code to return?". We understand that this is not trivial since not every possible error condition has a perfectly matching and specified HTTP error code representation.

a) try to match the HTTP error code description as specified in RFC as good as possible
b) avoid mapping multiple distinct error condition to the same HTTP error code
c) aim to make the error codes useful for operators trying to understand and solve problems in their deployments
d) make sure recoverable vs. non-recoverable error codes are separated as clearly as possible

Some more explaining about (d)
HTTP reverse proxies that are in place to distribute (load balance) queries to multiple dnsdist DoH instances use HTTP response codes provided by dnsdist to determine whether it is a transient/temporary error by the specific dnsdist instance or a non-recoverable error (ie. 400/403/404/..) to decide whether the request should be forwarded to another backend after the first contacted backend returned a non-200 OK status code. These mechanics shoud be kept in mind when deciding on what HTTP response codes to use since incorrectly returned response codes can significantly (and unnecessarily) increase the processing resources required for a single incoming HTTP request at the load balancer/reverse proxy.

ref: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_next_upstream

rgacogne added some commits Jun 12, 2019

@rgacogne rgacogne force-pushed the rgacogne:dnsdist-http-codes branch from 3420f82 to 3b6a689 Jun 24, 2019

@rgacogne rgacogne requested a review from omoerbeek Jun 27, 2019

@omoerbeek
Copy link
Member

left a comment

A little late... Somehow I missed this review request. Sorry!

@rgacogne rgacogne merged commit b915d62 into PowerDNS:master Jul 8, 2019

23 checks passed

ci/circleci: build-auth Your tests passed on CircleCI!
Details
ci/circleci: build-auth-docs Your tests passed on CircleCI!
Details
ci/circleci: build-dnsdist Your tests passed on CircleCI!
Details
ci/circleci: build-dnsdist-docs Your tests passed on CircleCI!
Details
ci/circleci: build-recursor Your tests passed on CircleCI!
Details
ci/circleci: build-recursor-docs Your tests passed on CircleCI!
Details
ci/circleci: test-auth-algorithms Your tests passed on CircleCI!
Details
ci/circleci: test-auth-api Your tests passed on CircleCI!
Details
ci/circleci: test-auth-regress-bind Your tests passed on CircleCI!
Details
ci/circleci: test-auth-regress-gmysql Your tests passed on CircleCI!
Details
ci/circleci: test-auth-regress-gpgsql Your tests passed on CircleCI!
Details
ci/circleci: test-auth-regress-gsqlite3 Your tests passed on CircleCI!
Details
ci/circleci: test-auth-regress-ldap Your tests passed on CircleCI!
Details
ci/circleci: test-auth-regress-lmdb Your tests passed on CircleCI!
Details
ci/circleci: test-auth-regress-mydns Your tests passed on CircleCI!
Details
ci/circleci: test-auth-regress-odbc-mssql Your tests passed on CircleCI!
Details
ci/circleci: test-auth-regress-odbc-sqlite3 Your tests passed on CircleCI!
Details
ci/circleci: test-auth-regress-tinydns Your tests passed on CircleCI!
Details
ci/circleci: test-dnsdist-regression Your tests passed on CircleCI!
Details
ci/circleci: test-recursor-api Your tests passed on CircleCI!
Details
ci/circleci: test-recursor-bulk Your tests passed on CircleCI!
Details
ci/circleci: test-recursor-regression Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@rgacogne rgacogne deleted the rgacogne:dnsdist-http-codes branch Jul 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.