-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(router) preserve_host always fetches the request's Host #2832
Conversation
spec/01-unit/010-router_spec.lua
Outdated
@@ -1344,7 +1304,7 @@ describe("Router", function() | |||
|
|||
local router = assert(Router.new(use_case)) | |||
|
|||
local _ngx = mock_ngx("GET", "/users/123/profile/hello/world", {}) | |||
local _ngx = mock_ngx("GET", "/users/123/profile/hello/world", { ["host"] = "domain.org" }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Friendly nudge to point out you're breaking the 80-char limit all over the place. ;)
(And that is to say that I'd rather see the rule relaxed for sensible cases like this than to have the PR changed for this trivial reason.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I tend to not follow the 80 chars rule (enough) in tests file. But you are right! And I do think this warrants a little cleanup! It's a quick job :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done :) At least for the tests impacted by this changeset!
Summary ------- It was reported in #2825 that `preserve_host` seemed to behave wrongly upon subsequent call of a same API if that API had no `hosts`. Since matches APIs are cached, and so are their `upstream_host` values, I investigated around the LRU cache of our matches in the router. What I discovered is, considering the following API: { "name": "example", "upstream_url": "http://httpbin.org", "uris": ["/example"] } The router would try to save cycles by *not* reading the `ngx.var.http_host` variable, since no API is configured to match on a Host header. However, the Host is also part of the cache key since the introduction of the LRU cache in the router. But in the above case, considering the following two requests: GET /example HTTP/1.1 Host: domain1.com GET /example HTTP/1.1 Host: domain2.com Both matches would be cached under the **same** key (since the `req_host` argument of `find_api` is `nil` because the router does not read the Host header): GET:/example: Notice the lack of Host part after the last colon. Hence, once this API would be matched after the first request, its cached `upstream_host` would be `domain1.com`, and subsequent calls with `Host: domain2.com` would hit the same key, and yield the same `upstream_host` as the first request. The approach taken by this fix is to **always** retrieve the Host header. This choice is driven by the fact that cache keys should have as high a cardinality as possible, and to ensure we avoid such duplicate keys in the future. Changes ------- * remove `grab_header` optimization from `router.lua` * remove `grab_header` tests from unit test suite * make `find_api()`'s 3rd argument (Host header) mandatory * update related unit tests to reflect the above change * new regression test for this issue --- Fix #2824
019ee93
to
cce41ed
Compare
@@ -95,14 +95,14 @@ describe("Router", function() | |||
|
|||
it("[uri]", function() | |||
-- uri | |||
local match_t = router.select("GET", "/my-api") | |||
local match_t = router.select("GET", "/my-api", "domain.org") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
traditionally (and per IANA recommendations) we should probably use "example.com" or "example.org". it's not a huge difference, but historically usage of things like "domain.com" has been problematic (that domain was used to serve malware at one point).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dully noted - my intent is to replace those with example.org
in a later contribution, as many tests (outside of the scope of this PR) already use domain.com
:( (I put them there in the first place 😅)
Summary
It was reported in #2825 that
preserve_host
seemed to behave wronglyupon subsequent call of a same API if that API had no
hosts
.Since matches APIs are cached, and so are their
upstream_host
values,I investigated around the LRU cache of our matches in the router. What I
discovered is, considering the following API:
The router would try to save cycles by not reading the
ngx.var.http_host
variable, since no API is configured to match on aHost header.
However, the Host is also part of the cache key since the introduction
of the LRU cache in the router. But in the above case, considering the
following two requests:
Both matches would be cached under the same key (since the
req_host
argument offind_api
isnil
because the router does notread the Host header):
Notice the lack of Host part after the last colon.
Hence, once this API would be matched after the first request, its
cached
upstream_host
would bedomain1.com
, and subsequent calls withHost: domain2.com
would hit the same key, and yield the sameupstream_host
as the first request.The approach taken by this fix is to always retrieve the Host
header. This choice is driven by the fact that cache keys should have as
high a cardinality as possible, and to ensure we avoid such duplicate
keys in the future.
Changes
grab_header
optimization fromrouter.lua
grab_header
tests from unit test suitefind_api()
's 3rd argument (Host header) mandatoryFix #2824