Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(router) preserve_host always fetches the request's Host #2832

Merged
merged 1 commit into from
Aug 23, 2017

Conversation

thibaultcha
Copy link
Member

@thibaultcha thibaultcha commented Aug 22, 2017

Summary

It was reported in #2825 that preserve_host seemed to behave wrongly
upon subsequent call of a same API if that API had no hosts.

Since matches APIs are cached, and so are their upstream_host values,
I investigated around the LRU cache of our matches in the router. What I
discovered is, considering the following API:

{
    "name": "example",
    "upstream_url": "http://httpbin.org",
    "uris": ["/example"]
}

The router would try to save cycles by not reading the
ngx.var.http_host variable, since no API is configured to match on a
Host header.

However, the Host is also part of the cache key since the introduction
of the LRU cache in the router. But in the above case, considering the
following two requests:

GET /example HTTP/1.1
Host: domain1.com

GET /example HTTP/1.1
Host: domain2.com

Both matches would be cached under the same key (since the
req_host argument of find_api is nil because the router does not
read the Host header):

GET:/example:

Notice the lack of Host part after the last colon.

Hence, once this API would be matched after the first request, its
cached upstream_host would be domain1.com, and subsequent calls with
Host: domain2.com would hit the same key, and yield the same
upstream_host as the first request.

The approach taken by this fix is to always retrieve the Host
header. This choice is driven by the fact that cache keys should have as
high a cardinality as possible, and to ensure we avoid such duplicate
keys in the future.

Changes

  • remove grab_header optimization from router.lua
  • remove grab_header tests from unit test suite
  • make find_api()'s 3rd argument (Host header) mandatory
  • update related unit tests to reflect the above change
  • new regression test for this issue

Fix #2824

@@ -1344,7 +1304,7 @@ describe("Router", function()

local router = assert(Router.new(use_case))

local _ngx = mock_ngx("GET", "/users/123/profile/hello/world", {})
local _ngx = mock_ngx("GET", "/users/123/profile/hello/world", { ["host"] = "domain.org" })
Copy link
Contributor

@hishamhm hishamhm Aug 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Friendly nudge to point out you're breaking the 80-char limit all over the place. ;)

(And that is to say that I'd rather see the rule relaxed for sensible cases like this than to have the PR changed for this trivial reason.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I tend to not follow the 80 chars rule (enough) in tests file. But you are right! And I do think this warrants a little cleanup! It's a quick job :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done :) At least for the tests impacted by this changeset!

Summary
-------

It was reported in #2825 that `preserve_host` seemed to behave wrongly
upon subsequent call of a same API if that API had no `hosts`.

Since matches APIs are cached, and so are their `upstream_host` values,
I investigated around the LRU cache of our matches in the router. What I
discovered is, considering the following API:

    {
        "name": "example",
        "upstream_url": "http://httpbin.org",
        "uris": ["/example"]
    }

The router would try to save cycles by *not* reading the
`ngx.var.http_host` variable, since no API is configured to match on a
Host header.

However, the Host is also part of the cache key since the introduction
of the LRU cache in the router. But in the above case, considering the
following two requests:

    GET /example HTTP/1.1
    Host: domain1.com

    GET /example HTTP/1.1
    Host: domain2.com

Both matches would be cached under the **same** key (since the
`req_host` argument of `find_api` is `nil` because the router does not
read the Host header):

    GET:/example:

Notice the lack of Host part after the last colon.

Hence, once this API would be matched after the first request, its
cached `upstream_host` would be `domain1.com`, and subsequent calls with
`Host: domain2.com` would hit the same key, and yield the same
`upstream_host` as the first request.

The approach taken by this fix is to **always** retrieve the Host
header. This choice is driven by the fact that cache keys should have as
high a cardinality as possible, and to ensure we avoid such duplicate
keys in the future.

Changes
-------

* remove `grab_header` optimization from `router.lua`
* remove `grab_header` tests from unit test suite
* make `find_api()`'s 3rd argument (Host header) mandatory
* update related unit tests to reflect the above change
* new regression test for this issue

---

Fix #2824
@@ -95,14 +95,14 @@ describe("Router", function()

it("[uri]", function()
-- uri
local match_t = router.select("GET", "/my-api")
local match_t = router.select("GET", "/my-api", "domain.org")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

traditionally (and per IANA recommendations) we should probably use "example.com" or "example.org". it's not a huge difference, but historically usage of things like "domain.com" has been problematic (that domain was used to serve malware at one point).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dully noted - my intent is to replace those with example.org in a later contribution, as many tests (outside of the scope of this PR) already use domain.com :( (I put them there in the first place 😅)

@thibaultcha thibaultcha merged commit 83bcda0 into master Aug 23, 2017
@thibaultcha thibaultcha deleted the fix/preserve-host branch August 23, 2017 04:21
@thibaultcha
Copy link
Member Author

Thanks @hishamhm @p0pr0ck5 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Open API3.0 Spec
3 participants