Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(proxy): host header including port #2045

Merged
merged 4 commits into from
Feb 7, 2017
Merged

Conversation

Tieske
Copy link
Member

@Tieske Tieske commented Feb 3, 2017

If the host header is updated to include the port, then what is the exact desired behaviour?

So if preserve_host is set on the api definition, we do not touch the header at all? or do we still inject a default port if it was not in the received host header?

If we don't preserve the header, we could simply insert the upstream host and port.

@Tieske Tieske added the pr/wip A work in progress PR opened to receive feedback label Feb 3, 2017
@Tieske Tieske self-assigned this Feb 3, 2017
@subnetmarco
Copy link
Member

So if preserve_host is set on the api definition, we do not touch the header at all?

We shouldn't touch it, and preserve the original Host sent by the client.

@@ -103,7 +103,7 @@ server {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Host $upstream_host;
proxy_set_header Host $upstream_host:$upstream_port;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the HTTP specs, the port really is part of the Host value if it is not the default port being requested for a service. This port could very well be included in the $upstream_host variable on the Lua land, to avoid dispersing that logic. Besides and to highlight that fact, this new variable is currently undeclared.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially since it is not required for the port to be present if it is the default one for the service being requested, and we would have more control over that from the Lua land if we wish to perform such changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@thibaultcha
Copy link
Member

So if preserve_host is set on the api definition, we do not touch the header at all?

We shouldn't touch it, and preserve the original Host sent by the client.

Yes.

@Tieske
Copy link
Member Author

Tieske commented Feb 6, 2017

There seems to be a bit more detail to it;

The router matches on 'host-header', not on hostname. So to the router myhost.com and myhost.com:80 are two distinct entries. But the api schema does not allow a port number in its hosts field.

Copy link
Member

@thibaultcha thibaultcha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issues I have with this is how it makes the proxy retrieve the request 's headers on every call instead of doing it only when necessary, and the weirdness of the router's API now that it lost some responsibilities (see comment).

@@ -79,6 +79,8 @@ return {

ctx.KONG_ACCESS_START = get_now()

local original_host_header = ngx.req.get_headers().host
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to avoid retrieving the headers if not necessary. This change will cancel that behavior.

Copy link
Member Author

@Tieske Tieske Feb 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the logic requires the hostname only, so an extra field would have to be returned, the original host-header (incl. any port info). I see what you mean, but don't see a clean solution.

@@ -117,14 +120,13 @@ return {
"' with: "..tostring(err))
end

if balancer_address.hostname and not api.preserve_host then
var.upstream_host = balancer_address.hostname

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line-jump was intended.

end

upstream_host = headers["host"]
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a test in the unit tests that currently ensures this behavior is taken care of by exec. It it now failing.

@@ -309,6 +310,7 @@ local function execute(target)

target.ip = ip
target.port = port
target.hostname = target.host
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using host or rename host to hostname? They seem to ultimately be identical even though the comments in the balancer_address structure seems to say otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

host is the hostname extracted from the upstream url (and might be an upstream-name)
hostname is the name that matches the final ip (never an upstream-name)
The original host field must be retained in case of a retry, whereas hostname might get a different value for each try.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whereas hostname might get a different value for each try.

Where is that bit happening then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha

@@ -493,6 +493,9 @@ function _M.new(apis)


local host = headers["Host"] or headers["host"]
if host then
host = host:match("^([^:]+)") -- strip port number if given
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better using match(host, pattern) in such a hot code path. Even better is to use ngx_pcre, to allow this bit to be JIT-compiled.

if ngx.var.http_kong_debug then
ngx.header["Kong-Api-Name"] = api_t.api.name
end


return api_t.api, api_t.upstream_scheme, upstream_host, api_t.upstream_port
return api_t.api, api_t.upstream_scheme, api_t.upstream_host, api_t.upstream_port
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time of writing, the goal of this API was to return everything needed in order for Kong to forward upstream. Now with the balancing bit in between and the removal of the preserve_host logic from this module, it seems like this is un-necessarily bloated.

We should either continue encapsulating such logic in this module (might not be possible since we now need to the port from the DNS resolution to build the definitive upstream Host header), or, simply return the Kong API (api_t.api) and api_t itself (which has all the necessary properties, but exposes some underlying router implementation which we don't really want).

We should return a table containing the final api + eventually another table containing scheme, host and port, or include those properties in the 1st table being returned. Those are minor modifications.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering this is all internal implementation I'd opt for returning api_t

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least over returning yet another table

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I kept the list and added the host_header, seemed cleanest to me. All header fetching, including checks for preserve_host are back in the router.

@Tieske Tieske added pr/status/needs review and removed pr/wip A work in progress PR opened to receive feedback labels Feb 7, 2017
@Tieske
Copy link
Member Author

Tieske commented Feb 7, 2017

@thibaultcha imo good to go now.

@Tieske Tieske added this to the 0.10 RC milestone Feb 7, 2017
@Tieske
Copy link
Member Author

Tieske commented Feb 7, 2017

fixes #1347

if balancer_address.hostname and not api.preserve_host then
var.upstream_host = balancer_address.hostname
-- if set `host_header` is the original header to be preserved
var.upstream_host = host_header or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I more and more find this syntax to be harder to read. In the future we should stick to:

if host_header then
  var.upstream_host = host_header

else
  var.upstream_host = balancer_address.hostname..":"..balancer_address.port
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick not using else is a more defensive programming practice

var.upstream_host = balancer_address.hostname..":"..balancer_address.port
if host_header then
    var.upstream_host = host_header
end

reference: https://github.com/TheLadders/object-calisthenics#rule-2-dont-use-the-else-keyword

assert.equal("preserved.com:80", json.headers.Host)
end)

pending("preserves downstream host+non_default_port if enabled", function()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are those pending?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per comment: httpbin does not return the host header, but a parsed version with the port stripped, hence cannot test it...

@thibaultcha thibaultcha merged commit 37e8ed3 into release/0.10.0 Feb 7, 2017
@thibaultcha thibaultcha deleted the fix/proxy-port branch February 7, 2017 19:43
@thibaultcha
Copy link
Member

thibaultcha commented Feb 8, 2017 via email

@thibaultcha
Copy link
Member

thibaultcha commented Feb 8, 2017 via email

@thibaultcha
Copy link
Member

thibaultcha commented Feb 8, 2017 via email

@hutchic
Copy link
Contributor

hutchic commented Feb 8, 2017

yeah I wasn't sure how applicable OOP calisthenics were. I figure lua has polymorphism so majority of them should still work.

but like I mentioned it's a nitpick (or bike shed if you prefer) :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants