Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(router) ensure transparent proxying of URIs #2519

Merged
merged 2 commits into from
Jun 9, 2017

Conversation

thibaultcha
Copy link
Member

@thibaultcha thibaultcha commented May 13, 2017

Summary

Since ngx.req.set_uri will make Nginx's proxying escape the upstream URI, we do not use this API anymore. As it was the case prior to Kong 0.10, we go back to inlining an Nginx variable containing the desired, plain URI, and thus ensure Nginx will not unconditionally percent-encode before proxying.

Full Changelog

  • The router now returns a 4th value: the URI to use for the upstream
    call.
  • The proxy_pass directive now inlines a new $upstream_uri variable
    which contains a raw string with both the Nginx $request_uri and
    $args variables.
  • Update the router value return test
  • New test to ensure we proxy the request's querystring
  • New regression test to ensure we do not double percent-encode request
    URIs.
  • simplify URI stripping logic

Issues resolved

Fix #2512

@thibaultcha thibaultcha force-pushed the fix/router-double-encoded-uris branch from 54000c0 to abbb128 Compare May 13, 2017 02:05
@thibaultcha thibaultcha self-assigned this May 30, 2017
@thibaultcha thibaultcha force-pushed the fix/router-double-encoded-uris branch 2 times, most recently from 7754e78 to cf9156a Compare June 6, 2017 21:05
@thibaultcha thibaultcha changed the title tests(router) new regression test for double percent-encoding of upstream URLs fix(router) ensure transparent proxying of URIs Jun 6, 2017
@thibaultcha thibaultcha force-pushed the fix/router-double-encoded-uris branch 3 times, most recently from a980e6f to 3b6c05f Compare June 6, 2017 21:37
p0pr0ck5
p0pr0ck5 previously approved these changes Jun 7, 2017

if uri_args then
uri = uri .. "?" .. uri_args
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than the string concatentation here, how do we feel about defining the proxy_pass directive as

proxy_pass $upstream_scheme://kong_upstream$upstream_url$is_args$args;

i dont have strong feels one way or another, just wondering about idiomatic approach and performance (ngx.var access)

@@ -659,16 +660,12 @@ function _M.new(apis)
return nil
end

local new_uri
local uri_root = uri == "/"
local uri_root = request_uri == "/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of this logic feels ever-so-slightly hand wavey. some comments here about what's going on would be useful, particularly given the subtlety and complexity of the bug resolved here.

Copy link
Member

@bungle bungle Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these seem just to be variable renames, and that's it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i spose this comment was in reference to the desire to have some more commentary on this module in general. certainly not a merge blocker though :)

@p0pr0ck5 p0pr0ck5 dismissed their stale review June 7, 2017 01:16

whups, wrong button, didnt mean to approve. LGTM overall though

@bungle
Copy link
Member

bungle commented Jun 7, 2017

This looks good to me (except that one $is_args change that I would like to see). I know this idea pretty well as I proposed it as a second approach to fix this. It was originally shot down because of worries if the plugins modify the uri arguments (and those get screwed, or overwritten, or encodings get bad). It looks like it is taken care in setting the final variable before proxying AND after the plugins have done their access handlers.

@bungle
Copy link
Member

bungle commented Jun 7, 2017

What this will break are the plugins that call ngx.req.set_uri in their access handlers (or does it?). But it looks like they have ngx.ctx.uri now.

@thibaultcha
Copy link
Member Author

thibaultcha commented Jun 7, 2017

Well, this is just a revert back to the implementation we used all along 0.1 to 0.9, and that was only changed in 0.10, nothing groundbreaking. It just handles the URI arguments slightly better, which was one of the original concerns, but there was another big one. The reason I changed it in 0.10 was to avoid the Nginx variable, and benefit from a somewhat nicer encapsulation from ngx.req.set_uri(), which feels nicer to use from a plugin than setting a ctx variable. This is mitigated since as soon as we have the plugins API, we will bring back this encapsulation by creating our own kong.set_uri() API.

I am not sure what the benefit of using multiple Nginx variables could be. Performance because of ngx.var access: we would actually trigger 3 such ngx.var accesses instead of 1. GC seems to be about the same, or slightly better in the current approach even. Nginx variables also have their own cost on the Nginx side, and creating those variables would triple such overhead.

@bungle
Copy link
Member

bungle commented Jun 8, 2017

The $is_args in Nginx is mainly because if there wasn't such you had to use e.g. ?$args and that means that in some cases (empty querystring) you would still append ? even if the original request didn't have it. Which reminds me that do you have a test for that?

  1. when original query had ? at the end without actual querystring, are we still proxying with ?
  2. when original query didn't have ? are we only adding it in case where Kong introduces new querystring parameters

I'm pretty sure 2. is handled, but whatabout that 1.?

Other than that, I agree, we don't need to have $is_args here even if it looks familiar to Nginx users.

@thibaultcha
Copy link
Member Author

@bungle Good point, I will add tests for 1. and 2. The later is handled, the former, I am less sure.

@p0pr0ck5 p0pr0ck5 added pr/changes requested Changes were requested to this PR by a maintainer. Please address them and ping back once done. and removed pr/status/please review labels Jun 8, 2017
@p0pr0ck5
Copy link
Contributor

p0pr0ck5 commented Jun 8, 2017

(relabelling given the notes above ^)

@thibaultcha
Copy link
Member Author

Yep, thanks! Actually implementing those tests.

@thibaultcha thibaultcha force-pushed the fix/router-double-encoded-uris branch 2 times, most recently from 3dcb7a6 to d85843c Compare June 8, 2017 22:01
@thibaultcha
Copy link
Member Author

Pushed a new version with a test for 1. Tests for 2. already exist in the form of the request-transformer plugin.

@thibaultcha thibaultcha added pr/status/please review and removed pr/changes requested Changes were requested to this PR by a maintainer. Please address them and ping back once done. labels Jun 8, 2017
assert.equal("world", json.args.hello)
end)

it("does not proxy '$is_args' if URI does not contain arguments", function()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my only comment here is that stripping the ? char from the query string when there are no args does not align with our desire to be as transparent as possible in proxying the request. we could build our own version of is_args that examines the state of args and the downstream query, and act appropriate. or we could not because i think we all just want this to go away ;)

Copy link
Member Author

@thibaultcha thibaultcha Jun 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah I did think about it too, and wanted to explain why not, but I forgot.

In my thinking, as much as we want transparent proxying to happen (and you know how much I pushed for this), I just cannot, for the life of me, think of a use case for why someone would want to send a request with an empty querystring /foo?. Hence, I did not take care of this and simply judged the Nginx behavior to be acceptable in this case. If we ever receive such a requirement from a use-case, beer is on me!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, i did find one case where someone noted that the presence of ? was semantically significant (granted, not our project or even our lang): psf/requests#2912

In addition, RFC 3986 1.2.3 points out:

The generic syntax uses the slash ("/"), question mark ("?"), and number sign ("#")
characters to delimit components that are significant to the generic parser's hierarchical
interpretation of an identifier. 

And from section 6.2.3:

Normalization should not remove delimiters when their associated
component is empty unless licensed to do so by the scheme
specification... the presence or absence of delimiters within a userinfo
subcomponent is usually significant to its interpretation.

If we ever receive such a requirement from a use-case, beer is on me!

http://beerme.pointlessmicroservic.es

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That 6.2.3 section is quite relevant indeed. Now I kind of want to take care of this use-case actually... Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This originally started because Nginx $uri is a bad normalization semantics-wise. We changes to $request_uri only to find out that decoding and encoding doesn't work 1:1. Thats why we will skip that with this patch, but I do think we can do better here. Following what Nginx does here doesn't feel right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bungle under "Normalizations that change semantics"

Removing the "?" when the query is empty. When the query is empty, there may be no need for the "?". 
Example:
http://www.example.com/display? → http://www.example.com/display

The point here is that we want to avoid changing semantic.

Copy link
Contributor

@p0pr0ck5 p0pr0ck5 Jun 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this patch resolves the issue from what I can tell (hastily whipped together): https://gist.github.com/p0pr0ck5/e742a478a8bfcaeabeefddafd54af947

However, the test in question fails, assumedly because of upstream normalization (both httpbin and mockbin strip ? when no query args are present). tshark proof: https://gist.github.com/p0pr0ck5/d19fb99c88ed5e2b1efdc53707912bb1#file-gistfile1-txt-L375

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I do think we can do better here.

Yep, let's do it then. Updating the patch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@p0pr0ck5 The patch would work, but would also add extra calls to ngx.var, so at this point, I think we should group the querystring in upstream_uri as the original implementation of this patch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New patch is up!

@thibaultcha thibaultcha force-pushed the fix/router-double-encoded-uris branch 2 times, most recently from d48fb66 to 7192697 Compare June 9, 2017 00:54

local body = assert.res_status(200, res)
local json = cjson.decode(body)
assert.matches("/get%?$", json.vars.request_uri)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We test this behavior with our custom template since both httpbin and mockbin strip the querystring delimiter if it is empty.

Copy link
Contributor

@p0pr0ck5 p0pr0ck5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, wouldnt mind seeing a few comments clarified but not a blocker for merge at this point

-- `/foo?` is to keep `$is_args` an empty string, hence effectively
-- stripping the empty querystring.
-- We overcome this behavior with our own logic, to prevent user
-- desired semantics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to prevent user desired semantics" doesnt quite make sense to me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant "preserve"

-- `uri` is the URI with which to call upstream, as returned by the
-- router, which might have truncated it (`strip_uri`).
--
-- If set `host_header` is the original header to be preserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the alignment of the comments can be a bit confusing; might me sense to have

-- comment upstream_uri
var.upstream_uri = uri

-- comment about host_headr
var.upstream_host = ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like grouping multiple var or ctx assignments together, as is done multiple times in that function

* The router now returns a 4th value: the URI to use for the upstream
  call.
* The `proxy_pass` directive now inlines a new `$upstream_uri` variable
  which contains a raw string with both the Nginx `$request_uri` and
  `$args` variables.
* Update the router value return test
* New test to ensure we proxy the request's querystring
* New regression test to ensure we do not double percent-encode request
  URIs.

Fix #2512
@thibaultcha thibaultcha force-pushed the fix/router-double-encoded-uris branch from 7192697 to 14f6297 Compare June 9, 2017 17:49
@thibaultcha
Copy link
Member Author

Rebuilt the history since the refactor commit accidentally got some diff from the fix one. + clarified comments

@thibaultcha thibaultcha merged commit 8ca38dc into master Jun 9, 2017
@thibaultcha thibaultcha deleted the fix/router-double-encoded-uris branch June 9, 2017 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kong is adding URL encoding before sending upstream
3 participants