Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query parameters didn't get fully encoded #245

Open
rominf opened this issue Oct 2, 2018 · 6 comments
Open

Query parameters didn't get fully encoded #245

rominf opened this issue Oct 2, 2018 · 6 comments

Comments

@rominf
Copy link
Contributor

rominf commented Oct 2, 2018

When I build an URL with yarl.URL.build I get the URL with partly encoded query:

from yarl import URL
scheme = 'http'
host = 'profile-srv.kama.gs'
path = '/oauth/authorize'
query = {'response_type': 'code', 'client_id': 'bEjgr4rUka8Oswy', 'redirect_uri': 'http://172.25.210.184/authorized?state=e8be712d-3858-4549-86b2-60cea46c7396'}
URL.build(scheme=scheme, host=host, path=path, query=query)
# URL('http://profile-srv.kama.gs/oauth/authorize?response_type=code&client_id=bEjgr4rUka8Oswy&redirect_uri=http://172.25.210.184/authorized?state%3De8be712d-3858-4549-86b2-60cea46c7396')

:, /, ? are not encoded, as you see.

@asvetlov

This comment was marked as off-topic.

@asvetlov
Copy link
Member

asvetlov commented Oct 2, 2018

GitMate.io thinks possibly related issues are #210 (Comma not being encoded in GET request params), #158 (Support encoded parameter in URL.build), #102 (Document encoded parameter), #45 (params do not encode right), and #91 (Encode semicolon in query string).

@bsolomon1124
Copy link

bsolomon1124 commented Apr 11, 2019

@rominf I found your issue here after filing a very similar one at #301.

My hunch about this behavior is that it is actually yarl.URL that is more closely following RFC 3986. If you look at the ABNF for RFC 3986:

   pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
   query         = *( pchar / "/" / "?" )

This means that / and ? are okay to be put in unencoded (literal) form in the URL query string. The reason for that, I would assume, is that the ? is unambiguous: you scan the URL for the first ?, and everything after that is the query string; once you hit any other occurrences, you already know those are literal parts of the query string parameters. Similarly, literal / characters should be okay in the query string because, since the parser has already seen the first ?, it should know it's no longer in the path component of the URL.

My own confusion from this arose from the fact that urllib.parse.urlencode() (which is used by requests, among other libraries) does not treat ? as safe; it encodes it:

>>> import urllib.parse
>>> query = {'response_type': 'code', 'client_id': 'bEjgr4rUka8Oswy', 'redirect_uri': 'http://172.25.210.184/authorized?state=e8be712d-3858-4549-86b2-60cea46c7396'}
>>> urllib.parse.urlencode(query)
'response_type=code&client_id=bEjgr4rUka8Oswy&redirect_uri=http%3A%2F%2F172.25.210.184%2Fauthorized%3Fstate%3De8be712d-3858-4549-86b2-60cea46c7396'

What it all comes down to is that urllib.parse.urlencode eventually calls quote() from that same module. And here is the description of quote():

Each part of a URL, e.g. the path info, the query, etc., has a different set of reserved characters that must be quoted. The quote function offers a cautious (not minimal) way to quote a string for most of these parts.

@bsolomon1124
Copy link

bsolomon1124 commented Apr 11, 2019

For a different discussion see: psf/requests#794

@dmoklaf
Copy link

dmoklaf commented Dec 7, 2020

Not escaping the ? in query arguments is causing many web servers to reject the connections (I crawl >100.000 pages on more than >10.000 web servers through aiohttp which uses yarl intensively).

The standard does not force "?" from being encoded but does not forbid it either. Consequently, there are 2 alternatives which are standard-compliant and not one. Of these 2, the only alternative that seems webservers-compatible is to always encode it.

@dralley
Copy link

dralley commented Jul 8, 2022

Another concrete case of this documented here: aio-libs/aiohttp#5319 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants