Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL.build doesn't url encode credentials #156

Closed
hynek opened this issue Jan 12, 2018 · 15 comments
Closed

URL.build doesn't url encode credentials #156

hynek opened this issue Jan 12, 2018 · 15 comments

Comments

@hynek
Copy link
Contributor

hynek commented Jan 12, 2018

EDIT the bug described here is just a consequence of the bug described below.

Currently square brackets in the username or the password lead to the URL being interpreted as an IPv6 address:

>>> yarl.URL("https://x:[@localhost")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/hynek/.virtualenvs/ansible-tools/lib/python3.6/site-packages/yarl/__init__.py", line 149, in __init__
    val = urlsplit(val)
  File "/Users/hynek/.pyenv/versions/3.6.3/lib/python3.6/urllib/parse.py", line 439, in urlsplit
    raise ValueError("Invalid IPv6 URL")
ValueError: Invalid IPv6 URL
Invalid IPv6 URL

>>> yarl.URL("https://[:x@localhost")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/hynek/.virtualenvs/ansible-tools/lib/python3.6/site-packages/yarl/__init__.py", line 149, in __init__
    val = urlsplit(val)
  File "/Users/hynek/.pyenv/versions/3.6.3/lib/python3.6/urllib/parse.py", line 439, in urlsplit
    raise ValueError("Invalid IPv6 URL")
ValueError: Invalid IPv6 URL
Invalid IPv6 URL

Sadly, the error seems to come from the stdlib tho:

>>> p.urlsplit("https://x:[@localhost")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/hynek/.pyenv/versions/3.6.3/lib/python3.6/urllib/parse.py", line 439, in urlsplit
    raise ValueError("Invalid IPv6 URL")
ValueError: Invalid IPv6 URL
Invalid IPv6 URL
@hynek
Copy link
Contributor Author

hynek commented Jan 12, 2018

(FTR: hyperlink only forbids /?#@)

@hynek
Copy link
Contributor Author

hynek commented Jan 12, 2018

OK so I believe the bug is rather that yarl doesn’t url encode credentials:

>>> str(yarl.URL.build(
...     scheme="http", host="localhost",
...     user="u?er",   # won't be encoded
...     password="pass[word",  # won't be encoded
...     path="[?"   # gets encoded
... ))
'http://u?er:pass[word@localhost/%5B%3F'

@hynek hynek changed the title Authentication cannot contain square brackets URL.build doesn't url encode credentials Jan 12, 2018
@webknjaz
Copy link
Member

Some clue from RFC:

https://tools.ietf.org/html/rfc3986#section-3.2.1

      userinfo    = *( unreserved / pct-encoded / sub-delims / ":" )

https://tools.ietf.org/html/rfc3986#section-2.3

      unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

https://tools.ietf.org/html/rfc3986#section-2.1

      pct-encoded = "%" HEXDIG HEXDIG

https://tools.ietf.org/html/rfc3986#section-2.2

      sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

@socketpair
Copy link

So. [ must be percent-encoded. i.e. https://x:%5B@localhost

@hynek
Copy link
Contributor Author

hynek commented Jan 12, 2018

Yeah current workaround is simply password=urllib.parse.quote("pass[word") and yarl seems to parse it correctly.

@webknjaz
Copy link
Member

@socketpair yeah, as per https://twitter.com/webKnjaZ/status/951789259631087616 yarl.URL() works as expected, but yarl.URL.build() is broken.

@hynek
Copy link
Contributor Author

hynek commented Jan 12, 2018

(just to be clear: I ran into my original bug because I tried to parse an URL that I've created using URL.build() :))

@webknjaz
Copy link
Member

Still, there's an ambiguity in how to detect whether it needs quoting.
If build() will always quote inputs, one might end up with double-quoted chars. I think it's pretty standard that user (especially with stdlib) should urllib.parse.quote('[') before passing strings into url building functions. So in this case we get %255B instead of %5B in URL.
We could detect denied chars and % in strings, but it's not really correct solution talking into account possibility of mixed strings.

The only way to solve this is to introduce additional method like yarl.URL.build_from_str(), which would do quoting unconditionally and will force users use it consciously and explicitly.

@hynek
Copy link
Contributor Author

hynek commented Jan 12, 2018

The same argument could be made about the path argument which seems to be…smart?

>>> yarl.URL.build(host="localhost", path="%", scheme="http")
URL('http://localhost/%25')

>>> yarl.URL.build(host="localhost", path="%5B", scheme="http")
URL('http://localhost/%5B')

@webknjaz
Copy link
Member

@hynek and this is also a bug. how would it know whether I want it double encoded?

@webknjaz
Copy link
Member

Magic is harmful: explicit is better, than implicit.

@hynek
Copy link
Contributor Author

hynek commented Jan 12, 2018

I’m not arguing for either; I’m just pointing out current behaviour.

@webknjaz
Copy link
Member

Just sharing my opinion explicitly :)

@asvetlov
Copy link
Member

asvetlov commented Jan 12, 2018

URL.build was fixed on master yesterday:

In [1]: from yarl import URL

In [2]: u = URL.build(scheme="http", host="localhost",  user="u?er", password="pass[word", path="[?")

In [3]: str(u)
Out[3]: 'http://u%3Fer:pass%5Bword@localhost/%5B%3F'

URL.build does encoding the input.
We can add an encoded=False option to skip this step (URL ctor already have it).

@hynek
Copy link
Contributor Author

hynek commented Jan 12, 2018

Yay :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants