Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spaces at the beginning are being trimmed now (change with python 3.10.12) #706

Closed
wants to merge 1 commit into from

Conversation

MeggyCal
Copy link

@MeggyCal MeggyCal commented Jun 29, 2023

Frow Python 310 WhatsNew:

gh-102153: urllib.parse.urlsplit() now strips leading C0 control and space characters following the specification for URLs defined by WHATWG in response to CVE-2023-24329. Patch by Illia Volochii.

therefore some tests fail:

[   15s] =================================== FAILURES ===================================
[   15s] _________________ test_urlparse[\t   :foo.com   \n-expected8] __________________
[   15s] 
[   15s] uri = '\t   :foo.com   \n'
[   15s] expected = ParseResult(scheme='', netloc='', path='   :foo.com   ', params='', query='', fragment='')
[   15s] 
[   15s]     @pytest.mark.parametrize(
[   15s]         "uri, expected",
[   15s]         [
[   15s]             ("", ParseResult()),
[   15s]             ("http://example\t.\norg", ParseResult(scheme="http", netloc="example.org")),
[   15s]             (
[   15s]                 "http://user:pass@foo:21/bar;par?b#c",
[   15s]                 ParseResult(
[   15s]                     scheme="http",
[   15s]                     netloc="user:pass@foo:21",
[   15s]                     path="/bar",
[   15s]                     params="par",
[   15s]                     query="b",
[   15s]                     fragment="c",
[   15s]                 ),
[   15s]             ),
[   15s]             ("https://test:@test", ParseResult(scheme="https", netloc="test:@test")),
[   15s]             ("https://:@test", ParseResult(scheme="https", netloc=":@test")),
[   15s]             (
[   15s]                 "non-special://test:@test/x",
[   15s]                 ParseResult(scheme="non-special", netloc="test:@test", path="/x"),
[   15s]             ),
[   15s]             (
[   15s]                 "non-special://:@test/x",
[   15s]                 ParseResult(scheme="non-special", netloc=":@test", path="/x"),
[   15s]             ),
[   15s]             ("http:foo.com", ParseResult(scheme="http", path="foo.com")),
[   15s]             # NOTE(willkg): The wpt tests set the scheme to http becaue that's what
[   15s]             # the base url is. Since our parser is not using a baseurl, it sets the
[   15s]             # scheme to "". Further, our parser includes spaces at the beginning,
[   15s]             # but I don't see that as being problematic.
[   15s]             ("\t   :foo.com   \n", ParseResult(path="   :foo.com   ")),
[   15s]             # NOTE(willkg): The wpt tests set the path to "/foo/foo.com" because
[   15s]             # the base url is at "/foo"
[   15s]             (" foo.com  ", ParseResult(path=" foo.com  ")),
[   15s]             ("a:\t foo.com", ParseResult(scheme="a", path=" foo.com")),
[   15s]             (
[   15s]                 "http://f:21/ b ? d # e ",
[   15s]                 ParseResult(
[   15s]                     scheme="http", netloc="f:21", path="/ b ", query=" d ", fragment=" e "
[   15s]                 ),
[   15s]             ),
[   15s]             (
[   15s]                 "lolscheme:x x#x x",
[   15s]                 ParseResult(scheme="lolscheme", path="x x", fragment="x x"),
[   15s]             ),
[   15s]             ("http://f:/c", ParseResult(scheme="http", netloc="f:", path="/c")),
[   15s]             ("http://f:0/c", ParseResult(scheme="http", netloc="f:0", path="/c")),
[   15s]             # NOTE(willkg): The wpt tests normalize the 0000000000000 to 0 so the
[   15s]             # netloc should be "f:0".
[   15s]             (
[   15s]                 "http://f:00000000000000/c",
[   15s]                 ParseResult(scheme="http", netloc="f:00000000000000", path="/c"),
[   15s]             ),
[   15s]             # NOTE(willkg): The wpt tests drop the 0000000000000000000 altogether
[   15s]             # so the netloc should be "f".
[   15s]             (
[   15s]                 "http://f:00000000000000000000080/c",
[   15s]                 ParseResult(scheme="http", netloc="f:00000000000000000000080", path="/c"),
[   15s]             ),
[   15s]             # This is an invalid ipv6 url
[   15s]             ("http://2001::1]", ValueError),
[   15s]             # NOTE(willkg): The wpt tests show this as a parse error, but our
[   15s]             # parser "parses" it.
[   15s]             ("http://f:b/c", ParseResult(scheme="http", netloc="f:b", path="/c")),
[   15s]             # NOTE(willkg): The wpt tests show this as a parse error, but our
[   15s]             # parser "parses" it.
[   15s]             ("http://f: /c", ParseResult(scheme="http", netloc="f: ", path="/c")),
[   15s]             # NOTE(willkg): The wpt tests show this as a parse error, but our
[   15s]             # parser "parses" it.
[   15s]             ("http://f:999999/c", ParseResult(scheme="http", netloc="f:999999", path="/c")),
[   15s]         ],
[   15s]     )
[   15s]     def test_urlparse(uri, expected):
[   15s]     
[   15s]         if inspect.isclass(expected) and issubclass(expected, BaseException):
[   15s]             with pytest.raises(expected):
[   15s]                 urlparse(uri)
[   15s]     
[   15s]         else:
[   15s]             parsed = urlparse(uri)
[   15s]             print(parsed)
[   15s]             assert parsed.scheme == expected.scheme
[   15s]             assert parsed.netloc == expected.netloc
[   15s] >           assert parsed.path == expected.path
[   15s] E           AssertionError: assert ':foo.com   ' == '   :foo.com   '
[   15s] E             -    :foo.com   
[   15s] E             ? ---
[   15s] E             + :foo.com
[   15s] 
[   15s] tests/test_parse_shim.py:108: AssertionError
[   15s] ----------------------------- Captured stdout call -----------------------------
[   15s] ParseResult(scheme='', netloc='', path=':foo.com   ', params='', query='', fragment='')
[   15s] _____________________ test_urlparse[ foo.com  -expected9] ______________________
[   15s] 
[   15s] uri = ' foo.com  '
[   15s] expected = ParseResult(scheme='', netloc='', path=' foo.com  ', params='', query='', fragment='')
[   15s] 
[   15s]     @pytest.mark.parametrize(
[   15s]         "uri, expected",
[   15s]         [
[   15s]             ("", ParseResult()),
[   15s]             ("http://example\t.\norg", ParseResult(scheme="http", netloc="example.org")),
[   15s]             (
[   15s]                 "http://user:pass@foo:21/bar;par?b#c",
[   15s]                 ParseResult(
[   15s]                     scheme="http",
[   15s]                     netloc="user:pass@foo:21",
[   15s]                     path="/bar",
[   15s]                     params="par",
[   15s]                     query="b",
[   15s]                     fragment="c",
[   15s]                 ),
[   15s]             ),
[   15s]             ("https://test:@test", ParseResult(scheme="https", netloc="test:@test")),
[   15s]             ("https://:@test", ParseResult(scheme="https", netloc=":@test")),
[   15s]             (
[   15s]                 "non-special://test:@test/x",
[   15s]                 ParseResult(scheme="non-special", netloc="test:@test", path="/x"),
[   15s]             ),
[   15s]             (
[   15s]                 "non-special://:@test/x",
[   15s]                 ParseResult(scheme="non-special", netloc=":@test", path="/x"),
[   15s]             ),
[   15s]             ("http:foo.com", ParseResult(scheme="http", path="foo.com")),
[   15s]             # NOTE(willkg): The wpt tests set the scheme to http becaue that's what
[   15s]             # the base url is. Since our parser is not using a baseurl, it sets the
[   15s]             # scheme to "". Further, our parser includes spaces at the beginning,
[   15s]             # but I don't see that as being problematic.
[   15s]             ("\t   :foo.com   \n", ParseResult(path="   :foo.com   ")),
[   15s]             # NOTE(willkg): The wpt tests set the path to "/foo/foo.com" because
[   15s]             # the base url is at "/foo"
[   15s]             (" foo.com  ", ParseResult(path=" foo.com  ")),
[   15s]             ("a:\t foo.com", ParseResult(scheme="a", path=" foo.com")),
[   15s]             (
[   15s]                 "http://f:21/ b ? d # e ",
[   15s]                 ParseResult(
[   15s]                     scheme="http", netloc="f:21", path="/ b ", query=" d ", fragment=" e "
[   15s]                 ),
[   15s]             ),
[   15s]             (
[   15s]                 "lolscheme:x x#x x",
[   15s]                 ParseResult(scheme="lolscheme", path="x x", fragment="x x"),
[   15s]             ),
[   15s]             ("http://f:/c", ParseResult(scheme="http", netloc="f:", path="/c")),
[   15s]             ("http://f:0/c", ParseResult(scheme="http", netloc="f:0", path="/c")),
[   15s]             # NOTE(willkg): The wpt tests normalize the 0000000000000 to 0 so the
[   15s]             # netloc should be "f:0".
[   15s]             (
[   15s]                 "http://f:00000000000000/c",
[   15s]                 ParseResult(scheme="http", netloc="f:00000000000000", path="/c"),
[   15s]             ),
[   15s]             # NOTE(willkg): The wpt tests drop the 0000000000000000000 altogether
[   15s]             # so the netloc should be "f".
[   15s]             (
[   15s]                 "http://f:00000000000000000000080/c",
[   15s]                 ParseResult(scheme="http", netloc="f:00000000000000000000080", path="/c"),
[   15s]             ),
[   15s]             # This is an invalid ipv6 url
[   15s]             ("http://2001::1]", ValueError),
[   15s]             # NOTE(willkg): The wpt tests show this as a parse error, but our
[   15s]             # parser "parses" it.
[   15s]             ("http://f:b/c", ParseResult(scheme="http", netloc="f:b", path="/c")),
[   15s]             # NOTE(willkg): The wpt tests show this as a parse error, but our
[   15s]             # parser "parses" it.
[   15s]             ("http://f: /c", ParseResult(scheme="http", netloc="f: ", path="/c")),
[   15s]             # NOTE(willkg): The wpt tests show this as a parse error, but our
[   15s]             # parser "parses" it.
[   15s]             ("http://f:999999/c", ParseResult(scheme="http", netloc="f:999999", path="/c")),
[   15s]         ],
[   15s]     )
[   15s]     def test_urlparse(uri, expected):
[   15s]     
[   15s]         if inspect.isclass(expected) and issubclass(expected, BaseException):
[   15s]             with pytest.raises(expected):
[   15s]                 urlparse(uri)
[   15s]     
[   15s]         else:
[   15s]             parsed = urlparse(uri)
[   15s]             print(parsed)
[   15s]             assert parsed.scheme == expected.scheme
[   15s]             assert parsed.netloc == expected.netloc
[   15s] >           assert parsed.path == expected.path
[   15s] E           AssertionError: assert 'foo.com  ' == ' foo.com  '
[   15s] E             -  foo.com  
[   15s] E             ? -
[   15s] E             + foo.com
[   15s] 
[   15s] tests/test_parse_shim.py:108: AssertionError
[   15s] ----------------------------- Captured stdout call -----------------------------
[   15s] ParseResult(scheme='', netloc='', path='foo.com  ', params='', query='', fragment='')

If I change them like in this PR, they succeed again. But they can be changed in many ways, so please choose the way which looks best to you.

@willkg
Copy link
Member

willkg commented Jun 29, 2023

It'd be more helpful to write this up as an issue. Then we can discuss it there. Thank you!

@MeggyCal
Copy link
Author

OK, I was just trying to provide possible solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants