urllib.parse space handling CVE-2023-24329 appears unfixed #102153

AdrianBunk · 2023-02-22T21:17:17Z

Everyone (including the submitter of the now public exploit who submitted the issue half a year ago to security@python.org and the NVD) seems to think that #99421 "accidently fixed" CVE-2023-24329.

Did the Python Security Response Team verify that this vulnerability that was reported to them and that is now public was fixed by #99421?

The PoC from the submitter still works for me with the Debian package 3.11.2-4, which surprised me and makes me wonder whether the fix had any effect at all on the stripping of leading blanks issue in the CVE.

Linked PRs

The text was updated successfully, but these errors were encountered:

ned-deily · 2023-02-23T20:22:02Z

@pablogsal

pablogsal · 2023-02-24T12:57:49Z

The backport was merged here #99446 no?

AdrianBunk · 2023-02-24T14:21:36Z

@pablogsal #99446 is a backport of #99421 that does not seem to fix CVE-2023-24329:

$ cat test.py 
import urllib.request
from urllib.parse import urlparse
def safeURLOpener(inputLink):
    block_host = ["instagram.com", "youtube.com", "tiktok.com", "example.com"]
    input_hostname = urlparse(inputLink).hostname
    if input_hostname in block_host:
        print("input hostname is forbidden")
        return
    target = urllib.request.urlopen(inputLink)
    content = target.read()
    print(content)

safeURLOpener("https://example.com")
safeURLOpener(" https://example.com")  # CVE-2023-24329
safeURLOpener("+https://example.com")  # 99421
$ python3.10 test.py 
input hostname is forbidden
b'<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta charset="utf-8" />\n    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />\n    <meta name="viewport" content="width=device-width, initial-scale=1" />\n    <style type="text/css">\n    body {\n        background-color: #f0f0f2;\n        margin: 0;\n        padding: 0;\n        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;\n        \n    }\n    div {\n        width: 600px;\n        margin: 5em auto;\n        padding: 2em;\n        background-color: #fdfdff;\n        border-radius: 0.5em;\n        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n    }\n    a:link, a:visited {\n        color: #38488f;\n        text-decoration: none;\n    }\n    @media (max-width: 700px) {\n        div {\n            margin: 0 auto;\n            width: auto;\n        }\n    }\n    </style>    \n</head>\n\n<body>\n<div>\n    <h1>Example Domain</h1>\n    <p>This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.</p>\n    <p><a href="https://www.iana.org/domains/example">More information...</a></p>\n</div>\n</body>\n</html>\n'
input hostname is forbidden
$ python3.11 test.py 
input hostname is forbidden
b'<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta charset="utf-8" />\n    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />\n    <meta name="viewport" content="width=device-width, initial-scale=1" />\n    <style type="text/css">\n    body {\n        background-color: #f0f0f2;\n        margin: 0;\n        padding: 0;\n        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;\n        \n    }\n    div {\n        width: 600px;\n        margin: 5em auto;\n        padding: 2em;\n        background-color: #fdfdff;\n        border-radius: 0.5em;\n        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n    }\n    a:link, a:visited {\n        color: #38488f;\n        text-decoration: none;\n    }\n    @media (max-width: 700px) {\n        div {\n            margin: 0 auto;\n            width: auto;\n        }\n    }\n    </style>    \n</head>\n\n<body>\n<div>\n    <h1>Example Domain</h1>\n    <p>This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.</p>\n    <p><a href="https://www.iana.org/domains/example">More information...</a></p>\n</div>\n</body>\n</html>\n'
Traceback (most recent call last):
  File "/tmp/test.py", line 15, in <module>
    safeURLOpener("+https://example.com")  # 99421
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/test.py", line 9, in safeURLOpener
    target = urllib.request.urlopen(inputLink)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 519, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 541, in _open
    return self._call_chain(self.handle_open, 'unknown',
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 1419, in unknown_open
    raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: +https>
$

pablogsal · 2023-02-24T16:34:24Z

CC: @gpshead

arhadthedev · 2023-02-27T09:21:03Z

Backport CVE-2023-24329 to all in-service releases: urlparse does not correctly handle schemes that begin with ASCII digits, '+', '-', and '.' characters #102293

CharlieZhao95 · 2023-02-28T10:12:30Z

BTW, does this patch (CVE-2023-24329) require a backport to 3.10, 3.9 and older branches?

I noticed that the bug is currently only backported to the 3.11 branch, but it actually affects all versions prior to 3.11.

RSAlderman · 2023-02-28T10:19:51Z

@CharlieZhao95 that's what I was asking for in #102293 - backport of the security vulnerability fix for CVE-2023-24329 to all in-service releases (3.7-3.10).

The request for the backports has been closed as a duplicate of this issue by @gpshead

CharlesBryant-G · 2023-02-28T13:33:45Z

Maybe it's worth taking a step back and looking at the problem in a wider context.

In the PoC, the vulnerability arises not because parse() returns the wrong answer, but because it interprets the url differently from urlopen(). If they were both wrong in the same way it would be harmless. Why is there more than one piece of code which parses URLs? The DRY principle should apply.

Closely related, note that urlparse() does not have a vulnerability at all - any vulnerability is in code which relies on it and does so in a way in which it creates a vulnerability. In the PoC, the vulnerability is in the code for safeURLOpener().

The way in which urlparse() is implemented is fragile and bug prone. As a general principle, parsing code should not look ahead for known delimiters, it should systematically work from the start, advancing over characters tested to be legitimate. So
urlparse('example.com@!$%^&*()_+-={}[]:;"\\|?query#frag')
should stop parsing at the '%' as that is not a legal character when not followed by two hex digits. It may return that "example.com@!$" is the path and there are extra characters after the URL (this style of parsing is often convenient when parsing items which may contain things to be parsed), or report failure due to an invalid URL. Instead, an early stage of processing skips ahead to the '?' and '#', so it claims there is a path, query, and fragment. While it could then validate these pieces and realise that the path is invalid, this can be forgotten and makes it unnecessarily difficult and dangerous to make the parser accept a valid URL followed by other characters (because it would need to reliably undo any parsing of anything past the valid part).

gpshead · 2023-03-01T00:47:26Z

We will backport something that makes sense if we determine this is a security issue, that's why I duped the other issue here. Backporting the existing commit further does not make sense to me until the leading space issue, if present as reported here, is resolved. (I haven't taken the time to look. this is not an emergency)

xiaoge1001 · 2023-03-06T08:40:27Z

>>> from urllib.parse import urlparse
>>> urlparse(" https://example.com")
ParseResult(scheme='', netloc='', path=' https://example.com', params='', query='', fragment='')

I tested it and the problem doesn't seem to be fixed. I execute urlparse(" https://example.com"), the output before and after merging #99421 is the same.

xiaoge1001 · 2023-03-06T11:57:11Z

CVE-2023-24329 says that supplying a URL that starts with blank characters is bad.

If a URL-scheme is " https", it will jump out of the loop in the following code:

cpython/Lib/urllib/parse.py

Line 465 in 50b0415

if c not in scheme_chars:

After #99421 is merged, it will exit early:

cpython/Lib/urllib/parse.py

Line 463 in 2e279e8

if i > 0 and url[0].isascii() and url[0].isalpha():

The code in line 468 is not executed before and after the modification, the subsequent code execution will not change：

when input a URL that starts with blank characters，#99421 doesn't seem to have no effect.

xiaoge1001 · 2023-03-07T04:03:56Z

@gpshead Hello, can you review this pull #102470 ?

xiaoge1001 · 2023-03-07T06:17:45Z

https://nvd.nist.gov/vuln/detail/CVE-2023-24329

Base Score: [7.5 HIGH]
vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:H/A:N

This is a high-score vulnerability. Can we fix it as soon as possible? If we don't think it's a vulnerability, can we reject it?

illia-v · 2023-03-07T11:43:12Z

BTW, in JS both leading and trailing space and C0 control are removed by URL parsers.

https://url.spec.whatwg.org/#url-parsing

xiaoge1001 · 2023-03-07T12:00:27Z

For URLs with leading whitespace, ruby throws InvalidURIError.

…lit`

gpshead · 2023-03-09T09:25:05Z

Adding some historical context - caution, this is long, but worth understanding:

In August 2022 a discussion about this issue was spawned in private on the Python security response team mailing list after the initial report from Yebo Cao came in. Our intent was to file a public issue about it and continue discussion from there, but that never happened. So lets consider this issue to be that one... thanks for filing it!

In the private discussions, which are longer than I want to paste so I'll summarize some points from that, and paste a bunch of bits of other parts:

For security fixes we value stability very highly. We are very cautious about changing the behavior of API that may break existing code using it on already deployed Python version that receive a security patch backports. https://www.hyrumslaw.com/ very much applies.
We agree that agree this urlparse behavior is unexpected vs modern sensabilities.
The standard library urllib functions are inconsistent in their behavior. For example urlopen and urlparse do not always behave the same / urlopen does not always call urlparse. (uhoh)
urllib and related modules are VERY old crufty code. If you dig, expect that much of it comes from the mid 1990s, before RFCs on the subject were well defined and long before WHATWG was even a thing let alone widely accepted as the real standard borne of practical experience.
From looking over old Python issues it's been observed that: "users do not (and never really) expect RFC behavior here. What they expect is an implementation of the WHATWG url parsing standard (which is to say, they expect urlparse to behave like a browser does): https://url.spec.whatwg.org/"
It is fair to say that this part of the standard library is unowned and undermaintained. Nobody wants to own it and this post should make it apparent as to why: Compatibility constraints mean it remains a legacy behavior mess.

cc: @PaulMcMillan who did a lot of the above and below analysis last year.

Paul came up with a nice looking list of urlparse potential test cases and demonstrated their current behavior as of August 2022 here https://gist.github.com/PaulMcMillan/70618ca857a0519379af704d88a1c9af as part of the analysis. (even if some of those have changed with other fixes since, the ones with the spaces in what should've been the scheme do not appear to have - I haven't checked all of those behaviors across time and versions).

URL Schemes:

RFC 1808 2.1 specifies that schemes are named as per RFC1738 section 2.1 which says:

   Scheme names consist of a sequence of characters. The lower case
   letters "a"--"z", digits, and the characters plus ("+"), period
   ("."), and hyphen ("-") are allowed.

This seems to reasonably imply that "scheme" should raise a ValueError if other characters (e.g. spaces) are included.

BUT... The tricky question for the Python stdlib is "do we have a scheme with invalid characters?" vs. our fallback of "Otherwise the input is presumed to be a relative URL and thus to start with a path component." per our long standing documented and implemented urllib.parse API. (pause for audience groans...) But the documented API is talking about the netloc there, not the scheme.

What users think they want is: "A string containing :// will try to parse everything to the left of : as the scheme."

However, this doesn't work. If you look at actual user behavior, they have a tendency to do things like put unencoded urls in the query, or possibly even the path, and it doesn't seem to make sense to break schema-less parsing in that case.

Existing user code depends on that API behavior.

Another past issue demonstrating similar problems with changing the behavior of the urlparse() APIs is the past joy of netloc's containing port numbers. See #661 (and its linked to issues)

Leaving it unconclusive that there's much we can do about this.

@PaulMcMillan had one suggestion, perhaps a more heuristic approach: If a urlcontains a ://, split on that, if the left hand side of that does not have a / in it, assume it was supposed to be a scheme and raise a ValueError if it contains any invalid scheme characters. Essentially adding a "what you have looks enough like a url that we're pretty sure your schema is invalid" check here:

cpython/Lib/urllib/parse.py

Line 465 in 2e279e8

if c not in scheme_chars:

We're not sure if this edge case is worth the backwards compatibility change since it doesn't cover a myriad of other ways urllib.parse.urlparse() differs from browsers.

Further discussion covered questions such as "can't we just rstrip() or lstrip() or strip() always?", noting other use patterns and reasons why change here in complicated in practice:

rstrip vs trailing spaces isn't so easy as those can be meaningful in path names.
current behavior in path-less urls such as "scheme://foo.example " where urlopen() works today but parsing that with urlparse() the trailing space winds up in the netloc which will defeat similar blocklist style string matches.
people are likely to use urlparse(), but then use requests or other libraries instead of urlopen(). requests accepts a prepended space, but raises an error in the prior trailing space in netloc example...
urlopen()'s stripping behavior is possibly unintentional as a side effect its own very messy internals.

At a minimum we should document what happens with invalid schemes. Perhaps we should be recommending better fully WHATWG compliant PyPI maintained libraries. Ideally we'd have shipped one. But we don't today and changing our existing urlparse API to behave that way will break existing users so it is not a security fix... it'd need to be a thoughtful breaking change API transition with a behavior deprecation period and recommendations for code needing any of the old behaviors. Not a security fix.

(The bulk of the above analysis and a bunch of the words come from Paul and some from Guido. I'm opening them up here for a wider audience, I added or rephrased or editorialized and emphasized a few things along the way.)

xiaoge1001 · 2023-03-09T12:26:17Z

@gpshead Thank you for reply.

So there's a fix plan at the moment for the main branch? Because this is a high-score vulnerability, I hope it can be fixed as soon as possible.

AdrianBunk · 2023-03-09T14:36:05Z

@gpshead Thanks for the explanations. In my opinion Python does in recent years already break/remove far too much existing functionality, and I am pleasantly surprised by your awareness for maintaining backwards compatibility.

In addition to the technical side you explained, there is also a process problem you should discuss (perhaps not in public) in case you aren't already doing this:

Right now there is a CVE with a high score links to a description of a vulnerability with a PoC and a merge request with a one-line fix - but the PoC still works with the fix. Something went wrong that resulted in people wrongly thinking that #99421 would fix CVE-2023-24329, and for that it is not even relevant whether the final resolution will be a code fix or a documentation update that this is not considered a bug.

Our intent was to file a public issue about it and continue discussion from there, but that never happened.

Apparently:

someone (or noone?) was supposed to do this, but
this never happened, and
there might be a lack of internal tracking that would detect such overdue tasks?

pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508) `urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595. This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329). --------- (cherry picked from commit 2f630e1) (cherry picked from commit 610cc0a) Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com> Co-authored-by: Illia Volochii <illia.volochii@gmail.com> Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>

…lit` (GH-104896) `urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit GH-25595. This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329). (cherry picked from commit d7f8a5f) (cherry picked from commit 2f630e1) (cherry picked from commit 610cc0a) (cherry picked from commit f48a96a) Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com> Co-authored-by: Illia Volochii <illia.volochii@gmail.com> Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>

…lit` (GH-102508) (GH-104575) (GH-104592) (#104593) (#104895) `urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit GH-25595. This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329). I simplified the docs by eliding the state of the world explanatory paragraph in this security release only backport. (people will see that in the mainline /3/ docs) (cherry picked from commit d7f8a5f) (cherry picked from commit 2f630e1) (cherry picked from commit 610cc0a) (cherry picked from commit f48a96a) Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com> Co-authored-by: Illia Volochii <illia.volochii@gmail.com> Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>

The latest release of Python 3.8 and 3.9 have been just released that contain the fix to a security vulnerability backported to those versions: python/cpython#102153 Release notes: * https://www.python.org/downloads/release/python-3817/ * https://www.python.org/downloads/release/python-3917/ The fix improved sanitizing of the URLs and until Python 3.10 and 3.11 get released, we need to add the sanitization ourselves to pass tests on all versions. In order to improve security of airflow users and make the tests work regardless whether the users have latest Python versions released, we add extra sanitisation step to the URL to apply the standard WHATWG specification.

pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508) `urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595. This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329). Backported from Python 3.12

The latest release of Python 3.8 and 3.9 have been just released that contain the fix to a security vulnerability backported to those versions: python/cpython#102153 Release notes: * https://www.python.org/downloads/release/python-3817/ * https://www.python.org/downloads/release/python-3917/ The fix improved sanitizing of the URLs and until Python 3.10 and 3.11 get released, we need to add the sanitization ourselves to pass tests on all versions. In order to improve security of airflow users and make the tests work regardless whether the users have latest Python versions released, we add extra sanitisation step to the URL to apply the standard WHATWG specification.

The latest release of Python 3.8 and 3.9 have been just released that contain the fix to a security vulnerability backported to those versions: python/cpython#102153 Release notes: * https://www.python.org/downloads/release/python-3817/ * https://www.python.org/downloads/release/python-3917/ The fix improved sanitizing of the URLs and until Python 3.10 and 3.11 get released, we need to add the sanitization ourselves to pass tests on all versions. In order to improve security of airflow users and make the tests work regardless whether the users have latest Python versions released, we add extra sanitisation step to the URL to apply the standard WHATWG specification. (cherry picked from commit 87c5c9f)

* Post 3.8.16 * [3.8] Update copyright years to 2023. (pythongh-100852) * [3.8] Update copyright years to 2023. (pythongh-100848). (cherry picked from commit 11f9932) Co-authored-by: Benjamin Peterson <benjamin@python.org> * Update additional copyright years to 2023. Co-authored-by: Ned Deily <nad@python.org> * [3.8] Update copyright year in README (pythonGH-100863) (pythonGH-100867) (cherry picked from commit 30a6cc4) Co-authored-by: Ned Deily <nad@python.org> Co-authored-by: HARSHA VARDHAN <75431678+Thunder-007@users.noreply.github.com> * [3.8] Correct CVE-2020-10735 documentation (pythonGH-100306) (python#100698) (cherry picked from commit 1cf3d78) (cherry picked from commit 88fe8d7) Co-authored-by: Jeremy Paige <ucodery@gmail.com> Co-authored-by: Gregory P. Smith <greg@krypto.org> * [3.8] Bump Azure Pipelines to ubuntu-22.04 (pythonGH-101089) (python#101215) (cherry picked from commit c22a55c) Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com> * [3.8] pythongh-100180: Update Windows installer to OpenSSL 1.1.1s (pythonGH-100903) (python#101258) * pythongh-101422: (docs) TarFile default errorlevel argument is 1, not 0 (pythonGH-101424) (cherry picked from commit ea23271) Co-authored-by: Owain Davies <116417456+OTheDev@users.noreply.github.com> * [3.8] pythongh-95778: add doc missing in some places (pythonGH-100627) (python#101630) (cherry picked from commit 4652182) * [3.8] pythongh-101283: Improved fallback logic for subprocess with shell=True on Windows (pythonGH-101286) (python#101710) Co-authored-by: Oleg Iarygin <oleg@arhadthedev.net> Co-authored-by: Steve Dower <steve.dower@microsoft.com> * [3.8] pythongh-101981: Fix Ubuntu SSL tests with OpenSSL (3.1.0-beta1) CI i… (python#102095) [3.8] pythongh-101981: Fix Ubuntu SSL tests with OpenSSL (3.1.0-beta1) CI issue (pythongh-102079) * [3.8] pythonGH-102306 Avoid GHA CI macOS test_posix failure by using the appropriate macOS SDK (pythonGH-102307) [3.8] Avoid GHA CI macOS test_posix failure by using the appropriate macOS SDK. * [3.8] pythongh-101726: Update the OpenSSL version to 1.1.1t (pythonGH-101727) (pythonGH-101752) Fixes CVE-2023-0286 (High) and a couple of Medium security issues. https://www.openssl.org/news/secadv/20230207.txt Co-authored-by: Gregory P. Smith <greg@krypto.org> Co-authored-by: Ned Deily <nad@python.org> * [3.8] pythongh-102627: Replace address pointing toward malicious web page (pythonGH-102630) (pythonGH-102667) (cherry picked from commit 61479d4) Co-authored-by: Blind4Basics <32236948+Blind4Basics@users.noreply.github.com> Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM> Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com> * [3.8] pythongh-101997: Update bundled pip version to 23.0.1 (pythonGH-101998). (python#102244) (cherry picked from commit 89d9ff0) * [3.8] pythongh-102950: Implement PEP 706 – Filter for tarfile.extractall (pythonGH-102953) (python#104548) Backport of c8c3956 * [3.8] pythongh-99889: Fix directory traversal security flaw in uu.decode() (pythonGH-104096) (python#104332) (cherry picked from commit 0aeda29) Co-authored-by: Sam Carroll <70000253+samcarroll42@users.noreply.github.com> * [3.8] pythongh-104049: do not expose on-disk location from SimpleHTTPRequestHandler (pythonGH-104067) (python#104121) Do not expose the local server's on-disk location from `SimpleHTTPRequestHandler` when generating a directory index. (unnecessary information disclosure) (cherry picked from commit c7c3a60) Co-authored-by: Ethan Furman <ethan@stoneleaf.us> Co-authored-by: Gregory P. Smith <greg@krypto.org> Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com> * [3.8] pythongh-103935: Use `io.open_code()` when executing code in trace and profile modules (pythonGH-103947) (python#103954) Co-authored-by: Tian Gao <gaogaotiantian@hotmail.com> * [3.8] pythongh-68966: fix versionchanged in docs (pythonGH-105299) * [3.8] Update GitHub CI workflow for macOS. (pythonGH-105302) * [3.8] pythongh-105184: document that marshal functions can fail and need to be checked with PyErr_Occurred (pythonGH-105185) (python#105222) (cherry picked from commit ee26ca1) Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com> * [3.8] pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508) (pythonGH-104575) (pythonGH-104592) (python#104593) (python#104895) `urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595. This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329). I simplified the docs by eliding the state of the world explanatory paragraph in this security release only backport. (people will see that in the mainline /3/ docs) (cherry picked from commit d7f8a5f) (cherry picked from commit 2f630e1) (cherry picked from commit 610cc0a) (cherry picked from commit f48a96a) Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com> Co-authored-by: Illia Volochii <illia.volochii@gmail.com> Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org> * [3.8] pythongh-103142: Upgrade binary builds and CI to OpenSSL 1.1.1u (pythonGH-105174) (pythonGH-105200) (pythonGH-105205) (python#105370) Upgrade builds to OpenSSL 1.1.1u. Also updates _ssl_data_111.h from OpenSSL 1.1.1u, _ssl_data_300.h from 3.0.9. Manual edits to the _ssl_data_300.h file prevent it from removing any existing definitions in case those exist in some peoples builds and were important (avoiding regressions during backporting). (cherry picked from commit ede89af) (cherry picked from commit e15de14) Co-authored-by: Gregory P. Smith <greg@krypto.org> Co-authored-by: Ned Deily <nad@python.org> * Python 3.8.17 * Post 3.8.17 * Updated CI to build 3.8.17 --------- Co-authored-by: Łukasz Langa <lukasz@langa.pl> Co-authored-by: Benjamin Peterson <benjamin@python.org> Co-authored-by: Ned Deily <nad@python.org> Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com> Co-authored-by: HARSHA VARDHAN <75431678+Thunder-007@users.noreply.github.com> Co-authored-by: Gregory P. Smith <greg@krypto.org> Co-authored-by: Jeremy Paige <ucodery@gmail.com> Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com> Co-authored-by: Steve Dower <steve.dower@python.org> Co-authored-by: Owain Davies <116417456+OTheDev@users.noreply.github.com> Co-authored-by: Éric <earaujo@caravan.coop> Co-authored-by: Oleg Iarygin <oleg@arhadthedev.net> Co-authored-by: Steve Dower <steve.dower@microsoft.com> Co-authored-by: Dong-hee Na <donghee.na@python.org> Co-authored-by: Blind4Basics <32236948+Blind4Basics@users.noreply.github.com> Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM> Co-authored-by: Pradyun Gedam <pradyunsg@gmail.com> Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-authored-by: Sam Carroll <70000253+samcarroll42@users.noreply.github.com> Co-authored-by: Ethan Furman <ethan@stoneleaf.us> Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com> Co-authored-by: Tian Gao <gaogaotiantian@hotmail.com> Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com> Co-authored-by: stratakis <cstratak@redhat.com> Co-authored-by: Illia Volochii <illia.volochii@gmail.com>

(python/cpython#102153)

…rlsplit https://docs.python.org/release/3.9.17/whatsnew/changelog.html#changelog > gh-102153: urllib.parse.urlsplit() now strips leading C0 control and space characters following the specification for URLs defined by WHATWG in response to CVE-2023-24329. Patch by Illia Volochii. python/cpython#102153

00399 # * pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508) `urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595. This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329). Backported to Python 2 from Python 3.12. Co-authored-by: Illia Volochii <illia.volochii@gmail.com> Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org> Co-authored-by: Lumir Balhar <lbalhar@redhat.com>

The latest release of Python 3.8 and 3.9 have been just released that contain the fix to a security vulnerability backported to those versions: python/cpython#102153 Release notes: * https://www.python.org/downloads/release/python-3817/ * https://www.python.org/downloads/release/python-3917/ The fix improved sanitizing of the URLs and until Python 3.10 and 3.11 get released, we need to add the sanitization ourselves to pass tests on all versions. In order to improve security of airflow users and make the tests work regardless whether the users have latest Python versions released, we add extra sanitisation step to the URL to apply the standard WHATWG specification. (cherry picked from commit 87c5c9fa629317090ce65ec4c686596a2c4cd148) GitOrigin-RevId: 5b41ed8209d965402c7f593afb85c1e13afeb23a

pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508) `urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595. This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329). Backported from Python 3.12 (cherry picked from commit f48a96a) Co-authored-by: Illia Volochii <illia.volochii@gmail.com> Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>

AdrianBunk added the type-bug An unexpected behavior, bug, or error label Feb 22, 2023

hugovk added the type-security A security issue label Feb 22, 2023

gpshead self-assigned this Feb 24, 2023

gpshead mentioned this issue Feb 27, 2023

Backport CVE-2023-24329 to all in-service releases: urlparse does not correctly handle schemes that begin with ASCII digits, '+', '-', and '.' characters #102293

Closed

gpshead changed the title ~~Is CVE-2023-24329 still unfixed in 3.11.2?~~ urllib.parse CVE-2023-24329 appears unfixed Mar 1, 2023

mcepl mentioned this issue Mar 2, 2023

test_no_scheme2 fails with Python 3.11 aio-libs/yarl#803

Closed

1 task

bedevere-bot mentioned this issue Mar 6, 2023

gh-102153: fix CVE-2023-24329 #102470

Closed

illia-v added a commit to illia-v/cpython that referenced this issue Mar 7, 2023

pythongh-102153: Start stripping C0 control and space chars in `urlsp…

5e67815

…lit`

bedevere-bot mentioned this issue Mar 7, 2023

gh-102153: Start stripping C0 control and space chars in urlsplit #102508

Merged

illia-v added a commit to illia-v/cpython that referenced this issue Mar 8, 2023

Merge remote-tracking branch 'python/main' into pythongh-102153

d81766d

gpshead added the stdlib Python modules in the Lib dir label Mar 9, 2023

gpshead changed the title ~~urllib.parse CVE-2023-24329 appears unfixed~~ urllib.parse space handling CVE-2023-24329 appears unfixed Mar 9, 2023

potiuk mentioned this issue Jun 7, 2023

Fix failing get_safe_url tests for latest Python 3.8 and 3.9 apache/airflow#31766

Merged

MeggyCal added a commit to MeggyCal/bleach that referenced this issue Jun 29, 2023

spaces at the beginning are being trimmed now

749c169

(python/cpython#102153)

This was referenced Jun 29, 2023

spaces at the beginning are being trimmed now (change with python 3.10.12) mozilla/bleach#706

Closed

bug: using OpenSUSE and Fedora packages which change the Bleach code, parse_shim tests fail with Python 3.10.12 mozilla/bleach#707

Closed

jwhitlock mentioned this issue Oct 16, 2023

Update to Python 3.10.13 mozilla/fx-private-relay#4010

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

urllib.parse space handling CVE-2023-24329 appears unfixed #102153

urllib.parse space handling CVE-2023-24329 appears unfixed #102153

AdrianBunk commented Feb 22, 2023 •

edited by bedevere-bot

ned-deily commented Feb 23, 2023

pablogsal commented Feb 24, 2023

AdrianBunk commented Feb 24, 2023

pablogsal commented Feb 24, 2023

arhadthedev commented Feb 27, 2023 •

edited by AlexWaygood

CharlieZhao95 commented Feb 28, 2023

RSAlderman commented Feb 28, 2023

CharlesBryant-G commented Feb 28, 2023

gpshead commented Mar 1, 2023

xiaoge1001 commented Mar 6, 2023 •

edited

xiaoge1001 commented Mar 6, 2023 •

edited

xiaoge1001 commented Mar 7, 2023

xiaoge1001 commented Mar 7, 2023 •

edited

illia-v commented Mar 7, 2023

xiaoge1001 commented Mar 7, 2023 •

edited

gpshead commented Mar 9, 2023

xiaoge1001 commented Mar 9, 2023 •

edited

AdrianBunk commented Mar 9, 2023

urllib.parse space handling CVE-2023-24329 appears unfixed #102153

urllib.parse space handling CVE-2023-24329 appears unfixed #102153

Comments

AdrianBunk commented Feb 22, 2023 • edited by bedevere-bot

Linked PRs

ned-deily commented Feb 23, 2023

pablogsal commented Feb 24, 2023

AdrianBunk commented Feb 24, 2023

pablogsal commented Feb 24, 2023

arhadthedev commented Feb 27, 2023 • edited by AlexWaygood

CharlieZhao95 commented Feb 28, 2023

RSAlderman commented Feb 28, 2023

CharlesBryant-G commented Feb 28, 2023

gpshead commented Mar 1, 2023

xiaoge1001 commented Mar 6, 2023 • edited

xiaoge1001 commented Mar 6, 2023 • edited

xiaoge1001 commented Mar 7, 2023

xiaoge1001 commented Mar 7, 2023 • edited

illia-v commented Mar 7, 2023

xiaoge1001 commented Mar 7, 2023 • edited

gpshead commented Mar 9, 2023

xiaoge1001 commented Mar 9, 2023 • edited

AdrianBunk commented Mar 9, 2023

AdrianBunk commented Feb 22, 2023 •

edited by bedevere-bot

arhadthedev commented Feb 27, 2023 •

edited by AlexWaygood

xiaoge1001 commented Mar 6, 2023 •

edited

xiaoge1001 commented Mar 6, 2023 •

edited

xiaoge1001 commented Mar 7, 2023 •

edited

xiaoge1001 commented Mar 7, 2023 •

edited

xiaoge1001 commented Mar 9, 2023 •

edited