Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-99418: Make urllib.parse.urlparse enforce that a scheme must begin with an alphabetical ASCII character. #99421

Merged
merged 2 commits into from Nov 13, 2022

Conversation

kenballus
Copy link
Contributor

@kenballus kenballus commented Nov 12, 2022

urllib.parse.urlparse does not enforce that a scheme must begin with a character from [A-Za-z]. This patch adds a check to enforce that rule.

…that don't begin with an alphabetical ASCII character.
…that don't begin with an alphabetical ASCII character.
@gpshead gpshead added the needs backport to 3.11 only security fixes label Nov 13, 2022
@gpshead gpshead merged commit 439b9cf into python:main Nov 13, 2022
@miss-islington
Copy link
Contributor

Thanks @kenballus for the PR, and @gpshead for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖 I'm not a witch! I'm not a witch!

@bedevere-bot
Copy link

GH-99446 is a backport of this pull request to the 3.11 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Nov 13, 2022
… begin with an alphabetical ASCII character. (pythonGH-99421)

Prevent urllib.parse.urlparse from accepting schemes that don't begin with an alphabetical ASCII character.

RFC 3986 defines a scheme like this: `scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )`
RFC 2234 defines an ALPHA like this: `ALPHA = %x41-5A / %x61-7A`

The WHATWG URL spec defines a scheme like this:
`"A URL-scheme string must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.)."`
(cherry picked from commit 439b9cf)

Co-authored-by: Ben Kallus <49924171+kenballus@users.noreply.github.com>
@bedevere-bot bedevere-bot removed the needs backport to 3.11 only security fixes label Nov 13, 2022
miss-islington added a commit that referenced this pull request Nov 13, 2022
… with an alphabetical ASCII character. (GH-99421)

Prevent urllib.parse.urlparse from accepting schemes that don't begin with an alphabetical ASCII character.

RFC 3986 defines a scheme like this: `scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )`
RFC 2234 defines an ALPHA like this: `ALPHA = %x41-5A / %x61-7A`

The WHATWG URL spec defines a scheme like this:
`"A URL-scheme string must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.)."`
(cherry picked from commit 439b9cf)

Co-authored-by: Ben Kallus <49924171+kenballus@users.noreply.github.com>
@vstinner
Copy link
Member

vstinner commented Apr 5, 2023

CVE-2023-24329 was assigned to this issue.

@vstinner vstinner changed the title gh-99418: Make urllib.parse.urlparse enforce that a scheme must begin with an alphabetical ASCII character. [CVE-2023-24329] gh-99418: Make urllib.parse.urlparse enforce that a scheme must begin with an alphabetical ASCII character. Apr 5, 2023
@gpshead
Copy link
Member

gpshead commented Apr 27, 2023

CVE-2023-24329 was assigned to this issue.

That this PR does not fix that CVE. The CVE is inaccurate. See #102153.

@gpshead gpshead changed the title [CVE-2023-24329] gh-99418: Make urllib.parse.urlparse enforce that a scheme must begin with an alphabetical ASCII character. gh-99418: Make urllib.parse.urlparse enforce that a scheme must begin with an alphabetical ASCII character. Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants