Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-39503: CVE-2020-8492: Fix AbstractBasicAuthHandler #18284

Merged
merged 2 commits into from Apr 2, 2020
Merged

bpo-39503: CVE-2020-8492: Fix AbstractBasicAuthHandler #18284

merged 2 commits into from Apr 2, 2020

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Jan 30, 2020

The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking.

Vulnerability reported by Matt Schwager.

https://bugs.python.org/issue39503

@vstinner
Copy link
Member Author

cc @serhiy-storchaka

@mschwager
Copy link

This fix looks good to me!

@@ -937,7 +937,7 @@ class AbstractBasicAuthHandler:

# allow for double- and single-quoted realm values
# (single quotes are a violation of the RFC, but appear in the wild)
rx = re.compile('(?:.*,)*[ \t]*([^ \t]+)[ \t]+'
rx = re.compile('(?:[^,]*,)*[ \t]*([^ \t]+)[ \t]+'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(?:.*,)* is equivalent to (?:.*,)?.

But since this regular expresion is only used with search(). (?:.*,)*[ \t]* can be removed at all.

I'll analyze whether it is correct or there is an error in the regular expression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I am cannot say that I completely understand the code, but to give it some sense we can either

  1. Replace rx.search() with rx.match() and replace (?:.*,)* with (?:.*,)?.

or

  1. Keep rx.search() and replace (?:.*,)* with (?:^|,).

Do not keep (?:[^,]*,)*. It is a waster of resources.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humm, options 1 and 2 are not equivalent if the field value contains more than one challenge. Option 2 is closer to the current behavior. But correct support of more than one challenge need rewriting the code.

https://tools.ietf.org/html/rfc7235#section-4.1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current patch seems to give an O(n^3) time to evaluate - much better than O(2^n), but still very slow - with 2000 commas it takes about a minute to evalute. With 65000 it takes much, much longer. Testing using the code from here gave the following (commas, seconds) values:
[(100, 0.124), (250, 0.261), (500, 0.923), (750, 2.85), (1000, 6.433), (1250, 12.608), (1500, 21.576), (2000, 50.751)]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong, the first option is equivalent to the current behavior (returns the last realm).

@serhiy-storchaka
Copy link
Member

Would be nice to add a test.

@mschwager
Copy link

Just a heads up, CVE-2020-8492 has been created. I'm not sure how Python CVEs are generally tracked, but it may be useful to include the information on the bug tracker issue 👍

@vstinner
Copy link
Member Author

vstinner commented Feb 3, 2020

Not only (?:.*,)* is inefficient, but it's also useless. It can be removed. Simplified example:

# reference
>>> all(re.search("(?:a,)*b", text) for text in ("a,b", "a,a,b", "b"))
True
# only match last ","
>>> all(re.search("(?:,)?b", text) for text in ("a,b", "a,a,b", "b"))
True
# don't match the prefix
>>> all(re.search("b", text) for text in ("a,b", "a,a,b", "b"))
True

We can either simplify the regex to prevent the "catastrophic backtracking" or even remove the prefix.

UPDATE: Oops, my example was wrong, I fixed it :-)

@bcaller
Copy link
Contributor

bcaller commented Feb 4, 2020

Does this also fix https://bugs.python.org/issue38826 ?

@encukou
Copy link
Member

encukou commented Mar 24, 2020

Not only (?:.*,)* is inefficient, but it's also useless. It can be removed.

It seems to me that the (?:.*,)* is there so that the last realm is selected, as mentioned in the comment above the regex. See:

>>> header = 'basic realm="1", x, other realm="2"'
>>> re.search("(?:.*,)*[ \t]*([^ \t]+)[ \t]+", header).group(1)
'other'
>>> re.search("[ \t]*([^ \t]+)[ \t]+", header).group(1)
'basic'
>>> 

I don't see a way to fix this by just changing the regex while preserving the previous behavior. Then again, corner cases of the previous behavior might be wrong.

@vstinner vstinner changed the title bpo-39503: Fix urllib basic auth regex bpo-39503: CVE-2020-8492: Fix urllib basic auth regex Mar 25, 2020
@vstinner
Copy link
Member Author

I rebased my PR and added more tests.

@vstinner
Copy link
Member Author

@serhiy-storchaka: I don't understand if you consider that the fix is wrong or that the fix is not enough (it remains possible to create a denial of service)?

@serhiy-storchaka
Copy link
Member

@vstinner your fix helps, but we can do better. It has cubic complexity, my suggestion has quadratic complexity. It is possible to implement an algorithm with linear complexity, but not with such small changes.

@davidfraser
Copy link

I also added some tests and implemented a simpler complexity regex - see master...davidfraser:urllib_basic_auth_regex

@davidfraser
Copy link

It's worth seeing how the results of this regex are actually used

Note my comment in f79379c:

Note that the original regex was roughly O(2**n)
The search for commas and spaces is unnecessary
(and insufficient to ensure that this starts a new scheme).
Replace with a simpler search for an initial scheme, since
we already check that the text starts with 'basic'.

@vstinner
Copy link
Member Author

WWW-Authenticate is badly specified. The RFC doesn't specify if a single HTTP header can contain multiple challenges.

I found these resources:

A variant is to have multiple WWW-Authenticate: one challenge per WWW-Authenticate header.

By the way, AbstractBasicAuthHandler code contains this interesting comment:

        # XXX could be multiple headers
        authreq = headers.get(authreq, None)

Current behavior:

  • Even if there are multiple WWW-Authenticate headers, only parse the first header. That's a bug: the Basic challenge may be in a following WWW-Authenticate header. Moreover, there may be two Basic challenges with two different realm.

  • scheme = str.split()[0] parses the scheme, if scheme.lower() != "basic": raise a ValueError.

  • Use the regex to parse the realm.

  • If the header contains multiple realm=xxx: use the last realm, even if it belongs to another challenge using a different scheme. IMO it's a bug: we should not check the scheme at the beginning of the header and use the last realm at the end of the string.

For example, WWW-Authenticate: Basic realm="ACME Widget Store", Digest realm="other realm" header is accepted since it starts with Basic, but the extracted realm is other realm: the wrong realm is used.

@vstinner
Copy link
Member Author

@serhiy-storchaka:

  1. Keep rx.search() and replace (?:.*,)* with (?:^|,).

Sorry, I misunderstood this proposition. In fact, I proposed something similar except that I missed the "start of the string" (regex ^) case. I modified my PR to use this PR. I also added comments to the regex to explain it.

I decided to write a way more complex change to not only fix the vulnerability, but also fix the parser since it didn't look possible to fix the regex without changing the behavior. Currently, the code uses the last realm if there are multiple challenges per header. I fixed this behavior to use the realm of the first Basic challenge.

I also modified the code to support multiple headers, except of only parsing the first one.

@vstinner vstinner changed the title bpo-39503: CVE-2020-8492: Fix urllib basic auth regex bpo-39503: CVE-2020-8492: Fix urllib AbstractBasicAuthHandler Mar 25, 2020
@vstinner vstinner changed the title bpo-39503: CVE-2020-8492: Fix urllib AbstractBasicAuthHandler bpo-39503: CVE-2020-8492: Fix AbstractBasicAuthHandler Mar 25, 2020
@vstinner
Copy link
Member Author

Ok, the PR is now ready for a new round of reviews. I fixed the vulnerability but I also changed the code to parse all WWW-Authenticate HTTP Headers and accept multiple challenges per header.

The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.

Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
@vstinner
Copy link
Member Author

Oh, Ben Caller reported the issue at 2019-11-17: https://bugs.python.org/issue38826

@vstinner
Copy link
Member Author

cc @orsenthil

@orsenthil
Copy link
Member

I also modified the code to support multiple headers, except of only parsing the first one.

Thanks, @vstinner - This was a good detection and change.

I am reviewing the patch further.

basic = f'Basic realm="{realm}"'
basic2 = f'Basic realm="{realm2}"'
other_no_realm = 'Otherscheme xxx'
digest = (f'Digest realm="{realm2}", '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the motivation of adding a digest test case here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to write a more realistic test than "Otherscheme xxx". The Digest challenge uses "realm" and the test ensures that it's skipped. It uses use multiple fields separated by commas, some use quotes: test that the parser handles that properly.

I picked the example from Wikipedia :-) https://fr.wikipedia.org/wiki/Authentification_HTTP#Demande_d'identification

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect. sounds good to me. :)

Copy link
Member

@orsenthil orsenthil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vstinner

Thanks for the patch and the change. The entire addition is meaningful and looks good to me.

  • The change of regex only the security issue keeps the patch simple.

I only had a question in the tests, but it is not a critical question. I thought, I am missing something by adding of "digest" test case in line 1469, when the multiple headers challenge is covered again in the same test case at line 1499.

LGTM.

@bedevere-bot bedevere-bot removed the needs backport to 3.8 only security fixes label Apr 2, 2020
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Apr 2, 2020
The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.

Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 0b297d4)

Co-authored-by: Victor Stinner <vstinner@python.org>
@bedevere-bot
Copy link

GH-19292 is a backport of this pull request to the 3.7 branch.

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot s390x SLES 3.x has failed when building commit 0b297d4.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/6/builds/675) and take a look at the build logs.
  4. Check if the failure is related to this commit (0b297d4) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/6/builds/675

Failed tests:

  • test_imaplib

Failed subtests:

  • test_logout - test.test_imaplib.RemoteIMAP_STARTTLSTest

Summary of the results of the build (if available):

== Tests result: FAILURE then FAILURE ==

404 tests OK.

10 slowest tests:

  • test_concurrent_futures: 3 min 8 sec
  • test_multiprocessing_spawn: 2 min 44 sec
  • test_tokenize: 1 min 47 sec
  • test_multiprocessing_forkserver: 1 min 39 sec
  • test_unparse: 1 min 26 sec
  • test_multiprocessing_fork: 1 min 24 sec
  • test_capi: 1 min 21 sec
  • test_asyncio: 1 min 1 sec
  • test_lib2to3: 56.4 sec
  • test_signal: 51.2 sec

1 test failed:
test_imaplib

15 tests skipped:
test_devpoll test_ioctl test_kqueue test_msilib test_ossaudiodev
test_readline test_sqlite test_startfile test_tix test_tk
test_ttk_guionly test_winconsoleio test_winreg test_winsound
test_zipfile64

1 re-run test:
test_imaplib

Total duration: 7 min 14 sec

Click to see traceback logs
Traceback (most recent call last):
  File "/home/dje/cpython-buildarea/3.x.edelsohn-sles-z/build/Lib/imaplib.py", line 989, in _command
    self.send(data + CRLF)
  File "/home/dje/cpython-buildarea/3.x.edelsohn-sles-z/build/Lib/imaplib.py", line 331, in send
    self.sock.sendall(data)
  File "/home/dje/cpython-buildarea/3.x.edelsohn-sles-z/build/Lib/ssl.py", line 1204, in sendall
    v = self.send(byte_view[count:])
  File "/home/dje/cpython-buildarea/3.x.edelsohn-sles-z/build/Lib/ssl.py", line 1173, in send
    return self._sslobj.write(data)
BrokenPipeError: [Errno 32] Broken pipe


Traceback (most recent call last):
  File "/home/dje/cpython-buildarea/3.x.edelsohn-sles-z/build/Lib/test/test_imaplib.py", line 951, in tearDown
    self.server.logout()
  File "/home/dje/cpython-buildarea/3.x.edelsohn-sles-z/build/Lib/imaplib.py", line 641, in logout
    typ, dat = self._simple_command('LOGOUT')
  File "/home/dje/cpython-buildarea/3.x.edelsohn-sles-z/build/Lib/imaplib.py", line 1213, in _simple_command
    return self._command_complete(name, self._command(name, *args))
  File "/home/dje/cpython-buildarea/3.x.edelsohn-sles-z/build/Lib/imaplib.py", line 991, in _command
    raise self.abort('socket error: %s' % val)
imaplib.IMAP4.abort: socket error: [Errno 32] Broken pipe

@miss-islington
Copy link
Contributor

Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.8.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.7.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Apr 2, 2020
The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.

Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 0b297d4)

Co-authored-by: Victor Stinner <vstinner@python.org>
@bedevere-bot
Copy link

GH-19296 is a backport of this pull request to the 3.8 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.8 only security fixes label Apr 2, 2020
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Apr 2, 2020
The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.

Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 0b297d4)

Co-authored-by: Victor Stinner <vstinner@python.org>
@bedevere-bot
Copy link

GH-19297 is a backport of this pull request to the 3.7 branch.

vstinner pushed a commit that referenced this pull request Apr 2, 2020
…-19296)

The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.

Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>

(cherry picked from commit 0b297d4)
vstinner pushed a commit that referenced this pull request Apr 2, 2020
…-19297)

The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.

Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>

(cherry picked from commit 0b297d4)
ned-deily pushed a commit that referenced this pull request Apr 3, 2020
…-19304)

The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.

Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 0b297d4)
larryhastings pushed a commit that referenced this pull request Jun 20, 2020
…9305)

The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking. Vulnerability reported by Ben Caller
and Matt Schwager.

AbstractBasicAuthHandler of urllib.request now parses all
WWW-Authenticate HTTP headers and accepts multiple challenges per
header: use the realm of the first Basic challenge.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants