Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove some false positives from basic auth #123

Merged
merged 2 commits into from
Feb 7, 2019

Conversation

killuazhu
Copy link
Contributor

Exclude single and double quotes in matching. See added test case for details.

@@ -5,7 +5,7 @@
from .base import RegexBasedDetector


SPECIAL_URL_CHARACTERS = ':/?#[]@'
SPECIAL_URL_CHARACTERS = ':/?#[]@"\''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These SPECIAL_URL_CHARACTERS are actually more than a blacklist of characters: they are based off RFC 3986 (Section 2.2 Reserved Characters) and inspired by "The Tangled Web".

Cross-referencing other sources of information, this nice overview, seems to suggest that the semi-colon has been left out from this set of reserved characters, perhaps due to compatibility reasons.

Perhaps a better way is to reclassify this is renaming this RESERVED_CHARACTERS, and adding a new set of SUB_DELIMITER_CHARACTERS, where we can add a variety of other known delimiters (including quote marks)? After all, Basic Auth credentials should not exist in the query string, and limiting this would both help with performance and lowering false positives.

And for bonus points, please add a comment to why these URL characters were chosen, for posterity and better maintenance down the road!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the very informative background. Did a quick glance of The Tangled Web, looks like a really good book.

The false positive we found in some code and want to avoid is "https://url:8000";@something else'. We've further expanded the format to replace double quote with the single quote. As you pointed out, if we add the semicolon to the special URL character list, it would also serve the purpose. My original implementation of adding quotes is not ideal.

I did read the compatibility reason link. If I understand correctly, it says for some CGI implementor, instead of writing http://host/?x=1&y=2, you can write http://host/?x=1;y=2. It's mainly for separating the URL parameters. For our case, we are trying to capture the use of base authentication. Is there any legit use case that a semicolon would appear in the username and password field before @ sign? If not, I think we can add the semicolon back to the list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that I don't think there's a legitimate use case that a semi-colon would appear in the username/password field, as it's uncertain how it will be parsed as a URL.

What I'm suggesting is something like:

RESERVED_CHARACTERS = ':/?#[]@'
SUB_DELIMITER_CHARACTERS = '!$&\';'  # and anything else we might need

Then, we can modify the blacklist to be:

blacklist = [
    re.compile(
        r'://[^{}\s]+:([^{}\s]+)@'.format(
            re.escape(RESERVED_CHARACTERS + SUB_DELIMITER_CHARACTERS),
            re.escape(RESERVED_CHARACTERS + SUB_DELIMITER_CHARACTERS),
    )
]

If this fits your purpose, I think it could both reduce false positives and improve regex performance seeing that there's a tighter set of constraints, allowing the regex engine to fail quicker and require less backtracking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works for our case well. I've rebased master and made the change you suggested. @domanchi

XIANJUN ZHU added 2 commits February 7, 2019 17:42
@killuazhu killuazhu force-pushed the contribute-more-basic-auth-cases branch from e3441a7 to 3e179cd Compare February 7, 2019 22:44
@domanchi domanchi merged commit 8a14e43 into Yelp:master Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants