New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CherryPy will allways decode basic authentication information with ISO-8859-1 #1680

Closed
PJaros opened this Issue Dec 29, 2017 · 17 comments

Comments

Projects
None yet
3 participants
@PJaros

PJaros commented Dec 29, 2017

I have submitted the observed behavior on stackoverflow.

import cherrypy

class SimpleWebpage(object):
    @cherrypy.expose
    def index(self):
        return "<html><head></head><body>Authenticated</body></html>"

def hexcode(s):
    return ' '.join(hex(ord(x))[2:] for x in s)

def test_hexcode(realm, username, password):
    username_hexcode = hexcode(username)
    password_hexcode = hexcode(password)
    print(f"realm: {realm!r}, username: {username!r}-{username_hexcode!r}, "
          f"password: {password!r}-{password_hexcode!r}")
    return False

cherrypy.tree.mount(SimpleWebpage(), '/',
                    {'/': {'tools.auth_basic.checkpassword': test_hexcode,
                           'tools.auth_basic.on': True,
                           'tools.auth_basic.realm': 'MY_REALM',}})

cherrypy.config.update({'tools.sessions.on': True,})
cherrypy.server.socket_host = '0.0.0.0'
cherrypy.engine.autoreload.unsubscribe()
cherrypy.engine.start()
cherrypy.engine.block()

Issuing
curl -u 'curl:€öäü' -i -X GET http://10.25.5.17:8080/
will print
realm: 'MY_REALM', username: 'curl'-'63 75 72 6c', password: 'â\x82¬Ã¶Ã¤Ã¼'-'e2 82 ac c3 b6 c3 a4 c3 bc'
on the python console.

RFC-7617 explains how encoding should be handled concerning basic authentication.

Currently CherryPy doesn't offer a way to indicate what encoding scheme it wants and how it decodes it. Right now browsers will use different charsets:

On Redhat 6

  • curl 7.19.7 (x86_64-redhat-linux-gnu) sends UTF-8

On Windows 10 Version 1709

  • Firefox 57.0.2 (32-Bit) sends (probably) ISO-8859-1
  • curl 7.56.1 (i686-pc-cygwin) sends UTF-8
  • Google-Chrome 63.0.3239.108 (64-Bit) sends UTF-8
  • Internet Explorer 11.786.15063.0 sends (probably) ISO-8859-1
  • Edge 40.15063.674.0 sends (probably) ISO-8859-1

A possible solution would be if CherryPy would send WWW-Authenticate: Basic realm="foo", charset="UTF-8" and decode the authentication string as utf-8 instead of ISO-8851-1. But I'm sure that some other things might need to be considered. And I haven't tested if Firefox will switch to UTF-8 with this header.

@webknjaz webknjaz added the bug label Dec 29, 2017

@webknjaz

This comment has been minimized.

Member

webknjaz commented Dec 29, 2017

@PJaros thanks for the detailed analysis. This needs some investigation, but I'd guess that the solution might need to be submitted to cheroot (just a note for now).

@PJaros

This comment has been minimized.

PJaros commented Jan 3, 2018

I've been trying to create a PR. I've was able to pin-point where in cherrypy to make that happen but can't anticipate where this would fit in cheroot.

I'm halting my work on the PR because:

  • I can't bring Firefox to send UTF-8 no matter what www-authenticate I send (I.E .... , charset="UTF-8" is being ignored)
  • I receive slightly (apparently) bogus from IE 11 and Edge which thus won't decode easily. Only google-chrome seems to work fine right out of the box.

If you wish so I can go into more details about the PR and the findings with the above mentioned problems. But as of right now i'll just pause my efforts right here.

@webknjaz

This comment has been minimized.

Member

webknjaz commented Jan 3, 2018

It would be useful if you could point certain places in code, because I haven't found them yet.

@PJaros

This comment has been minimized.

PJaros commented Jan 3, 2018

Okay. I'll wrap-up my current findings tomorrow.

webknjaz added a commit that referenced this issue Jan 3, 2018

Decode user/pass using UTF-8 in HTTP Auth
This is a silly fix, which should evolve into one taking into
account RFC7617.

Ref #1680
@webknjaz

This comment has been minimized.

Member

webknjaz commented Jan 3, 2018

@PJaros I've crafted a silly hotfix in #1683, but it should be improved to support reading the charset param and I need to check digest auth behavior as well. So it's WIP now. Feel free to comment/add suggestions right there.

webknjaz added a commit that referenced this issue Jan 3, 2018

Decode user/pass using UTF-8 in HTTP Auth
This is a silly fix, which should evolve into one taking into
account RFC7617.

Ref #1680
@PJaros

This comment has been minimized.

PJaros commented Jan 4, 2018

I've wrapped it up with this Pull-Request here: #1684 which isn't complete either (it doesn't work with Python2.7).

I'll invest some time later to understand your #1683 and try improving my PR. This is my very first attempt dealing with all this so .. please bear with me :)

As a personal note to myself l'll try to move away from Basic Auth: The specs are pretty much broken by design. :P

@webknjaz

This comment has been minimized.

Member

webknjaz commented Jan 4, 2018

@PJaros it looks you're right: Firefox sends garbage in case of unicode creds :(
But at least, fallback in my PR would allow supporting working and semi-working clients.

@PJaros

This comment has been minimized.

PJaros commented Jan 5, 2018

@webknjaz

  1. You've never managed to bring Firefox to deliver UTF-8 creds? -> Please tell me how did that. I've never managed to bring any browser to switch charset.
  2. I've had problems in the past where some decoding methods will have trouble with the ':' in place. But I've run your code and this seems to be fine. I've create gist from that: https://gist.github.com/PJaros/b624e1b600d197c227b132f3c34ba3db
  3. The € (Euro) Sign isn't part of "ISO-8859-1" it was introduced with "ISO-8859-15".
@webknjaz

This comment has been minimized.

Member

webknjaz commented Jan 5, 2018

  1. I did not force Firefox to convert anything, just opened a URL and entered creds. And it did not switch charset from what I can see: I've inspected network traffic using Wireshark.
  2. Regarding : - AFAIR it's encoded the same way in UTF-8 and ASCII

webknjaz added a commit that referenced this issue Jan 13, 2018

Decode user/pass using UTF-8 in HTTP Auth
This is a silly fix, which should evolve into one taking into
account RFC7617.

Ref #1680
@jaraco

This comment has been minimized.

Member

jaraco commented Jan 13, 2018

I've only quickly skimmed this issue, but I saw the references to Firefox and password encoding. In my past experience, I've discovered that Firefox does not support non-ascii characters in passwords for basic auth (underlying issue here). We ended up not supporting non-ascii passwords at our organization for this reason.

@PJaros

This comment has been minimized.

PJaros commented Feb 9, 2018

Is there something I can do to help to close this issue?

@webknjaz

This comment has been minimized.

Member

webknjaz commented Feb 9, 2018

@PJaros yea, enforce browsers to comply standards :trollface:

While working on it I've noticed tool duplication (#1688), so currently we're finally deprecating outdated one (#1689).
On the other hand this means that we need this patch in other tool + need to merge functionality, which became diverse + migrate tests to cover that newer tool and cover this unicode case. So its kinda WIP for me. I've got it in some incomplete state on my laptop, if you'd like to help with this I can push latest changes to remote.

@webknjaz

This comment has been minimized.

Member

webknjaz commented Feb 9, 2018

(okay, it's probably all on github already)

@PJaros

This comment has been minimized.

PJaros commented Apr 16, 2018

I'm getting pressure from my supriors to fix the issue... please. I understand that you wish to tiedy up your code first. But come on. I'm going to be forced to switch to a different framework and rewrite my code because of this issue :-(

@webknjaz

This comment has been minimized.

Member

webknjaz commented Apr 21, 2018

webknjaz added a commit that referenced this issue Apr 21, 2018

Decode user/pass using UTF-8 in HTTP Auth
This is a silly fix, which should evolve into one taking into
account RFC7617.

Ref #1680

webknjaz added a commit that referenced this issue Apr 22, 2018

Decode user/pass using UTF-8 in HTTP Auth
This is a silly fix, which should evolve into one taking into
account RFC7617.

Ref #1680

webknjaz added a commit that referenced this issue Apr 22, 2018

Decode user/pass using UTF-8 in HTTP Auth
This is a silly fix, which should evolve into one taking into
account RFC7617.

Ref #1680

webknjaz added a commit that referenced this issue Apr 22, 2018

Decode user/pass using UTF-8 in HTTP Basic Auth
This is a silly fix, which should evolve into one taking into
account RFC7617.

Ref #1680
@webknjaz

This comment has been minimized.

Member

webknjaz commented Apr 22, 2018

@PJaros

This comment has been minimized.

PJaros commented May 1, 2018

I was able to test 14.2.0 7 days ago and everything works. Thanks for release this version. 👍

@webknjaz webknjaz referenced this issue Sep 4, 2018

Merged

Enhanced basic auth decode #3239

1 of 5 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment