Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CookieMiddleware should decode() using latin-1. #1450

Closed
vaisaghvt opened this issue May 12, 2020 · 8 comments
Closed

CookieMiddleware should decode() using latin-1. #1450

vaisaghvt opened this issue May 12, 2020 · 8 comments

Comments

@vaisaghvt
Copy link

vaisaghvt commented May 12, 2020

We have been periodically getting this error stack trace in our logs:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 745: ordinal not in range(128)
  File "/usr/local/lib/python3.7/site-packages/twisted/internet/defer.py", line 151, in maybeDeferred
  File "/usr/local/lib/python3.7/site-packages/daphne/server.py", line 201, in create_application
  File "/usr/local/lib/python3.7/site-packages/channels/routing.py", line 54, in __call__
  File "/usr/local/lib/python3.7/site-packages/channels/security/websocket.py", line 37, in __call__
  File "/usr/local/lib/python3.7/site-packages/channels/sessions.py", line 41, in __call__

When investigating, I found out that the issue is at this line:

cookies = parse_cookie(value.decode("ascii"))

when there is a cookie that has a non-ascii value.

Is there a reason that it is assumed that the cookie value will not have non-ascii values? If so, what would be the work around?

If not, is there an issue with changing this?

For example, would it be safe to do this:
cookies = parse_cookie(value.decode("ascii", errors='ignore'))

@hwooson
Copy link

hwooson commented Jun 8, 2020

I have the same issue using the version of 2.3.0.

@carltongibson
Copy link
Member

As far as I can tell cookies are meant to only contain US-ASCII characters...

e.g. https://tools.ietf.org/id/draft-ietf-httpstate-cookie-23.html#sane-set-cookie-syntax

cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
                      ; US-ASCII characters excluding CTLs,
                      ; whitespace DQUOTE, comma, semicolon,
                      ; and backslash

Searching doesn't point to anything particularly definitive.

Will close: happy to re-open if more info comes up.

@carltongibson
Copy link
Member

Some historical discussion on Gunicorn here: benoitc/gunicorn#1778

Eventually looks like they eased to latin-1. https://github.com/benoitc/gunicorn/pull/1914/files

Maybe we should do that.

@carltongibson carltongibson reopened this Jul 13, 2020
@carltongibson
Copy link
Member

PEP 333:

Note also that strings passed to start_response() as a status or as response headers must follow RFC 2616 with respect to encoding. That is, they must either be ISO-8859-1 characters, or use RFC 2047 MIME encoding.

So it looks like latin-1 is correct, even if folks SHOULD use ascii.

@carltongibson carltongibson changed the title CookieMiddleware causes websockets crash when cookie value contains a non-ascii value CookieMiddleware should decode() using latin-1. Jul 13, 2020
@carltongibson
Copy link
Member

Todo:

@Dhruv-Sachdev1313
Copy link

@carltongibson Hey is this issue still pending ?

@carltongibson
Copy link
Member

Hi @Dhruv-Sachdev1313 — looks like it's handled in #1494. If you want to review that it would be great.

@Dhruv-Sachdev1313
Copy link

Okay @carltongibson . Thanks !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants