Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wsgi.py > send_headers: encoding problem. #1353

Closed
rsm-gh opened this issue Sep 21, 2016 · 6 comments
Closed

wsgi.py > send_headers: encoding problem. #1353

rsm-gh opened this issue Sep 21, 2016 · 6 comments

Comments

@rsm-gh
Copy link

rsm-gh commented Sep 21, 2016

While serving a file with django and nginx I got an encoding error coming from gunicorn. The problem is that the send_headers method tries to convert the header_str in to ascii while sometimes it is necessary to use utf-8 for serving paths with special characters.

    def send_headers(self):
        if self.headers_sent:
            return
        tosend = self.default_headers()
        tosend.extend(["%s: %s\r\n" % (k, v) for k, v in self.headers])

        header_str = "%s\r\n" % "".join(tosend)
        util.write(self.sock, util.to_bytestring(header_str, "ascii"))
        self.headers_sent = True

I just changed ascii in to utf-8 to fix my problem, but maybe there should be some setting to change this.

And in case that you be curious of how I was using the Response object, I got it from an HTTPResponse from django:


def return_file_nginx(harddrive_path):

    filename=os.path.basename(harddrive_path)

    if os.path.exists(harddrive_path):

        nginx_path = harddrive_path.split(SERVER_PATHS.main,1)[1]

        response = HttpResponse()
        response['Content-Length'] = os.path.getsize(harddrive_path)
        response['Content-Disposition'] = 'attachment; filename="{}"'.format(filename)
        response['X-Accel-Redirect'] = nginx_path
        return response

    return default_404()
@jamadden
Copy link
Collaborator

This is a bit of a grey area, but one thing is clear: send_headers could be using latin-1, instead of ascii, but never utf-8. Latin-1 encoding is specified by all the relevant standards (and ASCII is a subset of latin-1). There was a change from utf-8 to latin-1 in #1102.

Now, 5f4ebd2 changed latin-1 to ascii. The referenced standards do suggest that headers should be in ASCII. gunicorn is being particularly strict here, and it could be argued that it should go back to latin-1 encoding, but that raises other potential interoperability concerns. (Do your applications and clients work successfully with latin-1?)

This could be a Python 2/Python 3 thing, since the version isn't specified. If this code is running on Python 2, the bug is in the user code that sets the header, because the header should already be a bytes/str object, not unicode.

@rsm-gh
Copy link
Author

rsm-gh commented Sep 21, 2016

I'm using python 3.

All this seems kinda problematic and I guess that the proper solution would be then that I use only ascii characters for the paths files. Sadly I can not ensure that.

I'll keep using utf-8 until I get an error.

        try:
            header_bytes=util.to_bytestring(header_str, "ascii")
        except:
            header_bytes=util.to_bytestring(header_str, "utf-8")

        util.write(self.sock, header_bytes)

For the moment everything seems to work good and it fixes a critical bug since most of the filenames have special characters!

(I'll maybe switch to latin-1 but since everything is in utf-8 it may bug again..)

@tilgovi
Copy link
Collaborator

tilgovi commented Dec 21, 2016

I'm going to close this as it's well documented around the web that HTTP headers should be ASCII, unfortunately. @rsm-gh you may wish to change your application to add the information you need to the response body, rather than the headers, if possible.

@thaxy
Copy link

thaxy commented Nov 29, 2017

Because this is Django code I am linking a good solution to address this issue:
http://source.mihelac.org/2011/02/6/rename-uploaded-files-ascii-character-set-django/

@Friday21
Copy link

Friday21 commented Jan 11, 2018

if your problem is downloading file with filename in unicode like me, you can try this:

import urllib  
filename = urllib.parse.quote(filename)    # in python2, it is urllib.quote  
response['Content-Disposition'] = 'attachment; filename="{}"'.format(filename)

@gdub
Copy link

gdub commented Feb 5, 2018

Seems the proper way to do this would be using filename* (and optionally still a filename for older browsers). See RFC 6266 examples. See also some test cases to check browser support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants