Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV_EXPORT.encoding config result in gunicorn exception #5377

Closed
georgexsh opened this issue Jul 11, 2018 · 3 comments
Closed

CSV_EXPORT.encoding config result in gunicorn exception #5377

georgexsh opened this issue Jul 11, 2018 · 3 comments
Labels
inactive Inactive for >= 30 days

Comments

@georgexsh
Copy link

georgexsh commented Jul 11, 2018

Superset version

Superset 0.26.3
Python 3.5.3 (via docker)

Steps to reproduce

set config:

CSV_EXPORT = {
    'encoding': 'utf-8-sig',
}

click "export csv" of a chart in dashboard page, request /superset/explore_json/?form_data=%7B%22slice_id%22%3A16%7D&csv=true, got an error message:

[ERROR] Error handling request /superset/explore_json/?form_data=%7B%22slice_id%22%3A18%7D&csv
=true
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/sync.py", line 135, in handle
    self.handle_request(listener, req, client, addr)
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/workers/sync.py", line 182, in handle_request
    resp.write(item)
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/http/wsgi.py", line 333, in write
    self.send_headers()
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/http/wsgi.py", line 329, in send_headers
    util.write(self.sock, util.to_bytestring(header_str, "ascii"))
  File "/usr/local/lib/python3.5/dist-packages/gunicorn/util.py", line 507, in to_bytestring
    return value.encode(encoding)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 253-255: ordinal not in range(128)

Root Cause:

dumped headers data result in the exception:

In [7]: dumped.splitlines()
Out[7]:
['HTTP/1.0 200 OK',
...
 'Set-Cookie: session="\\357\\273\\277.eJyNkc1OxSAQRt-FdVOwf9AmxgcxhgwwSCO9VKC3uRrfXXpdmLixy5l853yTzCeRNmJyZLLgE1ZEzoZMBDU3nJtxsGMnmqYb9NiPQgndCeiUaklFdIpW5vCGl5IX0A-NGkXPdCvQ9NhyzsxgkSnomdEWHyyzRhfOBw0eC_PhyrTCK0o3pxzijUzPxOW8ponStK0YE-Z6vuTaQdi3Rdc6LNRABgUJrzPuNLmw0-OYf7EtlT31pYieiSe4onnfMN7uPXfuSVqfJZOH6pGfsvhZ4xIM-l_LGc5AcipANH_Zl4oc7T8_4uTrGwlkmwA.DideRw.-PntEeJCZLJ5k6Y4toho_K6l9ss"; HttpOnly; Path=%EF%BB%BF/?%EF%BB%BF#%EF%BB%BF',
]

notice the strange characters  in the value of Set-Cookie header, they are b'\xef\xbb\xbf' in hex, aka UTF-8 BOM, they are the culprit of this exception.

flask/werkzeug will encode header value with Response.charset, superset has ovrridden charset with the value of CSV_EXPORT.encoding, in PR #3484, but csv encoding is not always http header safe, like utf-8-sig, adds extra BOM bytes that gunicorn dont understand.

related with #5372.

IMHO superset should encode csv data by itself, simplely return encoded binary response, do not rely on Response.charset, which has side effect that will mess up with 8 bits http header value.

--- a/superset/viz.py
+++ b/superset/viz.py
@@ -467,7 +467,9 @@ class BaseViz(object):
     def get_csv(self):
         df = self.get_df()
         include_index = not isinstance(df.index, pd.RangeIndex)
+        r = df.to_csv(index=include_index, **config.get('CSV_EXPORT'))
+        return r.encode(config.get('CSV_EXPORT')['encoding'])

     def get_data(self, df):
         return []
@georgexsh
Copy link
Author

with #5377 and #5372, I think superset is still at an early stage in supporting Python3 and non-English language environment, especially on CSV exporting, this feature needs a comprehensive refactor.
as a python3 and Chinese user, I would like to contribute some time to it, but I want to hear thoughts from maintainers and the community first.

@stale
Copy link

stale bot commented Apr 10, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale bot added the inactive Inactive for >= 30 days label Apr 10, 2019
@stale stale bot closed this as completed Apr 17, 2019
@Yiutto
Copy link

Yiutto commented Oct 23, 2019

This is mainly because the cause of the Chinese code, if you are in the download CSV file these mistakes, you can modify the superset of the source code: “site-packages/superset/views/core.py”
f"attachment: filename={query.name}.csv" Modified to attachment: filename=%s.csv" % (quote(query.name))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inactive Inactive for >= 30 days
Projects
None yet
Development

No branches or pull requests

2 participants