-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csvw.dsv fails on Python 2 with encodings that are not 8bit-clean #5
Comments
Limiting the whole library to utf-8 would probably not be a good idea, I guess. Although recoding of files could easily be done elsewhere ... |
+1, AFAIR some versions of excel will produce/want |
xflr6
added a commit
that referenced
this issue
Jan 18, 2018
xflr6
added a commit
that referenced
this issue
Jan 18, 2018
xflr6
added a commit
that referenced
this issue
Jan 18, 2018
Test case is here: Lines 93 to 107 in 84b4628
|
xflr6
added a commit
that referenced
this issue
Jan 25, 2018
xflr6
added a commit
that referenced
this issue
Jan 26, 2018
xflr6
added a commit
that referenced
this issue
Jan 26, 2018
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
To allow arbitary encodings with Python 2
csv
, cells must be encoded intoutf-8
before writing so that the output of thecsv.writer
can then be recoded into the wanted target encoding. On reading, first recode intoutf-8
and then decode the cells fromcsv.reader
(currently, onlycsvw.dsv.UnicodeReader
but notUnicodeWriter
does this re-encoding suggested in the csv docs).As an optimization, the recoding can be skipped for 8bit-clean encodings, cf.
csvkit/agate
:https://github.com/wireservice/agate/blob/233afefbc7c0b25084666a2dd2b315b6359a128a/agate/csv_py2.py#L14-L17
The text was updated successfully, but these errors were encountered: