New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No unicode values when an encoding is given. #18
Comments
Ah interesting. As a sanity check, I assume you're python 2.6 or 2.7? This is definitely the kind of thing that can get lost with supporting 2 and 3 in the same codebase, and wouldn't have had an obvious test before the conversion. Thanks for the report, I'd imagine this is important enough of an incompatibility to warrant a 5.0.1 release so I'll work on that today. |
Yes, I can confirm issue with a python 2.7.5 and 2.7.6.. Thanks for the very fast response to the issue... Brian |
Alright, taking a few initial cracks at this, and I think it's clear that we really didn't do a great job of handling conversion of unicode but I think this is a clear upgrade (at least in terms of test coverage) |
this can be closed pending #19 |
there was some discussion in #19 that centered around confirming what the docs say when you don't pass an encoding parameter. We've confirmed that on 4.7.2 not passing an encoding kept things a bytestring. We both didn't necessarily see how the code path let that happen, and there wasn't an explicit test, but we've added it as part of #19. example of 4.7.2: >>> c = configobj.ConfigObj('thing.cfg')
>>> c
ConfigObj({'butts': '1', 'herp': {'derp': 'burp'}})
>>> c = configobj.ConfigObj('thing.cfg', encoding='utf-8')
>>> c
ConfigObj({u'butts': u'1', u'herp': {u'derp': u'burp'}}) |
fixes #18 - correctly handling unicode conversion where appropriate
Hi, I think I'm hitting this issue again with 5.0.5 on Python 2.7, trying to get unicode representation of inifile passed in StringIO or a file descriptor:
$ /usr/bin/python2 repr.py $ /usr/bin/python2 -V Other possible ways of passing |
Ack, good catch! I've confirmed this on my box and am re-opening this issue. |
The code in configobj that handles BOM and encoding is very confusing and error prone. Why not refactor it entirely to handle decode earlier, and rely on the caller to have handled it in the list case. By the way, there is a decoder in Python that handles optional BOMs ("utf-8-sig"). Maybe I am naive, but I don't see much need beyond an early decode(self.encoding) call on file data when a filename was provided. Backwards compatibility be damned :) |
As reported in github issue DiffSK#18 when a file-like object is given and the encoding param is passed configobj fails to decode the data on Python 2. This is a regression from the 4.7 releases. Fix is simply to check first for binary types in _decode() helper as six.string_types includes the bytes str type on Python 2.
As reported in github issue DiffSK#18 when a file-like object is given and the encoding param is passed configobj fails to decode the data on Python 2. This is a regression from the 4.7 releases. Fix is simply to check first for binary types in _decode() helper as six.string_types includes the bytes str type on Python 2.
Version 5.0.0.
If one open a utf-8 encoded file ( with filename in variable fn below ), add an encoding parameter when opening the ConfigObj, the returned object does not convert the keys and values to unicode. The version 4.7.2 did..
In [6]: o = ConfigObj( infile = fn , encoding = "utf8" )
In [7]: t = o["root"]["title"]
In [8]: t
Out[8]: 'Planl\xc3\xa6gning og data'
In [9]: isinstance( t , str )
Out[9]: True
In [10]: isinstance( t , unicode )
Out[10]: False
The text was updated successfully, but these errors were encountered: