Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No unicode values when an encoding is given. #18

Closed
brianbirke opened this issue Feb 17, 2014 · 8 comments
Closed

No unicode values when an encoding is given. #18

brianbirke opened this issue Feb 17, 2014 · 8 comments
Assignees
Labels
Milestone

Comments

@brianbirke
Copy link

Version 5.0.0.

If one open a utf-8 encoded file ( with filename in variable fn below ), add an encoding parameter when opening the ConfigObj, the returned object does not convert the keys and values to unicode. The version 4.7.2 did..

In [6]: o = ConfigObj( infile = fn , encoding = "utf8" )

In [7]: t = o["root"]["title"]

In [8]: t
Out[8]: 'Planl\xc3\xa6gning og data'

In [9]: isinstance( t , str )
Out[9]: True

In [10]: isinstance( t , unicode )
Out[10]: False

@robdennis
Copy link
Member

Ah interesting. As a sanity check, I assume you're python 2.6 or 2.7?

This is definitely the kind of thing that can get lost with supporting 2 and 3 in the same codebase, and wouldn't have had an obvious test before the conversion.

Thanks for the report, I'd imagine this is important enough of an incompatibility to warrant a 5.0.1 release so I'll work on that today.

@brianbirke
Copy link
Author

Yes, I can confirm issue with a python 2.7.5 and 2.7.6..

Thanks for the very fast response to the issue...

Brian

@robdennis robdennis added this to the 5.0.1 milestone Feb 17, 2014
@robdennis robdennis added the bug label Feb 17, 2014
@robdennis robdennis self-assigned this Feb 17, 2014
@robdennis
Copy link
Member

Alright, taking a few initial cracks at this, and I think it's clear that we really didn't do a great job of handling conversion of unicode but I think this is a clear upgrade (at least in terms of test coverage)

@robdennis
Copy link
Member

this can be closed pending #19

@robdennis
Copy link
Member

there was some discussion in #19 that centered around confirming what the docs say when you don't pass an encoding parameter. We've confirmed that on 4.7.2 not passing an encoding kept things a bytestring. We both didn't necessarily see how the code path let that happen, and there wasn't an explicit test, but we've added it as part of #19.

example of 4.7.2:

>>> c = configobj.ConfigObj('thing.cfg')
>>> c
ConfigObj({'butts': '1', 'herp': {'derp': 'burp'}})
>>> c = configobj.ConfigObj('thing.cfg', encoding='utf-8')
>>> c
ConfigObj({u'butts': u'1', u'herp': {u'derp': u'burp'}})

robdennis added a commit that referenced this issue Feb 20, 2014
fixes #18 - correctly handling unicode conversion where appropriate
@bkabrda
Copy link

bkabrda commented Feb 27, 2015

Hi, I think I'm hitting this issue again with 5.0.5 on Python 2.7, trying to get unicode representation of inifile passed in StringIO or a file descriptor:

import configobj
import StringIO

inp = StringIO.StringIO('[foo]\na=a')
o = configobj.ConfigObj(inp,encoding='utf-8')
print(o.dict())

$ /usr/bin/python2 repr.py
{'foo': {'a': 'a'}}

$ /usr/bin/python2 -V
Python 2.7.8

Other possible ways of passing infile (filename, list of strings) work correctly and return dict with unicode representation of keys and values.

@EliAndrewC
Copy link
Member

Ack, good catch! I've confirmed this on my box and am re-opening this issue.

@EliAndrewC EliAndrewC reopened this Apr 27, 2015
@lambdafu
Copy link

The code in configobj that handles BOM and encoding is very confusing and error prone. Why not refactor it entirely to handle decode earlier, and rely on the caller to have handled it in the list case.

By the way, there is a decoder in Python that handles optional BOMs ("utf-8-sig").

Maybe I am naive, but I don't see much need beyond an early decode(self.encoding) call on file data when a filename was provided. Backwards compatibility be damned :)

bz2 added a commit to bz2/configobj that referenced this issue Feb 19, 2017
As reported in github issue DiffSK#18 when a file-like object is
given and the encoding param is passed configobj fails to
decode the data on Python 2. This is a regression from
the 4.7 releases.

Fix is simply to check first for binary types in _decode()
helper as six.string_types includes the bytes str type
on Python 2.
bz2 added a commit to bz2/configobj that referenced this issue Feb 19, 2017
As reported in github issue DiffSK#18 when a file-like object is
given and the encoding param is passed configobj fails to
decode the data on Python 2. This is a regression from
the 4.7 releases.

Fix is simply to check first for binary types in _decode()
helper as six.string_types includes the bytes str type
on Python 2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants