No unicode values when an encoding is given. #18

brianbirke · 2014-02-17T15:48:57Z

Version 5.0.0.

If one open a utf-8 encoded file ( with filename in variable fn below ), add an encoding parameter when opening the ConfigObj, the returned object does not convert the keys and values to unicode. The version 4.7.2 did..

In [6]: o = ConfigObj( infile = fn , encoding = "utf8" )

In [7]: t = o["root"]["title"]

In [8]: t
Out[8]: 'Planl\xc3\xa6gning og data'

In [9]: isinstance( t , str )
Out[9]: True

In [10]: isinstance( t , unicode )
Out[10]: False

robdennis · 2014-02-17T15:56:43Z

Ah interesting. As a sanity check, I assume you're python 2.6 or 2.7?

This is definitely the kind of thing that can get lost with supporting 2 and 3 in the same codebase, and wouldn't have had an obvious test before the conversion.

Thanks for the report, I'd imagine this is important enough of an incompatibility to warrant a 5.0.1 release so I'll work on that today.

brianbirke · 2014-02-17T19:32:59Z

Yes, I can confirm issue with a python 2.7.5 and 2.7.6..

Thanks for the very fast response to the issue...

Brian

robdennis · 2014-02-18T07:55:32Z

Alright, taking a few initial cracks at this, and I think it's clear that we really didn't do a great job of handling conversion of unicode but I think this is a clear upgrade (at least in terms of test coverage)

robdennis · 2014-02-18T07:57:17Z

this can be closed pending #19

robdennis · 2014-02-20T02:20:29Z

there was some discussion in #19 that centered around confirming what the docs say when you don't pass an encoding parameter. We've confirmed that on 4.7.2 not passing an encoding kept things a bytestring. We both didn't necessarily see how the code path let that happen, and there wasn't an explicit test, but we've added it as part of #19.

example of 4.7.2:

>>> c = configobj.ConfigObj('thing.cfg')
>>> c
ConfigObj({'butts': '1', 'herp': {'derp': 'burp'}})
>>> c = configobj.ConfigObj('thing.cfg', encoding='utf-8')
>>> c
ConfigObj({u'butts': u'1', u'herp': {u'derp': u'burp'}})

fixes #18 - correctly handling unicode conversion where appropriate

bkabrda · 2015-02-27T13:08:51Z

Hi, I think I'm hitting this issue again with 5.0.5 on Python 2.7, trying to get unicode representation of inifile passed in StringIO or a file descriptor:

import configobj
import StringIO

inp = StringIO.StringIO('[foo]\na=a')
o = configobj.ConfigObj(inp,encoding='utf-8')
print(o.dict())

$ /usr/bin/python2 repr.py
{'foo': {'a': 'a'}}

$ /usr/bin/python2 -V
Python 2.7.8

Other possible ways of passing infile (filename, list of strings) work correctly and return dict with unicode representation of keys and values.

EliAndrewC · 2015-04-27T23:42:35Z

Ack, good catch! I've confirmed this on my box and am re-opening this issue.

lambdafu · 2016-06-27T14:21:58Z

The code in configobj that handles BOM and encoding is very confusing and error prone. Why not refactor it entirely to handle decode earlier, and rely on the caller to have handled it in the list case.

By the way, there is a decoder in Python that handles optional BOMs ("utf-8-sig").

Maybe I am naive, but I don't see much need beyond an early decode(self.encoding) call on file data when a filename was provided. Backwards compatibility be damned :)

As reported in github issue DiffSK#18 when a file-like object is given and the encoding param is passed configobj fails to decode the data on Python 2. This is a regression from the 4.7 releases. Fix is simply to check first for binary types in _decode() helper as six.string_types includes the bytes str type on Python 2.

robdennis added this to the 5.0.1 milestone Feb 17, 2014

robdennis added the bug label Feb 17, 2014

robdennis self-assigned this Feb 17, 2014

robdennis mentioned this issue Feb 20, 2014

fixes #18 - correctly handling unicode conversion where appropriate #19

Merged

robdennis closed this as completed in 23c1d70 Feb 20, 2014

robdennis added a commit that referenced this issue Feb 20, 2014

Merge pull request #19 from robdennis/master

10db58d

fixes #18 - correctly handling unicode conversion where appropriate

robdennis mentioned this issue Apr 26, 2014

encoding issue writing #55

Closed

EliAndrewC reopened this Apr 27, 2015

lambdafu mentioned this issue Jun 27, 2016

Fix decoding for python 2 file data. #113

Closed

bz2 mentioned this issue Feb 19, 2017

Always return unicode when encoding is given #139

Merged

jhermann closed this as completed in efde657 Feb 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No unicode values when an encoding is given. #18

No unicode values when an encoding is given. #18

brianbirke commented Feb 17, 2014

robdennis commented Feb 17, 2014

brianbirke commented Feb 17, 2014

robdennis commented Feb 18, 2014

robdennis commented Feb 18, 2014

robdennis commented Feb 20, 2014

bkabrda commented Feb 27, 2015

EliAndrewC commented Apr 27, 2015

lambdafu commented Jun 27, 2016

No unicode values when an encoding is given. #18

No unicode values when an encoding is given. #18

Comments

brianbirke commented Feb 17, 2014

robdennis commented Feb 17, 2014

brianbirke commented Feb 17, 2014

robdennis commented Feb 18, 2014

robdennis commented Feb 18, 2014

robdennis commented Feb 20, 2014

bkabrda commented Feb 27, 2015

EliAndrewC commented Apr 27, 2015

lambdafu commented Jun 27, 2016