Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON python library does know only Unicode (accents issue) #846

Closed
wants to merge 10 commits into from

Conversation

tobes
Copy link
Contributor

@tobes tobes commented Apr 30, 2013

All ckan datas are encoded in Unicode. When we use ckan API we can see every accented characters are changed in Unicode.
The issue doesn't come from the storage but the display. Indeed, when data is text, accents are well handled but when it's array, the JSON python library can use only Unicode and the interface doesn't manage to display it properly.

@tobes
Copy link
Contributor

tobes commented Apr 30, 2013

@aViandon Thanks for reporting this issue. Which version of ckan are you running and could you provide an example with the bad behaviour so we can see the problem and fix it.

@aViandon
Copy link
Author

@tobes

Version of ckan : 1.7.1

Example of the bad behaviour : The API return {..., "type": "Jeu de donn\u00e9es (S\u00e9rie de donn\u00e9es)" , ...} instead of {..., "type": "Jeu de données (Série de données)", ...}

Thanks for your help.

@tobes
Copy link
Contributor

tobes commented Apr 30, 2013

I've made a fix for this.

@amercader is this worth adding for 2.0? Would need to test on the release branch I will if you want

@ghost ghost assigned tobes Jun 14, 2013
@ghost ghost assigned tobes Jul 23, 2013
@ghost ghost assigned joetsoi Oct 29, 2013
@joetsoi
Copy link
Contributor

joetsoi commented Nov 28, 2013

summary on ckan dev call, undecided on including pretty printing into the api by defaullt. Either could be a parameter option or a whole seperate api for pretty printing.

@nickstenning
Copy link
Contributor

It seems to me that the original problem here was that the CKAN API emits ASCII-encoded JSON. I can't really see why this is a major issue. JSON strings are defined to be UTF-8 unicode, but also support an escape sequence (such as \u2603) for encoding non-ASCII characters to ASCII.

Either way, {"name": "\u2603"} and {"name": "☃"} are the same document:

>>> json.loads('{"name": "\\u2603"}') == json.loads('{"name": "☃"}')
True

Moreover, most of the code in this PR is actually nothing to do with resolving this issue, which would simply be a matter of including ensure_ascii=False in the kwargs to json.dumps. Instead, it's mostly about pretty-printing the JSON.

I propose this is closed.

(Oh, and in case you were wondering, ☃ is a unicode snowman.)

@joetsoi joetsoi closed this Sep 25, 2014
@smotornyuk smotornyuk deleted the 846-unicode-api-json branch December 19, 2018 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants