Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses two issues #787 and #628 (rebased version of #794 ).
The plan to fix them and possible other XML-with-multibyte(non-ASCII)-character
issues are as follows:
riak_cs_xml:format_value(Val)
treats conversion between lists/binaries andUnicode strings
(xmerl accepts only Unicode strings but not binary or latin-1-ish strings).
when is_list(Val)
, use list_to_binary and callformat_value
again with converted binary.when is_binary(Val)
useunicode:characters_to_list
to produce Unicode strings.
unicode:characters_to_list
around XML output creation.Some misc notes:
binary_to_list
should notbe considered as "strings". This conversion is NOT Unicode-aware, so lists and
binaries have byte-wise correspondence.
unicode:characters_to_list
is Unicode-aware, so convertedlists are not necessarily
[0..255]
. Each element of a converted list iscorrespond to Unicode codepoint (not so precise but it's almost a character).
Codepoints, encodings, string literals, binaries and more.
[1] http://www.erlang-factory.com/conference/ErlangUserConference2013/speakers/PatrikNyblom