New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_utf8: Fixed wchar_t handling on Windows #413

Merged
merged 1 commit into from Oct 30, 2018

Conversation

Projects
None yet
2 participants
@Kojoley
Collaborator

Kojoley commented Oct 28, 2018

Spirit were assuming that wchar_t is 32-bit and the content is UCS-4.
It is wrong, the actual representation is implementation defined [lex.ccon]/6.
However, on most Unix platforms this assumption is valid and gives the
expected outcome, but on Windows wchar_t is 16-bit and the content is UTF-16.

Fixes #395.

to_utf8: Fixed wchar_t handling on Windows
Spirit were assuming that wchar_t is 32-bit and the content is UCS-4.
It is wrong, the actual representation is implementation defined [lex.ccon]/6.
However, on most Unix platforms this assumption is valid and gives the
expected outcome, but on Windows wchar_t is 16-bit and the content is UTF-16.
@djowel

This comment has been minimized.

Member

djowel commented Oct 28, 2018

wchar_t is implementation defined. If the code somehow assumes that, then it is a bug.

@Kojoley

This comment has been minimized.

Collaborator

Kojoley commented Oct 28, 2018

What are alternatives? Remove to_utf8 entirely? It is been here for 10 years and assumes that wchar_t is 32-bit UCS-4. I would be glad if you can fix this somehow without assuming any implementation details of wchar_t. Note that std::codecvt is deprecated in C++17 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0618r0.html.

@djowel

This comment has been minimized.

Member

djowel commented Oct 28, 2018

How about we support only a certain subset of types: those that we know work and is generally useful for typical platforms such as windows and unix? Assert the size required perhaps.

@Kojoley

This comment has been minimized.

Collaborator

Kojoley commented Oct 28, 2018

How about we support only a certain subset of types: those that we know work and is generally useful for typical platforms such as windows and unix?

The problem is we cannot know what can be passed here, especially in C++98 because there is no char16_t and char32_t types.

Assert the size required perhaps.

This function is only used in parser.what() (I have posted about this in #395 (comment), please read it), no one noticed before that it does something strange with wchar_t on Windows because it is does not affect parsing in any way and its result is only seen in parser debugging with BOOST_SPIRIT_QI_DEBUG. I think it is not a good way to throw asserts from things that are cosmetic and a size assert will not help, possibly constexpr function could check what the actual wchar_t representation is, but again I do not think it is needed. Also I have added a test and when it will fail somewhere then things might be reconsidered.

@djowel

This comment has been minimized.

Member

djowel commented Oct 28, 2018

I see. Well, I think we should simplify and just require exact representations to begin with. All platforms has some conversion support for it anyway. I think your last paragraph in your comment is the way to go.

@Kojoley

This comment has been minimized.

Collaborator

Kojoley commented Oct 28, 2018

I am sorry I do not get you. If would be glad if you fix this if you know how it should be done the right way.

@djowel

This comment has been minimized.

Member

djowel commented Oct 29, 2018

Sorry for the misunderstanding. To be clear, I think your solution here is OK.

@Kojoley

This comment has been minimized.

Collaborator

Kojoley commented Oct 29, 2018

I merge it then?

@Kojoley Kojoley merged commit cf631bf into boostorg:develop Oct 30, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@Kojoley Kojoley deleted the Kojoley:fix-to_utf8-wchar_t-handling-on-windows branch Oct 30, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment