Encoding with utf_traits can be wrong with C++11 string types #47

Flamefire · 2019-12-13T10:16:34Z

The utf_traits class uses the size of the char to switch between UTF-8, UTF-16, UTF32, e.g. for sizeof(CHAR)==2 it uses UTF-16: https://github.com/boostorg/locale/blob/develop/include/boost/locale/utf.hpp#L312

C++11 introduced standardized strings with encoding via char16_t, char32_t (and later char8_t). Those are "at least" 2 or 4 bytes respectively wide. So using utf_traits<char16_t> might fail if that happens to be 4 bytes wide on some platform.

Proposed solution: Templatize utf_traits over an encoding instead. This can be a plain-old-enum where values UTF8, UTF16, UTF32 are set to 1, 2, 4 to allow this to be backwards compatible. Then dispatch (via an indirection if required) the charN_t types to their known encoding. Not sure if with the current impl one can directly make utf_traits<char16_t, N>==utf_traits<char16_t, 2> so indirection might be required.

The text was updated successfully, but these errors were encountered:

Flamefire changed the title ~~Encoding with utf_traits can be wrong on C++11~~ Encoding with utf_traits can be wrong with C++11 string types Dec 13, 2019

Flamefire mentioned this issue Dec 13, 2019

Add support for C++11 UTF string literals #48

Closed

Flamefire closed this as completed Mar 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding with utf_traits can be wrong with C++11 string types #47

Encoding with utf_traits can be wrong with C++11 string types #47

Flamefire commented Dec 13, 2019

Encoding with utf_traits can be wrong with C++11 string types #47

Encoding with utf_traits can be wrong with C++11 string types #47

Comments

Flamefire commented Dec 13, 2019