Uniform char8_t and char basic_string_view punning #22
EmJayGee
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
This release makes the m/strings/punning.h header have uniform type punning between char8_t and char based basic_string_view<> objects.
This is mainly to support consumption of OSS packages which tacitly assume that char encodes UTF-8 which is a poor assumption.
With the m::as_u8string_view() metaphor, one can easily take a std::string and treat it as a std::u8string_view and then interact with it as UTF-8 data in a type safe fashion.
And then when having to interact back with a library which demands std::string_view for UTF-8 data, you can use m::as_string_view() on your std::u8string data and apply it again into the OSS library.
Caveat Programmer! These functions are only punning the types - they do nothing to extend lifetimes. If you want data with safe lifetime, you should write something like:
Assuming that
some_oss_function()returns astd::string, the compiler will get astd::string_viewof it to pass to them::as_u8string_view()function, which will simply remap the pointer and length to astd::u8string_viewinstance and return that.It is tempting to imagine that the entire
basic_string<>object could be punned but standard library implementations can and do perform major specialization for the 'common' character types ofcharandwchar_t, which may or may not extend to the less-commonchar8_t,char16_t, andchar32_t, and there are people who adhere to the notion thatbasic_string<mytype>is a logical notion so the standard library maintainers also must cater to this usage.Therefore punning of the entire string object is not provided since while it may or may not work on any given standard library implementation, it is almost certainly not portable and is also a proverbial ticking time bomb.
If you want to fix this problem, get your open source code to provide proper UTF-8 support by providing support for std::u8string, but this is probably a decade long 'windmill tilt' if not multi-decade.
char8_tisn't brand new.This discussion was created from the release Uniform char8_t and char basic_string_view punning.
Beta Was this translation helpful? Give feedback.
All reactions