code: conform to c++23 #184

lazan · 2023-04-25T09:44:41Z

This patch still builds with c++17.

OlofGullnas · 2023-04-25T10:49:00Z

middleware/common/unittest/source/test_transcode.cpp

@@ -100,7 +100,7 @@ namespace casual
         if( transcode::utf8::exist( "ISO-8859-15"))
         {
            const std::string source = { static_cast<std::string::value_type>(0xA4)};
-            const std::string expect( u8"€");
+            const std::string expect( "€");
            const std::string result = transcode::utf8::encode( source, "ISO-8859-15");


Är detta bra? Bygger väl på att Eurosymbolen i expect råkar vara en UTF-8 encoded variant av symbolen. Dvs att källkoden editerats i en omgivning som har locale utf-8, och att kompilatorn accepterar denna byte-ström i den locale som gäller vid bygget. Borde man använda "hex-notation" i expected? Har inte läst på om det nya utf-8 stödet i C++20/23 ännu, men antar att det är det som gör at koden behövde ändras för at fungera med både C++17 och C++23... Har köpt Josuttis "C++20 The complete Guide" (700 sidor om C++20, version daterad 2022-11-14) och ser att det finns ett avsnitt i den om ändringarna i utf-8 stödet.

This is not that great... But u8 prefix creates a std::u8string, witch we don't have any knowledge off in our code base, yet. The source code encoding is in utf8, hence the euro-sign will be a utf8 encoded string, as far as I can understand. The whole u8 prefix is rather confusing, at least to me: https://stackoverflow.com/questions/23471935/how-are-u8-literals-supposed-to-work

It makes more sense now when it creates std::u8string.

http://wg21.link/p1423 can be of interest.
According to the man page for c++ the g++ compiler takes the default input character set (-finput-charset=charset) from the "locale". If not available there it is assumed to be UTF-8. But it can be specified/overridden on the command line....
For the execution character set (-fexec-charset=charset) the default is UTF-8.
What happens if the input charset is UTF-8 but actual input is "illegal" UTF-8 (e.g. it really is 8859-1 with non-ascii characters) is probably "implementation dependent". According to the stack overflow discussion Clang gives a warning, but g++ just "preserves" the input bytes in this case.

code: conform to c++23

d68d3b7

This patch still builds with c++17.

lazan requested a review from ahhud April 25, 2023 09:45

OlofGullnas reviewed Apr 25, 2023

View reviewed changes

lazan merged commit d68d3b7 into feature/1.7/main Jul 2, 2023

lazan deleted the feature/1.7/c++23-conformance branch July 24, 2023 09:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code: conform to c++23 #184

code: conform to c++23 #184

lazan commented Apr 25, 2023

OlofGullnas Apr 25, 2023

lazan Apr 25, 2023

OlofGullnas Apr 28, 2023

code: conform to c++23 #184

code: conform to c++23 #184

Conversation

lazan commented Apr 25, 2023

OlofGullnas Apr 25, 2023

Choose a reason for hiding this comment

lazan Apr 25, 2023

Choose a reason for hiding this comment

OlofGullnas Apr 28, 2023

Choose a reason for hiding this comment