-
Notifications
You must be signed in to change notification settings - Fork 14k
Implement P1885R12: <text_encoding>
header
#141312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Implement P1885R12: <text_encoding>
header
#141312
Conversation
@llvm/pr-subscribers-libcxx Author: William Tran-Viet (smallp-o-p) ChangesResolve #105373 and consequently #118371 First crack at Patch is 118.46 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/141312.diff 32 Files Affected:
diff --git a/libcxx/docs/FeatureTestMacroTable.rst b/libcxx/docs/FeatureTestMacroTable.rst
index 9b57b7c8eeb52..93308e4078075 100644
--- a/libcxx/docs/FeatureTestMacroTable.rst
+++ b/libcxx/docs/FeatureTestMacroTable.rst
@@ -500,7 +500,7 @@ Status
---------------------------------------------------------- -----------------
``__cpp_lib_submdspan`` *unimplemented*
---------------------------------------------------------- -----------------
- ``__cpp_lib_text_encoding`` *unimplemented*
+ ``__cpp_lib_text_encoding`` ``202306L``
---------------------------------------------------------- -----------------
``__cpp_lib_to_chars`` *unimplemented*
---------------------------------------------------------- -----------------
diff --git a/libcxx/docs/Status/Cxx2cPapers.csv b/libcxx/docs/Status/Cxx2cPapers.csv
index 3809446a57896..a7dfa75df7c87 100644
--- a/libcxx/docs/Status/Cxx2cPapers.csv
+++ b/libcxx/docs/Status/Cxx2cPapers.csv
@@ -13,7 +13,7 @@
"`P2013R5 <https://wg21.link/P2013R5>`__","Freestanding Language: Optional ``::operator new``","2023-06 (Varna)","","",""
"`P2363R5 <https://wg21.link/P2363R5>`__","Extending associative containers with the remaining heterogeneous overloads","2023-06 (Varna)","","",""
"`P1901R2 <https://wg21.link/P1901R2>`__","Enabling the Use of ``weak_ptr`` as Keys in Unordered Associative Containers","2023-06 (Varna)","","",""
-"`P1885R12 <https://wg21.link/P1885R12>`__","Naming Text Encodings to Demystify Them","2023-06 (Varna)","","",""
+"`P1885R12 <https://wg21.link/P1885R12>`__","Naming Text Encodings to Demystify Them","2023-06 (Varna)","|Complete|","21",""
"`P0792R14 <https://wg21.link/P0792R14>`__","``function_ref``: a type-erased callable reference","2023-06 (Varna)","","",""
"`P2874R2 <https://wg21.link/P2874R2>`__","P2874R2: Mandating Annex D Require No More","2023-06 (Varna)","|Complete|","12",""
"`P2757R3 <https://wg21.link/P2757R3>`__","Type-checking format args","2023-06 (Varna)","","",""
@@ -79,7 +79,7 @@
"`P3136R1 <https://wg21.link/P3136R1>`__","Retiring niebloids","2024-11 (Wrocław)","|Complete|","14",""
"`P3138R5 <https://wg21.link/P3138R5>`__","``views::cache_latest``","2024-11 (Wrocław)","","",""
"`P3379R0 <https://wg21.link/P3379R0>`__","Constrain ``std::expected`` equality operators","2024-11 (Wrocław)","|Complete|","21",""
-"`P2862R1 <https://wg21.link/P2862R1>`__","``text_encoding::name()`` should never return null values","2024-11 (Wrocław)","","",""
+"`P2862R1 <https://wg21.link/P2862R1>`__","``text_encoding::name()`` should never return null values","2024-11 (Wrocław)","|Complete|","21",""
"`P2897R7 <https://wg21.link/P2897R7>`__","``aligned_accessor``: An ``mdspan`` accessor expressing pointer over-alignment","2024-11 (Wrocław)","|Complete|","21",""
"`P3355R1 <https://wg21.link/P3355R1>`__","Fix ``submdspan`` for C++26","2024-11 (Wrocław)","","",""
"`P3222R0 <https://wg21.link/P3222R0>`__","Fix C++26 by adding transposed special cases for P2642 layouts","2024-11 (Wrocław)","","",""
diff --git a/libcxx/include/CMakeLists.txt b/libcxx/include/CMakeLists.txt
index 43cefd5600646..ba61ee7c11e35 100644
--- a/libcxx/include/CMakeLists.txt
+++ b/libcxx/include/CMakeLists.txt
@@ -751,6 +751,7 @@ set(files
__system_error/error_condition.h
__system_error/system_error.h
__system_error/throw_system_error.h
+ __text_encoding/text_encoding.h
__thread/formatter.h
__thread/id.h
__thread/jthread.h
@@ -1062,6 +1063,7 @@ set(files
strstream
syncstream
system_error
+ text_encoding
tgmath.h
thread
tuple
diff --git a/libcxx/include/__locale b/libcxx/include/__locale
index d6c6ef19627ff..4da3f38ac408f 100644
--- a/libcxx/include/__locale
+++ b/libcxx/include/__locale
@@ -31,6 +31,10 @@
# include <cstddef>
# include <cstring>
+# if _LIBCPP_STD_VER >= 26
+# include <__text_encoding/text_encoding.h>
+# endif
+
# if _LIBCPP_HAS_WIDE_CHARACTERS
# include <cwchar>
# else
@@ -99,6 +103,11 @@ public:
// locale operations:
string name() const;
+
+# if _LIBCPP_STD_VER >= 26 && __CHAR_BIT__ == 8
+ text_encoding encoding() const;
+# endif // _LIBCPP_STD_VER >= 26
+
bool operator==(const locale&) const;
# if _LIBCPP_STD_VER <= 17
_LIBCPP_HIDE_FROM_ABI bool operator!=(const locale& __y) const { return !(*this == __y); }
diff --git a/libcxx/include/__text_encoding/text_encoding.h b/libcxx/include/__text_encoding/text_encoding.h
new file mode 100644
index 0000000000000..93d0ae2ab6b89
--- /dev/null
+++ b/libcxx/include/__text_encoding/text_encoding.h
@@ -0,0 +1,1483 @@
+// -*- C++ -*-
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___TEXT_ENCODING_TEXT_ENCODING_H
+#define _LIBCPP___TEXT_ENCODING_TEXT_ENCODING_H
+
+#include <__config>
+
+#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
+# pragma GCC system_header
+#endif
+
+#if _LIBCPP_HAS_LOCALIZATION
+
+#include <__algorithm/copy_n.h>
+#include <__algorithm/lower_bound.h>
+#include <__algorithm/min.h>
+#include <__functional/hash.h>
+#include <__iterator/iterator_traits.h>
+#include <__locale_dir/locale_base_api.h>
+#include <__ranges/view_interface.h>
+#include <__string/char_traits.h>
+#include <__utility/unreachable.h>
+#include <cstdint>
+#include <string_view>
+
+_LIBCPP_PUSH_MACROS
+#include <__undef_macros>
+
+#if _LIBCPP_STD_VER >= 26
+_LIBCPP_BEGIN_NAMESPACE_STD
+
+struct _LIBCPP_EXPORTED_FROM_ABI text_encoding {
+ static constexpr size_t max_name_length = 63;
+
+private:
+ struct __encoding_data {
+ using __id_rep _LIBCPP_NODEBUG = int_least32_t;
+ __id_rep __mib_rep;
+ const char* __name;
+
+ friend constexpr bool operator==(const __encoding_data& __e, const __encoding_data& __other) _NOEXCEPT {
+ return __e.__mib_rep == __other.__mib_rep || __comp_name(__e.__name, __other.__name);
+ }
+
+ friend constexpr bool operator<(const __encoding_data& __e, const __id_rep __i) _NOEXCEPT {
+ return __e.__mib_rep < __i;
+ }
+ };
+
+public:
+ enum class id : __encoding_data::__id_rep {
+ other = 1,
+ unknown = 2,
+ ASCII = 3,
+ ISOLatin1 = 4,
+ ISOLatin2 = 5,
+ ISOLatin3 = 6,
+ ISOLatin4 = 7,
+ ISOLatinCyrillic = 8,
+ ISOLatinArabic = 9,
+ ISOLatinGreek = 10,
+ ISOLatinHebrew = 11,
+ ISOLatin5 = 12,
+ ISOLatin6 = 13,
+ ISOTextComm = 14,
+ HalfWidthKatakana = 15,
+ JISEncoding = 16,
+ ShiftJIS = 17,
+ EUCPkdFmtJapanese = 18,
+ EUCFixWidJapanese = 19,
+ ISO4UnitedKingdom = 20,
+ ISO11SwedishForNames = 21,
+ ISO15Italian = 22,
+ ISO17Spanish = 23,
+ ISO21German = 24,
+ ISO60DanishNorwegian = 25,
+ ISO69French = 26,
+ ISO10646UTF1 = 27,
+ ISO646basic1983 = 28,
+ INVARIANT = 29,
+ ISO2IntlRefVersion = 30,
+ NATSSEFI = 31,
+ NATSSEFIADD = 32,
+ NATSDANO = 33,
+ NATSDANOADD = 34,
+ ISO10Swedish = 35,
+ KSC56011987 = 36,
+ ISO2022KR = 37,
+ EUCKR = 38,
+ ISO2022JP = 39,
+ ISO2022JP2 = 40,
+ ISO13JISC6220jp = 41,
+ ISO14JISC6220ro = 42,
+ ISO16Portuguese = 43,
+ ISO18Greek7Old = 44,
+ ISO19LatinGreek = 45,
+ ISO25French = 46,
+ ISO27LatinGreek1 = 47,
+ ISO5427Cyrillic = 48,
+ ISO42JISC62261978 = 49,
+ ISO47BSViewdata = 50,
+ ISO49INIS = 51,
+ ISO50INIS8 = 52,
+ ISO51INISCyrillic = 53,
+ ISO54271981 = 54,
+ ISO5428Greek = 55,
+ ISO57GB1988 = 56,
+ ISO58GB231280 = 57,
+ ISO61Norwegian2 = 58,
+ ISO70VideotexSupp1 = 59,
+ ISO84Portuguese2 = 60,
+ ISO85Spanish2 = 61,
+ ISO86Hungarian = 62,
+ ISO87JISX0208 = 63,
+ ISO88Greek7 = 64,
+ ISO89ASMO449 = 65,
+ ISO90 = 66,
+ ISO91JISC62291984a = 67,
+ ISO92JISC62991984b = 68,
+ ISO93JIS62291984badd = 69,
+ ISO94JIS62291984hand = 70,
+ ISO95JIS62291984handadd = 71,
+ ISO96JISC62291984kana = 72,
+ ISO2033 = 73,
+ ISO99NAPLPS = 74,
+ ISO102T617bit = 75,
+ ISO103T618bit = 76,
+ ISO111ECMACyrillic = 77,
+ ISO121Canadian1 = 78,
+ ISO122Canadian2 = 79,
+ ISO123CSAZ24341985gr = 80,
+ ISO88596E = 81,
+ ISO88596I = 82,
+ ISO128T101G2 = 83,
+ ISO88598E = 84,
+ ISO88598I = 85,
+ ISO139CSN369103 = 86,
+ ISO141JUSIB1002 = 87,
+ ISO143IECP271 = 88,
+ ISO146Serbian = 89,
+ ISO147Macedonian = 90,
+ ISO150 = 91,
+ ISO151Cuba = 92,
+ ISO6937Add = 93,
+ ISO153GOST1976874 = 94,
+ ISO8859Supp = 95,
+ ISO10367Box = 96,
+ ISO158Lap = 97,
+ ISO159JISX02121990 = 98,
+ ISO646Danish = 99,
+ USDK = 100,
+ DKUS = 101,
+ KSC5636 = 102,
+ Unicode11UTF7 = 103,
+ ISO2022CN = 104,
+ ISO2022CNEXT = 105,
+ UTF8 = 106,
+ ISO885913 = 109,
+ ISO885914 = 110,
+ ISO885915 = 111,
+ ISO885916 = 112,
+ GBK = 113,
+ GB18030 = 114,
+ OSDEBCDICDF0415 = 115,
+ OSDEBCDICDF03IRV = 116,
+ OSDEBCDICDF041 = 117,
+ ISO115481 = 118,
+ KZ1048 = 119,
+ UCS2 = 1000,
+ UCS4 = 1001,
+ UnicodeASCII = 1002,
+ UnicodeLatin1 = 1003,
+ UnicodeJapanese = 1004,
+ UnicodeIBM1261 = 1005,
+ UnicodeIBM1268 = 1006,
+ UnicodeIBM1276 = 1007,
+ UnicodeIBM1264 = 1008,
+ UnicodeIBM1265 = 1009,
+ Unicode11 = 1010,
+ SCSU = 1011,
+ UTF7 = 1012,
+ UTF16BE = 1013,
+ UTF16LE = 1014,
+ UTF16 = 1015,
+ CESU8 = 1016,
+ UTF32 = 1017,
+ UTF32BE = 1018,
+ UTF32LE = 1019,
+ BOCU1 = 1020,
+ UTF7IMAP = 1021,
+ Windows30Latin1 = 2000,
+ Windows31Latin1 = 2001,
+ Windows31Latin2 = 2002,
+ Windows31Latin5 = 2003,
+ HPRoman8 = 2004,
+ AdobeStandardEncoding = 2005,
+ VenturaUS = 2006,
+ VenturaInternational = 2007,
+ DECMCS = 2008,
+ PC850Multilingual = 2009,
+ PC8DanishNorwegian = 2012,
+ PC862LatinHebrew = 2013,
+ PC8Turkish = 2014,
+ IBMSymbols = 2015,
+ IBMThai = 2016,
+ HPLegal = 2017,
+ HPPiFont = 2018,
+ HPMath8 = 2019,
+ HPPSMath = 2020,
+ HPDesktop = 2021,
+ VenturaMath = 2022,
+ MicrosoftPublishing = 2023,
+ Windows31J = 2024,
+ GB2312 = 2025,
+ Big5 = 2026,
+ Macintosh = 2027,
+ IBM037 = 2028,
+ IBM038 = 2029,
+ IBM273 = 2030,
+ IBM274 = 2031,
+ IBM275 = 2032,
+ IBM277 = 2033,
+ IBM278 = 2034,
+ IBM280 = 2035,
+ IBM281 = 2036,
+ IBM284 = 2037,
+ IBM285 = 2038,
+ IBM290 = 2039,
+ IBM297 = 2040,
+ IBM420 = 2041,
+ IBM423 = 2042,
+ IBM424 = 2043,
+ PC8CodePage437 = 2011,
+ IBM500 = 2044,
+ IBM851 = 2045,
+ PCp852 = 2010,
+ IBM855 = 2046,
+ IBM857 = 2047,
+ IBM860 = 2048,
+ IBM861 = 2049,
+ IBM863 = 2050,
+ IBM864 = 2051,
+ IBM865 = 2052,
+ IBM868 = 2053,
+ IBM869 = 2054,
+ IBM870 = 2055,
+ IBM871 = 2056,
+ IBM880 = 2057,
+ IBM891 = 2058,
+ IBM903 = 2059,
+ IBBM904 = 2060,
+ IBM905 = 2061,
+ IBM918 = 2062,
+ IBM1026 = 2063,
+ IBMEBCDICATDE = 2064,
+ EBCDICATDEA = 2065,
+ EBCDICCAFR = 2066,
+ EBCDICDKNO = 2067,
+ EBCDICDKNOA = 2068,
+ EBCDICFISE = 2069,
+ EBCDICFISEA = 2070,
+ EBCDICFR = 2071,
+ EBCDICIT = 2072,
+ EBCDICPT = 2073,
+ EBCDICES = 2074,
+ EBCDICESA = 2075,
+ EBCDICESS = 2076,
+ EBCDICUK = 2077,
+ EBCDICUS = 2078,
+ Unknown8BiT = 2079,
+ Mnemonic = 2080,
+ Mnem = 2081,
+ VISCII = 2082,
+ VIQR = 2083,
+ KOI8R = 2084,
+ HZGB2312 = 2085,
+ IBM866 = 2086,
+ PC775Baltic = 2087,
+ KOI8U = 2088,
+ IBM00858 = 2089,
+ IBM00924 = 2090,
+ IBM01140 = 2091,
+ IBM01141 = 2092,
+ IBM01142 = 2093,
+ IBM01143 = 2094,
+ IBM01144 = 2095,
+ IBM01145 = 2096,
+ IBM01146 = 2097,
+ IBM01147 = 2098,
+ IBM01148 = 2099,
+ IBM01149 = 2100,
+ Big5HKSCS = 2101,
+ IBM1047 = 2102,
+ PTCP154 = 2103,
+ Amiga1251 = 2104,
+ KOI7switched = 2105,
+ BRF = 2106,
+ TSCII = 2107,
+ CP51932 = 2108,
+ windows874 = 2109,
+ windows1250 = 2250,
+ windows1251 = 2251,
+ windows1252 = 2252,
+ windows1253 = 2253,
+ windows1254 = 2254,
+ windows1255 = 2255,
+ windows1256 = 2256,
+ windows1257 = 2257,
+ windows1258 = 2258,
+ TIS620 = 2259,
+ CP50220 = 2260,
+ reserved = 3000
+ };
+
+ using enum id;
+
+ _LIBCPP_HIDE_FROM_ABI constexpr text_encoding() = default;
+ _LIBCPP_HIDE_FROM_ABI constexpr explicit text_encoding(string_view __enc) _NOEXCEPT
+ : __encoding_rep_(__find_encoding_data(__enc)) {
+ __enc.copy(__name_, max_name_length, 0);
+ }
+ _LIBCPP_HIDE_FROM_ABI constexpr text_encoding(id __i) _NOEXCEPT : __encoding_rep_(__find_encoding_data_by_id(__i)) {
+ if (__encoding_rep_->__name[0] != '\0')
+ std::copy_n(__encoding_rep_->__name, std::char_traits<char>::length(__encoding_rep_->__name), __name_);
+ }
+
+ [[nodiscard]] _LIBCPP_HIDE_FROM_ABI constexpr id mib() const _NOEXCEPT { return id(__encoding_rep_->__mib_rep); }
+ [[nodiscard]] _LIBCPP_HIDE_FROM_ABI constexpr const char* name() const _NOEXCEPT { return __name_; }
+
+ // [text.encoding.aliases], class text_encoding::aliases_view
+ struct aliases_view : ranges::view_interface<aliases_view> {
+ constexpr aliases_view() = default;
+ constexpr aliases_view(const __encoding_data* __d) : __view_data_(__d) {}
+ struct __end_sentinel {};
+ struct __iterator {
+ using value_type = const char*;
+ using reference = const char*;
+ using difference_type = ptrdiff_t;
+
+ _LIBCPP_HIDE_FROM_ABI constexpr __iterator() noexcept = default;
+
+ _LIBCPP_HIDE_FROM_ABI constexpr value_type operator*() const {
+ if (__can_dereference())
+ return __data_->__name;
+ std::unreachable();
+ }
+
+ _LIBCPP_HIDE_FROM_ABI constexpr value_type operator[](difference_type __n) const {
+ auto __it = *this;
+ return *(__it + __n);
+ }
+
+ _LIBCPP_HIDE_FROM_ABI friend constexpr __iterator operator+(__iterator __it, difference_type __n) {
+ __it += __n;
+ return __it;
+ }
+
+ _LIBCPP_HIDE_FROM_ABI friend constexpr __iterator operator+(difference_type __n, __iterator __it) {
+ __it += __n;
+ return __it;
+ }
+
+ _LIBCPP_HIDE_FROM_ABI friend constexpr __iterator operator-(__iterator __it, difference_type __n) {
+ __it -= __n;
+ return __it;
+ }
+
+ _LIBCPP_HIDE_FROM_ABI constexpr difference_type operator-(const __iterator& __other) const
+ {
+ if(__other.__mib_rep_ == __mib_rep_)
+ return __mib_rep_ - __other.__mib_rep_;
+ std::unreachable();
+ }
+
+ _LIBCPP_HIDE_FROM_ABI friend constexpr __iterator operator-(difference_type __n, __iterator& __it) {
+ __it -= __n;
+ return __it;
+ }
+
+ _LIBCPP_HIDE_FROM_ABI constexpr __iterator& operator++() {
+ __data_++;
+ return *this;
+ }
+
+ _LIBCPP_HIDE_FROM_ABI constexpr __iterator operator++(int) {
+ auto __old = *this;
+ __data_++;
+ return __old;
+ }
+
+ _LIBCPP_HIDE_FROM_ABI constexpr __iterator& operator--() {
+ __data_--;
+ return *this;
+ }
+
+ _LIBCPP_HIDE_FROM_ABI constexpr __iterator operator--(int) {
+ auto __old = *this;
+ __data_--;
+ return __old;
+ }
+
+ // Check if going past the encoding data list array and if the new index has the same id, if not then
+ // replace it with a sentinel "out-of-bounds" iterator.
+ _LIBCPP_HIDE_FROM_ABI constexpr __iterator& operator+=(difference_type __n) {
+ if (__data_) [[__likely__]] {
+ if (__n > 0) {
+ if ((__data_ + __n) < std::end(__text_encoding_data) && __data_[__n - 1].__mib_rep == __mib_rep_)
+ __data_ += __n;
+ else
+ *this = __iterator{};
+ } else if (__n < 0) {
+ if ((__data_ + __n) > __text_encoding_data && __data_[__n].__mib_rep == __mib_rep_)
+ __data_ += __n;
+ else
+ *this = __iterator{};
+ }
+ }
+ return *this;
+ }
+
+ _LIBCPP_HIDE_FROM_ABI constexpr __iterator& operator-=(difference_type __n) { return operator+=(-__n); }
+
+ _LIBCPP_HIDE_FROM_ABI constexpr bool operator==(const __iterator& __it) const {
+ return __data_ == __it.__data_ && __it.__mib_rep_ == __mib_rep_;
+ }
+
+ _LIBCPP_HIDE_FROM_ABI constexpr bool operator==(__end_sentinel) const { return !__can_dereference(); }
+
+ _LIBCPP_HIDE_FROM_ABI constexpr auto operator<=>(__iterator __it) const { return __data_ <=> __it.__data_; }
+
+ private:
+ friend struct text_encoding;
+
+ _LIBCPP_HIDE_FROM_ABI constexpr __iterator(const __encoding_data* __enc_d) noexcept
+ ...
[truncated]
|
You can test this locally with the following command:git-clang-format --diff HEAD~1 HEAD --extensions h,,inc,cpp -- libcxx/include/__text_encoding/get_locale_encoding.h libcxx/include/__text_encoding/text_encoding.h libcxx/include/text_encoding libcxx/src/text_encoding.cpp libcxx/test/std/language.support/support.limits/support.limits.general/text_encoding.version.compile.pass.cpp libcxx/test/std/localization/locales/locale/locale.members/encoding.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.ctor/default.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.ctor/id.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.ctor/string_view.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.eq/equal.id.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.eq/equal.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.members/aliases.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.members/environment.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.members/literal.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.members/nodiscard.verify.cpp libcxx/test/std/utilities/text_encoding/text_encoding.members/text_encoding.aliases_view/begin.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.members/text_encoding.aliases_view/empty.pass.cpp libcxx/test/std/utilities/text_encoding/text_encoding.members/text_encoding.aliases_view/front.pass.cpp libcxx/test/support/test_text_encoding.h libcxx/include/__locale libcxx/include/__locale_dir/locale_base_api.h libcxx/include/__locale_dir/locale_base_api/ibm.h libcxx/include/__locale_dir/support/bsd_like.h libcxx/include/__locale_dir/support/fuchsia.h libcxx/include/__locale_dir/support/linux.h libcxx/include/version libcxx/modules/std/text_encoding.inc libcxx/test/std/language.support/support.limits/support.limits.general/version.version.compile.pass.cpp libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_fr_FR.pass.cpp View the diff from clang-format here.diff --git a/libcxx/test/std/language.support/support.limits/support.limits.general/text_encoding.version.compile.pass.cpp b/libcxx/test/std/language.support/support.limits/support.limits.general/text_encoding.version.compile.pass.cpp
index 817b0f0d6..1678e8840 100644
--- a/libcxx/test/std/language.support/support.limits/support.limits.general/text_encoding.version.compile.pass.cpp
+++ b/libcxx/test/std/language.support/support.limits/support.limits.general/text_encoding.version.compile.pass.cpp
@@ -60,4 +60,3 @@
#endif // TEST_STD_VER > 23
// clang-format on
-
diff --git a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp
index a87fd19c1..341f05943 100644
--- a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp
+++ b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp
@@ -543,7 +543,8 @@ int main(int, char**)
std::noshowbase(ios);
}
{ // negative, showbase
- std::wstring v = convert_thousands_sep(L"-1" THOUSANDS_SEP_ "234" THOUSANDS_SEP_ "567,89 \u20ac"); // EURO SIGN
+ std::wstring v =
+ convert_thousands_sep(L"-1" THOUSANDS_SEP_ "234" THOUSANDS_SEP_ "567,89 \u20ac"); // EURO SIGN
std::showbase(ios);
typedef cpp17_input_iterator<const wchar_t*> I;
long double ex;
|
Thanks! I've edited the PR description to associate this PR with both issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting to see progress on this
I'm not a library maintainer, so take my comments for what they are worth :)
What's the Windows support for libc++ these days? environment is Posix specific at the moment
libcxx/src/text_encoding.cpp
Outdated
auto __make_locale = [](const char* __name) { | ||
text_encoding __enc{}; | ||
if (auto __loc = __locale::__newlocale(LC_CTYPE_MASK, __name, static_cast<locale_t>(0))) { | ||
if (const char* __codeset = nl_langinfo_l(CODESET, __loc)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to check that you are in a POSIX environment here. nl_langinfo_l
is not going to be available on windows, for example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be in the locale base API, since this is platform-specific and locale related.
} | ||
_LIBCPP_HIDE_FROM_ABI constexpr text_encoding(id __i) _NOEXCEPT : __encoding_rep_(__find_encoding_data_by_id(__i)) { | ||
if (__encoding_rep_->__name[0] != '\0') | ||
std::copy_n(__encoding_rep_->__name, std::char_traits<char>::length(__encoding_rep_->__name), __name_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just use strncpy here - but I don;t know what libc++ folks prefer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather save the size and avoid this call entirely. I'm pretty sure we can get away with not even increasing the size of the struct, since it's at most 63.
template <id __i> | ||
[[nodiscard]] _LIBCPP_HIDE_FROM_ABI static bool environment_is() { | ||
return environment() == __i; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That function intended that it could be optimized - eg for utf8 - such that it would not access / odr-use the data table. But that implementation is fine, especially as a first pass
{1, ""}, | ||
{2, ""}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these useful?
const __encoding_data* __encoding_rep_ = __text_encoding_data + 1; | ||
char __name_[max_name_length + 1] = {0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To decrease the size at least somewhat, we could instead have a union of these two and set the last byte to a non-zero value if we store a pointer. The __name_
would be the same as __encoding_rep_
in that case IIUC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is feasible due to 2.2 in the case of a match found for the name passed in enc
, we'd have to copy the name into the buffer and somehow be able to retrieve the id
without the pointer.
Edit: We could still avoid the call to copy_n
though and change name()
to check if the first character is null terminator.
__id_rep __mib_rep; | ||
const char* __name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we store the size of the name as well? We already have space between __mib_rep
and __name
anyways. Also, they should be __mib_rep_
and __name_
.
# pragma GCC system_header | ||
#endif | ||
|
||
#if _LIBCPP_HAS_LOCALIZATION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to guard this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think a user who is building libc++ without localization wouldn't have much use for a text encoding database, hence guarding it behind _LIBCPP_HAS_LOCALIZATION
, however I'm not an expert. In the standard, localization and text_encoding are different subcategories of the text processing library, so it's more likely that the guard should be removed. Perhaps @cor3ntin could weigh in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
locale and encoding are unrelated, for the most part. so I would not disable the feature if locale is not available.
maybe environment/environment_is would be considered to require locale?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's only one or two functions that actually require locale we should guard only those. It may not make a ton of sense to use this without localization, but _LIBCPP_HAS_LOCALIZATION
really means "has libc locale support". Unless there is actually a platform requirement for a feature, we shouldn't guard that feature.
libcxx/src/text_encoding.cpp
Outdated
auto __make_locale = [](const char* __name) { | ||
text_encoding __enc{}; | ||
if (auto __loc = __locale::__newlocale(LC_CTYPE_MASK, __name, static_cast<locale_t>(0))) { | ||
if (const char* __codeset = nl_langinfo_l(CODESET, __loc)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be in the locale base API, since this is platform-specific and locale related.
@@ -0,0 +1,56 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libcxx/src/text_encoding.cpp
Outdated
# include <langinfo.h> | ||
#endif | ||
|
||
#if _LIBCPP_STD_VER >= 26 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should never have version checks in src/
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pardon my unfamiliarity with the codebase, but wouldn't this prevent the library from being built, since it's built with -std=c++23
? In that case this PR would probably have to be parked (once all the requested changes are made and approvals are given) until the library is built with -std=c++26
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The library needs to be built with -std=c++26
(or in an even later mode) as long as text_encoding::environment
is separately compiled. So it doesn't make sense to guard this with _LIBCPP_STD_VER >= 26
.
Alternatively, it's possible to provide an internal function achieving this functionality that's separately compiled and available in old modes. But it still makes no sense to guard the function definition in src/
.
libcxx/test/std/localization/locales/locale/locale.members/encoding.pass.cpp
Outdated
Show resolved
Hide resolved
#if _LIBCPP_STD_VER >= 26 | ||
# include <__text_encoding/text_encoding.h> | ||
#endif // _LIBCPP_STD_VER >= 26 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you put the implementation in an internal header? I don't see a reason for that. IMO you should move the implementation here instead. The granularization of headers has a special purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is fine. We've granularized most headers by now, so this keeps the pattern, and this avoids having to move code around in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not arguing. IMO Without anything granularized it's just unnecessary overhead. I wondered why you didn't comment on it.
libcxx/test/std/utilities/text_encoding/text_encoding.ctor/id.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/std/utilities/text_encoding/text_encoding.ctor/string_view.pass.cpp
Outdated
Show resolved
Hide resolved
#include "test_macros.h" | ||
#include <cstdint> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "test_macros.h" | |
#include <cstdint> | |
#include <cstdint> | |
#include "test_macros.h" |
Do you really need this header in the support directory? Perhaps it can be put in the test directory?
@smallp-o-p BTW You should use proper GitHub syntax to link the PR to the issues: https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue |
|
||
// [text.encoding.aliases], class text_encoding::aliases_view | ||
struct aliases_view : ranges::view_interface<aliases_view> { | ||
constexpr aliases_view() = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this default constructor? It's clarified in https://eel.is/c++draft/text.encoding.aliases#note-1 that aliases_view
can be non-default_initializable
.
// [text.encoding.aliases], class text_encoding::aliases_view | ||
struct aliases_view : ranges::view_interface<aliases_view> { | ||
constexpr aliases_view() = default; | ||
constexpr aliases_view(const __encoding_data* __d) : __view_data_(__d) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constructor seems too permissive (ditto __iterator
's), e.g. one can write aliases_view(nullptr)
. I guess it's better to make it hard or even impossible to call constructor without mentioning any reserved identifier.
I think there's another approach. Note that
The helper class and function can be available in old modes without exposing |
…odule visibility issues
…already be on the primary encoding name
…cale instead of assuming the "C" locale
I have no idea what libc++ policies are in terms of backporting, but given that this feature is most useful for legacy systems, it might be reasonable to make it available in C++20 (we need consteval) - assuming there are appropriate warnings (note that gcc does not do that though). |
A couple notes based on the recent build failures:
It may just be better to use the draft implementation for |
I belive the latest released NDK. |
Resolve #105373 and consequently resolve #118371
First crack at
<text_encoding>
. Implementation is pretty similar tolibstdc++
.