Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P2729 Unicode in the Library, Part 2: Normalization #1423

Open
wg21bot opened this issue Jan 16, 2023 · 4 comments
Open

P2729 Unicode in the Library, Part 2: Normalization #1423

wg21bot opened this issue Jan 16, 2023 · 4 comments
Labels
B3 - addition Bucket 3 as described by P0592: material that is not mentioned in P0592 C++26 Targeted at C++26 IS Ship vehicle: IS ranges std::ranges SG9 Ranges SG SG16 Text processing size - large paper size estimate unicode
Milestone

Comments

@wg21bot
Copy link
Collaborator

wg21bot commented Jan 16, 2023

P2729R0 Unicode in the Library, Part 2: Normalization (Zach Laine)

@wg21bot wg21bot added LEWG Library Evolution LEWGI Library Evolution Incubator labels Jan 16, 2023
@wg21bot wg21bot added this to the 2022-telecon milestone Jan 16, 2023
@tahonermann
Copy link
Collaborator

This needs SG16 review.

@tahonermann tahonermann added the SG16 Text processing label Jan 20, 2023
@brycelelbach brycelelbach added B3 - addition Bucket 3 as described by P0592: material that is not mentioned in P0592 IS Ship vehicle: IS C++26 Targeted at C++26 size - large paper size estimate ready-for-library-evolution-meeting-review This paper needs to be discussed at a Library Evolution meeting and removed LEWGI Library Evolution Incubator labels Jan 23, 2023
@jensmaurer jensmaurer modified the milestones: 2022-telecon, 2023-02 Jan 25, 2023
@brycelelbach
Copy link

brycelelbach commented Feb 9, 2023

2023-02-07 19:30 to 22:00 Issaquah Library Evolution Meeting

P2728R0: Unicode in the Library, Part 1: UTF Transcoding

P2729R0: Unicode in the Library, Part 2: Normalization

2023-02-07 19:30 to 22:00 UTC-8 Issaquah Library Evolution Minutes

Champion: Zach Laine (IP)

Chair: Bryce Adelstein Lelbach (IP) & Ben Craig (IP)

Minute Taker: Robert Leahy (IP)

Start: 2023-02-07 19:41 UTC-8

Does this paper have:

  • Examples?
    • Yes
  • Field experience?
    • Based on Boost Text. There is no clean room implementation from specification.
  • Performance considerations?
    • Yes.
  • Discussion of prior art?
    • Yes.
  • Changes Library Evolution previously requested?
    • N/A - new paper.
  • Wording?
    • No.
  • Breaking changes?
    • No.
  • Feature test macro?
    • Yes.
  • Freestanding considered?
    • Yes.

Open Questions:

  • Should text facilities support null-terminated strings as input?
  • What should happen when ill-formed Unicode is encountered? Return the
    replacement character, throw an exception, or terminate?

Typo in P2728 section 2: "3 UTF-8 code units in sequence may encode a particular code unit" -> the second "code unit" should be "code point".

Typo in P2729 section 4.2: is_normalized calls in the examples should take the format.

Typo in P2729 section 5.2: Unicode versions should have types.

Why utf_8_to_16_iterator instead of utf8_to_16_iterator? Why not use a template parameter for the sizes?

Should formats be enumerators, or should each be its own trivial type?

Maybe the fast but verbose code example shouldn't be the first one in the paper.

Transcoding iterators should model the iterator category of the underlying iterator.

Unicode version should be queried with runtime functions, not constexpr variables.

Why use template parameters for normalization forms but not UTFs? I'd prefer consistency.

End: 21:56

Summary

We took an early look at P2728 and P2729, which propose Unicode facilities for the C++ Standard Library. The proposal includes both low level facilities which should have speed of light performance, and higher level facilities that are composable and easy to use (such as views and ranges).

Next Steps

Proceed with review and incubation in the Text and Unicode study group.

@brycelelbach
Copy link

@tahonermann please send this to Library Evolution when it's ready.

@brycelelbach brycelelbach removed LEWG Library Evolution ready-for-library-evolution-meeting-review This paper needs to be discussed at a Library Evolution meeting labels Feb 9, 2023
@jensmaurer jensmaurer modified the milestones: 2023-02, 2023-telecon Mar 31, 2023
@cor3ntin cor3ntin added the SG9 Ranges SG label Aug 22, 2023
@tahonermann
Copy link
Collaborator

SG16 review of this paper remains pending while SG16 iterates on P2728 (Unicode in the Library, Part 1: UTF Transcoding).

@inbal2l inbal2l added the ranges std::ranges label Oct 23, 2023
@jensmaurer jensmaurer modified the milestones: 2023-telecon, 2024-telecon Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B3 - addition Bucket 3 as described by P0592: material that is not mentioned in P0592 C++26 Targeted at C++26 IS Ship vehicle: IS ranges std::ranges SG9 Ranges SG SG16 Text processing size - large paper size estimate unicode
Projects
Development

No branches or pull requests

6 participants