Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update .NET 7 Unicode data to version 14.0.0 #44423

Closed
GrabYourPitchforks opened this issue Nov 9, 2020 · 5 comments · Fixed by #66362
Closed

Update .NET 7 Unicode data to version 14.0.0 #44423

GrabYourPitchforks opened this issue Nov 9, 2020 · 5 comments · Fixed by #66362
Labels
area-System.Globalization enhancement Product code improvement that does NOT require public API changes/additions
Milestone

Comments

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Nov 9, 2020

The Unicode Standard version 14.0.0 is tentatively scheduled for September 2021. As per usual, since the .NET runtime carries a copy of Unicode-derived data, we should update our data files to match version 14.0.0 when it's released.

This will affect the following APIs:

  • System.Globalization.StringInfo
  • System.Globalization.CharUnicodeInfo
  • System.Text.Encodings.Web.*
  • System.Text.Json.* (since it depends on System.Text.Encodings.Web)

For instructions on how to update the runtime-carried Unicode data files, consult the GenUnicodeProp docs and the STEW docs. Also update the UnicodeUcdVersion data throughout our .csproj files (see samples).

See #2378 for the changes we made for Unicode 13.0.0 in .NET 5.

We should also keep an eye out for any changes to UAX#29 that might be part of the Unicode 14.0.0 wave. Our tools will automatically pick up any changes to a code point's Grapheme_Cluster_Break property, but if the algorithm in Sec. 3.1.1 changes as part of Unicode 14.0.0 then we may need to update the logic in TextSegmentationUtility.cs.

@GrabYourPitchforks GrabYourPitchforks added this to the 6.0.0 milestone Nov 9, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Nov 9, 2020
@ghost
Copy link

ghost commented Nov 9, 2020

Tagging subscribers to this area: @tarekgh, @safern, @krwq
See info in area-owners.md if you want to be subscribed.


Issue meta data

Issue content: The Unicode Standard version __14.0.0__ [is tentatively scheduled](https://home.unicode.org/unicode-14-0-delayed-for-6-months/) for September 2021. As per usual, since the .NET runtime carries a copy of Unicode-derived data, we should update our data files to match version 14.0.0 when it's released.

This will affect the following APIs:

  • System.Globalization.StringInfo
  • System.Globalization.CharUnicodeInfo
  • System.Text.Encodings.Web.*
  • System.Text.Json.* (since it depends on System.Text.Encodings.Web)

For instructions on how to update the runtime-carried Unicode data files, consult the GenUnicodeInfo docs and the STEW docs. Also update the UnicodeUcdVersion data throughout our .csproj files (see samples).

See #2378 for the changes we made for Unicode 13.0.0 in .NET 5.

We should also keep an eye out for any changes to UAX#29 that might be part of the Unicode 14.0.0 wave. Our tools will automatically pick up any changes to a code point's Grapheme_Cluster_Break property, but if the algorithm in Sec. 3.1.1 changes as part of Unicode 14.0.0 then we may need to update the logic in TextSegmentationUtility.cs.

Issue author: GrabYourPitchforks
Assignees: -
Milestone: [object Object]

@tarekgh tarekgh added enhancement Product code improvement that does NOT require public API changes/additions and removed untriaged New issue has not been triaged by the area owner labels Nov 9, 2020
@tarekgh
Copy link
Member

tarekgh commented Jul 19, 2021

@GrabYourPitchforks just checkin, are you planning for doing that soon?

@GrabYourPitchforks
Copy link
Member Author

Moving this to 7.0 so that the dates line up correctly.

@GrabYourPitchforks GrabYourPitchforks modified the milestones: 6.0.0, 7.0.0 Jul 21, 2021
@ghost ghost moved this from 6.0.0 to Untriaged in ML, Extensions, Globalization, etc, POD. Jul 21, 2021
@maryamariyan maryamariyan moved this from Untriaged to 7.0.0 in ML, Extensions, Globalization, etc, POD. Aug 4, 2021
@GrabYourPitchforks GrabYourPitchforks changed the title Update .NET 6 Unicode data to version 14.0.0 Update .NET 7 Unicode data to version 14.0.0 Aug 17, 2021
@GrabYourPitchforks
Copy link
Member Author

Now that we're within a month of Unicode 14.0's release, I gave https://unicode.org/versions/Unicode14.0.0/ another look. There's a new block Arabic Extended-B being added to the BMP. Our ingestion tools will automatically create a new API to support this block, so I opened #57609 to track the API review process for it.

We're still waiting for the PDFs to be published in case there were any changes to Sec. 5.8 (which controls string.ReplaceLineEndings). So far we're still good on UAX#29 (which controls StringInfo).

@tarekgh
Copy link
Member

tarekgh commented Jan 12, 2022

@GrabYourPitchforks what is remaining to do here?

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 8, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 10, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Apr 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Globalization enhancement Product code improvement that does NOT require public API changes/additions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants