Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codepage GB18030 does not implement the latest version of the standard GB18030-2022 #91068

Open
hpetith opened this issue Aug 24, 2023 · 2 comments

Comments

@hpetith
Copy link

hpetith commented Aug 24, 2023

The latest version GB18030-2022 specifies three implementation levels building on each other. When testing my application on .NET 6, 7 and 8 for compliance with implementation levels 1 and 2, I found that there is only one missing bit that is not yet fulfilled by the current implementation in GB18030Encoding.cs. All other extensions mandated by GB18030-2022 implementation levels 1&2 perfectly work out of the box.

In detail, GB18030-2022 changes a set of code mappings to no longer point to private use area PUA, but rather to codes standardized by Unicode in the meantime. The changed mappings are nicely described in this blog post, section "No PUA Requirement". The Unicode consortium has a pragmatic proposal to implement the changed mappings only into one direction, for ease of transcoding into the standard.

With the missing bit implemented, the .NET codepage would be fully compliant with GB18030-2022 implementation levels 1&2.

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Aug 24, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Aug 24, 2023
@vcsjones vcsjones added area-System.Globalization and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Aug 24, 2023
@ghost
Copy link

ghost commented Aug 24, 2023

Tagging subscribers to this area: @dotnet/area-system-globalization
See info in area-owners.md if you want to be subscribed.

Issue Details

The latest version GB18030-2022 specifies three implementation levels building on each other. When testing my application on .NET 6, 7 and 8 for compliance with implementation levels 1 and 2, I found that there is only one missing bit that is not yet fulfilled by the current implementation in GB18030Encoding.cs. All other extensions mandated by GB18030-2022 implementation levels 1&2 perfectly work out of the box.

In detail, GB18030-2022 changes a set of code mappings to no longer point to private use area PUA, but rather to codes standardized by Unicode in the meantime. The changed mappings are nicely described in this blog post, section "No PUA Requirement". The Unicode consortium has a pragmatic proposal to implement the changed mappings only into one direction, for ease of transcoding into the standard.

With the missing bit implemented, the .NET codepage would be fully compliant with GB18030-2022 implementation levels 1&2.

Author: hpetith
Assignees: -
Labels:

area-System.Globalization, untriaged

Milestone: -

@ghost
Copy link

ghost commented Aug 24, 2023

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

Issue Details

The latest version GB18030-2022 specifies three implementation levels building on each other. When testing my application on .NET 6, 7 and 8 for compliance with implementation levels 1 and 2, I found that there is only one missing bit that is not yet fulfilled by the current implementation in GB18030Encoding.cs. All other extensions mandated by GB18030-2022 implementation levels 1&2 perfectly work out of the box.

In detail, GB18030-2022 changes a set of code mappings to no longer point to private use area PUA, but rather to codes standardized by Unicode in the meantime. The changed mappings are nicely described in this blog post, section "No PUA Requirement". The Unicode consortium has a pragmatic proposal to implement the changed mappings only into one direction, for ease of transcoding into the standard.

With the missing bit implemented, the .NET codepage would be fully compliant with GB18030-2022 implementation levels 1&2.

Author: hpetith
Assignees: -
Labels:

area-System.Text.Encoding, untriaged

Milestone: -

@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants