Add unicodedata.east_asian_width#4523
Merged
youknowone merged 2 commits intoRustPython:mainfrom Feb 19, 2023
Merged
Conversation
d5857f0 to
38eaa75
Compare
youknowone
approved these changes
Feb 19, 2023
Member
youknowone
left a comment
There was a problem hiding this comment.
Looks great. Thank you. Do you have more concerns not to make this ready?
38eaa75 to
7661fab
Compare
Contributor
Author
I've updated the PR (as in commit 7661fab) and made it ready. Please help to merge it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #4522
This PR is a draft for further discussion.Update: In this PR, we've adopted
ucdcrate to implementunicodedata.east_asian_widthusing database v9.0.0, while leavingunicodedata.ucd_3_2_0.unidata_versionreferring tounicodedata.east_asian_widthas is for now.Python Document
https://docs.python.org/3/library/unicodedata.html#unicodedata.east_asian_width
Expectation (as in CPython 3.11)
Status
unicodedata.east_asian_widthis supported by usingucdcrateunicodedata.ucd_3_2_0.east_asian_widthis not working due to 9.0 change as in test https://github.com/RustPython/RustPython/blob/main/Lib/test/test_unicodedata.py#L264Reason
ucdcrate uses Unicode Character Database (v9.0.0)self.assertEqual(self.db.ucd_3_2_0.east_asian_width('\u231a'), 'N')WExtra Information
uniccrate uses v10.0.0 https://github.com/open-i18n/rust-unic/blob/master/unic/ucd/age/tables/unicode_version.rsv#L3unicodedata.ucd_3_2_0.*seem using v10.0.0 instead of v3.2.0 https://github.com/RustPython/RustPython/blob/main/stdlib/src/unicodedata.rs#L68Possible Solution
unidata_version, including v3.2.0 and another major version (say, v14.0.0).icucrate may be a good option to load dynamic database https://github.com/unicode-org/icu4x/blob/main/components/properties/src/maps.rs#L376unicodedatamodule to use database determined byunidata_version.Example Commits
ucdcrate: 420240cicucrate: 38eaa75Related Tests
cargo run --release -- -m test test_unicodedata -vcargo run --release -- extra_tests/snippets/builtin_str_unicode.py