-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perl needs to normalize its identifiers #11573
Comments
From tchrist@perl.comPython runs its Unicode identifiers through NFD transforms, although Perl, *You* cannot tell which one got entered, and *you* cannot see which is How can this possibly not be a bug? I get figure out a tie map for hashes to make this work right, so that your Since this is something each user must take especially care to do "right" --tom Summary of my perl5 (revision 5 version 14 subversion 0) configuration: Characteristics of this binary (from libperl): |
From @HugmeirOn Thu, Aug 11, 2011 at 4:39 PM, tchrist1 <perlbug-followup@perl.org> wrote:
Does Python use NFD? PEP 3131 recommends either NFC or NFKC, but I haven't In any case, I agree that this needs to change, but I have doubts on how it
Tieing stashes is broken, so that won't do for the moment. Without giving it Unrelated to the bug report, what does Python do with bidi control |
The RT System itself - Status changed from 'new' to 'open' |
From tchrist@perl.com"Brian Fraser via RT" <perlbug-followup@perl.org> wrote
Sorry, you're right, it's NFC: #!/usr/bin/env python3.2 print out First screen is NFC screen I was worried about how this plays with Apple's HSF+, given
I agree that it has to be just for identifiers, not string literals, $nfd = "écran"; Those need to be distinct. I think the solution for hashes should probably be a tie layer
I was kinda just kidding, because I did remember this.
Haven't looked at that. Bidi is ugly, since Perl stuff goes left to Interesting:
And this one http://mail.python.org/pipermail/python-3000/2007-May/007725.html points out that Java can get away with this because they have all these This http://mail.python.org/pipermail/python-3000/2007-May/007833.html seems as far as they got. I don't see any resolution. Too tired to Hm, I wonder whether this has anything useful to say about the matter, http://www.w3.org/International/iri-edit/draft-duerst-iri-05.txt --tom |
From @nwc10On Fri, Aug 12, 2011 at 02:10:55AM -0600, Tom Christiansen wrote:
Strictly it doesn't: http://developer.apple.com/library/mac/technotes/tn/tn1150.html#UnicodeSubtleties IMPORTANT: An implementation must not use the Unicode utilities implemented It's a snapshot of NFD - I think even a snapshot of a late NFD *draft*. Which I think was an issue Father C raised - Unicode evolves, therefore This doesn't seem to be addressed at all in PEP 3131, so I'm assuming that
Shame. Does any language have a working implementation of normalised Unicode Nicholas Clark |
From tchrist@perl.comNicholas Clark <nick@ccl4.org> wrote
...
I usually hedge that by saying that it's quasi-NFD. I don't know any
Is the fear that an unassigned code point would later get assigned something http://unicode.org/policies/stability_policy.html Unlike many other standards, the Unicode Standard is continually In each new version of the Unicode Standard, the Unicode Consortium may add ... Normalization Stability Strong Normalization Stability If a string contains only characters from a given version of Unicode, and it More formally, given versions V and U of Unicode, and any string S toNFCV(S) = toNFCU(S) In particular, once a character is encoded, its canonical combining Now, HSF+ came out in 1998, but the stability guarantee only applies to
I can't see that they've done anything about bidis.
What exactly do you mean by this? As I said, Python runs them http://bugs.python.org/issue11230 And in particular http://bugs.python.org/msg128724 which reads: Short answer: In Python 3.2, « import héhé » doesn't work on Windows, but you can have non-ASCII paths in sys.path. Longer answer: I fixed the import machinery to handle correctly non-ASCII characters I plan to fix all these issues in Python 3.3: see #3080. -- > Could you please make it clear in documentation and web pages, What's New in Python 3.2 documentation has this sentence: "Python’s Which web page should updated/fixed? So I don't think they have it working in module names either. Besides * Python does seem to do the IDS/IDC thing, so you might see idents * Java I know to have filesystem issues, but Java also allows for * In contrast Go does not seem to use IDS/IDC, because you get compiler % 6g idents.go % uniquote -x < idents.go So it doesn't mind E9, but dislikes 301. (BTW, I keep making errors in Python because of there being no strict * I haven't poked at Ruby hard enough to know what it does here % ruby ident.ruby % uniquote -x < ident.ruby --tom |
Migrated from rt.perl.org#96814 (status was 'open')
Searchable as RT96814$
The text was updated successfully, but these errors were encountered: