Perl needs to normalize its identifiers #11573
Python runs its Unicode identifiers through NFD transforms, although Perl,
*You* cannot tell which one got entered, and *you* cannot see which is
How can this possibly not be a bug?
I get figure out a tie map for hashes to make this work right, so that your
Since this is something each user must take especially care to do "right"
Summary of my perl5 (revision 5 version 14 subversion 0) configuration:
Characteristics of this binary (from libperl):
On Thu, Aug 11, 2011 at 4:39 PM, tchrist1 <email@example.com> wrote:
Does Python use NFD? PEP 3131 recommends either NFC or NFKC, but I haven't
In any case, I agree that this needs to change, but I have doubts on how it
Tieing stashes is broken, so that won't do for the moment. Without giving it
Unrelated to the bug report, what does Python do with bidi control
"Brian Fraser via RT" <firstname.lastname@example.org> wrote
Sorry, you're right, it's NFC:
First screen is NFC screen
I was worried about how this plays with Apple's HSF+, given
I agree that it has to be just for identifiers, not string literals,
$nfd = "écran";
Those need to be distinct.
I think the solution for hashes should probably be a tie layer
I was kinda just kidding, because I did remember this.
Haven't looked at that. Bidi is ugly, since Perl stuff goes left to
And this one
points out that Java can get away with this because they have all these
seems as far as they got. I don't see any resolution. Too tired to
Hm, I wonder whether this has anything useful to say about the matter,
On Fri, Aug 12, 2011 at 02:10:55AM -0600, Tom Christiansen wrote:
Strictly it doesn't:
An implementation must not use the Unicode utilities implemented
It's a snapshot of NFD - I think even a snapshot of a late NFD *draft*.
Which I think was an issue Father C raised - Unicode evolves, therefore
This doesn't seem to be addressed at all in PEP 3131, so I'm assuming that
Does any language have a working implementation of normalised Unicode
Nicholas Clark <email@example.com> wrote
I usually hedge that by saying that it's quasi-NFD. I don't know any
Is the fear that an unassigned code point would later get assigned something
Unlike many other standards, the Unicode Standard is continually
In each new version of the Unicode Standard, the Unicode Consortium may add
Strong Normalization Stability
If a string contains only characters from a given version of Unicode, and it
More formally, given versions V and U of Unicode, and any string S
toNFCV(S) = toNFCU(S)
In particular, once a character is encoded, its canonical combining
Now, HSF+ came out in 1998, but the stability guarantee only applies to
I can't see that they've done anything about bidis.
What exactly do you mean by this? As I said, Python runs them
And in particular
In Python 3.2, « import héhé » doesn't work on Windows, but you can have non-ASCII paths in sys.path.
I fixed the import machinery to handle correctly non-ASCII characters
I plan to fix all these issues in Python 3.3: see #3080.
> Could you please make it clear in documentation and web pages,
What's New in Python 3.2 documentation has this sentence: "Python’s
Which web page should updated/fixed?
So I don't think they have it working in module names either. Besides
* Python does seem to do the IDS/IDC thing, so you might see idents
* Java I know to have filesystem issues, but Java also allows for
* In contrast Go does not seem to use IDS/IDC, because you get compiler
% 6g idents.go
% uniquote -x < idents.go
So it doesn't mind E9, but dislikes 301.
(BTW, I keep making errors in Python because of there being no strict
* I haven't poked at Ruby hard enough to know what it does here
% ruby ident.ruby
% uniquote -x < ident.ruby