Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code and documentation is inconsistent about naming UTF- 8 #15428

p5pRT opened this issue Jul 6, 2016 · 5 comments

Code and documentation is inconsistent about naming UTF- 8 #15428

p5pRT opened this issue Jul 6, 2016 · 5 comments


Copy link

p5pRT commented Jul 6, 2016

Migrated from (status was 'stalled')

Searchable as RT128559$

Copy link

p5pRT commented Jul 6, 2016


Created by

When discussing Perl internals there are valid reasons for talking about
`utf8`, but it is inexcusable when the meaning is `UTF-8`

Anything other variations, such as `utf-8` or `UTF8` should be meaningless,
although of course `Encode` must retain the aliases

I see no reason for the documentation to mention `utf8` anywhere, except as
the name of the pragma or the PerlIO layer.

In theory, Perl 5 should be able to change the internal representation of
its strings just as it does the `sort` algorithm, but we are way past that
possibility now

The docs should at least discourage discussion of the specific encoding
that is used, and talk instead about strings as containing "Unicode
characters" without any mention of encoding

Perl Info


Site configuration information for perl 5.24.0:

Configured by strawberry-perl at Tue May 10 21:33:22 2016.

Summary of my perl5 (revision 5 version 24 subversion 0) configuration:

    osname=MSWin32, osvers=6.3, archname=MSWin32-x64-multi-thread
    uname='Win32 strawberry-perl #1 Tue May 10 21:30:49 2016 x64'
    hint=recommended, useposix=true, d_sigaction=undef
    useithreads=define, usemultiplicity=define
    use64bitint=define, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
    cc='gcc', ccflags =' -s -O2 -DWIN32 -DWIN64 -DCONSERVATIVE
-fwrapv -fno-strict-aliasing -mms-bitfields',
    optimize='-s -O2',
    ccversion='', gccversion='4.9.2', gccosandvers=''
    intsize=4, longsize=4, ptrsize=8, doublesize=8, byteorder=12345678,
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16,
    ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='long
long', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='g++', ldflags ='-s -L"C:\strawberry\perl\lib\CORE"
    libpth=C:\strawberry\c\lib C:\strawberry\c\x86_64-w64-mingw32\lib
    libs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32
-ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr
-lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32
    perllibs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32
-ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr
-lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32
    libc=, so=dll, useshrplib=true, libperl=libperl524.a
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=xs.dll, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-mdll -s -L"C:\strawberry\perl\lib\CORE"

@INC for perl 5.24.0:

Environment for perl 5.24.0:
    HOME (unset)
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=C:\PHP\;C:\ProgramData\Oracle\Java\javapath;C:\Program Files
    PERL_BADLANG (unset)
    SHELL (unset)

You have finished composing your message. At this point, you have
a few options. You can:

    * [Se]nd the message to,
    * [D]isplay the message on the screen,
    * [R]e-edit the message
    * Display or change the message's [su]bject
    * Save the message to a [f]ile to mail at another time
    * [Q]uit without sending a message

Action (Send/Display/Edit/Subject/Save to File):

Copy link

p5pRT commented Dec 14, 2016

From @khwilliamson

On 07/06/2016 08​:05 AM, Rob Dixon (via RT) wrote​:

# New Ticket Created by Rob Dixon
# Please include the string​: [perl #128559]
# in the subject line of all future correspondence about this issue.
# <URL​: >

This is a bug report for perl from the.rob.dixon@​,
generated with the help of perlbug 1.40 running under perl 5.24.0.

So, we should search the pods, and replace all occurrences of 'utf8'
with UTF-8, at a minimum?

It's long been confusing as well that Perl has an extended form of
UTF-8, the most limited version of which allows one to encode just the
code points 0..0x10FFFF, excluding surrogates.

Some people use the term UTF-X for perl's, but that is confusing to me,
as there is UTF16 (of various endianness) and UTF32. I've lately been
thinking we should standardize on "UTF-8X" when we mean perl's extension
to UTF-8.

There are places in the documentation where we could get away from
naming the internal format, but there are places where it is essential,
and places that are gray areas. I'm too close to the implementation to
be a good judge. I think if you want anything to actually get done
along these lines, that you'll have to submit a patch for discussion.

Copy link

p5pRT commented Dec 14, 2016

The RT System itself - Status changed from 'new' to 'open'

Copy link

p5pRT commented Apr 8, 2017

From @khwilliamson

In the absence of a patch from the OP, I'm marking this as stalled
Karl Williamson

Copy link

p5pRT commented Apr 8, 2017

@khwilliamson - Status changed from 'open' to 'stalled'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

No branches or pull requests

2 participants