Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't read pinyin characters from terminal #13668

Open
p5pRT opened this issue Mar 16, 2014 · 20 comments
Open

can't read pinyin characters from terminal #13668

p5pRT opened this issue Mar 16, 2014 · 20 comments

Comments

@p5pRT
Copy link

@p5pRT p5pRT commented Mar 16, 2014

Migrated from rt.perl.org#121450 (status was 'open')

Searchable as RT121450$

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Mar 16, 2014

From ntysdd@gmail.com

Created by ntysdd@gmail.com

Using strawberryperl portable under a simplified Chinese env.(CP936)
Found perl can't read pinyin chars properly from a terminal.

Example​:

perl -ne "print"
nǐtàiyánsù
n t iy ns

Chinese characters are OK.
Reading from a file using redirection is also OK.
Only terminal plus pinyin will get wrong.

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl 5.18.2:

Configured by strawberry-perl at Tue Jan  7 16:32:09 2014.

Summary of my perl5 (revision 5 version 18 subversion 2) configuration:

  Platform:
    osname=MSWin32, osvers=6.2, archname=MSWin32-x86-multi-thread-64int
    uname='Win32 strawberry-perl 5.18.2.1 #1 Tue Jan  7 16:30:36 2014 i386'
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags =' -s -O2 -DWIN32  -DPERL_TEXTMODE_SCRIPTS
-DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO
-fno-strict-aliasing -mms-bitfields',
    optimize='-s -O2',
    cppflags='-DWIN32'
    ccversion='', gccversion='4.7.3', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long long', ivsize=8, nvtype='double', nvsize=8,
Off_t='long long', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='g++.exe', ldflags ='-s -L"F:\mono\perl\perl\lib\CORE"
-L"F:\mono\perl\c\lib"'
    libpth=F:\mono\perl\c\lib F:\mono\perl\c\i686-w64-mingw32\lib
F:\mono\perl\c\lib\gcc\i686-w64-mingw32\4.7.3
    libs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32
-ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32
-lmpr -lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32
    perllibs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool
-lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid
-lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32
    libc=, so=dll, useshrplib=true, libperl=libperl518.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-mdll -s -L"F:\mono\perl\perl\lib\CORE"
-L"F:\mono\perl\c\lib"'

Locally applied patches:



@INC for perl 5.18.2:
    F:/mono/perl/perl/site/lib
    F:/mono/perl/perl/vendor/lib
    F:/mono/perl/perl/lib
    .


Environment for perl 5.18.2:
    HOME (unset)
    LANG=zh_CN
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=F:\mono\perl\perl\site\bin;F:\mono\perl\perl\bin;F:\mono\perl\c\bin;C:\Program
Files\Broadcom\Broadcom 802.11 Network
Adapter;;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program
Files\Windows Kits\8.1\Windows Performance Toolkit\;C:\Program
Files\Microsoft SQL Server\110\Tools\Binn\;C:\Program
Files\GNU\GnuPG\pub
    PERL_BADLANG (unset)
    SHELL (unset)

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jul 4, 2014

From @jkeenan

Can anyone familiar with CP936 reproduce this?

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jul 4, 2014

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jul 4, 2014

From @khwilliamson

I'm trying to understand this report. I am not familiar with CP936, but
I looked it up, and it is a one and two byte encoding. Perl supports
internally only single byte encodings, plus, starting in 5.20, UTF-8.
So this encoding shouldn't be expected to work in Perl. What one is
supposed to do is to use the Encode module to translate the encoding
into Perl's internal form on input, and transform back on output. An
example I found is http://www.perlmonks.org/?node_id=537416

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jul 4, 2014

From @ikegami

I'll see what I can find out tonight. Can you please provide the output of
the following in the meantime?

chcp & perl -MWin32 -MWin32​::Console -E"say for Win32​::GetACP(),
Win32​::GetOEMCP(), Win32​::Console->new(STD_INPUT_HANDLE)->InputCP(),
Win32​::Console->new(STD_OUTPUT_HANDLE)->OutputCP();"

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jul 5, 2014

From @ikegami

I haven't found anything that helps you. Still waiting on your feedback.
Would also like to see the output of perl -ne"printf qq{%v02X\n}, $_"
for that same input.

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jul 6, 2014

From ntysdd@gmail.com

活动代码页​: 936
936
936
936
936

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jul 6, 2014

From ntysdd@gmail.com

活动代码页​: 936
936
936
936
936

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jul 7, 2014

From @tonycoz

On Sun Mar 16 00​:41​:07 2014, ntysdd@​gmail.com wrote​:

Using strawberryperl portable under a simplified Chinese env.(CP936)
Found perl can't read pinyin chars properly from a terminal.

Example​:

perl -ne "print"
nǐtàiyánsù
n t iy ns

Chinese characters are OK.
Reading from a file using redirection is also OK.
Only terminal plus pinyin will get wrong.

I wonder if this is related to #13794

Tony

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jul 7, 2014

From @ikegami

On Mon, Jul 7, 2014 at 5​:14 AM, Tony Cook via RT <perlbug-followup@​perl.org>
wrote​:

On Sun Mar 16 00​:41​:07 2014, ntysdd@​gmail.com wrote​:

Using strawberryperl portable under a simplified Chinese env.(CP936)
Found perl can't read pinyin chars properly from a terminal.

Example​:

perl -ne "print"
nǐtàiyánsù
n t iy ns

Chinese characters are OK.
Reading from a file using redirection is also OK.
Only terminal plus pinyin will get wrong.

I wonder if this is related to
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121783

No. The non-ASCII chars are filtered out on or before input. It's not an
output issue.

The program is getting a NUL where the non-ASCII chars as suppose to be
(6E.00.74.00.69.79.00.6E.73.00.0A). I have no idea why.

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jul 8, 2014

From @khwilliamson

On 07/07/2014 09​:25 AM, Eric Brine wrote​:

On Mon, Jul 7, 2014 at 5​:14 AM, Tony Cook via RT
<perlbug-followup@​perl.org <mailto​:perlbug-followup@​perl.org>> wrote​:

On Sun Mar 16 00&#8203;:41&#8203;:07 2014\, ntysdd@&#8203;gmail\.com
\<mailto&#8203;:ntysdd@&#8203;gmail\.com> wrote&#8203;:
 > Using strawberryperl portable under a simplified Chinese env\.\(CP936\)
 > Found perl can't read pinyin chars properly from a terminal\.
 >
 > Example&#8203;:
 > > perl \-ne "print"
 > > nǐtàiyánsù
 > n t iy ns
 >
 > Chinese characters are OK\.
 > Reading from a file using redirection is also OK\.
 > Only terminal plus pinyin will get wrong\.

I wonder if this is related to
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121783

No. The non-ASCII chars are filtered out on or before input. It's not an
output issue.

The program is getting a NUL where the non-ASCII chars as suppose to be
(6E.00.74.00.69.79.00.6E.73.00.0A). I have no idea why.

I'm still having trouble grokking this issue. According to
http​://msdn.microsoft.com/en-US/goglobal/cc305153
CP936 is ASCII plus 0x80 means the EURO SIGN. 0xFF is undefined, and
0x81 - 0xFE start a two byte sequence that give various ideographs.

I don't understand what it might mean to input an accented Latin
character when it appears to me that the terminal is not set up to
understand them.

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jul 8, 2014

From @ikegami

On Tue, Jul 8, 2014 at 2​:57 PM, Karl Williamson <public@​khwilliamson.com>
wrote​:

I'm still having trouble grokking this issue.

If I enter "nitàiyánsù" into my cp850 terminal, I expect to get the cp850
encoding of those characters from STDIN, and I do.

perl -MEncode -Mcharnames=​:full -nlE"say sprintf '%v02X', $_; say
charnames​::viacode(ord) for split //, decode('cp850', $_);"
nitàiyánsù
6E.69.74.85.69.79.A0.6E.73.97
LATIN SMALL LETTER N
LATIN SMALL LETTER I
LATIN SMALL LETTER T
LATIN SMALL LETTER A WITH GRAVE
LATIN SMALL LETTER I
LATIN SMALL LETTER Y
LATIN SMALL LETTER A WITH ACUTE
LATIN SMALL LETTER N
LATIN SMALL LETTER S
LATIN SMALL LETTER U WITH GRAVE
^Z

He enters "nǐtàiyánsù" into his cp936 terminal. He expects to get the cp936
encoding of those characters from STDIN. He doesn't.

6E.A8.AB.74.A8.A4.69.79.A8.A2.6E.73.A8.B4 is what he expects to get
6E.00. 74.00. 69.79.00. 6E.73.00 is what he gets

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jul 8, 2014

From @khwilliamson

On 07/08/2014 02​:26 PM, Eric Brine wrote​:

On Tue, Jul 8, 2014 at 2​:57 PM, Karl Williamson <public@​khwilliamson.com
<mailto​:public@​khwilliamson.com>> wrote​:

I'm still having trouble grokking this issue\.

If I enter "nitàiyánsù" into my cp850 terminal, I expect to get the
cp850 encoding of those characters from STDIN, and I do.

perl -MEncode -Mcharnames=​:full -nlE"say sprintf '%v02X', $_; say
charnames​::viacode(ord) for split //, decode('cp850', $_);"
nitàiyánsù
6E.69.74.85.69.79.A0.6E.73.97
LATIN SMALL LETTER N
LATIN SMALL LETTER I
LATIN SMALL LETTER T
LATIN SMALL LETTER A WITH GRAVE
LATIN SMALL LETTER I
LATIN SMALL LETTER Y
LATIN SMALL LETTER A WITH ACUTE
LATIN SMALL LETTER N
LATIN SMALL LETTER S
LATIN SMALL LETTER U WITH GRAVE
^Z

He enters "nǐtàiyánsù" into his cp936 terminal. He expects to get the
cp936 encoding of those characters from STDIN. He doesn't.

6E.A8.AB.74.A8.A4.69.79.A8.A2.6E.73.A8.B4 is what he expects to get
6E.00. 74.00. 69.79.00. 6E.73.00 is what he gets

What I'm saying is there is no encoding in cp936 for those characters.

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jul 8, 2014

From @ikegami

On Tue, Jul 8, 2014 at 4​:48 PM, Karl Williamson <public@​khwilliamson.com>
wrote​:

On 07/08/2014 02​:26 PM, Eric Brine wrote​:

On Tue, Jul 8, 2014 at 2​:57 PM, Karl Williamson <public@​khwilliamson.com
<mailto​:public@​khwilliamson.com>> wrote​:

I'm still having trouble grokking this issue\.

If I enter "nitàiyánsù" into my cp850 terminal, I expect to get the
cp850 encoding of those characters from STDIN, and I do.

perl -MEncode -Mcharnames=​:full -nlE"say sprintf '%v02X', $_; say
charnames​::viacode(ord) for split //, decode('cp850', $_);"
nitàiyánsù
6E.69.74.85.69.79.A0.6E.73.97
LATIN SMALL LETTER N
LATIN SMALL LETTER I
LATIN SMALL LETTER T
LATIN SMALL LETTER A WITH GRAVE
LATIN SMALL LETTER I
LATIN SMALL LETTER Y
LATIN SMALL LETTER A WITH ACUTE
LATIN SMALL LETTER N
LATIN SMALL LETTER S
LATIN SMALL LETTER U WITH GRAVE
^Z

He enters "nǐtàiyánsù" into his cp936 terminal. He expects to get the
cp936 encoding of those characters from STDIN. He doesn't.

6E.A8.AB.74.A8.A4.69.79.A8.A2.6E.73.A8.B4 is what he expects to get
6E.00. 74.00. 69.79.00. 6E.73.00 is what he gets

What I'm saying is there is no encoding in cp936 for those characters.

$ perl -MEncode -E'use utf8; $_="nǐtàiyánsù"; say sprintf "%v02X", encode
"cp936", $_;'
6E.A8.AB.74.A8.A4.69.79.A8.A2.6E.73.A8.B4

Encode seems to think so?

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jul 8, 2014

From @ikegami

On Tue, Jul 8, 2014 at 5​:58 PM, Eric Brine <ikegami@​adaelis.com> wrote​:

On Tue, Jul 8, 2014 at 4​:48 PM, Karl Williamson <public@​khwilliamson.com>
wrote​:

On 07/08/2014 02​:26 PM, Eric Brine wrote​:

On Tue, Jul 8, 2014 at 2​:57 PM, Karl Williamson <public@​khwilliamson.com
<mailto​:public@​khwilliamson.com>> wrote​:

I'm still having trouble grokking this issue\.

If I enter "nitàiyánsù" into my cp850 terminal, I expect to get the
cp850 encoding of those characters from STDIN, and I do.

perl -MEncode -Mcharnames=​:full -nlE"say sprintf '%v02X', $_; say
charnames​::viacode(ord) for split //, decode('cp850', $_);"
nitàiyánsù
6E.69.74.85.69.79.A0.6E.73.97
LATIN SMALL LETTER N
LATIN SMALL LETTER I
LATIN SMALL LETTER T
LATIN SMALL LETTER A WITH GRAVE
LATIN SMALL LETTER I
LATIN SMALL LETTER Y
LATIN SMALL LETTER A WITH ACUTE
LATIN SMALL LETTER N
LATIN SMALL LETTER S
LATIN SMALL LETTER U WITH GRAVE
^Z

He enters "nǐtàiyánsù" into his cp936 terminal. He expects to get the
cp936 encoding of those characters from STDIN. He doesn't.

6E.A8.AB.74.A8.A4.69.79.A8.A2.6E.73.A8.B4 is what he expects to get
6E.00. 74.00. 69.79.00. 6E.73.00 is what he gets

What I'm saying is there is no encoding in cp936 for those characters.

$ perl -MEncode -E'use utf8; $_="nǐtàiyánsù"; say sprintf "%v02X", encode
"cp936", $_;'
6E.A8.AB.74.A8.A4.69.79.A8.A2.6E.73.A8.B4

Encode seems to think so?

And so does the page you linked earlier. Lead byte A8​:
http​://msdn.microsoft.com/en-US/goglobal/gg675289

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Dec 16, 2017

From zefram@fysh.org

The encoding should be pretty irrelevant for the test program given.
If this were Unix I'd ask to compare perl's behaviour against cat for
the same input, using strace to see what the programs actually get.
But being Windows, that kind of debugging isn't available. I think the
weird behaviour seen must be specific to Windows; it doesn't look like
Perl behaviour at all.

-zefram

@toddr
Copy link
Member

@toddr toddr commented Feb 13, 2020

From @tonycoz
I wonder if this is related to #13794

Tony

Which was just closed.

@ikegami
Copy link
Contributor

@ikegami ikegami commented Feb 14, 2020

From @tonycoz
I wonder if this is related to #13794
Tony

Which was just closed.

As previously stated, it's not related to #13794.

13794 was fixed in Win10.

This problem still happens.

C:\Users\ikegami>chcp 936
Active code page: 936

C:\Users\ikegami>echo nǐtàiyánsù
nǐtàiyánsù

C:\Users\ikegami>echo nǐtàiyánsù | perl -ne"print"
nǐtàiyánsù

C:\Users\ikegami>perl -ne"print"
nǐtàiyánsù     <- pasted in
n t iy ns
^Z

C:\Users\ikegami>echo nǐtàiyánsù | perl -ne"printf qq{%v02X\n}, $_"
6E.C7.90.74.C3.A0.69.79.C3.A1.6E.73.C3.B9.20.0A

C:\Users\ikegami>perl -ne"printf qq{%v02X\n}, $_"
nǐtàiyánsù
6E.00.74.00.69.79.00.6E.73.00.0A
^Z
@khwilliamson
Copy link
Contributor

@khwilliamson khwilliamson commented Feb 14, 2020

Thanks for this example.

What happens if in your paste example, you instead set a $scalar to it, and Devel::Peek Dump that scalar?

@ikegami
Copy link
Contributor

@ikegami ikegami commented Feb 16, 2020

Thanks for this example.

What happens if in your paste example, you instead set a $scalar to it, and Devel::Peek Dump that scalar?

As you would expect based on the printf %vX:

C:\Users\ikegami>perl -MDevel::Peek -wne"Dump($_)"
nǐtàiyánsù       <-- pasted in
SV = PV(0x114b8d8) at 0x27bbab0
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x27b55a8 "n\0t\0iy\0ns\0\n"\0
  CUR = 11
  LEN = 81
^Z
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants