Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The :crlf PerlIO layer doesn't like the :encoding layer. #8325

Closed
p5pRT opened this issue Feb 7, 2006 · 11 comments

Comments

@p5pRT
Copy link
Collaborator

@p5pRT p5pRT commented Feb 7, 2006

Migrated from rt.perl.org#38456 (status was 'resolved')

Searchable as RT38456$

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Feb 7, 2006

From ciaran@tnauk.org.uk

Created by ciaran@tnauk.org.uk

(note​: This report was compiled with perlbug, then sent manually with
Thunderbird as perlbug had trouble. However, I've made sure that any
long lines that matter are NOT wrapped.)

There seems to be a problem in the way that the "​:crlf" PerlIO layer is
implemented. I'm not entirely sure where the problem lies in :crlf, but
it doesn't play nicely with "​:encoding". I have observed this behaviour in​:

* Perl v5.8.7, Cygwin (prepackaged)
* Perl v5.8.7, Gentoo Linux (dev-lang/perl-5.8.7-r3)
* Perl v5.9.1, Cygwin (compiled)

Here is a test case​:

===CUT HERE===
#!/usr/bin/perl -w

open(FILE, ">​:encoding(UTF-16)​:crlf", "test-file");
print FILE "Test \x{A3}45!\n";
print FILE "Test!\n";
close(FILE);
===CUT HERE===

This should generate a valid UTF-16 file , containing "Test
£45!\r\nTest!\r\n", where the \r and \n characters represent \015 and
\012 respectively, and £ represents the character U+00A3, the British
pound symbol.

On a standard version of Perl, what actually happens is that I get a
message​:

Malformed UTF-8 character (unexpected continuation byte 0xa3, with no
preceding start byte) in null operation at ./utf16-test.pl line 6.

and the pound sign is replaced with a null character. Using a literal
(not interpolated) UTF-8 character sequence for U+00A3 (ie. C2 A3) in
the file seems to work fine. As noted above, while I am submitting this
report from Perl v5.8.7, I have compiled Perl v5.9.1 on Cygwin and
observed the same behaviour. The output of "perl5.9.1 -V" is included at
the end of the user-modifiable section.

One workaround seems to be to install PerlIO​::eol from CPAN and replace
the "​:crlf" layer with "​:eol(CRLF)". This works correctly, and test-file
contains what it should.

I have not mentioned ActiveState Perl in this report as I assume it is
not your responsibility. (For the record, ActiveState Perl v5.8.7 seems
to treat the layers as if they were reversed, so the inserted CR doesn't
get encoded, and thus corrupts the UTF-16 file by inserting a
single-byte character instead of a double-byte character.)

Thank you. Following is the output of "perl5.9.1 -V"​:

===CUT HERE===
ciaran@​IT-16​:~/utf8$ perl5.9.1 -V
Summary of my perl5 (revision 5 version 9 subversion 1) configuration​:
  Platform​:
  osname=cygwin, osvers=1.5.19(0.15042), archname=cygwin
  uname='cygwin_nt-5.1 it-16 1.5.19(0.15042) 2006-01-20 13​:28 i686
cygwin '
  config_args='-Dmksymlinks'
  hint=recommended, useposix=true, d_sigaction=define
  usethreads=undef useithreads=undef usemultiplicity=undef
  useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
  use64bitint=undef use64bitall=undef uselongdouble=undef
  usemymalloc=y, bincompat5005=undef
  Compiler​:
  cc='gcc', ccflags ='-DPERL_USE_SAFE_PUTENV -fno-strict-aliasing',
  optimize='-O2',
  cppflags='-DPERL_USE_SAFE_PUTENV -fno-strict-aliasing'
  ccversion='', gccversion='3.4.4 (cygming special) (gdc 0.12, using
dmd 0.125)', gccosandvers=''
  intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
  d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
  ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
  alignbytes=8, prototype=define
  Linker and Libraries​:
  ld='ld2', ldflags =' -L/usr/local/lib'
  libpth=/usr/local/lib /usr/lib /lib
  libs=-lgdbm -lcrypt -lgdbm_compat
  perllibs=-lcrypt -lgdbm_compat
  libc=/usr/lib/libc.a, so=dll, useshrplib=true, libperl=libperl.a
  gnulibc_version=''
  Dynamic Linking​:
  dlsrc=dl_dlopen.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
  cccdlflags=' ', lddlflags=' -L/usr/local/lib'

Characteristics of this binary (from libperl)​:
  Compile-time options​: USE_LARGE_FILES
  Built under cygwin
  Compiled at Feb 7 2006 09​:58​:18
  %ENV​:
  PERL5LIB="C​:/HATS/scripts/lib"
  CYGWIN=""
  @​INC​:
  C
  /HATS/scripts/lib
  /usr/local/lib/perl5/5.9.1/cygwin
  /usr/local/lib/perl5/5.9.1
  /usr/local/lib/perl5/site_perl/5.9.1/cygwin
  /usr/local/lib/perl5/site_perl/5.9.1
  /usr/local/lib/perl5/site_perl
  .
===CUT HERE===

Perl Info

Flags:
     category=core
     severity=medium

Site configuration information for perl v5.8.7:

Configured by gerrit at Fri Dec 30 02:40:15     2005.

Summary of my perl5 (revision 5 version 8 subversion 7) configuration:
   Platform:
     osname=cygwin, osvers=1.5.18(0.13242), 
archname=cygwin-thread-multi-64int
     uname='cygwin_nt-5.1 inspiron 1.5.18(0.13242) 2005-07-02 20:30 i686 
unknown unknown cygwin '
     config_args='-de -Dmksymlinks -Duse64bitint -Dusethreads 
-Uusemymalloc -Doptimize=-O3 -Dman3ext=3pm -Dusesitecustomize'
     hint=recommended, useposix=true, d_sigaction=define
     usethreads=define use5005threads=undef useithreads=define 
usemultiplicity=define
     useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
     use64bitint=define use64bitall=undef uselongdouble=undef
     usemymalloc=n, bincompat5005=undef
   Compiler:
     cc='gcc', ccflags ='-DPERL_USE_SAFE_PUTENV -fno-strict-aliasing 
-pipe -I/usr/local/include',
     optimize='-O3',
     cppflags='-DPERL_USE_SAFE_PUTENV -fno-strict-aliasing -pipe 
-I/usr/local/include'
     ccversion='', gccversion='3.4.4 (cygming special) (gdc 0.12, using 
dmd 0.125)', gccosandvers=''
     intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
     d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
     ivtype='long long', ivsize=8, nvtype='double', nvsize=8, 
Off_t='off_t', lseeksize=8
     alignbytes=8, prototype=define
   Linker and Libraries:
     ld='ld2', ldflags =' -s -L/usr/local/lib'
     libpth=/usr/local/lib /lib /usr/lib
     libs=-lgdbm -ldb -lcrypt -lgdbm_compat
     perllibs=-lcrypt -lgdbm_compat
     libc=/usr/lib/libc.a, so=dll, useshrplib=true, libperl=libperl.a
     gnulibc_version=''
   Dynamic Linking:
     dlsrc=dl_dlopen.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' -s'
     cccdlflags=' ', lddlflags=' -s -L/usr/local/lib'

Locally applied patches:
     SPRINTF0 - fixes for sprintf formatting issues - CVE-2005-3962


@INC for perl v5.8.7:
     C
     /HATS/scripts/lib
     /usr/lib/perl5/5.8/cygwin
     /usr/lib/perl5/5.8
     /usr/lib/perl5/site_perl/5.8/cygwin
     /usr/lib/perl5/site_perl/5.8
     /usr/lib/perl5/site_perl/5.8/cygwin
     /usr/lib/perl5/site_perl/5.8
     /usr/lib/perl5/vendor_perl/5.8/cygwin
     /usr/lib/perl5/vendor_perl/5.8
     /usr/lib/perl5/vendor_perl/5.8/cygwin
     /usr/lib/perl5/vendor_perl/5.8
     .


Environment for perl v5.8.7:
     HOME=/home/ciaran
     LANG=en_GB
     LANGUAGE (unset)
     LD_LIBRARY_PATH (unset)
     LOGDIR (unset)
 
PATH=/home/ciaran/bin:/usr/bin:/usr/local/bin:/bin:/usr/X11R6/bin:/cygdrive/c/Perl/bin/:/cygdrive/c/PHP/:/cygdrive/c/swig:/cygdrive/c/gnuwin/gnuwin32/bin:/cygdrive/c/WINDOWS/system32:/cygdrive/c/WINDOWS:/cygdrive/c/WINDOWS/System32/Wbem:/cygdrive/c/PROGRA~1/ULTRAE~1:/cygdrive/c/Program 
Files/MySQL/MySQL Server 5.0/bin:/cygdrive/c/Program 
Files/Subversion/bin:/cygdrive/c/Program Files/Common 
Files/GTK/2.0/bin:/cygdrive/c/Program Files/ActiveState Komodo 
3.1/:/cygdrive/c/binpath:/usr/bin
     PERL5LIB=C:/HATS/scripts/lib
     PERL_BADLANG (unset)
     SHELL (unset)

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Jan 20, 2011

From @Leont

I've attached a possible fix for this bug. The patch is relative to my
patches for #82484 but it doesn't depend on it.

Currently, if a «​:crlf» layer is given it will first try to (re)enable
any crlf layer it can find or else push itself on the stack. This can
lead to data corruption. In this patch I've changed it to only check the
topmost layer.

Testing it would be very welcome, in particular on Windows (not my habitat).

Leon

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Jan 20, 2011

From @Leont

0001-binmode-FH-crlf-only-modifies-top-crlf-layer.patch
From d65510200bc7d39254ff62c8d4240280dd988276 Mon Sep 17 00:00:00 2001
From: Leon Timmermans <fawaka@gmail.com>
Date: Thu, 20 Jan 2011 23:32:28 +0100
Subject: [PATCH] binmode FH, ":crlf" only modifies top crlf layer

When pushed on top of the stack, crlf will no longer enable crlf layers
lower in the stack. This will prevent unexpected results.
---
 perlio.c      |    5 ++---
 t/io/layers.t |   18 +++++++++++++++---
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/perlio.c b/perlio.c
index 3ce31d1..6ce142e 100644
--- a/perlio.c
+++ b/perlio.c
@@ -4513,7 +4513,7 @@ PerlIOCrlf_pushed(pTHX_ PerlIO *f, const char *mode, SV *arg, PerlIO_funcs *tab)
        * any given moment at most one CRLF-capable layer being enabled
        * in the whole layer stack. */
 	 PerlIO *g = PerlIONext(f);
-	 while (PerlIOValid(g)) {
+	 if (PerlIOValid(g)) {
 	      PerlIOl *b = PerlIOBase(g);
 	      if (b && b->tab == &PerlIO_crlf) {
 		   if (!(b->flags & PERLIO_F_CRLF))
@@ -4521,8 +4521,7 @@ PerlIOCrlf_pushed(pTHX_ PerlIO *f, const char *mode, SV *arg, PerlIO_funcs *tab)
 		   S_inherit_utf8_flag(g);
 		   PerlIO_pop(aTHX_ f);
 		   return code;
-	      }		  
-	      g = PerlIONext(g);
+	      }
 	 }
     }
     S_inherit_utf8_flag(f);
diff --git a/t/io/layers.t b/t/io/layers.t
index dea3d09..b0bcf1e 100644
--- a/t/io/layers.t
+++ b/t/io/layers.t
@@ -43,7 +43,7 @@ if (${^UNICODE} & 1) {
 } else {
     $UTF8_STDIN = 0;
 }
-my $NTEST = 45 - (($DOSISH || !$FASTSTDIO) ? 7 : 0) - ($DOSISH ? 5 : 0)
+my $NTEST = 55 - (($DOSISH || !$FASTSTDIO) ? 7 : 0) - ($DOSISH ? 7 : 0)
     + $UTF8_STDIN;
 
 sub PerlIO::F_UTF8 () { 0x00008000 } # from perliol.h
@@ -105,7 +105,7 @@ SKIP: {
 	    # 5 tests potentially skipped because
 	    # DOSISH systems already have a CRLF layer
 	    # which will make new ones not stick.
-	    @$expected = grep { $_ ne 'crlf' } @$expected;
+	    splice @$expected, 1, 1 if $expected->[1] eq 'crlf';
 	}
 	my $n = scalar @$expected;
 	is(scalar @$result, $n, "$id - layers == $n");
@@ -132,13 +132,25 @@ SKIP: {
 	  [ qw(stdio crlf) ],
 	  "open :crlf");
 
+    binmode(F, ":crlf");
+
+    check([ PerlIO::get_layers(F) ],
+	  [ qw(stdio crlf) ],
+	  "binmode :crlf");
+
     binmode(F, ":encoding(cp1047)"); 
 
     check([ PerlIO::get_layers(F) ],
 	  [ qw[stdio crlf encoding(cp1047) utf8] ],
 	  ":encoding(cp1047)");
+
+    binmode(F, ":crlf");
+
+    check([ PerlIO::get_layers(F) ],
+	  [ qw[stdio crlf encoding(cp1047) utf8 crlf utf8] ],
+	  ":encoding(cp1047):crlf");
     
-    binmode(F, ":pop");
+    binmode(F, ":pop:pop");
 
     check([ PerlIO::get_layers(F) ],
 	  [ qw(stdio crlf) ],
-- 
1.7.1

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Jan 20, 2011

The RT System itself - Status changed from 'new' to 'open'

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Jan 28, 2011

From @cpansprout

On Thu Jan 20 15​:04​:30 2011, LeonT wrote​:

I've attached a possible fix for this bug. The patch is relative to my
patches for #82484 but it doesn't depend on it.

Currently, if a «​:crlf» layer is given it will first try to (re)enable
any crlf layer it can find or else push itself on the stack. This can
lead to data corruption. In this patch I've changed it to only check the
topmost layer.

Testing it would be very welcome, in particular on Windows (not my
habitat).

Leon

Thank you. I have just applied this as 7826b36. Let’s see whether it
breaks Windows. :-)

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Jan 28, 2011

@cpansprout - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this Jan 28, 2011
@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Mar 25, 2011

From @Leont

On Fri, Jan 28, 2011 at 6​:43 AM, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Thu Jan 20 15​:04​:30 2011, LeonT wrote​:

I've attached a possible fix for this bug. The patch is relative to my
patches for #82484 but it doesn't depend on it.

Currently, if a «​:crlf» layer is given it will first try to (re)enable
any crlf layer it can find or else push itself on the stack. This can
lead to data corruption. In this patch I've changed it to only check the
topmost layer.

Testing it would be very welcome, in particular on Windows (not my
habitat).

Leon

Thank you. I have just applied this as 7826b36. Let’s see whether it
breaks Windows. :-)

Comment and documentation updates with regard to the previous patch.
No code changes.

Leon

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Mar 25, 2011

From @Leont

0001-Update-comment-for-PerlIOCrlf_pushed-wrt-7826b36.patch
From 27e349d88293e66ca5dcaaa21f258e31fc51cc15 Mon Sep 17 00:00:00 2001
From: Leon Timmermans <fawaka@gmail.com>
Date: Fri, 25 Mar 2011 12:55:46 +0100
Subject: [PATCH 1/2] Update comment for PerlIOCrlf_pushed wrt 7826b36

7826b36fbbf24cfa659558ee5af3de424faa2d5a changed the behavior of
PerlIOCrlf_pushed but its comment wasn't updated along with it.
---
 perlio.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/perlio.c b/perlio.c
index 42bdb84..f2f8729 100644
--- a/perlio.c
+++ b/perlio.c
@@ -4544,10 +4544,8 @@ PerlIOCrlf_pushed(pTHX_ PerlIO *f, const char *mode, SV *arg, PerlIO_funcs *tab)
 		 PerlIOBase(f)->flags);
 #endif
     {
-      /* Enable the first CRLF capable layer you can find, but if none
-       * found, the one we just pushed is fine.  This results in at
-       * any given moment at most one CRLF-capable layer being enabled
-       * in the whole layer stack. */
+      /* If the old top layer is a CRLF layer, reactivate it (if
+       * necessary) and remove this new layer from the stack */
 	 PerlIO *g = PerlIONext(f);
 	 if (PerlIOValid(g)) {
 	      PerlIOl *b = PerlIOBase(g);
-- 
1.7.1

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Mar 25, 2011

From @Leont

0002-Update-PerlIO-docs-for-crlf-wrt-7826b36f.patch
From aaa228e7b6721a6ebb009c0486665f6cec624e6e Mon Sep 17 00:00:00 2001
From: Leon Timmermans <fawaka@gmail.com>
Date: Fri, 25 Mar 2011 12:58:51 +0100
Subject: [PATCH 2/2] Update PerlIO docs for :crlf wrt 7826b36f

7826b36fbbf24cfa659558ee5af3de424faa2d5a changed the behavior of
PerlIOCrlf_pushed but :crlf's documentation wasn't updated along with
it.
---
 lib/PerlIO.pm |   16 ++--------------
 1 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/lib/PerlIO.pm b/lib/PerlIO.pm
index f4a0197..e76e0b4 100644
--- a/lib/PerlIO.pm
+++ b/lib/PerlIO.pm
@@ -85,24 +85,12 @@ C<:perlio> will insert a C<:unix> layer below itself to do low level IO.
 
 A layer that implements DOS/Windows like CRLF line endings.  On read
 converts pairs of CR,LF to a single "\n" newline character.  On write
-converts each "\n" to a CR,LF pair.  Note that this layer likes to be
-one of its kind: it silently ignores attempts to be pushed into the
-layer stack more than once.
+converts each "\n" to a CR,LF pair.  Note that this layer will silently
+refuse to be pushed on top of itself.
 
 It currently does I<not> mimic MS-DOS as far as treating of Control-Z
 as being an end-of-file marker.
 
-(Gory details follow) To be more exact what happens is this: after
-pushing itself to the stack, the C<:crlf> layer checks all the layers
-below itself to find the first layer that is capable of being a CRLF
-layer but is not yet enabled to be a CRLF layer.  If it finds such a
-layer, it enables the CRLFness of that other deeper layer, and then
-pops itself off the stack.  If not, fine, use the one we just pushed.
-
-The end result is that a C<:crlf> means "please enable the first CRLF
-layer you can find, and if you can't find one, here would be a good
-spot to place a new one."
-
 Based on the C<:perlio> layer.
 
 =item :mmap
-- 
1.7.1

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Mar 25, 2011

From @cpansprout

On Fri Mar 25 05​:20​:25 2011, LeonT wrote​:

On Fri, Jan 28, 2011 at 6​:43 AM, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Thu Jan 20 15​:04​:30 2011, LeonT wrote​:

I've attached a possible fix for this bug. The patch is relative to my
patches for #82484 but it doesn't depend on it.

Currently, if a «​:crlf» layer is given it will first try to (re)enable
any crlf layer it can find or else push itself on the stack. This can
lead to data corruption. In this patch I've changed it to only
check the
topmost layer.

Testing it would be very welcome, in particular on Windows (not my
habitat).

Leon

Thank you. I have just applied this as 7826b36. Let’s see whether it
breaks Windows. :-)

Comment and documentation updates with regard to the previous patch.
No code changes.

Thank you. Applied as 8dcd593 and 5da08ab.

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Mar 25, 2011

From [Unknown Contact. See original ticket]

On Fri Mar 25 05​:20​:25 2011, LeonT wrote​:

On Fri, Jan 28, 2011 at 6​:43 AM, Father Chrysostomos via RT
<perlbug-followup@​perl.org> wrote​:

On Thu Jan 20 15​:04​:30 2011, LeonT wrote​:

I've attached a possible fix for this bug. The patch is relative to my
patches for #82484 but it doesn't depend on it.

Currently, if a «​:crlf» layer is given it will first try to (re)enable
any crlf layer it can find or else push itself on the stack. This can
lead to data corruption. In this patch I've changed it to only
check the
topmost layer.

Testing it would be very welcome, in particular on Windows (not my
habitat).

Leon

Thank you. I have just applied this as 7826b36. Let’s see whether it
breaks Windows. :-)

Comment and documentation updates with regard to the previous patch.
No code changes.

Thank you. Applied as 8dcd593 and 5da08ab.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.