Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always use STOP_AT_PARTIAL in PerlIO::encoding #16345

Closed
p5pRT opened this issue Dec 29, 2017 · 9 comments
Closed

Always use STOP_AT_PARTIAL in PerlIO::encoding #16345

p5pRT opened this issue Dec 29, 2017 · 9 comments

Comments

@p5pRT
Copy link

@p5pRT p5pRT commented Dec 29, 2017

Migrated from rt.perl.org#132665 (status was 'open')

Searchable as RT132665$

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Dec 29, 2017

From @Leont

Created by @Leont

PerlIO​::encoding has a $fallback variable that allows one to set the
behavior on a encoding/decoding error, for example to make it throw an
exception on error. What is not documented (actually the example in the
documentation is even missing this) is that PerlIO​::encoding needs the
(equally undocumented) Encode​::STOP_AT_PARTIAL flag to be set, otherwise
a multi-byte character spanning buffer boundaries will be interpreted as
two invalid byte sequences. I could have fixed the documentation, but
instead I fixed the code to always pass this flag to Encode, simplifying
the use and making the current documentation correct again.

Perl Info

Flags:
    category=library
    severity=low
    module=PerlIO::encoding

Site configuration information for perl 5.24.0:

Configured by leont at Thu Jun  9 10:14:26 CEST 2016.

Summary of my perl5 (revision 5 version 24 subversion 0) configuration:

  Platform:
    osname=linux, osvers=4.5.4-1-arch, archname=x86_64-linux-thread-multi
    uname='linux leon-laptop 4.5.4-1-arch #1 smp preempt wed may 11
22:21:28 cest 2016 x86_64 gnulinux '
    config_args='-de
-Dprefix=/home/leont/perl5/perlbrew/perls/perl-5.24.0 -Dusethreads
-Aeval:scriptdir=/home/leont/perl5/perlbrew/perls/perl-5.24.0/bin'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fwrapv
-fno-strict-aliasing -pipe -fstack-protector-strong
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing
-pipe -fstack-protector-strong -I/usr/local/include'
    ccversion='', gccversion='6.1.1 20160501', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8,
byteorder=12345678, doublekind=3
    d_longlong=define, longlongsize=8, d_longdbl=define,
longdblsize=16, longdblkind=3
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib
/usr/lib/gcc/x86_64-pc-linux-gnu/6.1.1/include-fixed /usr/lib
/lib/../lib /usr/lib/../lib /lib /lib64 /usr/lib64
    libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.23.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.23'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib
-fstack-protector-strong'

Locally applied patches:
    Devel::PatchPerl 1.40


@INC for perl 5.24.0:
    /home/leont/perl5/perlbrew/perls/perl-5.24.0/lib/site_perl/5.24.0/x86_64-linux-thread-multi
    /home/leont/perl5/perlbrew/perls/perl-5.24.0/lib/site_perl/5.24.0
    /home/leont/perl5/perlbrew/perls/perl-5.24.0/lib/5.24.0/x86_64-linux-thread-multi
    /home/leont/perl5/perlbrew/perls/perl-5.24.0/lib/5.24.0
    .


Environment for perl 5.24.0:
    HOME=/home/leont
    LANG=en_US.UTF-8
    LANGUAGE=en_US.UTF-8
    LC_MEASUREMENT=en_IE.UTF-8
    LC_MESSAGES=POSIX
    LC_MONETARY=en_IE.UTF-8
    LC_PAPER=en_IE.UTF-8
    LC_TIME=en_IE.UTF-8
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/leont/perl5/perlbrew/bin:/home/leont/perl5/perlbrew/perls/perl-5.24.0/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl
    PERLBREW_HOME=/home/leont/.perlbrew
    PERLBREW_MANPATH=/home/leont/perl5/perlbrew/perls/perl-5.24.0/man
    PERLBREW_PATH=/home/leont/perl5/perlbrew/bin:/home/leont/perl5/perlbrew/perls/perl-5.24.0/bin
    PERLBREW_PERL=perl-5.24.0
    PERLBREW_ROOT=/home/leont/perl5/perlbrew
    PERLBREW_SHELLRC_VERSION=0.82
    PERLBREW_VERSION=0.82
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Dec 29, 2017

From @Leont

0001-PerlIO-encoding-has-a-fallback-variable-that-allows-.patch
From a64eaf63b9e70e502d64f489a217fd6c4ede6c31 Mon Sep 17 00:00:00 2001
From: Leon Timmermans <fawaka@gmail.com>
Date: Thu, 28 Dec 2017 19:23:03 +0100
Subject: [PATCH] PerlIO::encoding has a $fallback variable that allows one to
 set the behavior on a encoding/decoding error, for example to make it throw
 an exception on error. What is not documented (actually the example in the
 documentation is even missing this) is that PerlIO::encoding needs the
 (equally undocumented) Encode::STOP_AT_PARTIAL flag to be set, otherwise a
 multi-byte character spanning buffer boundaries will be interpreted as two
 invalid byte sequences. I could have fixed the documentation, but instead I
 fixed the code to always pass this flag to Encode, simplifying the use and
 making the current documentation correct again.

---
 ext/PerlIO-encoding/encoding.pm |  5 ++---
 ext/PerlIO-encoding/encoding.xs | 15 +++++++++++++++
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/ext/PerlIO-encoding/encoding.pm b/ext/PerlIO-encoding/encoding.pm
index 08d2df4713..c3b36df2ee 100644
--- a/ext/PerlIO-encoding/encoding.pm
+++ b/ext/PerlIO-encoding/encoding.pm
@@ -1,7 +1,7 @@
 package PerlIO::encoding;
 
 use strict;
-our $VERSION = '0.25';
+our $VERSION = '0.26';
 our $DEBUG = 0;
 $DEBUG and warn __PACKAGE__, " called by ", join(", ", caller), "\n";
 
@@ -13,8 +13,7 @@ $DEBUG and warn __PACKAGE__, " called by ", join(", ", caller), "\n";
 require XSLoader;
 XSLoader::load();
 
-our $fallback =
-    Encode::PERLQQ()|Encode::WARN_ON_ERR()|Encode::STOP_AT_PARTIAL();
+our $fallback = Encode::PERLQQ()|Encode::WARN_ON_ERR();
 
 1;
 __END__
diff --git a/ext/PerlIO-encoding/encoding.xs b/ext/PerlIO-encoding/encoding.xs
index bb4754f3d9..3445082a7c 100644
--- a/ext/PerlIO-encoding/encoding.xs
+++ b/ext/PerlIO-encoding/encoding.xs
@@ -5,6 +5,10 @@
 #define U8 U8
 
 #define OUR_DEFAULT_FB	"Encode::PERLQQ"
+#define OUR_STOP_AT_PARTIAL "Encode::STOP_AT_PARTIAL"
+
+/* This will be set during BOOT */
+static unsigned int encode_stop_at_partial = 0;
 
 #if defined(USE_PERLIO)
 
@@ -164,6 +168,7 @@ PerlIOEncode_pushed(pTHX_ PerlIO * f, const char *mode, SV * arg, PerlIO_funcs *
     }
 
     e->chk = newSVsv(get_sv("PerlIO::encoding::fallback", 0));
+    SvIV_set(e->chk, SvIV(e->chk) | encode_stop_at_partial);
     e->inEncodeCall = 0;
 
     FREETMPS;
@@ -685,6 +690,16 @@ BOOT:
     }
     SPAGAIN;
     sv_setsv(chk, POPs);
+
+    PUSHMARK(sp);
+    PUTBACK;
+    if (call_pv(OUR_STOP_AT_PARTIAL, G_SCALAR) != 1) {
+	    /* should never happen */
+	    Perl_die(aTHX_ "%s did not return a value", OUR_STOP_AT_PARTIAL);
+    }
+    SPAGAIN;
+    encode_stop_at_partial = POPu;
+
     PUTBACK;
 #ifdef PERLIO_LAYERS
     PerlIO_define_layer(aTHX_ PERLIO_FUNCS_CAST(&PerlIO_encode));
-- 
2.15.1

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Feb 4, 2018

From @Leont

On Fri, 29 Dec 2017 05​:11​:38 -0800, LeonT wrote​:

PerlIO​::encoding has a $fallback variable that allows one to set the
behavior on a encoding/decoding error, for example to make it throw an
exception on error. What is not documented (actually the example in
the
documentation is even missing this) is that PerlIO​::encoding needs the
(equally undocumented) Encode​::STOP_AT_PARTIAL flag to be set,
otherwise
a multi-byte character spanning buffer boundaries will be interpreted
as
two invalid byte sequences. I could have fixed the documentation, but
instead I fixed the code to always pass this flag to Encode,
simplifying
the use and making the current documentation correct again.

It turned out that Encode allows one to pass a coderef instead of a set of flags to handle. This however doesn't allow one to pass STOP_AT_PARTIAL, which means it has always been buggy on buffer boundaries. With my new automatic STOP_AT_PARTIAL passing this would result in an unpredictable value (based on the pointer value of the CV), this is clearly undesirable.

Instead we now disallow it in PerlIO​::encoding. This could be changed at some point in the future once Encode adds support for it.

Leon

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Feb 4, 2018

From @Leont

0001-Enforce-STOP_AT_PARTIAL-in-PerlIO-encoding-fallback.patch
From 35d99902af4832a40c4aa9d88895f98aa0b22755 Mon Sep 17 00:00:00 2001
From: Leon Timmermans <fawaka@gmail.com>
Date: Thu, 28 Dec 2017 19:23:03 +0100
Subject: [PATCH 1/2] Enforce STOP_AT_PARTIAL in $PerlIO::encoding::fallback

PerlIO::encoding has a $fallback variable that allows one to set the
behavior on a encoding/decoding error, for example to make it throw an
exception on error. What is not documented (actually the example in the
documentation is even missing this) is that PerlIO::encoding needs the
(equally undocumented) Encode::STOP_AT_PARTIAL flag to be set, otherwise
a multi-byte character spanning buffer boundaries will be interpreted as
two invalid byte sequences. I could have fixed the documentation, but
instead I fixed the code to always pass this flag to Encode, simplifying
the use and making the current documentation correct again.
---
 ext/PerlIO-encoding/encoding.pm |  5 ++---
 ext/PerlIO-encoding/encoding.xs | 15 +++++++++++++++
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/ext/PerlIO-encoding/encoding.pm b/ext/PerlIO-encoding/encoding.pm
index 08d2df4713..c3b36df2ee 100644
--- a/ext/PerlIO-encoding/encoding.pm
+++ b/ext/PerlIO-encoding/encoding.pm
@@ -1,7 +1,7 @@
 package PerlIO::encoding;
 
 use strict;
-our $VERSION = '0.25';
+our $VERSION = '0.26';
 our $DEBUG = 0;
 $DEBUG and warn __PACKAGE__, " called by ", join(", ", caller), "\n";
 
@@ -13,8 +13,7 @@ $DEBUG and warn __PACKAGE__, " called by ", join(", ", caller), "\n";
 require XSLoader;
 XSLoader::load();
 
-our $fallback =
-    Encode::PERLQQ()|Encode::WARN_ON_ERR()|Encode::STOP_AT_PARTIAL();
+our $fallback = Encode::PERLQQ()|Encode::WARN_ON_ERR();
 
 1;
 __END__
diff --git a/ext/PerlIO-encoding/encoding.xs b/ext/PerlIO-encoding/encoding.xs
index bb4754f3d9..39ac40f182 100644
--- a/ext/PerlIO-encoding/encoding.xs
+++ b/ext/PerlIO-encoding/encoding.xs
@@ -5,6 +5,10 @@
 #define U8 U8
 
 #define OUR_DEFAULT_FB	"Encode::PERLQQ"
+#define OUR_STOP_AT_PARTIAL "Encode::STOP_AT_PARTIAL"
+
+/* This will be set during BOOT */
+static unsigned int encode_stop_at_partial = 0;
 
 #if defined(USE_PERLIO)
 
@@ -164,6 +168,7 @@ PerlIOEncode_pushed(pTHX_ PerlIO * f, const char *mode, SV * arg, PerlIO_funcs *
     }
 
     e->chk = newSVsv(get_sv("PerlIO::encoding::fallback", 0));
+    SvUV_set(e->chk, SvUV(e->chk) | encode_stop_at_partial);
     e->inEncodeCall = 0;
 
     FREETMPS;
@@ -685,6 +690,16 @@ BOOT:
     }
     SPAGAIN;
     sv_setsv(chk, POPs);
+
+    PUSHMARK(sp);
+    PUTBACK;
+    if (call_pv(OUR_STOP_AT_PARTIAL, G_SCALAR) != 1) {
+	    /* should never happen */
+	    Perl_die(aTHX_ "%s did not return a value", OUR_STOP_AT_PARTIAL);
+    }
+    SPAGAIN;
+    encode_stop_at_partial = POPu;
+
     PUTBACK;
 #ifdef PERLIO_LAYERS
     PerlIO_define_layer(aTHX_ PERLIO_FUNCS_CAST(&PerlIO_encode));
-- 
2.15.0-291-g0d8980c

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Feb 4, 2018

From @Leont

0002-Disallow-coderef-in-PerlIO-encoding-fallback.patch
From f4e72dbfa4b0d957dd2dcfcf63b615ddd38ae786 Mon Sep 17 00:00:00 2001
From: Leon Timmermans <fawaka@gmail.com>
Date: Thu, 4 Jan 2018 19:56:03 +0100
Subject: [PATCH 2/2] Disallow coderef in $PerlIO::encoding::fallback

Encode allows one to pass a coderef instead of a set of flags to handle.
This however doesn't allow one to pass STOP_AT_PARTIAL, which means it
has always been buggy on buffer boundaries. With my new automatic
STOP_AT_PARTIAL passing this would result in an unpredictable value.
Instead we now disallow it in PerlIO::encoding.
---
 ext/PerlIO-encoding/encoding.xs | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/ext/PerlIO-encoding/encoding.xs b/ext/PerlIO-encoding/encoding.xs
index 39ac40f182..66728734d6 100644
--- a/ext/PerlIO-encoding/encoding.xs
+++ b/ext/PerlIO-encoding/encoding.xs
@@ -168,6 +168,8 @@ PerlIOEncode_pushed(pTHX_ PerlIO * f, const char *mode, SV * arg, PerlIO_funcs *
     }
 
     e->chk = newSVsv(get_sv("PerlIO::encoding::fallback", 0));
+    if (SvROK(e->chk))
+	Perl_croak(aTHX_ "PerlIO::encoding::fallback must be an integer");
     SvUV_set(e->chk, SvUV(e->chk) | encode_stop_at_partial);
     e->inEncodeCall = 0;
 
-- 
2.15.0-291-g0d8980c

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Dec 6, 2018

From @jkeenan

On Sun, 04 Feb 2018 12​:09​:44 GMT, LeonT wrote​:

On Fri, 29 Dec 2017 05​:11​:38 -0800, LeonT wrote​:

PerlIO​::encoding has a $fallback variable that allows one to set the
behavior on a encoding/decoding error, for example to make it throw
an
exception on error. What is not documented (actually the example in
the
documentation is even missing this) is that PerlIO​::encoding needs
the
(equally undocumented) Encode​::STOP_AT_PARTIAL flag to be set,
otherwise
a multi-byte character spanning buffer boundaries will be interpreted
as
two invalid byte sequences. I could have fixed the documentation, but
instead I fixed the code to always pass this flag to Encode,
simplifying
the use and making the current documentation correct again.

It turned out that Encode allows one to pass a coderef instead of a
set of flags to handle. This however doesn't allow one to pass
STOP_AT_PARTIAL, which means it has always been buggy on buffer
boundaries. With my new automatic STOP_AT_PARTIAL passing this would
result in an unpredictable value (based on the pointer value of the
CV), this is clearly undesirable.

Instead we now disallow it in PerlIO​::encoding. This could be changed
at some point in the future once Encode adds support for it.

Leon

Would anyone like to comment on these two patches?

Thank you very much.
--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Dec 6, 2018

The RT System itself - Status changed from 'new' to 'open'

@Leont
Copy link
Contributor

@Leont Leont commented Feb 12, 2021

This has been implemented in #18496

@Leont Leont closed this Feb 12, 2021
@demerphq
Copy link
Contributor

@demerphq demerphq commented Feb 12, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants