Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] feature sysio_bytes #16732

Closed
p5pRT opened this issue Oct 23, 2018 · 18 comments
Closed

[RFC] feature sysio_bytes #16732

p5pRT opened this issue Oct 23, 2018 · 18 comments
Labels

Comments

@p5pRT
Copy link
Collaborator

@p5pRT p5pRT commented Oct 23, 2018

Migrated from rt.perl.org#133610 (status was 'rejected')

Searchable as RT133610$

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Oct 23, 2018

From @tonycoz

In [perl #125760] I suggested obsoleting sysread and syswrite on :utf8
handles, and after some discussion on #p5p, also deprecated send and
recv on :utf8 handles.

With 5c0551a the deprecation was
carried through and these operators now croak when used on a :utf8
handle.

So how can we get to a saner behaviour for these operators without
silently changing the behaviour of existing code?

The attached patches add a new feature that prevents these operators
from croaking when used on a :utf8 handle, but also makes them work in
bytes, rather then the sketchy way they did before.

This feature is currently not part of any version feature bundles, but
this could change.

Tony

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Oct 23, 2018

From @tonycoz

0001-add-feature-sysio_bytes.patch
From 6471b9f762528da33c242691992d5a42f7675f1e Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Wed, 17 Oct 2018 09:18:42 +1100
Subject: add feature sysio_bytes

Calling sysread(), syswrite(), recv(), send() on a :utf8 handle
currently throws an exception, due to the strangeness of their old
behaviour.

This feature allows those operators to be called on :utf8 handles, but
makes them ignore that :utf8 flags, always reading or writing bytes.
---
 feature.h          |  6 ++++++
 lib/feature.pm     | 12 ++++++++++--
 pod/perldiag.pod   | 13 ++++++++++---
 pod/perlfunc.pod   | 14 +++++++++-----
 pp_sys.c           | 29 +++++++++++++++++++----------
 regen/feature.pl   | 10 +++++++++-
 t/io/socket.t      | 27 ++++++++++++++++++++++-----
 t/lib/croak/pp_sys | 10 ++++++++--
 t/op/sysio.t       | 21 ++++++++++++++++++++-
 9 files changed, 113 insertions(+), 29 deletions(-)

diff --git a/feature.h b/feature.h
index 52ace09f6d..96bcdd5006 100644
--- a/feature.h
+++ b/feature.h
@@ -98,6 +98,12 @@
 	 FEATURE_IS_ENABLED("refaliasing") \
     )
 
+#define FEATURE_SYSIO_BYTES_IS_ENABLED \
+    ( \
+	CURRENT_FEATURE_BUNDLE == FEATURE_BUNDLE_CUSTOM && \
+	 FEATURE_IS_ENABLED("sysio_bytes") \
+    )
+
 #define FEATURE_POSTDEREF_QQ_IS_ENABLED \
     ( \
 	(CURRENT_FEATURE_BUNDLE >= FEATURE_BUNDLE_523 && \
diff --git a/lib/feature.pm b/lib/feature.pm
index 0301aa5935..7a8fff3a00 100644
--- a/lib/feature.pm
+++ b/lib/feature.pm
@@ -5,7 +5,7 @@
 
 package feature;
 
-our $VERSION = '1.54';
+our $VERSION = '1.55';
 
 our %feature = (
     fc              => 'feature_fc',
@@ -17,6 +17,7 @@ our %feature = (
     signatures      => 'feature_signatures',
     current_sub     => 'feature___SUB__',
     refaliasing     => 'feature_refaliasing',
+    sysio_bytes     => 'feature_sysio_bytes',
     postderef_qq    => 'feature_postderef_qq',
     unicode_eval    => 'feature_unieval',
     declared_refs   => 'feature_myref',
@@ -29,7 +30,7 @@ our %feature_bundle = (
     "5.15"    => [qw(current_sub evalbytes fc say state switch unicode_eval unicode_strings)],
     "5.23"    => [qw(current_sub evalbytes fc postderef_qq say state switch unicode_eval unicode_strings)],
     "5.27"    => [qw(bitwise current_sub evalbytes fc postderef_qq say state switch unicode_eval unicode_strings)],
-    "all"     => [qw(bitwise current_sub declared_refs evalbytes fc postderef_qq refaliasing say signatures state switch unicode_eval unicode_strings)],
+    "all"     => [qw(bitwise current_sub declared_refs evalbytes fc postderef_qq refaliasing say signatures state switch sysio_bytes unicode_eval unicode_strings)],
     "default" => [qw()],
 );
 
@@ -348,6 +349,13 @@ Reference to a Variable> for examples.
 
 This feature is available from Perl 5.26 onwards.
 
+=head2 The 'sysio_bytes' feature
+
+This allows the C<sysread>, C<syswrite>, C<recv> and C<send> operators
+to work on file handles that have the C<:utf8> flag, B<but> makes them
+operator in bytes, just as they do for handles without the C<:utf8>
+flag.
+
 =head1 FEATURE BUNDLES
 
 It's possible to load multiple features together, using
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index 82d3e4e768..02f3a262df 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -3218,9 +3218,16 @@ Similarly, syswrite() and send() used only the C<:utf8> flag, otherwise ignoring
 any layers.  If the flag is set, both wrote the value UTF-8 encoded, even if
 the layer is some different encoding, such as the example above.
 
-Ideally, all of these operators would completely ignore the C<:utf8> state,
-working only with bytes, but this would result in silently breaking existing
-code.
+You can prevent this error by calling the operator within the scope
+of:
+
+  use feature 'sysio_bytes';
+
+B<but> this changes the behaviour from older perls so that these
+operators always work in bytes, rather than the older behaviour.
+
+Ideally, this would be the default, but this may result in silently
+breaking existing code.
 
 =item "%s" is more clearly written simply as "%s" in regex; marked by S<<-- HERE> in m/%s/
 
diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
index 9394e22343..2f47f41f00 100644
--- a/pod/perlfunc.pod
+++ b/pod/perlfunc.pod
@@ -6284,7 +6284,8 @@ This call is actually implemented in terms of the L<recvfrom(2)> system call.
 See L<perlipc/"UDP: Message Passing"> for examples.
 
 Note that if the socket has been marked as C<:utf8>, C<recv> will
-throw an exception.  The C<:encoding(...)> layer implicitly introduces
+throw an exception unless called within the scope of
+C< use feature "sysio_bytes"; >.  The C<:encoding(...)> layer implicitly introduces
 the C<:utf8> layer.  See L<C<binmode>|/binmode FILEHANDLE, LAYER>.
 
 =item redo LABEL
@@ -7078,7 +7079,8 @@ or the undefined value on error.  The L<sendmsg(2)> syscall is currently
 unimplemented.  See L<perlipc/"UDP: Message Passing"> for examples.
 
 Note that if the socket has been marked as C<:utf8>, C<send> will
-throw an exception.  The C<:encoding(...)> layer implicitly introduces
+throw an exception unless called within the scope of
+C< use feature "sysio_bytes"; >.  The C<:encoding(...)> layer implicitly introduces
 the C<:utf8> layer.  See L<C<binmode>|/binmode FILEHANDLE, LAYER>.
 
 =item setpgrp PID,PGRP
@@ -8712,8 +8714,9 @@ L<C<eof>|/eof FILEHANDLE> doesn't work well on device files (like ttys)
 anyway.  Use L<C<sysread>|/sysread FILEHANDLE,SCALAR,LENGTH,OFFSET> and
 check for a return value of 0 to decide whether you're done.
 
-Note that if the filehandle has been marked as C<:utf8>, C<sysread> will
-throw an exception.  The C<:encoding(...)> layer implicitly
+If the filehandle has been marked as C<:utf8>, C<sysread> will
+throw an exception unless it's called within the scope of
+C< use feature "sysio_bytes"; >.  The C<:encoding(...)> layer implicitly
 introduces the C<:utf8> layer.  See
 L<C<binmode>|/binmode FILEHANDLE, LAYER>,
 L<C<open>|/open FILEHANDLE,EXPR>, and the L<open> pragma.
@@ -8874,7 +8877,8 @@ string other than the beginning.  A negative OFFSET specifies writing
 that many characters counting backwards from the end of the string.
 If SCALAR is of length zero, you can only use an OFFSET of 0.
 
-B<WARNING>: If the filehandle is marked C<:utf8>, C<syswrite> will raise an exception.
+If the filehandle is marked C<:utf8>, C<syswrite> will raise an exception,
+unless it is called within the scope of C< use feature "sysio_bytes"; >.
 The C<:encoding(...)> layer implicitly introduces the C<:utf8> layer.
 Alternately, if the handle is not marked with an encoding but you
 attempt to write characters with code points over 255, raises an exception.
diff --git a/pp_sys.c b/pp_sys.c
index 00faa7711f..fd85e752e7 100644
--- a/pp_sys.c
+++ b/pp_sys.c
@@ -30,6 +30,7 @@
 #define PERL_IN_PP_SYS_C
 #include "perl.h"
 #include "time64.h"
+#include "feature.h"
 
 #ifdef I_SHADOW
 /* Shadow password support for solaris - pdo@cs.umd.edu
@@ -1725,16 +1726,26 @@ PP(pp_sysread)
 
     if ((fp_utf8 = PerlIO_isutf8(IoIFP(io))) && !IN_BYTES) {
         if (PL_op->op_type == OP_SYSREAD || PL_op->op_type == OP_RECV) {
-            Perl_croak(aTHX_
-                       "%s() isn't allowed on :utf8 handles",
-                       OP_DESC(PL_op));
+            if (FEATURE_SYSIO_BYTES_IS_ENABLED) {
+                /* treat the handle as non-UTF8 for sysread() */
+                fp_utf8 = 0;
+                goto bytes;
+            }
+            else {
+                Perl_croak(aTHX_
+                           "%s() isn't allowed on :utf8 handles",
+                           OP_DESC(PL_op));
+            }
+        }
+        else {
+            buffer = SvPVutf8_force(bufsv, blen);
+            /* UTF-8 may not have been set if they are all low bytes */
+            SvUTF8_on(bufsv);
+            buffer_utf8 = 0;
         }
-	buffer = SvPVutf8_force(bufsv, blen);
-	/* UTF-8 may not have been set if they are all low bytes */
-	SvUTF8_on(bufsv);
-	buffer_utf8 = 0;
     }
     else {
+    bytes:
 	buffer = SvPV_force(bufsv, blen);
 	buffer_utf8 = !IN_BYTES && SvUTF8(bufsv);
     }
@@ -1776,8 +1787,6 @@ PP(pp_sysread)
 	SvCUR_set(bufsv, count);
 	*SvEND(bufsv) = '\0';
 	(void)SvPOK_only(bufsv);
-	if (fp_utf8)
-	    SvUTF8_on(bufsv);
 	SvSETMAGIC(bufsv);
 	/* This should not be marked tainted if the fp is marked clean */
 	if (!(IoFLAGS(io) & IOf_UNTAINT))
@@ -1985,7 +1994,7 @@ PP(pp_syswrite)
     buffer = SvPV_const(bufsv, blen);
     doing_utf8 = DO_UTF8(bufsv);
 
-    if (PerlIO_isutf8(IoIFP(io))) {
+    if (PerlIO_isutf8(IoIFP(io)) && !FEATURE_SYSIO_BYTES_IS_ENABLED) {
         Perl_croak(aTHX_
                    "%s() isn't allowed on :utf8 handles",
                    OP_DESC(PL_op));
diff --git a/regen/feature.pl b/regen/feature.pl
index 89d46af907..665b852961 100755
--- a/regen/feature.pl
+++ b/regen/feature.pl
@@ -35,6 +35,7 @@ my %feature = (
     unicode_strings => 'unicode',
     fc              => 'fc',
     signatures      => 'signatures',
+    sysio_bytes     => 'sysio_bytes',
 );
 
 # NOTE: If a feature is ever enabled in a non-contiguous range of Perl
@@ -375,7 +376,7 @@ read_only_bottom_close_and_rename($h);
 __END__
 package feature;
 
-our $VERSION = '1.54';
+our $VERSION = '1.55';
 
 FEATURES
 
@@ -660,6 +661,13 @@ Reference to a Variable> for examples.
 
 This feature is available from Perl 5.26 onwards.
 
+=head2 The 'sysio_bytes' feature
+
+This allows the C<sysread>, C<syswrite>, C<recv> and C<send> operators
+to work on file handles that have the C<:utf8> flag, B<but> makes them
+operator in bytes, just as they do for handles without the C<:utf8>
+flag.
+
 =head1 FEATURE BUNDLES
 
 It's possible to load multiple features together, using
diff --git a/t/io/socket.t b/t/io/socket.t
index be3abc0e1e..d9807048ee 100644
--- a/t/io/socket.t
+++ b/t/io/socket.t
@@ -169,10 +169,17 @@ SKIP: {
             binmode $accept, ':raw:utf8';
             ok(!eval { send($accept, "ABC", 0); 1 },
                "should die on send to :utf8 socket");
-            binmode $accept;
             # check bytes will be sent
             utf8::upgrade($send_data);
 	    my $sent_total = 0;
+            {
+                use feature 'sysio_bytes';
+                my $sent;
+                ok(eval { $sent = send($accept, $send_data, 0); 1 },
+                   "can send to :utf8 under sysio_bytes");
+                $sent_total += $sent;
+            }
+            binmode $accept;
 	    while ($sent_total < length $send_data) {
 		my $sent = send($accept, substr($send_data, $sent_total), 0);
 		defined $sent or last;
@@ -184,13 +191,13 @@ SKIP: {
 	    # transit on a certain broken implementation
 	    <$accept>;
 	    # child tests are printed once we hit eof
-	    curr_test(curr_test()+6);
+	    curr_test(curr_test()+7);
 	    waitpid($pid, 0);
 
 	    ok($shutdown, "shutdown() works");
 	}
 	elsif (defined $pid) {
-	    curr_test(curr_test()+3);
+	    curr_test(curr_test()+4);
 	    #sleep 1;
 	    # child
 	    ok_child(close($serv), "close server socket in child");
@@ -205,8 +212,12 @@ SKIP: {
             ok_child(!eval { recv($child, $buf, 1000, 0); 1 },
                      "recv on :utf8 should die");
             is_child($buf, "", "buf shouldn't contain anything");
+            {
+                use feature "sysio_bytes";
+                ok_child(eval { recv($child, $buf, 1000, 0); 1 },
+                         "recv under sysio_bytes on :utf8 doesn't die");
+            }
             binmode $child;
-	    my $recv_peer = recv($child, $buf, 1000, 0);
 	    while(defined recv($child, my $tmp, 1000, 0)) {
 		last if length $tmp == 0;
 		$buf .= $tmp;
@@ -277,11 +288,17 @@ sub ok_child {
     push @child_tests, ( $ok ? "ok " : "not ok ") . curr_test() . " - $note "
 	. ( $TODO ? "# TODO $TODO" : "" ) . "\n";
     curr_test(curr_test()+1);
+    $ok;
 }
 
 sub is_child {
     my ($got, $want, $note) = @_;
-    ok_child($got eq $want, $note);
+    unless (ok_child($got eq $want, $note)) {
+        $got =~ s/([^[:print:]])/ sprintf("\\x%02x", ord $1) /ge;
+        $want =~ s/([^[:print:]])/ sprintf("\\x%02x", ord $1) /ge;
+        push @child_tests, "#  got: $got (length ".length($got). ")\n",
+          "# want: $want (length ".length($want). ")\n";
+    }
 }
 
 sub end_child {
diff --git a/t/lib/croak/pp_sys b/t/lib/croak/pp_sys
index be100da27a..464c2ba65b 100644
--- a/t/lib/croak/pp_sys
+++ b/t/lib/croak/pp_sys
@@ -79,17 +79,23 @@ open my $fh, "<:raw", "../harness" or die "# $!";
 my $buf;
 sysread $fh, $buf, 10;
 binmode $fh, ':utf8';
+use feature "sysio_bytes";
+sysread $fh, $buf, 10;
+no feature "sysio_bytes";
 sysread $fh, $buf, 10;
 EXPECT
-sysread() isn't allowed on :utf8 handles at - line 5.
+sysread() isn't allowed on :utf8 handles at - line 8.
 ########
 # NAME syswrite() disallowed on :utf8
 my $file = "syswwarn.tmp";
 open my $fh, ">:raw", $file or die "# $!";
 syswrite $fh, 'ABC';
 binmode $fh, ':utf8';
+use feature "sysio_bytes";
+syswrite $fh, 'ABC';
+no feature "sysio_bytes";
 syswrite $fh, 'ABC';
 close $fh;
 END { unlink $file; }
 EXPECT
-syswrite() isn't allowed on :utf8 handles at - line 5.
+syswrite() isn't allowed on :utf8 handles at - line 8.
diff --git a/t/op/sysio.t b/t/op/sysio.t
index c6d9bd8917..68ec4c49fc 100644
--- a/t/op/sysio.t
+++ b/t/op/sysio.t
@@ -6,7 +6,7 @@ BEGIN {
   set_up_inc('../lib');
 }
 
-plan tests => 45;
+plan tests => 52;
 
 open(I, 'op/sysio.t') || die "sysio.t: cannot find myself: $!";
 binmode I;
@@ -219,6 +219,25 @@ ok(not defined sysseek(I, -1, 1));
 
 close(I);
 
+{
+    use feature "sysio_bytes";
+    open my $f, ">:raw:utf8", $outfile
+      or die "Cannot open $outfile: $!";
+    my $abc = "\x80\xC1\xFF";
+    is(syswrite($f, $abc), length $abc, "syswrite to :utf8 with sysio_bytes");
+    utf8::upgrade($abc);
+    is(syswrite($f, $abc), length $abc, "syswrite to :utf8 with sysio_bytes");
+    close $f;
+    open $f, "<:raw:utf8", $outfile
+      or die "Cannot open $outfile; $!";
+    my $x;
+    is(sysread($f, $x, 6), 6, "sysread from :utf8 with sysio_bytes");
+    is($x, "$abc$abc", "check we read as bytes");
+    is(sysseek($f, 0, 0)+0, 0, "seek back");
+    is(sysread($f, $x, 6, 6), 6, "sysread with offset from :utf8 with sysio_bytes");
+    is($x, $abc x 4, "check we wrote buffer correctly");
+}
+
 unlink_all $outfile;
 
 chdir('..');
-- 
2.11.0

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Oct 23, 2018

From @tonycoz

0002-perldelta-for-use-sysio_bytes.patch
From b2d4242df1c4b8142839cc69182304355f229d73 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Tue, 23 Oct 2018 10:44:28 +1100
Subject: perldelta for use sysio_bytes

---
 pod/perldelta.pod | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/pod/perldelta.pod b/pod/perldelta.pod
index d88bdb7ef1..553a29b605 100644
--- a/pod/perldelta.pod
+++ b/pod/perldelta.pod
@@ -27,6 +27,15 @@ here, but most should go in the L</Performance Enhancements> section.
 
 [ List each enhancement as a =head2 entry ]
 
+=head2 C< use feature 'sysio_bytes'; >
+
+This feature allows using C<sysread>, C<syswrite>, C<recv> and C<send>
+on C<:utf8> handles, B<but> makes them work in bytes rather than in
+the sort-of-UTF-8 way they did in older perls.
+
+We use a feature here rather than changing the default behaviour to
+avoid silently breaking existing code.
+
 =head1 Security
 
 XXX Any security-related notices go here.  In particular, any security
-- 
2.11.0

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Oct 26, 2018

From @jkeenan

On Tue, 23 Oct 2018 00​:13​:55 GMT, tonyc wrote​:

In [perl #125760] I suggested obsoleting sysread and syswrite on :utf8
handles, and after some discussion on #p5p, also deprecated send and
recv on :utf8 handles.

With 5c0551a the deprecation was
carried through and these operators now croak when used on a :utf8
handle.

So how can we get to a saner behaviour for these operators without
silently changing the behaviour of existing code?

The attached patches add a new feature that prevents these operators
from croaking when used on a :utf8 handle, but also makes them work in
bytes, rather then the sketchy way they did before.

This feature is currently not part of any version feature bundles, but
this could change.

Tony

To facilitate evaluation of this feature request, I have placed the patches in this branch for smoke testing​:

smoke-me/jkeenan/tonyc/133610-sysio-bytes

Thank you very much.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Oct 26, 2018

The RT System itself - Status changed from 'new' to 'open'

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Oct 26, 2018

From @Leont

On Tue, Oct 23, 2018 at 2​:14 AM Tony Cook (via RT) <
perlbug-followup@​perl.org> wrote​:

# New Ticket Created by Tony Cook
# Please include the string​: [perl #133610]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=133610 >

In [perl #125760] I suggested obsoleting sysread and syswrite on :utf8
handles, and after some discussion on #p5p, also deprecated send and
recv on :utf8 handles.

With 5c0551a the deprecation was
carried through and these operators now croak when used on a :utf8
handle.

So how can we get to a saner behaviour for these operators without
silently changing the behaviour of existing code?

The attached patches add a new feature that prevents these operators
from croaking when used on a :utf8 handle, but also makes them work in
bytes, rather then the sketchy way they did before.

This feature is currently not part of any version feature bundles, but
this could change.

What is the use-case of this feature?

I'm not really seeing any new possibilities, or even better syntax for old
possibilities.

Leon

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Oct 29, 2018

From @tonycoz

On Fri, 26 Oct 2018 14​:20​:59 -0700, LeonT wrote​:

On Tue, Oct 23, 2018 at 2​:14 AM Tony Cook (via RT) <
perlbug-followup@​perl.org> wrote​:

# New Ticket Created by Tony Cook
# Please include the string​: [perl #133610]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=133610 >

In [perl #125760] I suggested obsoleting sysread and syswrite on :utf8
handles, and after some discussion on #p5p, also deprecated send and
recv on :utf8 handles.

With 5c0551a the deprecation was
carried through and these operators now croak when used on a :utf8
handle.

So how can we get to a saner behaviour for these operators without
silently changing the behaviour of existing code?

The attached patches add a new feature that prevents these operators
from croaking when used on a :utf8 handle, but also makes them work in
bytes, rather then the sketchy way they did before.

This feature is currently not part of any version feature bundles, but
this could change.

What is the use-case of this feature?

I'm not really seeing any new possibilities, or even better syntax for old
possibilities.

The intent is to make sensible behaviour for these operators available even if the handle has the :utf8 flag.

I would have preferred (at the #125760 timeframe) to make these ops just work in file bytes, whatever layers had been pushed, but that would have been a silent change in behaviour and hence Bad(tm).

Of course, someone who wants such sane behaviour could just​:

  binmode($foo);

Tony

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Oct 29, 2018

From @Leont

On Mon, Oct 29, 2018 at 5​:03 AM Tony Cook via RT <perlbug-followup@​perl.org>
wrote​:

On Fri, 26 Oct 2018 14​:20​:59 -0700, LeonT wrote​:

What is the use-case of this feature?

I'm not really seeing any new possibilities, or even better syntax for
old
possibilities.

The intent is to make sensible behaviour for these operators available
even if the handle has the :utf8 flag.

I would have preferred (at the #125760 timeframe) to make these ops just
work in file bytes, whatever layers had been pushed, but that would have
been a silent change in behaviour and hence Bad(tm).

Of course, someone who wants such sane behaviour could just​:

binmode($foo);

Tony

It already requires performing a binmode (or an equivalent open) to get the
handle in a utf8-state, your proposal boils down to doing two actions that
cancel each other out to reach the default state again. Surely the sensible
solution is to get rid of that binmode $handle, '​:utf8';

I genuinely can't think of any scenario where one would want to use this
feature.

Leon

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Jan 1, 2019

From @tonycoz

On Mon, 22 Oct 2018 17​:13​:55 -0700, tonyc wrote​:

In [perl #125760] I suggested obsoleting sysread and syswrite on :utf8
handles, and after some discussion on #p5p, also deprecated send and
recv on :utf8 handles.

With 5c0551a the deprecation was
carried through and these operators now croak when used on a :utf8
handle.

So how can we get to a saner behaviour for these operators without
silently changing the behaviour of existing code?

The attached patches add a new feature that prevents these operators
from croaking when used on a :utf8 handle, but also makes them work in
bytes, rather then the sketchy way they did before.

This feature is currently not part of any version feature bundles, but
this could change.

Rejecting, since the only responses have been negative.

Tony

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

@p5pRT p5pRT commented Jan 1, 2019

@tonycoz - Status changed from 'open' to 'rejected'

@p5pRT p5pRT closed this Jan 1, 2019
@p5pRT p5pRT added the Severity Low label Oct 19, 2019
@Astara

This comment has been minimized.

Copy link

@Astara Astara commented Oct 25, 2019

Here is a use case:

-------- Original Message --------
Subject: Re: [cpan-testers/CPAN-Reporter] CPAN-Reporter not functioning on cygwin/Windows due to warnings about deprecated feature use (#96)
Date: Thu, 24 Oct 2019 12:27:27 -0700
From: Slaven Rezić
Reply-To: cpan-testers/CPAN-Reporter reply+AABDATJ24H2FYVBBKGYXBQN3X4527EVBNHHB5AFA7A@reply.github.com
To: cpan-testers/CPAN-Reporter CPAN-Reporter@noreply.github.com
CC: Astara <>, Author author@noreply.github.com
References: cpan-testers/CPAN-Reporter/issues/96@github.com

sysread at line 8 and syswrite at line 9: it looks like Capture::Tiny is involved here: https://github.com/dagolden/Capture-Tiny/blob/master/lib/Capture/Tiny.pm#L74-L75
However, the wrapper script is called with -C0 which is supposed to turn all unicode off...


I don't think anyone used binmode here to put it in UTF8.

And this is the type of case that gets gratuitously broken when changes that shouldn't have been needed were implemented.

sysread/syswrite were supposed to be maps to the OS read/write calls. How can they not be in bytes when I have their contents memory mapped with the MMAP flag?

@Leont

This comment has been minimized.

Copy link
Contributor

@Leont Leont commented Oct 25, 2019

I don't think anyone used binmode here to put it in UTF8.

No, but you did put a -CSA in there, which is really the same.

@Astara

This comment has been minimized.

Copy link

@Astara Astara commented Oct 25, 2019

@Leont

This comment has been minimized.

Copy link
Contributor

@Leont Leont commented Oct 25, 2019

At one point in time, that was the recommended way to get UTF-8
compatibility
for terminal functions but not force it on binary applications/files.

I don't know who recommended to do PERL5OPT=-CSA, but I'm pretty sure it wasn't p5p. I tried googling for it but I literally only found posts by you.

Quite frankly, I consider that kind of setup a case of "doctor, it hurts when I put my hands into the fire".

Your advice to set UNICODE to SA, sounds good, but at the time -CSA was
introduced, I don't recall that being an option. Is UNICODE in, say
perl5.16?

It was introduced in perl 5.8.1.

@Astara

This comment has been minimized.

Copy link

@Astara Astara commented Oct 26, 2019

@Astara

This comment has been minimized.

Copy link

@Astara Astara commented Oct 31, 2019

@xsawyerx

This comment has been minimized.

Copy link
Member

@xsawyerx xsawyerx commented Nov 1, 2019

[...]
Please fix CPAN::Reporter to address this problem, [...]

CPAN::Reporter has its bug issue here. Please open a ticket there for any "fixes" to it.

Please resist the temptation to marginalize or dismiss my input as such would seem to violate Perl's code of conduct. Thank you.

Having read this, I don't consider this marginalizing or dismissing. Leon was using a term to reflect the language is doing the correct thing in how you're using it and thus does not see the need for the language to address it as a special case.

@Astara

This comment has been minimized.

Copy link

@Astara Astara commented Nov 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.