Skip to content

Commit

Permalink
Clarify descriptions of unicode_eval and evalbytes.
Browse files Browse the repository at this point in the history
Issue #18801
  • Loading branch information
FGasper committed May 20, 2021
1 parent f212efc commit 5434342
Show file tree
Hide file tree
Showing 3 changed files with 35 additions and 52 deletions.
4 changes: 2 additions & 2 deletions lib/feature.pm
Expand Up @@ -209,8 +209,8 @@ couldn't be changed without breaking some things that had come to rely on
them, so the feature can be enabled and disabled. Details are at
L<perlfunc/Under the "unicode_eval" feature>.
C<evalbytes> is like string C<eval>, but operating on a byte stream that is
not UTF-8 encoded. Details are at L<perlfunc/evalbytes EXPR>. Without a
C<evalbytes> is like string C<eval>, but it treats its argument as a byte
string. Details are at L<perlfunc/evalbytes EXPR>. Without a
S<C<use feature 'evalbytes'>> nor a S<C<use v5.16>> (or higher) declaration in
the current scope, you can still access it by instead writing
C<CORE::evalbytes>.
Expand Down
79 changes: 31 additions & 48 deletions pod/perlfunc.pod
Expand Up @@ -2199,29 +2199,13 @@ format definitions remain afterwards.
=item Under the L<C<"unicode_eval"> feature|feature/The 'unicode_eval' and 'evalbytes' features>

If this feature is enabled (which is the default under a C<use 5.16> or
higher declaration), EXPR is considered to be
in the same encoding as the surrounding program. Thus if
S<L<C<use utf8>|utf8>> is in effect, the string will be treated as being
UTF-8 encoded. Otherwise, the string is considered to be a sequence of
independent bytes. Bytes that correspond to ASCII-range code points
will have their normal meanings for operators in the string. The
treatment of the other bytes depends on if the
L<C<'unicode_strings"> feature|feature/The 'unicode_strings' feature> is
in effect.

In a plain C<eval> without an EXPR argument, being in S<C<use utf8>> or
not is irrelevant; the UTF-8ness of C<$_> itself determines the
behavior.

Any S<C<use utf8>> or S<C<no utf8>> declarations within the string have
no effect, and source filters are forbidden. (C<unicode_strings>,
however, can appear within the string.) See also the
L<C<evalbytes>|/evalbytes EXPR> operator, which works properly with
source filters.

Variables defined outside the C<eval> and used inside it retain their
original UTF-8ness. Everything inside the string follows the normal
rules for a Perl program with the given state of S<C<use utf8>>.
higher declaration), Perl assumes that EXPR is a character string.
Any S<C<use utf8>> or S<C<no utf8>> declarations within
the string thus have no effect. Source filters are forbidden as well.
(C<unicode_strings>, however, can appear within the string.)

See also the L<C<evalbytes>|/evalbytes EXPR> operator, which works properly
with source filters.

=item Outside the C<"unicode_eval"> feature

Expand All @@ -2233,8 +2217,26 @@ breaking existing programs:

=item *

It can lose track of whether something should be encoded as UTF-8 or
not.
Perl's internal storage of EXPR affects the behavior of the executed code.
For example:

my $v = eval "use utf8; '$expr'";

If $expr is C<"\xc4\x80"> (U+0100 in UTF-8), then the value stored in C<$v>
will depend on whether Perl stores $expr "upgraded" (cf. L<utf8>) or
not:

=over

=item * If upgraded, C<$v> will be C<"\xc4\x80"> (i.e., the
C<use utf8> has no effect.)

=item * If non-upgraded, C<$v> will be C<"\x{100}">.

=back

This is undesirable since being
upgraded or not should not affect a string's behavior.

=item *

Expand Down Expand Up @@ -2360,30 +2362,11 @@ X<evalbytes>

This function is similar to a L<string eval|/eval EXPR>, except it
always parses its argument (or L<C<$_>|perlvar/$_> if EXPR is omitted)
as a string of independent bytes.

If called when S<C<use utf8>> is in effect, the string will be assumed
to be encoded in UTF-8, and C<evalbytes> will make a temporary copy to
work from, downgraded to non-UTF-8. If this is not possible
(because one or more characters in it require UTF-8), the C<evalbytes>
will fail with the error stored in C<$@>.

Bytes that correspond to ASCII-range code points will have their normal
meanings for operators in the string. The treatment of the other bytes
depends on if the L<C<'unicode_strings"> feature|feature/The
'unicode_strings' feature> is in effect.

Of course, variables that are UTF-8 and are referred to in the string
retain that:

my $a = "\x{100}";
evalbytes 'print ord $a, "\n"';

prints

256
as a byte string. If the string contains any code points above 255, then
it cannot be a byte string, and the C<evalbytes> will fail with the error
stored in C<$@>.

and C<$@> is empty.
C<use utf8> and C<no utf8> within the string have their usual effect.

Source filters activated within the evaluated code apply to the code
itself.
Expand Down
4 changes: 2 additions & 2 deletions regen/feature.pl
Expand Up @@ -615,8 +615,8 @@ =head2 The 'unicode_eval' and 'evalbytes' features
them, so the feature can be enabled and disabled. Details are at
L<perlfunc/Under the "unicode_eval" feature>.
C<evalbytes> is like string C<eval>, but operating on a byte stream that is
not UTF-8 encoded. Details are at L<perlfunc/evalbytes EXPR>. Without a
C<evalbytes> is like string C<eval>, but it treats its argument as a byte
string. Details are at L<perlfunc/evalbytes EXPR>. Without a
S<C<use feature 'evalbytes'>> nor a S<C<use v5.16>> (or higher) declaration in
the current scope, you can still access it by instead writing
C<CORE::evalbytes>.
Expand Down

0 comments on commit 5434342

Please sign in to comment.