Skip to content

Commit

Permalink
Merge support for UTF8 symbols
Browse files Browse the repository at this point in the history
This branch makes symbols support UTF8 internally, which means that
Unicode is supported properly at the perl level.  So ${"\xff"} will
give you the same scalar, regardless of the internal encoding of the
string.  Also, many  parts of the core are now nul-clean, too, as a
result of the UTF8 changes, which means that ‘$m = "a\0b"; foo->$m’
will try to call the method named "a\0b", instead of just "a".
Details follow.

• New API functions:
  Many of these take a _flags parameter, which accept the
  SVf_UTF8 flag.
  • HvNAMELEN
  • HvNAMEUTF8
  • HvENAMELEN
  • HvENAMEUTF8
  • gv_init_pv(n)/sv
  • gv_fetchmeth_pv(n)/sv
  • gv_fetchmeth_pv(n)/sv_autoload
  • gv_fetchmethod_pv(n)/sv_flags — may change
  • gv_autoload_pv(n)/sv
  • newGVgen_flags
  • sv_derived_from_pv(n)/sv
  • sv_does_pv(n)/sv
  • whichsig_pv(n)/sv
• New internal functions:
  • GvNAMEUTF8
  • GvENAMELEN
  • GvENAME_HEK
  • CopSTASH_flags
  • CopSTASH_flags_set
  • PmopSTASH_flags
  • PmopSTASH_flags_set
  • sv_sethek
• Parts of Perl that handle Unicode symbol names correctly:
  • Method names (including those passed to ‘use overload’)
  • Typeglob names (including variable and filehandle names)
  • Package names
  • Constant subroutine names (not nul-clean yet)
  • goto
  • Symbolic dereferencing
  • Second argument to bless() and tie()
  • Return value of ref()
  • Package names returned by caller()
  • Subroutine prototypes
  • Attributes
  • Warnings and error messages that mention filehandles, packages,
    methods, variables, constant values, subroutines, symbolic refer-
    ences, format names and subroutine prototypes
• Parts of Perl that now handle embedded nuls correctly:
  • Method names
  • Typeglob names (including filehandle names)
  • Package names
  • Autoloading
  • Return value of ref()
  • Package names returned by caller()
  • Filehandle warnings
  • Typeglob elements (*foo{"THING\0stuff"})
  • Signal names
  • Warnings and error messages that mention (yes, it’s the same list
    as above) filehandles, packages, methods, variables, constant val-
    ues, subroutines, symbolic references, format names and subroutine
    prototypes
• Other bug fixes
  • *{é} now treats é as the name of the glob (the usual implicit
    quoting), instead of treating it as a bareword (strict-unsafe)
    or function call.  *{é} used to be equivalent to *{+é}, in
    other words.
• Modified modules:
  • constant has been modified not to apply the workaround for the bug
    that this branch fixes, if that workaround does not apply.
  • attributes has been modified as part of making Unicode attri-
    butes work.
  • XS::APItest
  • mro, as part of making method lookup account for Unicode.
• Side effects
  • Blessing into "\0" no longer causes ref() to return false.
  • *{"*é::..."} is now equivalent to *{"é::..."}, just as
    *{"*e::..."} is equivalent to *{"e::..."}.  Previously, the * was
    only stripped if followed by [A-Za-z].
  • $é is now subject to ‘Used only once’ warnings.  It used to be
    exempt, as the code that checked the named considered it a punctu-
    ation variable.
  • Loading branch information
Father Chrysostomos committed Oct 6, 2011
2 parents da83cd3 + 8cb149d commit dfb1828
Show file tree
Hide file tree
Showing 114 changed files with 8,841 additions and 674 deletions.
55 changes: 55 additions & 0 deletions MANIFEST
Expand Up @@ -3819,6 +3819,11 @@ ext/XS-APItest/t/exception.t XS::APItest extension
ext/XS-APItest/t/fetch_pad_names.t Tests for UTF8 names in pad
ext/XS-APItest/t/gotosub.t XS::APItest: tests goto &xsub and hints
ext/XS-APItest/t/grok.t XS::APItest: tests for grok* functions
ext/XS-APItest/t/gv_autoload4.t XS::APItest: tests for gv_autoload4() and variants
ext/XS-APItest/t/gv_fetchmeth_autoload.t XS::APItest: tests for gv_fetchmeth_autoload() and variants
ext/XS-APItest/t/gv_fetchmethod_flags.t XS::APItest: tests for gv_fetchmethod_flags() and variants
ext/XS-APItest/t/gv_fetchmeth.t XS::APItest: tests for gv_fetchmeth() and variants
ext/XS-APItest/t/gv_init.t XS::APItest: tests for gv_init and variants
ext/XS-APItest/t/hash.t XS::APItest: tests for hash related APIs
ext/XS-APItest/t/keyword_multiline.t test keyword plugin parsing across lines
ext/XS-APItest/t/keyword_plugin.t test keyword plugin mechanism
Expand All @@ -3833,6 +3838,7 @@ ext/XS-APItest/t/Markers.pm Helper for ./blockhooks.t
ext/XS-APItest/t/multicall.t XS::APItest: test MULTICALL macros
ext/XS-APItest/t/my_cxt.t XS::APItest: test MY_CXT interface
ext/XS-APItest/t/my_exit.t XS::APItest: test my_exit
ext/XS-APItest/t/newCONSTSUB.t XS::APItest: test newCONSTSUB(_flags)
ext/XS-APItest/t/Null.pm Helper for ./blockhooks.t
ext/XS-APItest/t/op_contextualize.t test op_contextualize() API
ext/XS-APItest/t/op_list.t test OP list construction API
Expand Down Expand Up @@ -3864,6 +3870,7 @@ ext/XS-APItest/t/temp_lv_sub.t XS::APItest: tests for lvalue subs returning temp
ext/XS-APItest/t/underscore_length.t Test find_rundefsv()
ext/XS-APItest/t/utf16_to_utf8.t Test behaviour of utf16_to_utf8{,reversed}
ext/XS-APItest/t/utf8.t Tests for code in utf8.c
ext/XS-APItest/t/whichsig.t XS::APItest: tests for whichsig() and variants
ext/XS-APItest/t/xs_special_subs_require.t for require too
ext/XS-APItest/t/xs_special_subs.t Test that XS BEGIN/CHECK/INIT/END work
ext/XS-APItest/t/xsub_h.t Tests for XSUB.h
Expand Down Expand Up @@ -4911,42 +4918,78 @@ t/lib/warnings/universal Tests for universal.c for warnings.t
t/lib/warnings/utf8 Tests for utf8.c for warnings.t
t/lib/warnings/util Tests for util.c for warnings.t
t/mro/basic_01_c3.t mro tests
t/mro/basic_01_c3_utf8.t utf8 mro tests
t/mro/basic_01_dfs.t mro tests
t/mro/basic_01_dfs_utf8.t utf8 mro tests
t/mro/basic_02_c3.t mro tests
t/mro/basic_02_c3_utf8.t utf8 mro tests
t/mro/basic_02_dfs.t mro tests
t/mro/basic_02_dfs_utf8.t utf8 mro tests
t/mro/basic_03_c3.t mro tests
t/mro/basic_03_c3_utf8.t utf8 mro tests
t/mro/basic_03_dfs.t mro tests
t/mro/basic_03_dfs_utf8.t utf8 mro tests
t/mro/basic_04_c3.t mro tests
t/mro/basic_04_c3_utf8.t utf8 mro tests
t/mro/basic_04_dfs.t mro tests
t/mro/basic_04_dfs_utf8.t utf8 mro tests
t/mro/basic_05_c3.t mro tests
t/mro/basic_05_c3_utf8.t utf8 mro tests
t/mro/basic_05_dfs.t mro tests
t/mro/basic_05_dfs_utf8.t utf8 mro tests
t/mro/basic.t mro tests
t/mro/basic_utf8.t utf8 mro tests
t/mro/c3_with_overload.t mro tests
t/mro/c3_with_overload_utf8.t utf8 mro tests
t/mro/complex_c3.t mro tests
t/mro/complex_c3_utf8.t utf8 mro tests
t/mro/complex_dfs.t mro tests
t/mro/complex_dfs_utf8.t utf8 mro tests
t/mro/dbic_c3.t mro tests
t/mro/dbic_c3_utf8.t utf8 mro tests
t/mro/dbic_dfs.t mro tests
t/mro/dbic_dfs_utf8.t utf8 mro tests
t/mro/inconsistent_c3.t mro tests
t/mro/inconsistent_c3_utf8.t utf8 mro tests
t/mro/isa_aliases.t tests for shared @ISA arrays
t/mro/isa_aliases_utf8.t utf8 mro tests
t/mro/isa_c3.t test for optimisatised mro_get_linear_isa_c3
t/mro/isa_c3_utf8.t utf8 mro tests
t/mro/isa_dfs.t test for optimisatised mro_get_linear_isa_dfs
t/mro/isa_dfs_utf8.t utf8 mro tests
t/mro/isarev.t PL_isarev/mro::get_isarev tests
t/mro/isarev_utf8.t utf8 mro tests
t/mro/method_caching.t mro tests
t/mro/method_caching_utf8.t utf8 mro tests
t/mro/next_edgecases.t mro tests
t/mro/next_edgecases_utf8.t utf8 mro tests
t/mro/next_goto.t mro tests
t/mro/next_goto_utf8.t utf8 mro tests
t/mro/next_inanon.t mro tests
t/mro/next_inanon_utf8.t utf8 mro tests
t/mro/next_ineval.t mro tests
t/mro/next_ineval_utf8.t utf8 mro tests
t/mro/next_method.t mro tests
t/mro/next_method_utf8.t utf8 mro tests
t/mro/next_NEXT.t mro tests
t/mro/next_NEXT_utf8.t utf8 mro tests
t/mro/next_skip.t mro tests
t/mro/next_skip_utf8.t utf8 mro tests
t/mro/overload_c3.t mro tests
t/mro/overload_c3_utf8.t utf8 mro tests
t/mro/overload_dfs.t mro tests
t/mro/package_aliases.t mro tests
t/mro/package_aliases_utf8.t utf8 mro tests
t/mro/pkg_gen.t mro tests
t/mro/pkg_gen_utf8.t utf8 mro tests
t/mro/recursion_c3.t mro tests
t/mro/recursion_c3_utf8.t utf8 mro tests
t/mro/recursion_dfs.t mro tests
t/mro/recursion_dfs_utf8.t utf8 mro tests
t/mro/vulcan_c3.t mro tests
t/mro/vulcan_c3_utf8.t utf8 mro tests
t/mro/vulcan_dfs.t mro tests
t/mro/vulcan_dfs_utf8.t utf8 mro tests
toke.c The tokener
t/op/64bitint.t See if 64 bit integers work
t/op/alarm.t See if alarm works
Expand Down Expand Up @@ -5226,24 +5269,36 @@ t/run/switchx.t Test the -x switch
t/TEST The regression tester
t/test.pl Simple testing library
t/thread_it.pl Run regression tests in a new thread
t/uni/attrs.t See if Unicode attributes work
t/uni/bless.t See if Unicode bless works
t/uni/cache.t See if Unicode swash caching works
t/uni/caller.t See if Unicode doesn't get mangled in caller()
t/uni/case.pl See if Unicode casing works
t/uni/chomp.t See if Unicode chomp works
t/uni/chr.t See if Unicode chr works
t/uni/class.t See if Unicode classes work (\p)
t/uni/fold.t See if Unicode folding works
t/uni/goto.t See if Unicode goto &sub works
t/uni/greek.t See if Unicode in greek works
t/uni/gv.t See if Unicode GVs work.
t/uni/latin2.t See if Unicode in latin2 works
t/uni/lex_utf8.t See if Unicode in lexer works
t/uni/lower.t See if Unicode casing works
t/uni/method.t See if Unicode methods work
t/uni/overload.t See if Unicode overloading works
t/uni/package.t See if Unicode in package declarations works
t/uni/parser.t See if Unicode in the parser works in edge cases.
t/uni/readline.t See if Unicode filehandles in <FH> work
t/uni/select.t See if Unicode filehandles aren't mangled by select()
t/uni/sprintf.t See if Unicode sprintf works
t/uni/stash.t See if Unicode stashes work
t/uni/tie.t See if Unicode tie works
t/uni/title.t See if Unicode casing works
t/uni/tr_7jis.t See if Unicode tr/// in 7jis works
t/uni/tr_eucjp.t See if Unicode tr/// in eucjp works
t/uni/tr_sjis.t See if Unicode tr/// in sjis works
t/uni/tr_utf8.t See if Unicode tr/// in utf8 works
t/uni/universal.t See if Unicode in calls to UNIVERSAL works
t/uni/upper.t See if Unicode casing works
t/uni/write.t See if Unicode formats work
t/win32/runenv.t Test if Win* perl honors its env variables
Expand Down
18 changes: 15 additions & 3 deletions cop.h
Expand Up @@ -389,6 +389,7 @@ struct cop {
#ifdef USE_ITHREADS
char * cop_stashpv; /* package line was compiled in */
char * cop_file; /* file name the following line # is from */
U32 cop_stashflags; /* currently only SVf_UTF8 */
#else
HV * cop_stash; /* package line was compiled in */
GV * cop_filegv; /* file the following line # is from */
Expand Down Expand Up @@ -433,9 +434,20 @@ struct cop {
# define CopSTASHPV_set(c,pv) ((c)->cop_stashpv = savesharedpv(pv))
# endif

# define CopSTASH(c) (CopSTASHPV(c) \
? gv_stashpv(CopSTASHPV(c),GV_ADD) : NULL)
# define CopSTASH_set(c,hv) CopSTASHPV_set(c, (hv) ? HvNAME_get(hv) : NULL)
# define CopSTASH_flags(c) ((c)->cop_stashflags)
# define CopSTASH_flags_set(c,flags) ((c)->cop_stashflags = flags)

# define CopSTASH(c) (CopSTASHPV(c) \
? gv_stashpv(CopSTASHPV(c), \
GV_ADD|(CopSTASH_flags(c) \
? CopSTASH_flags(c): 0 )) \
: NULL)
# define CopSTASH_set(c,hv) (CopSTASHPV_set(c, (hv) ? HvNAME_get(hv) : NULL), \
CopSTASH_flags_set(c, \
((hv) && HvNAME_HEK(hv) && \
HvNAMEUTF8(hv)) \
? SVf_UTF8 \
: 0))
# define CopSTASH_eq(c,hv) ((hv) && stashpv_hvname_match(c,hv))
# ifdef NETWARE
# define CopSTASH_free(c) SAVECOPSTASH_FREE(c)
Expand Down
25 changes: 20 additions & 5 deletions dist/constant/lib/constant.pm
Expand Up @@ -4,7 +4,7 @@ use strict;
use warnings::register;

use vars qw($VERSION %declared);
$VERSION = '1.22';
$VERSION = '1.23';

#=======================================================================

Expand All @@ -29,6 +29,19 @@ BEGIN {
no strict 'refs';
my $const = $] > 5.009002;
*_CAN_PCS = sub () {$const};

# Before this makes its way into a dev perl release, we have to do
# browser-sniffing, as it were....
return unless $const;
*{chr 256} = \3;
if (exists ${__PACKAGE__."::"}{"\xc4\x80"}) {
delete ${__PACKAGE__."::"}{"\xc4\x80"};
*_DOWNGRADE = sub () {1};
}
else {
delete ${__PACKAGE__."::"}{chr 256};
*_DOWNGRADE = sub () {0};
}
}

#=======================================================================
Expand Down Expand Up @@ -117,10 +130,12 @@ sub import {
if ($multiple || @_ == 1) {
my $scalar = $multiple ? $constants->{$name} : $_[0];

# Work around perl bug #31991: Sub names (actually glob
# names in general) ignore the UTF8 flag. So we have to
# turn it off to get the "right" symbol table entry.
utf8::is_utf8 $name and utf8::encode $name;
if (_DOWNGRADE) { # for 5.10 to 5.14
# Work around perl bug #31991: Sub names (actually glob
# names in general) ignore the UTF8 flag. So we have to
# turn it off to get the "right" symbol table entry.
utf8::is_utf8 $name and utf8::encode $name;
}

# The constant serves to optimise this entire block out on
# 5.8 and earlier.
Expand Down
27 changes: 15 additions & 12 deletions doio.c
Expand Up @@ -126,8 +126,8 @@ Perl_do_openn(pTHX_ GV *gv, register const char *oname, I32 len, int as_raw,
if (result == EOF && fd > PL_maxsysfd) {
/* Why is this not Perl_warn*() call ? */
PerlIO_printf(Perl_error_log,
"Warning: unable to close filehandle %s properly.\n",
GvENAME(gv));
"Warning: unable to close filehandle %"SVf" properly.\n",
SVfARG(sv_2mortal(newSVhek(GvENAME_HEK(gv)))));
}
IoOFP(io) = IoIFP(io) = NULL;
}
Expand Down Expand Up @@ -541,14 +541,14 @@ Perl_do_openn(pTHX_ GV *gv, register const char *oname, I32 len, int as_raw,
if ((IoTYPE(io) == IoTYPE_RDONLY) &&
(fp == PerlIO_stdout() || fp == PerlIO_stderr())) {
Perl_warner(aTHX_ packWARN(WARN_IO),
"Filehandle STD%s reopened as %s only for input",
"Filehandle STD%s reopened as %"SVf" only for input",
((fp == PerlIO_stdout()) ? "OUT" : "ERR"),
GvENAME(gv));
SVfARG(sv_2mortal(newSVhek(GvENAME_HEK(gv)))));
}
else if ((IoTYPE(io) == IoTYPE_WRONLY) && fp == PerlIO_stdin()) {
Perl_warner(aTHX_ packWARN(WARN_IO),
"Filehandle STDIN reopened as %s only for output",
GvENAME(gv));
"Filehandle STDIN reopened as %"SVf" only for output",
SVfARG(sv_2mortal(newSVhek(GvENAME_HEK(gv)))));
}
}

Expand Down Expand Up @@ -1337,8 +1337,8 @@ Perl_my_lstat_flags(pTHX_ const U32 flags)
return PL_laststatval;
}
if (ckWARN(WARN_IO)) {
Perl_warner(aTHX_ packWARN(WARN_IO), "Use of -l on filehandle %s",
GvENAME(cGVOP_gv));
Perl_warner(aTHX_ packWARN(WARN_IO), "Use of -l on filehandle %"SVf,
SVfARG(sv_2mortal(newSVhek(GvENAME_HEK(cGVOP_gv)))));
}
return (PL_laststatval = -1);
}
Expand Down Expand Up @@ -1567,6 +1567,7 @@ Perl_apply(pTHX_ I32 type, register SV **mark, register SV **sp)
register I32 tot = 0;
const char *const what = PL_op_name[type];
const char *s;
STRLEN len;
SV ** const oldmark = mark;

PERL_ARGS_ASSERT_APPLY;
Expand Down Expand Up @@ -1677,12 +1678,14 @@ nothing in the core.
APPLY_TAINT_PROPER();
if (mark == sp)
break;
s = SvPVx_nolen_const(*++mark);
s = SvPVx_const(*++mark, len);
if (isALPHA(*s)) {
if (*s == 'S' && s[1] == 'I' && s[2] == 'G')
if (*s == 'S' && s[1] == 'I' && s[2] == 'G') {
s += 3;
if ((val = whichsig(s)) < 0)
Perl_croak(aTHX_ "Unrecognized signal name \"%s\"",s);
len -= 3;
}
if ((val = whichsig_pvn(s, len)) < 0)
Perl_croak(aTHX_ "Unrecognized signal name \"%"SVf"\"", SVfARG(*mark));
}
else
val = SvIV(*mark);
Expand Down

0 comments on commit dfb1828

Please sign in to comment.