Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RH perl5.8.0 split loop in <STDIN> #6304

Closed
p5pRT opened this issue Feb 13, 2003 · 7 comments
Closed

RH perl5.8.0 split loop in <STDIN> #6304

p5pRT opened this issue Feb 13, 2003 · 7 comments

Comments

@p5pRT
Copy link

@p5pRT p5pRT commented Feb 13, 2003

Migrated from rt.perl.org#20912 (status was 'resolved')

Searchable as RT20912$

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Feb 13, 2003

From widyono@cis.upenn.edu

Created by widyono@cis.upenn.edu

The following works​:

perl -e '@​parse=split(/[, ]+/,"io0, io1"); print "$parse[0]\n$parse[1]\n";'

The following works in 5.6.1 on linux, but in 5.8.0 (RedHat's 8.0 default
install RPM built for i386-linux-multi-thread), spits out
Split loop, <STDIN> line 1.

perl -e '$input=<STDIN>;chop($input);@​parse=split(/[, ]+/, $input);print
"$parse[0]\n$parse[1]\n";'

Happens even if $input is reassigned to another var and that var is used
in split. Does not happen if [] is not used (what would appropriate
REGEXP be in that case, without using []?).

Does not happen with LANG=C.

No idea what to put for severity...

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl v5.8.0:

Configured by bhcompile at Sun Sep  1 23:55:07 EDT 2002.

Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.4.18-11smp, archname=i386-linux-thread-multi
    uname='linux daffy.perf.redhat.com 2.4.18-11smp #1 smp thu aug 15 06:41:59 edt 2002 i686 i686 i386 gnulinux '
    config_args='-des -Doptimize=-O2 -march=i386 -mcpu=i686 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
    optimize='-O2 -march=i386 -mcpu=i686',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -I/usr/include/gdbm'
    ccversion='', gccversion='3.2 20020822 (Red Hat Linux Rawhide 3.2-5)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lgdbm -ldb -ldl -lm -lpthread -lc -lcrypt -lutil
    perllibs=-lnsl -ldl -lm -lpthread -lc -lcrypt -lutil
    libc=/lib/libc-2.2.92.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.2.92'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic -Wl,-rpath,/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.0:
    /usr/lib/perl5/5.8.0/i386-linux-thread-multi
    /usr/lib/perl5/5.8.0
    /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.0
    /usr/lib/perl5/site_perl
    /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.0
    /usr/lib/perl5/vendor_perl
    .


Environment for perl v5.8.0:
    HOME=/home/widyono
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/bin:/usr/bin:/usr/local/bin:/usr/bin/X11:/usr/X11R6/bin:/home/widyono/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Feb 13, 2003

From @jhi

Verified to still exist in the maintenance branch. Easier way to reproduce
(and which doesn't require RedHat) is as follows​:

-e '$input="foo bar";utf8​::upgrade($input);@​parse=split(/[, ]+/, $input);print "$parse[0]\n$parse[1]\n"'

(For perl5-porters I) The "Split loop" comes if pp.c​:Perl_pp_split() thinks that
the pp stack has grown too much. I don't think it does, I think the problem
is that the utf8 regex munging shuffles the pp stack.

(For perl5-porters II) If you are using "./perl ..." don't forget also "-Ilib" since
otherwise utf8.pm won't be found and you'll get into a "POPSTACK" death spiral.

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Feb 14, 2003

From enache@rdslink.ro

On Thu, Feb 13, 2003 at 05​:31​:38AM -0000, widyono@​cis.upenn.edu (via RT) wrote​:

The following works​:

perl -e '@​parse=split(/[, ]+/,"io0, io1"); print "$parse[0]\n$parse[1]\n";'

The following works in 5.6.1 on linux, but in 5.8.0 (RedHat's 8.0 default
install RPM built for i386-linux-multi-thread), spits out
Split loop, <STDIN> line 1.

perl -e '$input=<STDIN>;chop($input);@​parse=split(/[, ]+/, $input);print
"$parse[0]\n$parse[1]\n";'

Happens even if $input is reassigned to another var and that var is used
in split. Does not happen if [] is not used (what would appropriate
REGEXP be in that case, without using []?).

Does not happen with LANG=C.

I can get it on bleadperl too.
The example could be rewritten​:

$ perl -le '$p="a,b"; utf8​::upgrade $p; split(/[, ]+/,$p)'
Split loop at -e line 1.

( If the locales are utf8, $input=<STDIN> above become
  utf8 'colored' too )

However, this works​:

$ perl -le '$p="a,b"; utf8​::upgrade $p; print split(/[, ]+/,$p)'
ab

It obviously has to do with the trick pp_split() uses : if the
list it returns has to be assigned to an array ( @​_ if 'split'
was called in scalar context) , it uses that array as the stack
(pp.c​:4425)

When the string to be split is utf8 flagged, the regexp engine
(at pp.c​:4550) may call subs from the utf8 perl module, bracketing
those calls by PUSHSTACKi/POPSTACK pairs.
(utf8.c - Perl_swash_init()/_fetch())

POPSTACK pops there to the PL_curstackinfo->si_stack, not to the
array/stack pp_split() has just switched to.

The following fixes this bug. Please try.
Regards
Adi


Inline Patch
--- /arc/perl-current/pp.c	2003-02-02 19:59:19.000000000 +0200
+++ perl-current/pp.c	2003-02-15 00:29:51.000000000 +0200
@@ -4423,6 +4423,7 @@ PP(pp_split)
 	    }
 	    /* temporarily switch stacks */
 	    SWITCHSTACK(PL_curstack, ary);
+	    PL_curstackinfo->si_stack = ary;
 	    make_mortal = 0;
 	}
     }
@@ -4620,6 +4621,7 @@ PP(pp_split)
     if (realarray) {
 	if (!mg) {
 	    SWITCHSTACK(ary, oldstack);
+	    PL_curstackinfo->si_stack = oldstack;
 	    if (SvSMAGICAL(ary)) {
 		PUTBACK;
 		mg_set((SV*)ary);
----------------------------------------------------------------------------
#!/usr/bin/perl require "test\.pl";

eval { $p="a,b"; utf8​::upgrade $p; split(/[, ]+/,$p) };
  is ($@​, '', '#20912 - split() fails with /[]+/ & utf8');

__END__


@p5pRT
Copy link
Author

@p5pRT p5pRT commented Feb 15, 2003

From @jhi

Yes, thanks, just as I suspected, that somehow the stack had got
whacked by the use of the utf8.pm. I didn't notice the @​a = split ...
stack juggling trick, though. I'll apply the patch henceforth.

--
Jarkko Hietaniemi <jhi@​iki.fi> http​://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Feb 15, 2003

From @jhi

Now applied as change #18708.

--
Jarkko Hietaniemi <jhi@​iki.fi> http​://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Feb 17, 2003

From @jhi

This issue got resolved, so I am marking also the problem ticket as such.

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Feb 17, 2003

@jhi - Status changed from 'new' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant