More efficient s/// #3

Closed
rwstauner opened this Issue Aug 17, 2011 · 6 comments

Comments

Projects
None yet
2 participants
Contributor

rwstauner commented Aug 17, 2011

There are more efficient substitutions than using an alternation and /g which you may wish to consider.

The example from perlop seems best:

                   s/^\s*(.*?)\s*$/$1/;        # trim whitespace in $_, expensively

                   for ($variable) {           # trim whitespace in $variable, cheap
                       s/^\s+//;
                       s/\s+$//;
                   }

Though doing a quick Benchmark suggests that using \A and \z may actually be the fastest (and arguably most appropriate):

perl -Mstrict -Mwarnings -MBenchmark=:all -e 'my $t = " oh, why hello there \r\n"; sub t ($) { die unless $_[0] eq "oh, why hello there"; $_[0] } timethese(-5, { "^\$" => sub { my $s = $t; for($s){s/^\s+//;s/\s+$//} t $s}, "AZ" => sub { my $s = $t; for($s){s/\A\s+//;s/\s+\
Z//} t $s}, "Az" => sub { my $s = $t; for($s){s/\A\s+//;s/\s+\z//} t $s}, "alt" => sub { (my $s = $t) =~ s/(^\s+|\s+$)//g; t $s}, "exp2" =
> sub { (my $s = $t) =~ s/^\s*(.*?)\s*$/$1/; t $s}, });'
Benchmark: running AZ, Az, ^$, alt, exp2 for at least 5 CPU seconds...
        AZ:  5 wallclock secs ( 5.40 usr +  0.00 sys =  5.40 CPU) @ 573766.85/s (n=3098341)
        Az:  4 wallclock secs ( 5.37 usr +  0.00 sys =  5.37 CPU) @ 588067.97/s (n=3157925)
        ^$:  6 wallclock secs ( 5.34 usr +  0.00 sys =  5.34 CPU) @ 580213.67/s (n=3098341)
       alt:  6 wallclock secs ( 5.38 usr +  0.00 sys =  5.38 CPU) @ 239821.56/s (n=1290240)
      exp2:  6 wallclock secs ( 5.39 usr +  0.00 sys =  5.39 CPU) @ 322235.81/s (n=1736851)

Same result from a different perl (and specifying iterations instead of time):

perl -Mstrict -Mwarnings -MBenchmark=:all -e 'my $t = " oh, why hello there \r\n"; sub t ($) { die unless $_[0] eq "oh, why he
llo there"; $_[0] } timethese(3222333, { "^\$" => sub { my $s = $t; for($s){s/^\s+//;s/\s+$//} t $s}, "AZ" => sub { my $s = $t; for($s){s/
\A\s+//;s/\s+\Z//} t $s}, "Az" => sub { my $s = $t; for($s){s/\A\s+//;s/\s+\z//} t $s}, "alt" => sub { (my $s = $t) =~ s/(^\s+|\s+$)//g; t
 $s}, "exp2" => sub { (my $s = $t) =~ s/^\s*(.*?)\s*$/$1/; t $s}, });'                                                                    
Benchmark: timing 3222333 iterations of AZ, Az, ^$, alt, exp2...
        AZ:  5 wallclock secs ( 5.60 usr +  0.00 sys =  5.60 CPU) @ 575416.61/s (n=3222333)
        Az:  7 wallclock secs ( 5.54 usr +  0.00 sys =  5.54 CPU) @ 581648.56/s (n=3222333)
        ^$:  6 wallclock secs ( 5.59 usr +  0.00 sys =  5.59 CPU) @ 576445.97/s (n=3222333)
       alt: 12 wallclock secs (12.57 usr +  0.00 sys = 12.57 CPU) @ 256351.07/s (n=3222333)
      exp2: 10 wallclock secs (10.45 usr +  0.00 sys = 10.45 CPU) @ 308357.22/s (n=3222333)

Tidied so it's easier to read:

my $t = " oh, why hello there \r\n";
sub t ($) { die unless $_[0] eq "oh, why hello there"; $_[0] }
timethese(
  3222333,
  {
    "^\$" => sub {
      my $s = $t;
      for ($s) { s/^\s+//; s/\s+$// }
      t $s;
    },
    "AZ" => sub {
      my $s = $t;
      for ($s) { s/\A\s+//; s/\s+\Z// }
      t $s;
    },
    "Az" => sub {
      my $s = $t;
      for ($s) { s/\A\s+//; s/\s+\z// }
      t $s;
    },
    "alt"  => sub { ( my $s = $t ) =~ s/(^\s+|\s+$)//g;    t $s},
    "exp2" => sub { ( my $s = $t ) =~ s/^\s*(.*?)\s*$/$1/; t $s},
  }
);
Owner

doherty commented Aug 17, 2011

If you've done the benchmarks, then please feel free to send me a pull request :)

Contributor

rwstauner commented Aug 19, 2011

It was more of a "hey, look what I found," since I'm not actually using your module,
but if you can break down and do a pull request (++) then I suppose I can too (not 'til next week, though). ;-)

doherty added a commit that referenced this issue Oct 5, 2011

Owner

doherty commented Oct 5, 2011

I think that's right... I'd appreciate you double-checking my work.

@doherty doherty closed this Oct 5, 2011

Contributor

rwstauner commented Oct 10, 2011

Sorry, never meant to neglect this. I've been out of town and am now catching up. Will review later today or tomorrow.

Also, out of curiosity I should bench using $_ (and a bare s///) versus binding a var ($s =~ s///)... I wonder if there's any performance difference there or if it was solely on simplifying the regexp.

Owner

doherty commented Oct 13, 2011

No problem - I'd been neglecting it as well. Thanks again for your help

Contributor

rwstauner commented Oct 25, 2011

I altered my original benchmark and was surprised to see that explicitly binding a variable ($s =~ s///) seemed faster than the bare version (s///).
However, I was not able to reproduce the improvement with your module.
Go figure.

I also compared:

  • /o (s/$t//o)
  • moving the my $t1 outside the function (single declaration)
  • dropping the vars and embedding the regexp (s/\A\s+//).

Strangely, those 3 things which i thought would help seem to slow it down.

I think that says more about the actual usefulness of benchmarks than anything else ;-)

Anyway, your change is definitely a speed improvement when benched against the previous version,
though likely not noticeable unless you're doing 1.5 million trims :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment