Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

More efficient s/// #3

Closed
rwstauner opened this Issue · 6 comments

2 participants

@rwstauner

There are more efficient substitutions than using an alternation and /g which you may wish to consider.

The example from perlop seems best:

                   s/^\s*(.*?)\s*$/$1/;        # trim whitespace in $_, expensively

                   for ($variable) {           # trim whitespace in $variable, cheap
                       s/^\s+//;
                       s/\s+$//;
                   }

Though doing a quick Benchmark suggests that using \A and \z may actually be the fastest (and arguably most appropriate):

perl -Mstrict -Mwarnings -MBenchmark=:all -e 'my $t = " oh, why hello there \r\n"; sub t ($) { die unless $_[0] eq "oh, why hello there"; $_[0] } timethese(-5, { "^\$" => sub { my $s = $t; for($s){s/^\s+//;s/\s+$//} t $s}, "AZ" => sub { my $s = $t; for($s){s/\A\s+//;s/\s+\
Z//} t $s}, "Az" => sub { my $s = $t; for($s){s/\A\s+//;s/\s+\z//} t $s}, "alt" => sub { (my $s = $t) =~ s/(^\s+|\s+$)//g; t $s}, "exp2" =
> sub { (my $s = $t) =~ s/^\s*(.*?)\s*$/$1/; t $s}, });'
Benchmark: running AZ, Az, ^$, alt, exp2 for at least 5 CPU seconds...
        AZ:  5 wallclock secs ( 5.40 usr +  0.00 sys =  5.40 CPU) @ 573766.85/s (n=3098341)
        Az:  4 wallclock secs ( 5.37 usr +  0.00 sys =  5.37 CPU) @ 588067.97/s (n=3157925)
        ^$:  6 wallclock secs ( 5.34 usr +  0.00 sys =  5.34 CPU) @ 580213.67/s (n=3098341)
       alt:  6 wallclock secs ( 5.38 usr +  0.00 sys =  5.38 CPU) @ 239821.56/s (n=1290240)
      exp2:  6 wallclock secs ( 5.39 usr +  0.00 sys =  5.39 CPU) @ 322235.81/s (n=1736851)

Same result from a different perl (and specifying iterations instead of time):

perl -Mstrict -Mwarnings -MBenchmark=:all -e 'my $t = " oh, why hello there \r\n"; sub t ($) { die unless $_[0] eq "oh, why he
llo there"; $_[0] } timethese(3222333, { "^\$" => sub { my $s = $t; for($s){s/^\s+//;s/\s+$//} t $s}, "AZ" => sub { my $s = $t; for($s){s/
\A\s+//;s/\s+\Z//} t $s}, "Az" => sub { my $s = $t; for($s){s/\A\s+//;s/\s+\z//} t $s}, "alt" => sub { (my $s = $t) =~ s/(^\s+|\s+$)//g; t
 $s}, "exp2" => sub { (my $s = $t) =~ s/^\s*(.*?)\s*$/$1/; t $s}, });'                                                                    
Benchmark: timing 3222333 iterations of AZ, Az, ^$, alt, exp2...
        AZ:  5 wallclock secs ( 5.60 usr +  0.00 sys =  5.60 CPU) @ 575416.61/s (n=3222333)
        Az:  7 wallclock secs ( 5.54 usr +  0.00 sys =  5.54 CPU) @ 581648.56/s (n=3222333)
        ^$:  6 wallclock secs ( 5.59 usr +  0.00 sys =  5.59 CPU) @ 576445.97/s (n=3222333)
       alt: 12 wallclock secs (12.57 usr +  0.00 sys = 12.57 CPU) @ 256351.07/s (n=3222333)
      exp2: 10 wallclock secs (10.45 usr +  0.00 sys = 10.45 CPU) @ 308357.22/s (n=3222333)

Tidied so it's easier to read:

my $t = " oh, why hello there \r\n";
sub t ($) { die unless $_[0] eq "oh, why hello there"; $_[0] }
timethese(
  3222333,
  {
    "^\$" => sub {
      my $s = $t;
      for ($s) { s/^\s+//; s/\s+$// }
      t $s;
    },
    "AZ" => sub {
      my $s = $t;
      for ($s) { s/\A\s+//; s/\s+\Z// }
      t $s;
    },
    "Az" => sub {
      my $s = $t;
      for ($s) { s/\A\s+//; s/\s+\z// }
      t $s;
    },
    "alt"  => sub { ( my $s = $t ) =~ s/(^\s+|\s+$)//g;    t $s},
    "exp2" => sub { ( my $s = $t ) =~ s/^\s*(.*?)\s*$/$1/; t $s},
  }
);
@doherty
Owner

If you've done the benchmarks, then please feel free to send me a pull request :)

@rwstauner

It was more of a "hey, look what I found," since I'm not actually using your module,
but if you can break down and do a pull request (++) then I suppose I can too (not 'til next week, though). ;-)

@doherty doherty referenced this issue from a commit
@doherty Faster regexes; rstauner++
Fixes GH #3
d2117e7
@doherty
Owner

I think that's right... I'd appreciate you double-checking my work.

@doherty doherty closed this
@rwstauner

Sorry, never meant to neglect this. I've been out of town and am now catching up. Will review later today or tomorrow.

Also, out of curiosity I should bench using $_ (and a bare s///) versus binding a var ($s =~ s///)... I wonder if there's any performance difference there or if it was solely on simplifying the regexp.

@doherty
Owner

No problem - I'd been neglecting it as well. Thanks again for your help

@rwstauner

I altered my original benchmark and was surprised to see that explicitly binding a variable ($s =~ s///) seemed faster than the bare version (s///).
However, I was not able to reproduce the improvement with your module.
Go figure.

I also compared:

  • /o (s/$t//o)
  • moving the my $t1 outside the function (single declaration)
  • dropping the vars and embedding the regexp (s/\A\s+//).

Strangely, those 3 things which i thought would help seem to slow it down.

I think that says more about the actual usefulness of benchmarks than anything else ;-)

Anyway, your change is definitely a speed improvement when benched against the previous version,
though likely not noticeable unless you're doing 1.5 million trims :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.