# More efficient s/// #3

Closed
opened this Issue Aug 17, 2011 · 6 comments

Projects
None yet
2 participants
Contributor

### rwstauner commented Aug 17, 2011

 There are more efficient substitutions than using an alternation and `/g` which you may wish to consider. The example from `perlop` seems best: `````` s/^\s*(.*?)\s*\$/\$1/; # trim whitespace in \$_, expensively for (\$variable) { # trim whitespace in \$variable, cheap s/^\s+//; s/\s+\$//; } `````` Though doing a quick Benchmark suggests that using `\A` and `\z` may actually be the fastest (and arguably most appropriate): ``````perl -Mstrict -Mwarnings -MBenchmark=:all -e 'my \$t = " oh, why hello there \r\n"; sub t (\$) { die unless \$_[0] eq "oh, why hello there"; \$_[0] } timethese(-5, { "^\\$" => sub { my \$s = \$t; for(\$s){s/^\s+//;s/\s+\$//} t \$s}, "AZ" => sub { my \$s = \$t; for(\$s){s/\A\s+//;s/\s+\ Z//} t \$s}, "Az" => sub { my \$s = \$t; for(\$s){s/\A\s+//;s/\s+\z//} t \$s}, "alt" => sub { (my \$s = \$t) =~ s/(^\s+|\s+\$)//g; t \$s}, "exp2" = > sub { (my \$s = \$t) =~ s/^\s*(.*?)\s*\$/\$1/; t \$s}, });' Benchmark: running AZ, Az, ^\$, alt, exp2 for at least 5 CPU seconds... AZ: 5 wallclock secs ( 5.40 usr + 0.00 sys = 5.40 CPU) @ 573766.85/s (n=3098341) Az: 4 wallclock secs ( 5.37 usr + 0.00 sys = 5.37 CPU) @ 588067.97/s (n=3157925) ^\$: 6 wallclock secs ( 5.34 usr + 0.00 sys = 5.34 CPU) @ 580213.67/s (n=3098341) alt: 6 wallclock secs ( 5.38 usr + 0.00 sys = 5.38 CPU) @ 239821.56/s (n=1290240) exp2: 6 wallclock secs ( 5.39 usr + 0.00 sys = 5.39 CPU) @ 322235.81/s (n=1736851) `````` Same result from a different perl (and specifying iterations instead of time): ``````perl -Mstrict -Mwarnings -MBenchmark=:all -e 'my \$t = " oh, why hello there \r\n"; sub t (\$) { die unless \$_[0] eq "oh, why he llo there"; \$_[0] } timethese(3222333, { "^\\$" => sub { my \$s = \$t; for(\$s){s/^\s+//;s/\s+\$//} t \$s}, "AZ" => sub { my \$s = \$t; for(\$s){s/ \A\s+//;s/\s+\Z//} t \$s}, "Az" => sub { my \$s = \$t; for(\$s){s/\A\s+//;s/\s+\z//} t \$s}, "alt" => sub { (my \$s = \$t) =~ s/(^\s+|\s+\$)//g; t \$s}, "exp2" => sub { (my \$s = \$t) =~ s/^\s*(.*?)\s*\$/\$1/; t \$s}, });' Benchmark: timing 3222333 iterations of AZ, Az, ^\$, alt, exp2... AZ: 5 wallclock secs ( 5.60 usr + 0.00 sys = 5.60 CPU) @ 575416.61/s (n=3222333) Az: 7 wallclock secs ( 5.54 usr + 0.00 sys = 5.54 CPU) @ 581648.56/s (n=3222333) ^\$: 6 wallclock secs ( 5.59 usr + 0.00 sys = 5.59 CPU) @ 576445.97/s (n=3222333) alt: 12 wallclock secs (12.57 usr + 0.00 sys = 12.57 CPU) @ 256351.07/s (n=3222333) exp2: 10 wallclock secs (10.45 usr + 0.00 sys = 10.45 CPU) @ 308357.22/s (n=3222333) `````` Tidied so it's easier to read: ``````my \$t = " oh, why hello there \r\n"; sub t (\$) { die unless \$_[0] eq "oh, why hello there"; \$_[0] } timethese( 3222333, { "^\\$" => sub { my \$s = \$t; for (\$s) { s/^\s+//; s/\s+\$// } t \$s; }, "AZ" => sub { my \$s = \$t; for (\$s) { s/\A\s+//; s/\s+\Z// } t \$s; }, "Az" => sub { my \$s = \$t; for (\$s) { s/\A\s+//; s/\s+\z// } t \$s; }, "alt" => sub { ( my \$s = \$t ) =~ s/(^\s+|\s+\$)//g; t \$s}, "exp2" => sub { ( my \$s = \$t ) =~ s/^\s*(.*?)\s*\$/\$1/; t \$s}, } ); ``````
Owner

### doherty commented Aug 17, 2011

 If you've done the benchmarks, then please feel free to send me a pull request `:)`
Contributor

### rwstauner commented Aug 19, 2011

 It was more of a "hey, look what I found," since I'm not actually using your module, but if you can break down and do a pull request (++) then I suppose I can too (not 'til next week, though). ;-)

### doherty added a commit that referenced this issue Oct 5, 2011

``` Faster regexes; rstauner++ ```
`Fixes GH #3`
``` d2117e7 ```
Owner

### doherty commented Oct 5, 2011

 I think that's right... I'd appreciate you double-checking my work.

Contributor

### rwstauner commented Oct 10, 2011

 Sorry, never meant to neglect this. I've been out of town and am now catching up. Will review later today or tomorrow. Also, out of curiosity I should bench using `\$_` (and a bare `s///`) versus binding a var (`\$s =~ s///`)... I wonder if there's any performance difference there or if it was solely on simplifying the regexp.
Owner

### doherty commented Oct 13, 2011

 No problem - I'd been neglecting it as well. Thanks again for your help
Contributor

### rwstauner commented Oct 25, 2011

 I altered my original benchmark and was surprised to see that explicitly binding a variable (`\$s =~ s///`) seemed faster than the bare version (`s///`). However, I was not able to reproduce the improvement with your module. Go figure. I also compared: `/o` (`s/\$t//o`) moving the `my \$t1` outside the function (single declaration) dropping the vars and embedding the regexp (`s/\A\s+//`). Strangely, those 3 things which i thought would help seem to slow it down. I think that says more about the actual usefulness of benchmarks than anything else ;-) Anyway, your change is definitely a speed improvement when benched against the previous version, though likely not noticeable unless you're doing 1.5 million trims :-)