Skip to content

Memoize recurse regex#23

Merged
dmgk merged 1 commit into
dmgk:masterfrom
mpchadwick:feature/memoize-recurse-regex
Nov 15, 2020
Merged

Memoize recurse regex#23
dmgk merged 1 commit into
dmgk:masterfrom
mpchadwick:feature/memoize-recurse-regex

Conversation

@mpchadwick

Copy link
Copy Markdown
Contributor

Hi - Thanks for this project!

I'm using it in my project dbanon. I did some profiling today and found that the regexp.MustCompile calls inside faker.Fetch were the worst performance culprit in dbanon by far. While I couldn't find a use-case for the recursive regex, I was able to drastically improve performance by memoizing the *Regexp.

Below are some benchmarks.

  • user table with 1 million rows
  • Faker consulted 4 times for each row
    • faker.Name().FirstName()
    • faker.Name().LastName()
    • faker.Internet().Email()
    • faker.Internet().Password(8, 14)

Before

$ go tool pprof ~/go/src/github.com/mpchadwick/dbanon/dbanon dbanon-before.prof 
File: dbanon
Type: cpu
Time: Nov 13, 2020 at 9:25pm (EST)
Duration: 1.29mins, Total samples = 1.57mins (121.72%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10 -cum
Showing nodes accounting for 0.72s, 0.76% of 94.48s total
Dropped 359 nodes (cum <= 0.47s)
Showing top 10 nodes out of 185
      flat  flat%   sum%        cum   cum%
         0     0%     0%     62.16s 65.79%  main.main
         0     0%     0%     62.16s 65.79%  runtime.main
         0     0%     0%     61.88s 65.50%  github.com/mpchadwick/dbanon/src.LineProcessor.ProcessLine
     0.16s  0.17%  0.17%     61.88s 65.50%  github.com/mpchadwick/dbanon/src.LineProcessor.processInsert
     0.11s  0.12%  0.29%     53.76s 56.90%  github.com/mpchadwick/dbanon/src.Provider.Get
     0.11s  0.12%   0.4%     49.36s 52.24%  syreclabs.com/go/faker.Fetch
     0.01s 0.011%  0.41%     43.31s 45.84%  regexp.MustCompile
         0     0%  0.41%     43.30s 45.83%  regexp.Compile (inline)
     0.32s  0.34%  0.75%     43.30s 45.83%  regexp.compile
     0.01s 0.011%  0.76%     42.01s 44.46%  syreclabs.com/go/faker.fakeInternet.Email
(pprof) 

After

go tool pprof ~/go/src/github.com/mpchadwick/dbanon/dbanon dbanon-after.prof 
File: dbanon
Type: cpu
Time: Nov 13, 2020 at 9:58pm (EST)
Duration: 23.01s, Total samples = 20.93s (90.96%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10 -cum
Showing nodes accounting for 0.95s, 4.54% of 20.93s total
Dropped 198 nodes (cum <= 0.10s)
Showing top 10 nodes out of 173
      flat  flat%   sum%        cum   cum%
         0     0%     0%     16.07s 76.78%  main.main
         0     0%     0%     16.07s 76.78%  runtime.main
         0     0%     0%     15.77s 75.35%  github.com/mpchadwick/dbanon/src.LineProcessor.ProcessLine
     0.14s  0.67%  0.67%     15.77s 75.35%  github.com/mpchadwick/dbanon/src.LineProcessor.processInsert
     0.05s  0.24%  0.91%      8.26s 39.46%  github.com/mpchadwick/dbanon/src.Provider.Get
     0.01s 0.048%  0.96%      6.61s 31.58%  syreclabs.com/go/faker.fakeInternet.Email
     0.09s  0.43%  1.39%      4.82s 23.03%  syreclabs.com/go/faker.Fetch
     0.66s  3.15%  4.54%      4.62s 22.07%  github.com/blastrain/vitess-sqlparser/sqlparser.(*yyParserImpl).Parse
         0     0%  4.54%      4.62s 22.07%  github.com/blastrain/vitess-sqlparser/sqlparser.Parse
         0     0%  4.54%      4.62s 22.07%  github.com/blastrain/vitess-sqlparser/sqlparser.yyParse (inline)
(pprof)

As you can see I was able to cut execution time for dbanon from 95 seconds to 20 seconds with this change.

@dmgk dmgk merged commit 94c4ac7 into dmgk:master Nov 15, 2020
@dmgk

dmgk commented Nov 15, 2020

Copy link
Copy Markdown
Owner

Merged, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants