Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memoize recurse regex #23

Merged
merged 1 commit into from
Nov 15, 2020
Merged

Conversation

mpchadwick
Copy link
Contributor

Hi - Thanks for this project!

I'm using it in my project dbanon. I did some profiling today and found that the regexp.MustCompile calls inside faker.Fetch were the worst performance culprit in dbanon by far. While I couldn't find a use-case for the recursive regex, I was able to drastically improve performance by memoizing the *Regexp.

Below are some benchmarks.

  • user table with 1 million rows
  • Faker consulted 4 times for each row
    • faker.Name().FirstName()
    • faker.Name().LastName()
    • faker.Internet().Email()
    • faker.Internet().Password(8, 14)

Before

$ go tool pprof ~/go/src/github.com/mpchadwick/dbanon/dbanon dbanon-before.prof 
File: dbanon
Type: cpu
Time: Nov 13, 2020 at 9:25pm (EST)
Duration: 1.29mins, Total samples = 1.57mins (121.72%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10 -cum
Showing nodes accounting for 0.72s, 0.76% of 94.48s total
Dropped 359 nodes (cum <= 0.47s)
Showing top 10 nodes out of 185
      flat  flat%   sum%        cum   cum%
         0     0%     0%     62.16s 65.79%  main.main
         0     0%     0%     62.16s 65.79%  runtime.main
         0     0%     0%     61.88s 65.50%  github.com/mpchadwick/dbanon/src.LineProcessor.ProcessLine
     0.16s  0.17%  0.17%     61.88s 65.50%  github.com/mpchadwick/dbanon/src.LineProcessor.processInsert
     0.11s  0.12%  0.29%     53.76s 56.90%  github.com/mpchadwick/dbanon/src.Provider.Get
     0.11s  0.12%   0.4%     49.36s 52.24%  syreclabs.com/go/faker.Fetch
     0.01s 0.011%  0.41%     43.31s 45.84%  regexp.MustCompile
         0     0%  0.41%     43.30s 45.83%  regexp.Compile (inline)
     0.32s  0.34%  0.75%     43.30s 45.83%  regexp.compile
     0.01s 0.011%  0.76%     42.01s 44.46%  syreclabs.com/go/faker.fakeInternet.Email
(pprof) 

After

go tool pprof ~/go/src/github.com/mpchadwick/dbanon/dbanon dbanon-after.prof 
File: dbanon
Type: cpu
Time: Nov 13, 2020 at 9:58pm (EST)
Duration: 23.01s, Total samples = 20.93s (90.96%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10 -cum
Showing nodes accounting for 0.95s, 4.54% of 20.93s total
Dropped 198 nodes (cum <= 0.10s)
Showing top 10 nodes out of 173
      flat  flat%   sum%        cum   cum%
         0     0%     0%     16.07s 76.78%  main.main
         0     0%     0%     16.07s 76.78%  runtime.main
         0     0%     0%     15.77s 75.35%  github.com/mpchadwick/dbanon/src.LineProcessor.ProcessLine
     0.14s  0.67%  0.67%     15.77s 75.35%  github.com/mpchadwick/dbanon/src.LineProcessor.processInsert
     0.05s  0.24%  0.91%      8.26s 39.46%  github.com/mpchadwick/dbanon/src.Provider.Get
     0.01s 0.048%  0.96%      6.61s 31.58%  syreclabs.com/go/faker.fakeInternet.Email
     0.09s  0.43%  1.39%      4.82s 23.03%  syreclabs.com/go/faker.Fetch
     0.66s  3.15%  4.54%      4.62s 22.07%  github.com/blastrain/vitess-sqlparser/sqlparser.(*yyParserImpl).Parse
         0     0%  4.54%      4.62s 22.07%  github.com/blastrain/vitess-sqlparser/sqlparser.Parse
         0     0%  4.54%      4.62s 22.07%  github.com/blastrain/vitess-sqlparser/sqlparser.yyParse (inline)
(pprof)

As you can see I was able to cut execution time for dbanon from 95 seconds to 20 seconds with this change.

@dmgk dmgk merged commit 94c4ac7 into dmgk:master Nov 15, 2020
@dmgk
Copy link
Owner

dmgk commented Nov 15, 2020

Merged, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants