Skip to content

Conversation

@colinbendell
Copy link

@colinbendell colinbendell commented May 24, 2024

This PR focuses on optimizing the memory allocation for the byte manipulation to and from the Rockford Base32 encoding and bytes. Specifically the encode32 and decode32 can better utilize the standard rfc4648 base32 encoding for better performance.

Encoding performance:

Calculating -------------------------------------
   encode32_baseline    321.188k (± 2.3%) i/s -      1.624M in   5.057650s
  encode32_optimized      1.218M (± 3.4%) i/s -      6.157M in   5.061423s

Comparison:
  encode32_optimized:  1218263.9 i/s
   encode32_baseline:   321188.2 i/s - 3.79x  (± 0.00) slower

Calculating -------------------------------------
   encode32_baseline     1.336k memsize (     0.000  retained)
                        29.000  objects (     0.000  retained)
                        18.000  strings (     0.000  retained)
  encode32_optimized   373.000  memsize (    80.000  retained)
                         8.000  objects (     2.000  retained)
                         3.000  strings (     0.000  retained)

Comparison:
  encode32_optimized:        373 allocated
   encode32_baseline:       1336 allocated - 3.58x more

Decoding performance:

Calculating -------------------------------------
   decode32_baseline    243.968k (± 2.4%) i/s -      1.240M in   5.084222s
  decode32_optimized      1.310M (± 9.4%) i/s -      6.722M in   5.218875s

Comparison:
  decode32_optimized:  1310294.2 i/s
   decode32_baseline:   243967.9 i/s - 5.37x  (± 0.00) slower

Calculating -------------------------------------
   decode32_baseline     1.800k memsize (     0.000  retained)
                        41.000  objects (     0.000  retained)
                        18.000  strings (     0.000  retained)
  decode32_optimized   267.000  memsize (     0.000  retained)
                         6.000  objects (     0.000  retained)
                         2.000  strings (     0.000  retained)

Comparison:
  decode32_optimized:        267 allocated
   decode32_baseline:       1800 allocated - 6.74x more

Also even simple time parsing can be optimized:

Calculating -------------------------------------
 time_parse_baseline      1.001M (± 7.3%) i/s -      5.007M in   5.051415s
time_parse_optimized      2.122M (± 7.4%) i/s -     10.511M in   5.028573s

Comparison:
time_parse_optimized:  2121531.7 i/s
 time_parse_baseline:  1000624.8 i/s - 2.12x  (± 0.00) slower

Calculating -------------------------------------
 time_parse_baseline   462.000  memsize (     0.000  retained)
                         8.000  objects (     0.000  retained)
                         2.000  strings (     0.000  retained)
      time_parse_new   286.000  memsize (    40.000  retained)
                         6.000  objects (     1.000  retained)
                         2.000  strings (     0.000  retained)

Comparison:
      time_parse_new:        286 allocated
 time_parse_baseline:        462 allocated - 1.62x more

Copy link

@JonathanOfTheWell JonathanOfTheWell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me, I'm satisfied with the test coverage and performance report, although my context on these optimizations under the hood details is limited

Copy link
Member

@martelogan martelogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really interesting to see that the simpler / idiomatic Ruby was faster after all in this case. Good case study honestly :shipit:

@colinbendell colinbendell merged commit 0fbb1cb into master May 24, 2024
@martelogan martelogan added the #gsd:40924 Checkout Flash Cookies Migration: https://vault.shopify.io/gsd/projects/40924 label Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

#gsd:40924 Checkout Flash Cookies Migration: https://vault.shopify.io/gsd/projects/40924

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants