Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve performance when detecting country codes #274

Merged
merged 3 commits into from
Apr 4, 2023

Conversation

ElMassimo
Copy link
Contributor

@ElMassimo ElMassimo commented Mar 31, 2023

Description πŸ“–

This pull request adds a 10x speed up when the country for a phone number is unknown.

Ran tests locally, and they all pass.

Background πŸ“œ

Currently, when parsing international phone numbers this library allocates a significant amount of regexes (256 countries), and will unnecessarily match against all 256, although in practice it can only match a maximum of 1.

Country codes have 1, 2, or 3 digits, and have the interesting property that shorter codes are not prefixes of longer codes.

The global_phone library takes advantage of this to optimize country code detection.

By taking the first three prefixes of digits, it's possible to do a hash-based lookup instead of cycling through all countries.

The Fix πŸ”¨

Applying the techniques mentioned above to optimize detect_and_parse.

As a result, instead of creating 256 regexes and matching all of them every time a phone with an unknown country code was parsed, it will now perform only 3 hash lookups.

Benchmarks πŸ“Š

This optimization yields a 10x speed up when the country code is unknown! πŸš€

Added a new benchmark in spec/phonelib_ips_bench.rb, which can be run with rspec.

Before

Calculating -------------------------------------
       known country     27.029  (Β± 0.0%) i/s -    136.000  in   5.032140s
     unknown country      2.253  (Β± 0.0%) i/s -     12.000  in   5.325396s

Comparison:
       known country:       27.0 i/s
     unknown country:        2.3 i/s - 11.99x  slower

After

Calculating -------------------------------------
       known country     26.913  (Β± 0.0%) i/s -    136.000  in   5.053798s
     unknown country     23.172  (Β± 0.0%) i/s -    116.000  in   5.006049s

Comparison:
       known country:       26.9 i/s
     unknown country:       23.2 i/s - 1.16x  slower

Now the library will perform similarly when a country code is provided than when it needs to be detected.

If we combine this with the work in:

it should make both cases even faster, and make both cases comparable in performance (only 1.03x slower).

Memory Usage πŸ“Š

After this pull request, this use case allocates 5x less memory, so GC pressure will be mitigated as well.

Before

Calculating -------------------------------------
     unknown country    84.087M memsize (   160.465k retained)
                         1.859M objects (   513.000  retained)
                        50.000  strings (    50.000  retained)

After

Calculating -------------------------------------
     unknown country    15.234M memsize (   164.490k retained)
                       232.193k objects (   534.000  retained)
                        50.000  strings (    50.000  retained)

@daddyz daddyz merged commit 1d46ab7 into daddyz:master Apr 4, 2023
@daddyz
Copy link
Owner

daddyz commented Apr 4, 2023

@ElMassimo nice, thanks for PR

@ElMassimo ElMassimo deleted the perf/detect-country-code branch April 4, 2023 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants