feat: improve performance when detecting country codes #274

ElMassimo · 2023-03-31T14:34:46Z

Description 📖

This pull request adds a 10x speed up when the country for a phone number is unknown.

Ran tests locally, and they all pass.

Background 📜

Currently, when parsing international phone numbers this library allocates a significant amount of regexes (256 countries), and will unnecessarily match against all 256, although in practice it can only match a maximum of 1.

Country codes have 1, 2, or 3 digits, and have the interesting property that shorter codes are not prefixes of longer codes.

The global_phone library takes advantage of this to optimize country code detection.

By taking the first three prefixes of digits, it's possible to do a hash-based lookup instead of cycling through all countries.

The Fix 🔨

Applying the techniques mentioned above to optimize detect_and_parse.

As a result, instead of creating 256 regexes and matching all of them every time a phone with an unknown country code was parsed, it will now perform only 3 hash lookups.

Benchmarks 📊

This optimization yields a 10x speed up when the country code is unknown! 🚀

Added a new benchmark in spec/phonelib_ips_bench.rb, which can be run with rspec.

Before

Calculating -------------------------------------
       known country     27.029  (± 0.0%) i/s -    136.000  in   5.032140s
     unknown country      2.253  (± 0.0%) i/s -     12.000  in   5.325396s

Comparison:
       known country:       27.0 i/s
     unknown country:        2.3 i/s - 11.99x  slower

After

Calculating -------------------------------------
       known country     26.913  (± 0.0%) i/s -    136.000  in   5.053798s
     unknown country     23.172  (± 0.0%) i/s -    116.000  in   5.006049s

Comparison:
       known country:       26.9 i/s
     unknown country:       23.2 i/s - 1.16x  slower

Now the library will perform similarly when a country code is provided than when it needs to be detected.

If we combine this with the work in:

Only strip 00 prefix when country is not specified #268

it should make both cases even faster, and make both cases comparable in performance (only 1.03x slower).

Memory Usage 📊

After this pull request, this use case allocates 5x less memory, so GC pressure will be mitigated as well.

Before

Calculating -------------------------------------
     unknown country    84.087M memsize (   160.465k retained)
                         1.859M objects (   513.000  retained)
                        50.000  strings (    50.000  retained)

After

Calculating -------------------------------------
     unknown country    15.234M memsize (   164.490k retained)
                       232.193k objects (   534.000  retained)
                        50.000  strings (    50.000  retained)

daddyz · 2023-04-04T14:41:59Z

@ElMassimo nice, thanks for PR

feat: improve performance when detecting country codes

a000eab

ElMassimo mentioned this pull request Mar 31, 2023

Only strip 00 prefix when country is not specified #268

Closed

ElMassimo added 2 commits March 31, 2023 11:58

chore: add benchmarks

1b3a9df

chore: add memory benchmark

3a650cc

daddyz merged commit 1d46ab7 into daddyz:master Apr 4, 2023

ElMassimo deleted the perf/detect-country-code branch April 4, 2023 15:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve performance when detecting country codes #274

feat: improve performance when detecting country codes #274

ElMassimo commented Mar 31, 2023 •

edited

Loading

daddyz commented Apr 4, 2023

feat: improve performance when detecting country codes #274

feat: improve performance when detecting country codes #274

Conversation

ElMassimo commented Mar 31, 2023 • edited Loading

Description 📖

Background 📜

The Fix 🔨

Benchmarks 📊

Before

After

Memory Usage 📊

Before

After

daddyz commented Apr 4, 2023

ElMassimo commented Mar 31, 2023 •

edited

Loading