Improve performance when generating with a specific locale #1286

vitaly-ivanov · 2024-07-03T15:39:25Z

Overview

This is a partial fix for the issue #1285.

Result before changes:

Time to process 1000000 values: 40.341209708s

Result after changes:

Time to process 1000000 values: 19.826039958s

Result with no locale:

Time to process 1000000 values: 2.310795584s

Results are based on the code here: https://github.com/vitaly-ivanov/datafaker-memory-leak/blob/main/app/src/main/kotlin/org/example/Performance.kt

Details

Nulls were interpreted as no attempt to load

If values were missing for the specified locale, the loadValues method returned null instead of an empty map, resulting in subsequent attempts to load values into the cache.

Nulls were interpreted as no attempt to load If values were missing for the specified locale, the `loadValues` method returned null instead of an empty map, resulting in subsequent attempts to load values into the cache.

codecov-commenter · 2024-07-03T20:29:22Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.95%. Comparing base (b37c566) to head (2af17cc).
Report is 203 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #1286      +/-   ##
============================================
- Coverage     92.35%   91.95%   -0.41%     
- Complexity     2821     3086     +265     
============================================
  Files           292      310      +18     
  Lines          5609     6025     +416     
  Branches        599      627      +28     
============================================
+ Hits           5180     5540     +360     
- Misses          275      333      +58     
+ Partials        154      152       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

asolntsev

LGTM

bodiam · 2024-07-03T22:34:19Z

Thanks for this PR. Any chance you add a small unit / performance test to this to prevent regressions?

vitaly-ivanov · 2024-07-04T07:45:56Z

Thanks for this PR. Any chance you add a small unit / performance test to this to prevent regressions?

I think adding performance tests as unit tests may not be a good practice due to the execution time. I see that the benchmarks are in the separate repository https://github.com/datafaker-net/datafaker-benchmark. Should I add a new benchmark there?

snuyanzin · 2024-07-04T09:39:31Z

yep, benchmark might be a good idea

bodiam · 2024-07-04T10:18:00Z

Thanks for this PR. Any chance you add a small unit / performance test to this to prevent regressions?

I think adding performance tests as unit tests may not be a good practice due to the execution time. I see that the benchmarks are in the separate repository https://github.com/datafaker-net/datafaker-benchmark. Should I add a new benchmark there?

We have a few lightweight performance tests as part of the build, so it depends a bit on how long the test run is. If it's 10 seconds, it should be fine to add it to this build, if it's more then that, perhaps Datafaker-benchmark might be better.

vitaly-ivanov · 2024-07-04T15:32:28Z

I have added a new banchmark: datafaker-net/datafaker-benchmark#25

Before changes:

Benchmark                                        Mode  Cnt     Score     Error   Units
LocalePerformanceBenchmark.en_fullname          thrpt   10  2700.254 ± 161.029  ops/ms
LocalePerformanceBenchmark.en_gb_fullname       thrpt   10    12.079 ±   1.952  ops/ms
LocalePerformanceBenchmark.en_gb_streetAddress  thrpt   10    11.230 ±   1.599  ops/ms
LocalePerformanceBenchmark.en_streetAddress     thrpt   10  2045.465 ±  70.737  ops/ms

After changes:

Benchmark                                        Mode  Cnt     Score     Error   Units
LocalePerformanceBenchmark.en_fullname          thrpt   10  2645.880 ± 216.553  ops/ms
LocalePerformanceBenchmark.en_gb_fullname       thrpt   10    38.643 ±   3.966  ops/ms
LocalePerformanceBenchmark.en_gb_streetAddress  thrpt   10    51.417 ±  11.488  ops/ms
LocalePerformanceBenchmark.en_streetAddress     thrpt   10  1909.792 ± 213.204  ops/ms

It gets a little better, but the difference is still significant for some reason.

asolntsev · 2024-07-08T20:47:06Z

@bodiam @snuyanzin Could you please release DF 2.3.1 with this fix?
I heard complains about performance of DS 2.3.0 (that became obvious after we fixed the memory leaks).

bodiam · 2024-07-09T07:37:34Z

Yes, can do, but I was hoping to release this one too: #1281

Increase performance for generation with specified locale

2af17cc

Nulls were interpreted as no attempt to load If values were missing for the specified locale, the `loadValues` method returned null instead of an empty map, resulting in subsequent attempts to load values into the cache.

vitaly-ivanov mentioned this pull request Jul 3, 2024

Locale specific generation performance degradation #1285

Closed

asolntsev added the enhancement New feature or request label Jul 3, 2024

asolntsev approved these changes Jul 3, 2024

View reviewed changes

vitaly-ivanov mentioned this pull request Jul 4, 2024

Add a benchmark to test the performance of generating the most common values on different locales datafaker-net/datafaker-benchmark#25

Merged

snuyanzin approved these changes Jul 4, 2024

View reviewed changes

snuyanzin merged commit 452793d into datafaker-net:main Jul 4, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance when generating with a specific locale #1286

Improve performance when generating with a specific locale #1286

vitaly-ivanov commented Jul 3, 2024

codecov-commenter commented Jul 3, 2024

asolntsev left a comment

bodiam commented Jul 3, 2024

vitaly-ivanov commented Jul 4, 2024

snuyanzin commented Jul 4, 2024

bodiam commented Jul 4, 2024

vitaly-ivanov commented Jul 4, 2024

asolntsev commented Jul 8, 2024

bodiam commented Jul 9, 2024

Improve performance when generating with a specific locale #1286

Improve performance when generating with a specific locale #1286

Conversation

vitaly-ivanov commented Jul 3, 2024

Overview

Details

codecov-commenter commented Jul 3, 2024

Codecov Report

asolntsev left a comment

Choose a reason for hiding this comment

bodiam commented Jul 3, 2024

vitaly-ivanov commented Jul 4, 2024

snuyanzin commented Jul 4, 2024

bodiam commented Jul 4, 2024

vitaly-ivanov commented Jul 4, 2024

asolntsev commented Jul 8, 2024

bodiam commented Jul 9, 2024