Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimize allocations when unpacking TimeZones from cache #423

Conversation

nickrobinson251
Copy link
Contributor

@nickrobinson251 nickrobinson251 commented Dec 14, 2022

Even though we're careful about caching the creation of TimeZones and made FixedTimeZone an isbitstype/allocatedinline, we still allocate everytime TimeZone is called, even if just returning a FixedTimeZone from the cache!

I think this happens because the values in the cache are of type TimeZone, i.e. it's not known if we're returning a FixedTimeZone or a VariableTimeZone.

This PR proposes to instead maintain separate caches (per thread) for FixedTimeZones and VariableTimeZones, rather than a single one (per thread) for all TimeZones

Effectively this is some code duplication for a performance improvement.

VariableTimeZone still allocates once, presumably since it's not isbitstype (#271).

Micro benchmarks

FixedTimeZone

This PR

julia> str = "UTC";

julia> TimeZone(str);  # compile and cache

julia> @benchmark TimeZone($str)
BenchmarkTools.Trial: 10000 samples with 987 evaluations.
 Range (min  max):  46.733 ns  156.619 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     53.994 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   54.265 ns ±   2.880 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                   ▂     █▂     █▃
  ▂▁▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂█▅▄▄▄▅██▅▆▇████▅▄▃▃▃▃▄▄▃▃▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂ ▃
  46.7 ns         Histogram: frequency by time         61.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

master (v1.9.1)

julia> @benchmark TimeZone($str)
BenchmarkTools.Trial: 10000 samples with 943 evaluations.
 Range (min  max):  100.168 ns   1.490 μs  ┊ GC (min  max): 0.00%  92.27%
 Time  (median):     104.277 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   107.123 ns ± 48.091 ns  ┊ GC (mean ± σ):  1.61% ±  3.33%

             ▁█▆▂
  ▂▃▃▄▃▄▄▅▅▅▇████▅▄▃▃▃▄▅▅▆▇▇▇▅▅▄▃▃▃▂▂▂▂▂▂▁▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  100 ns          Histogram: frequency by time          117 ns <

 Memory estimate: 64 bytes, allocs estimate: 2.

VariableTimeZone

This PR

julia> str = "America/Winnipeg";

julia> TimeZone(str);   # compile and cache

julia> @benchmark TimeZone($str)
BenchmarkTools.Trial: 10000 samples with 946 evaluations.
 Range (min  max):   98.441 ns   1.144 μs  ┊ GC (min  max): 0.00%  90.99%
 Time  (median):     101.039 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   102.185 ns ± 32.794 ns  ┊ GC (mean ± σ):  1.01% ±  2.87%

                ▅▄█▆▁
  ▁▂▃▄▄▆█▇▆▄▄▄▄▇█████▄▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  98.4 ns         Histogram: frequency by time          108 ns <

 Memory estimate: 48 bytes, allocs estimate: 1.

master (v1.9.1)

julia> @benchmark TimeZone($str)
BenchmarkTools.Trial: 10000 samples with 909 evaluations.
 Range (min  max):  119.179 ns   1.722 μs  ┊ GC (min  max): 0.00%  92.34%
 Time  (median):     120.187 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   123.116 ns ± 47.667 ns  ┊ GC (mean ± σ):  1.33% ±  3.17%

  ▁▅██▇▆▄▃▂▂▁▁▁▁▂▃▃▂▃▃▂▁▁▁▁▁▁▁                                 ▂
  ████████████████████████████▇▇███▇▇██▇▇▇▇▇▇▇▇▇█▆▇▅▇▆▆▆▆▆▅▆▅▄ █
  119 ns        Histogram: log(frequency) by time       134 ns <

 Memory estimate: 64 bytes, allocs estimate: 2.

@nickrobinson251
Copy link
Contributor Author

~/r/TimeZones.jl> TZDATA_VERSION=2016j julia --project=benchmark/ -e 'using PkgBenchmark, TimeZones; export_markdown(stdout, judge(TimeZones, "origin/HEAD", verbose=false))'
PkgBenchmark: Running benchmarks...
[ Info: Installing 2016j tzdata region data
[ Info: Converting tz source files into TimeZone data
PkgBenchmark: creating benchmark tuning file /Users/nickr/repos/TimeZones.jl/benchmark/tune.json...
Tuning 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:36
Benchmarking 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:43
PkgBenchmark: Running benchmarks...
[ Info: Installing 2016j tzdata region data
[ Info: Converting tz source files into TimeZone data
PkgBenchmark: using benchmark tuning data in /Users/nickr/repos/TimeZones.jl/benchmark/tune.json
Benchmarking 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:48

Benchmark Report for ~/repos/TimeZones.jl

Job Properties

  • Time of benchmarks:
    • Target: 14 Dec 2022 - 18:05
    • Baseline: 14 Dec 2022 - 18:06
  • Package commits:
    • Target: 85eb32
    • Baseline: f8f39b
  • Julia commits:
    • Target: 36034a
    • Baseline: 36034a
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["ZonedDateTime", "local", "standard"] 0.84 (5%) ✅ 1.00 (1%)
["ZonedDateTime", "range", "VariableTimeZone/DatePeriod"] 0.91 (5%) ✅ 1.00 (1%)
["arithmetic", "DatePeriod"] 0.93 (5%) ✅ 1.00 (1%)
["arithmetic", "TimePeriod"] 1.09 (5%) ❌ 1.00 (1%)
["arithmetic", "broadcast", "FixedTimeZone/TimePeriod"] 0.94 (5%) ✅ 1.00 (1%)
["arithmetic", "broadcast", "VariableTimeZone/DatePeriod"] 0.93 (5%) ✅ 1.00 (1%)
["interpret", "local", "ambiguous"] 0.87 (5%) ✅ 1.00 (1%)
["interpret", "local", "standard"] 0.89 (5%) ✅ 1.00 (1%)
["parse", "ISOZonedDateTimeFormat"] 0.82 (5%) ✅ 0.87 (1%) ✅
["parse", "issue#25"] 0.92 (5%) ✅ 0.88 (1%) ✅
["transition_range", "local", "ambiguous"] 0.83 (5%) ✅ 1.00 (1%)
["transition_range", "local", "non-existent"] 0.79 (5%) ✅ 1.00 (1%)
["transition_range", "local", "standard"] 0.81 (5%) ✅ 1.00 (1%)
["transition_range", "utc"] 1.43 (5%) ❌ 1.00 (1%)
["tryparsenext_fixedtz", "+06"] 0.94 (5%) ✅ 1.00 (1%)
["tryparsenext_fixedtz", "+0600"] 0.94 (5%) ✅ 1.00 (1%)
["tryparsenext_fixedtz", "+06:00"] 0.95 (5%) ✅ 1.00 (1%)
["tryparsenext_fixedtz", "-06"] 0.93 (5%) ✅ 1.00 (1%)
["tryparsenext_fixedtz", "-0600"] 0.95 (5%) ✅ 1.00 (1%)
["tryparsenext_fixedtz", "UTC"] 0.73 (5%) ✅ 1.00 (1%)
["tryparsenext_fixedtz", "Z"] 0.56 (5%) ✅ 1.00 (1%)
["tz_data", "parse_components"] 0.89 (5%) ✅ 0.84 (1%) ✅

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["ZonedDateTime"]
  • ["ZonedDateTime", "local"]
  • ["ZonedDateTime", "range"]
  • ["arithmetic"]
  • ["arithmetic", "broadcast"]
  • ["interpret", "local"]
  • ["interpret"]
  • ["parse"]
  • ["transition_range", "local"]
  • ["transition_range"]
  • ["tryparsenext_fixedtz"]
  • ["tryparsenext_tz"]
  • ["tz_data"]

Julia versioninfo

Target

Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin21.4.0)
  uname: Darwin 21.4.0 Darwin Kernel Version 21.4.0: Mon Feb 21 20:35:58 PST 2022; root:xnu-8020.101.4~2/RELEASE_ARM64_T6000 x86_64 i386
  CPU: Apple M1 Max:
                 speed         user         nice          sys         idle          irq
       #1-10  2400 MHz    1580681 s          0 s    1469385 s   30832641 s          0 s
  Memory: 64.0 GB (2506.71875 MB free)
  Uptime: 446789.0 sec
  Load Avg:  2.73779296875  2.6376953125  2.58837890625
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, westmere)
  Threads: 1 on 10 virtual cores

Baseline

Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin21.4.0)
  uname: Darwin 21.4.0 Darwin Kernel Version 21.4.0: Mon Feb 21 20:35:58 PST 2022; root:xnu-8020.101.4~2/RELEASE_ARM64_T6000 x86_64 i386
  CPU: Apple M1 Max:
                 speed         user         nice          sys         idle          irq
       #1-10  2400 MHz    1581658 s          0 s    1469753 s   30837878 s          0 s
  Memory: 64.0 GB (2527.17578125 MB free)
  Uptime: 446854.0 sec
  Load Avg:  2.205078125  2.5078125  2.54345703125
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, westmere)
  Threads: 1 on 10 virtual cores

@nickrobinson251
Copy link
Contributor Author

Re-ran the benchmarks and the ones that are ❌ above are ✅ on re-run (and others are now not significant), so i think they're very noisy? Especially as most of these don't look like they even hit the codepaths changed in this PR?

Anyway, from this doesn't seem like we can say the two caches cause any performance issue

@nickrobinson251 nickrobinson251 marked this pull request as ready for review December 14, 2022 18:44
src/types/timezone.jl Outdated Show resolved Hide resolved
Comment on lines 70 to 90
tz, class = get!(_tz_cache(), str) do
tz_path = joinpath(_COMPILED_DIR[], split(str, "/")...)
ftz, class = get(_ftz_cache(), str, (nothing, Class(:NONE)))
if ftz !== nothing
_check_class(mask, class)
return ftz::FixedTimeZone
end
Copy link
Contributor

@NHDaly NHDaly Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 hrrmm the downside of the https://github.com/JuliaConcurrent/MultiThreadedCaches.jl approach is that it currently only exposes get!() as its interface, for reasons that i cannot fully remember..

So the trick you're doing here wouldn't work.

For one workaround idea, maybe you could figure out which type of TimeZone it is ahead of time? I don't know if that's possible though, just from its name? I guess you could have a third cache that just caches the type of the timezone?

So it'd be something like:

    type = get!(_tz_type_cache, str) do
        compute_tz_type(str)
    end
    if type === :FixedTimeZone
        tz, class = get!(ftz_cache, str) do
            compute_fixed_timezone(str)
        end
    elseif type === :VariableTimeZone
        tz, class = get!(vtz_cache, str) do
            compute_variable_timezone(str)
        end
    end

Something like that? I'm not sure if splitting things up this way is feasible though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

figure out which type of TimeZone it is ahead of time?

I think that's what's not easy, because we actually create the TimeZones by reading from a timezone-specific file

tz_constructor = if tzh_timecnt == 0 || (tzh_timecnt == 1 && transition_types[1] == TIMESTAMP_MIN)
tzj_info = transition_types[1]
name -> (FixedTimeZone(name, tzj_info.utc_offset, tzj_info.dst_offset), class)
else
transitions = Transition[]
cutoff = timestamp2datetime(cutoff_time, nothing)
prev_zone = nothing
for i in eachindex(transition_times)
timestamp = transition_times[i]
tzj_info = transition_types[transition_indices[i]]
# Sometimes tzfiles save on storage by having multiple names in one for example:
# "WSST\0" at index 1 turns into "WSST" where as index 2 results in "SST"
# for "Pacific/Apia".
name = get_designation(combined_designations, tzj_info.designation_index)
zone = FixedTimeZone(name, tzj_info.utc_offset, tzj_info.dst_offset)
if zone != prev_zone
utc_datetime = timestamp2datetime(timestamp, typemin(DateTime))
push!(transitions, Transition(utc_datetime, zone))
end
prev_zone = zone
end
name -> (VariableTimeZone(name, transitions, cutoff), class)

i.e. we have to read the file to know what type it will be, and we don't want to have to read multiple times

so i think we'd have change that read function to return the type, then put that value in a third cache, while holding on to the tz, class

something like

    local tz, class
    type = get!(_tz_type_cache, str) do
        tz, class, type = compute_tz(str)
        return type
    end
    if type === :FixedTimeZone
        tz, class = get!(ftz_cache, str) do
            (tz, class)
        end
    elseif type === :VariableTimeZone
        tz, class = get!(vtz_cache, str) do
            (tz, class)
       end
    end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmm, this way's not looking nice to me 🤔

how much benefit do we think MultiThreadedCaches.jl is going to add for us?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could maybe go add get and empty! to the MultiThreadedCaches.jl API to reduce the necessary changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i don't see a nice way to re-write this using MultiThreadedCaches.jl

at least not without extending the MultiThreadedCaches.jl API a bit, which feels like it should be done separately and not block this PR (unless we're worried about correctness issues in this PR as is)

Copy link
Contributor

@NHDaly NHDaly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 👍 This is a nice change, @nickrobinson251! It makes sense why it reduces the allocations. 👍 Thanks for investigating this so deeply

@NHDaly
Copy link
Contributor

NHDaly commented Dec 14, 2022

I'm not sure i understand why the VariableTimeZones still alloc when you look them up. I agree with you it is unexpected.. I'm not sure the isbits thing should matter, since we're pointing to the existing vector, not allocating a new one, and Julia is supposed to be able to stack-allocate an immutable struct that contains mutable fields, since like 1.4 or something. So i'm not sure i understand yet; it might be good to look through a profile together, once the rest of the issues in the PR are addressed.

@nickrobinson251
Copy link
Contributor Author

Okay, i've spent a couple days playing around with different options but i can't yet see how to remove the remaining VariableTimeZones allocation, so i propose leaving that as a follow-up. And likewise i don't think a switch to MultiThreadedCaches.jl is straight forward, so would prefer to get this perf improvement in and leave that refactor to a follow-up.

@NHDaly and @omus please could you take another look at this PR? Thanks!

@omus
Copy link
Member

omus commented Dec 21, 2022

I'll try to take a look soon. I'll call out that I also want to get #382 merged soon as well.

@codecov-commenter
Copy link

codecov-commenter commented Dec 21, 2022

Codecov Report

Merging #423 (3218a4e) into master (804864c) will decrease coverage by 0.47%.
The diff coverage is 98.41%.

❗ Current head 3218a4e differs from pull request most recent head b158178. Consider uploading reports for the commit b158178 to get more accurate results

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##           master     #423      +/-   ##
==========================================
- Coverage   95.58%   95.11%   -0.47%     
==========================================
  Files          38       36       -2     
  Lines        1766     1803      +37     
==========================================
+ Hits         1688     1715      +27     
- Misses         78       88      +10     
Impacted Files Coverage Δ
src/TimeZones.jl 100.00% <ø> (ø)
src/types/timezone.jl 98.64% <98.36%> (+1.50%) ⬆️
src/tzdata/compile.jl 94.78% <100.00%> (-0.97%) ⬇️

... and 9 files with indirect coverage changes

test/helpers.jl Outdated Show resolved Hide resolved
@omus
Copy link
Member

omus commented Jan 5, 2023

I've been a bit swamped this week. Overall, I'm in favour of this change and will definitely merge this before #382

@nickrobinson251
Copy link
Contributor Author

Just following up to make sure this doesn't drop off your radar. Let me know if i can do anything to help, @omus :)

nickrobinson251 and others added 5 commits May 3, 2023 10:27
- Should not allocate for cached FixedTimeZone
- Should allocate only once for cached VariableTimeZone
- To minimize allocations when unpacking TimeZones from the cache
Co-authored-by: Nathan Daly <nathan.daly@relational.ai>
@omus
Copy link
Member

omus commented Jun 19, 2023

It turned out #382 became urgent to merge for Julia 1.9 so the original plan had changed. If @nickrobinson251 you have time to rebase this against the latest version of the code that would be great. If not, I'll try to adapt this code myself sometime.

@PallHaraldsson
Copy link
Contributor

Is this just a (simple) rebase away? It seems good as is, down to 0 allocations is great, already an improvement. Getting rid of the 1 remaining alloc for VariableTimeZone would be ideal, but shouldn't stop from merging as is? I didn't look closely into why it's there, maybe fixed already on 1.10?

@nickrobinson251
Copy link
Contributor Author

I think this is just a (not-very-simple) rebase away. And I would love, love, love someone to take that on, as I don't have capacity at the moment -- please feel encouraged to do so!

@omus
Copy link
Member

omus commented May 23, 2024

Superseded by #451

@omus omus closed this May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Creating TimeZone from string allocates unnecessarily
6 participants