Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual allocation strategy #143

Closed
amilsted opened this issue Aug 3, 2023 · 2 comments · Fixed by #182
Closed

Manual allocation strategy #143

amilsted opened this issue Aug 3, 2023 · 2 comments · Fixed by #182

Comments

@amilsted
Copy link

amilsted commented Aug 3, 2023

Regarding the discussion in #140, do you think the tensoralloc, tensorfree interface is compatible with approaches like this?

Could one alternatively write backend code that uses malloc and free to avoid GC? Would this require introducing a special array wrapper type?

Is it possible to guarantee that manually allocated intermediates get freed in case of an exception, within the current interface? Would this involve defining a finalizer for the array wrapper type?

@lkdvos
Copy link
Collaborator

lkdvos commented Aug 7, 2023

As promised, I've started experimenting with some strategies, it is still very much WIP but you can follow it here: AllocationKit

So far I've tried just plain malloc/free, and the finalizer version which is less prone to memory leaks, and initial benchmarks on my laptop report the underlying times. The testcase is the application of a single-site update encountered for example in DMRG or VUMPS for an Ising-like hamiltonian, repeated 30 times. (benchmark here )

In summary, it seems as if just doing this already leads to various improvements, depending on the tensor sizes. I might continue playing around this week with something like Bumper.jl, allthough I ran into some issues trying to get the StrideArray/PtrArray to work nicely with Strided.jl, so I will need to fix that first.

D = 64:

julia> result["default"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
 Range (min … max):  400.062 μs … 715.187 μs  ┊ GC (min … max): 13.62% … 35.15%
 Time  (median):     509.756 μs               ┊ GC (median):    11.73%
 Time  (mean ± σ):   503.847 μs ±  51.078 μs  ┊ GC (mean ± σ):  13.61% ±  3.73%

      ▁                       ▂▂█▅▁▁       ▁                     
  ▃▂▄▄█▄▇▄▇▆▅▅▅▄▃▃▃▄▄▄▄▄▅▄▄▄▄▆██████▆▅█▇█▇▇█▅▆▅▅▅▄▄▁▃▃▄▂▃▃▁▃▃▃▃ ▄
  400 μs           Histogram: frequency by time          617 μs <

 Memory estimate: 1.63 MiB, allocs estimate: 38.

julia> result["malloc"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
 Range (min … max):  271.277 μs … 727.405 μs  ┊ GC (min … max):  0.00% … 37.25%
 Time  (median):     345.764 μs               ┊ GC (median):    13.97%
 Time  (mean ± σ):   360.340 μs ±  61.898 μs  ┊ GC (mean ± σ):   8.75% ±  6.93%

    ▁▄▅          ▃▄▆▃█▅ ▁ ▂▁                                ▃    
  ▃████▄▆▆█▆▅▆▄▄▇████████▇████▆▅▅▅▆▃▅▄▃▃▄██▇▅▄▄▁▃▁▁▁▁▁▁▁▁▄█▇█▇▃ ▅
  271 μs           Histogram: frequency by time          493 μs <

 Memory estimate: 899.11 KiB, allocs estimate: 39.

julia> result["safemalloc"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
 Range (min … max):  276.067 μs … 568.403 μs  ┊ GC (min … max):  0.00% … 37.13%
 Time  (median):     362.419 μs               ┊ GC (median):    13.79%
 Time  (mean ± σ):   365.961 μs ±  39.841 μs  ┊ GC (mean ± σ):   8.66% ±  6.82%

                      ▃▃▃▇▅▁ ▁             ▁▇█▇▃▃                
  ▃▅▄▄▆▄▄▂▃▃▂▁▅▃▃▂▃▃▅▆██████▇█▅▇▄▆▅▄▃▄▃▄▂▆████████▅▄▄▃▃▄▂▁▁▁▁▁▂ ▄
  276 μs           Histogram: frequency by time          448 μs <

 Memory estimate: 899.11 KiB, allocs estimate: 39.

D = 128

julia> result["default"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
 Range (min … max):  1.929 ms …   3.871 ms  ┊ GC (min … max): 11.19% … 6.77%
 Time  (median):     2.403 ms               ┊ GC (median):    10.51%
 Time  (mean ± σ):   2.416 ms ± 271.936 μs  ┊ GC (mean ± σ):  10.48% ± 1.48%

                  ▁ ▃▄▇█▅                                      
  ▄▅▅▅▄▄▄▃▄▃▃▃▃▃▄▆███████▇▆▅▅▄▄▅▅▃▃▂▃▂▃▃▂▁▃▃▃▂▁▃▃▃▃▃▂▃▂▂▃▁▂▁▃ ▃
  1.93 ms         Histogram: frequency by time        3.29 ms <

 Memory estimate: 6.50 MiB, allocs estimate: 38.

julia> result["malloc"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
 Range (min … max):  1.625 ms …   3.481 ms  ┊ GC (min … max): 5.90% … 3.47%
 Time  (median):     1.734 ms               ┊ GC (median):    6.02%
 Time  (mean ± σ):   1.886 ms ± 299.257 μs  ┊ GC (mean ± σ):  6.39% ± 1.41%

   ▂█▃▇▁                                                       
  ▅█████▆▄▃▄▅▄▄▃▃▄▃▃▃▃▃▂▃▂▂▃▃▃▃▃▃▃▃▂▃▃▃▃▃▃▄▂▂▂▂▂▂▁▃▂▃▂▂▂▂▁▁▁▂ ▃
  1.62 ms         Histogram: frequency by time        2.85 ms <

 Memory estimate: 3.50 MiB, allocs estimate: 39.

julia> result["safemalloc"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
 Range (min … max):  1.625 ms …   2.556 ms  ┊ GC (min … max): 6.02% … 7.28%
 Time  (median):     1.716 ms               ┊ GC (median):    6.02%
 Time  (mean ± σ):   1.761 ms ± 134.372 μs  ┊ GC (mean ± σ):  6.81% ± 1.37%

    ▂▄█▄▄▃▆▆                                                   
  ▃▄█████████▇▇▅▄▄▃▃▃▃▃▃▄▃▁▂▃▃▃▃▃▁▂▃▃▃▃▃▃▂▂▃▂▂▃▂▁▂▃▂▁▂▂▁▂▂▁▂▃ ▃
  1.63 ms         Histogram: frequency by time        2.26 ms <

 Memory estimate: 3.50 MiB, allocs estimate: 39.

@amilsted
Copy link
Author

Nice! And cool that this is very simple to implement with the tensoralloc scheme.

@lkdvos lkdvos linked a pull request Jul 12, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants