XLA Allocator stats #517

Pangoraw · 2025-01-12T16:09:00Z

No description provided.

deps/ReactantExtra/API.cpp

glou-nes · 2025-01-12T18:35:22Z

deps/ReactantExtra/API.cpp

+    return std::numeric_limits<int64_t>::min();
+  return stats->peak_pool_bytes.value_or(std::numeric_limits<int64_t>::min());
+}
+


I think it's possible to replace this in order to limit GetAllocatorStats calls:

extern "C" tsl::AllocatorStats getAllocatorStats(PjRtDevice *Device) { auto stats = Device->GetAllocatorStats(); return stats.value(); //probably use a tsl::AllocatorStats* to deal with unsupported devices. }

Optional are represented with a 8 bytes discriminant. So a Tuple can be used here for instance.

struct AllocatorStats num_allocs::Int64 bytes_in_use::Int64 peak_bytes_in_use::Int64 largest_alloc_size::Int64 bytes_limit::Tuple{Int64,Int64} bytes_reserved::Int64 peak_bytes_reserved::Int64 bytes_reservable_limit::Tuple{Int64,Int64} largest_free_block_bytes::Int64 pool_bytes::Tuple{Int64,Int64} peak_pool_bytes::Tuple{Int64,Int64} end @ccall mlir_c.getAllocatorStats(device::XLA.Device)::getAllocatorStats

I cannot test this locally, I cannot build ReactantExtra with cuda. And cpu device doesn't use AllocatorStats at all.

I’m okay either way, but if you want to go this route, have it return the result via a pointer in the first arg and have the function return void (to avoid ABI issues with struct returning functions)

How should the Tuple{Int,Int} be interpreted for std::optional<int64_t> ?

The first one is the value, the second is the discriminant:

get_value(t::Tuple{Int,Int}) = t[2] % 2 == 1 ? t[1] : nothing

I'm slightly concerned with directly doing this (relying on the ABI of std optional), could we instead make our own struct which explicitly contains the ints (which we unwrap in c++)

for this kind of things, i use a std::tuple. you could use a std::tuple<bool,int> where the first value says if the value is valid or not.

I think using typemin(Int64) as a sentinel like I did here should be ok and C friendly ?

@glou-nes reading it more closely, don't rely on passing that tsl::AllocatorStats directly. you have no guarantee it will work.

you can instead:

save it into a pointer (e.g. using a copy/move constructor with new) and pass the pointer

create your own C-struct and pass that, without C++ objects in it

Optional are represented with a 8 bytes discriminant. So a Tuple can be used here for instance.

i had problems with this kind of tricks even with the most naive C++ classes.

sure! Thanks for the feedback, seems fair to only consider C ABI. My solution is hacky anyway. That kind of bug are a pain to debug, I got several with Ops.convolution. 2) is probably the way here.

I updated the code to make a single call to GetAllocatorStats. But I could not try on a CUDA machine so far (only by using a manually constructed tsl::AllocatorStats).

wsmoses · 2025-01-16T03:37:09Z

@Pangoraw jll is now landed if you want to try btw

Pangoraw · 2025-01-16T13:08:29Z

julia> stats = Reactant.XLA.allocatorstats()
       stats.num_allocs, stats.bytes_in_use
(0, 0)

julia> x = Reactant.to_rarray(randn(Float32, 1000));

julia> stats = Reactant.XLA.allocatorstats()
       stats.num_allocs, stats.bytes_in_use
(1, 4096)

julia> x = nothing; GC.gc(true)

julia> stats = Reactant.XLA.allocatorstats()
       stats.num_allocs, stats.bytes_in_use
(1, 0)

mofeing · 2025-01-16T13:46:34Z

is this working on CPU-only too?

Pangoraw · 2025-01-16T13:52:24Z

unfortunately, no

ERROR: UNIMPLEMENTED: GetAllocatorStats is not supported

Pangoraw added 2 commits January 12, 2025 12:17

allocator

1a12fd5

allocator2

5abd857

Pangoraw mentioned this pull request Jan 12, 2025

Add get_memory_allocated() and get_memory_allocated_gb() functions #515

Closed

wsmoses reviewed Jan 12, 2025

View reviewed changes

deps/ReactantExtra/API.cpp Outdated Show resolved Hide resolved

glou-nes reviewed Jan 12, 2025

View reviewed changes

Pangoraw added 3 commits January 12, 2025 20:03

throw error when unsupported

4fd970e

single GetAllocatorStats call

592d279

format

c8349f5

Pangoraw marked this pull request as ready for review January 13, 2025 16:25

fixup

65d0d11

wsmoses merged commit 0e8a17a into EnzymeAD:main Jan 15, 2025
26 of 38 checks passed

Pangoraw deleted the allocator-stats branch January 16, 2025 13:05

glou-nes mentioned this pull request Mar 7, 2025

Thunk Change glou-nes/Reactant.jl#10

Closed

XLA Allocator stats #517

XLA Allocator stats #517

Uh oh!

Conversation

Pangoraw commented Jan 12, 2025

Uh oh!

Uh oh!

glou-nes Jan 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wsmoses Jan 12, 2025

Choose a reason for hiding this comment

Uh oh!

Pangoraw Jan 12, 2025

Choose a reason for hiding this comment

Uh oh!

glou-nes Jan 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wsmoses Jan 12, 2025

Choose a reason for hiding this comment

Uh oh!

mofeing Jan 13, 2025

Choose a reason for hiding this comment

Uh oh!

Pangoraw Jan 13, 2025

Choose a reason for hiding this comment

Uh oh!

mofeing Jan 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glou-nes Jan 13, 2025

Choose a reason for hiding this comment

Uh oh!

Pangoraw Jan 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wsmoses commented Jan 16, 2025

Uh oh!

Pangoraw commented Jan 16, 2025

Uh oh!

mofeing commented Jan 16, 2025

Uh oh!

Pangoraw commented Jan 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

glou-nes Jan 12, 2025 •

edited

Loading

glou-nes Jan 12, 2025 •

edited

Loading

mofeing Jan 13, 2025 •

edited

Loading

Pangoraw Jan 13, 2025 •

edited

Loading