Skip to content

Conversation

@Pangoraw
Copy link
Collaborator

No description provided.

return std::numeric_limits<int64_t>::min();
return stats->peak_pool_bytes.value_or(std::numeric_limits<int64_t>::min());
}

Copy link
Contributor

@glou-nes glou-nes Jan 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's possible to replace this in order to limit GetAllocatorStats calls:

extern "C" tsl::AllocatorStats getAllocatorStats(PjRtDevice *Device) {
  auto stats = Device->GetAllocatorStats();
  return stats.value(); //probably use a tsl::AllocatorStats* to deal with unsupported devices.
}

Optional are represented with a 8 bytes discriminant. So a Tuple can be used here for instance.

struct AllocatorStats
    num_allocs::Int64
    bytes_in_use::Int64
    peak_bytes_in_use::Int64
    largest_alloc_size::Int64
    bytes_limit::Tuple{Int64,Int64}
    bytes_reserved::Int64
    peak_bytes_reserved::Int64
    bytes_reservable_limit::Tuple{Int64,Int64}
    largest_free_block_bytes::Int64
    pool_bytes::Tuple{Int64,Int64}
    peak_pool_bytes::Tuple{Int64,Int64}
end

@ccall mlir_c.getAllocatorStats(device::XLA.Device)::getAllocatorStats

I cannot test this locally, I cannot build ReactantExtra with cuda. And cpu device doesn't use AllocatorStats at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m okay either way, but if you want to go this route, have it return the result via a pointer in the first arg and have the function return void (to avoid ABI issues with struct returning functions)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should the Tuple{Int,Int} be interpreted for std::optional<int64_t> ?

Copy link
Contributor

@glou-nes glou-nes Jan 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first one is the value, the second is the discriminant:

get_value(t::Tuple{Int,Int}) = t[2] % 2 == 1 ? t[1] : nothing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly concerned with directly doing this (relying on the ABI of std optional), could we instead make our own struct which explicitly contains the ints (which we unwrap in c++)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this kind of things, i use a std::tuple. you could use a std::tuple<bool,int> where the first value says if the value is valid or not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using typemin(Int64) as a sentinel like I did here should be ok and C friendly ?

Copy link
Collaborator

@mofeing mofeing Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glou-nes reading it more closely, don't rely on passing that tsl::AllocatorStats directly. you have no guarantee it will work.

you can instead:

  1. save it into a pointer (e.g. using a copy/move constructor with new) and pass the pointer
  2. create your own C-struct and pass that, without C++ objects in it

Optional are represented with a 8 bytes discriminant. So a Tuple can be used here for instance.

i had problems with this kind of tricks even with the most naive C++ classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure! Thanks for the feedback, seems fair to only consider C ABI. My solution is hacky anyway. That kind of bug are a pain to debug, I got several with Ops.convolution. 2) is probably the way here.

Copy link
Collaborator Author

@Pangoraw Pangoraw Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code to make a single call to GetAllocatorStats. But I could not try on a CUDA machine so far (only by using a manually constructed tsl::AllocatorStats).

@Pangoraw Pangoraw marked this pull request as ready for review January 13, 2025 16:25
@wsmoses wsmoses merged commit 0e8a17a into EnzymeAD:main Jan 15, 2025
26 of 38 checks passed
@wsmoses
Copy link
Member

wsmoses commented Jan 16, 2025

@Pangoraw jll is now landed if you want to try btw

@Pangoraw Pangoraw deleted the allocator-stats branch January 16, 2025 13:05
@Pangoraw
Copy link
Collaborator Author

julia> stats = Reactant.XLA.allocatorstats()
       stats.num_allocs, stats.bytes_in_use
(0, 0)

julia> x = Reactant.to_rarray(randn(Float32, 1000));

julia> stats = Reactant.XLA.allocatorstats()
       stats.num_allocs, stats.bytes_in_use
(1, 4096)

julia> x = nothing; GC.gc(true)

julia> stats = Reactant.XLA.allocatorstats()
       stats.num_allocs, stats.bytes_in_use
(1, 0)

@mofeing
Copy link
Collaborator

mofeing commented Jan 16, 2025

is this working on CPU-only too?

@Pangoraw
Copy link
Collaborator Author

unfortunately, no

ERROR: UNIMPLEMENTED: GetAllocatorStats is not supported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants