-
Notifications
You must be signed in to change notification settings - Fork 33
XLA Allocator stats #517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XLA Allocator stats #517
Conversation
| return std::numeric_limits<int64_t>::min(); | ||
| return stats->peak_pool_bytes.value_or(std::numeric_limits<int64_t>::min()); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's possible to replace this in order to limit GetAllocatorStats calls:
extern "C" tsl::AllocatorStats getAllocatorStats(PjRtDevice *Device) {
auto stats = Device->GetAllocatorStats();
return stats.value(); //probably use a tsl::AllocatorStats* to deal with unsupported devices.
}Optional are represented with a 8 bytes discriminant. So a Tuple can be used here for instance.
struct AllocatorStats
num_allocs::Int64
bytes_in_use::Int64
peak_bytes_in_use::Int64
largest_alloc_size::Int64
bytes_limit::Tuple{Int64,Int64}
bytes_reserved::Int64
peak_bytes_reserved::Int64
bytes_reservable_limit::Tuple{Int64,Int64}
largest_free_block_bytes::Int64
pool_bytes::Tuple{Int64,Int64}
peak_pool_bytes::Tuple{Int64,Int64}
end
@ccall mlir_c.getAllocatorStats(device::XLA.Device)::getAllocatorStatsI cannot test this locally, I cannot build ReactantExtra with cuda. And cpu device doesn't use AllocatorStats at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m okay either way, but if you want to go this route, have it return the result via a pointer in the first arg and have the function return void (to avoid ABI issues with struct returning functions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should the Tuple{Int,Int} be interpreted for std::optional<int64_t> ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first one is the value, the second is the discriminant:
get_value(t::Tuple{Int,Int}) = t[2] % 2 == 1 ? t[1] : nothing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly concerned with directly doing this (relying on the ABI of std optional), could we instead make our own struct which explicitly contains the ints (which we unwrap in c++)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this kind of things, i use a std::tuple. you could use a std::tuple<bool,int> where the first value says if the value is valid or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using typemin(Int64) as a sentinel like I did here should be ok and C friendly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glou-nes reading it more closely, don't rely on passing that tsl::AllocatorStats directly. you have no guarantee it will work.
you can instead:
- save it into a pointer (e.g. using a copy/move constructor with new) and pass the pointer
- create your own C-struct and pass that, without C++ objects in it
Optional are represented with a 8 bytes discriminant. So a Tuple can be used here for instance.
i had problems with this kind of tricks even with the most naive C++ classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure! Thanks for the feedback, seems fair to only consider C ABI. My solution is hacky anyway. That kind of bug are a pain to debug, I got several with Ops.convolution. 2) is probably the way here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the code to make a single call to GetAllocatorStats. But I could not try on a CUDA machine so far (only by using a manually constructed tsl::AllocatorStats).
|
@Pangoraw jll is now landed if you want to try btw |
julia> stats = Reactant.XLA.allocatorstats()
stats.num_allocs, stats.bytes_in_use
(0, 0)
julia> x = Reactant.to_rarray(randn(Float32, 1000));
julia> stats = Reactant.XLA.allocatorstats()
stats.num_allocs, stats.bytes_in_use
(1, 4096)
julia> x = nothing; GC.gc(true)
julia> stats = Reactant.XLA.allocatorstats()
stats.num_allocs, stats.bytes_in_use
(1, 0) |
|
is this working on CPU-only too? |
|
unfortunately, no |
No description provided.