-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not running tests when more memory is needed than available. #1011
Conversation
This should be the standard for PRs for everyone, impressive. |
Please revert 42f8f01 to test on V340L |
This reverts commit 42f8f01.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good you cleaned out some old style allocators but could have been a separate PR.
clients/common/utility.cpp
Outdated
hipError_t err = hipMemGetInfo(&free_mem, &total_mem); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the cost of this in terms of time? Wondering if it is feasible to just move this into the device* allocators and return a failure that is handled by the CHECK_HIP_ERROR on device memory rather than all these special cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also replace the CHECK_HIP_ERROR calls with a CHECK_HIP_DEV_ALLOC if hipMalloc failure is fine to trigger without checking memory free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Placeholder to give me time to review changes
I addressed Torre's comments on this and moved the checking code into the device*.hpp files instead of individual tests since the check takes only a few microseconds, and now it's generic when adding future tests. |
Potentially resolves SWDEV-223558 and SWDEV-220984.
Summary
This allows tests to check how much memory will be allocated on the device, and compares this to how much memory is free. If the amount of memory to be allocated is greater, it immediately succeeds, and allows logging that a test was skipped.
This seems a little hacky, so any advice is appreciated.
Implementation
In utility.cpp, I use
hipMemGetInfo
to query how much memory is available. In each test, I calculate how much memory is needed, and if it's greater, I use googletest's SUCCESS with a unique message. In rocblas_gtest_main.cpp the listener detects this message and records the skipped test.Example Output
[----------] Global test environment tear-down
[==========] 13697 tests from 2 test cases ran. (18047 ms total)
[ PASSED ] 13697 tests.
[ SKIPPED ] 592 tests.
Alternatives
Alternatively, GTEST_SKIP() is a more elegant solution that I believe is available in a newer release of googletest (1.10.0) (we're currently using 1.8.0, not sure if it would be an issue to update).
Notes
I only added the memory check for the functions noted in the two tickets. I also updated
ger_batched
andsyr_batched
to the newdevice_batch_vector
s as well.I haven't ran this on a V340L machine to see if it works, but it worked for my manual testing. Will do this and update.