Introduce fastpath backend#27
Conversation
49bd191 to
d407139
Compare
d407139 to
b0086a4
Compare
Moving backend initialization out of the DriverState constructor and into DriverState::incRefCount() resulted in the backends getting initialized after rocFileIo() cached an empty backend vector from DriverState::getBackends(). This change relies on compilers supporting thread-stafe static initialization. DriverState's backend vector is now initialized the first time DriverState::getBackends() is called.
There was a problem hiding this comment.
Pull request overview
This PR introduces a fastpath backend that enables direct GPU-to-file I/O operations, bypassing host memory staging when the destination buffer is on a GPU. This optimization leverages the hipAmdFileRead and hipAmdFileWrite functions when available, providing significant performance improvements for GPU workloads with direct I/O requirements.
Key changes:
- Implements Fastpath backend with alignment and O_DIRECT requirements for GPU device memory
- Adds lazy backend initialization with runtime detection of HIP file I/O capabilities
- Extends error handling to properly propagate std::system_error exceptions from HIP layer
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| rocfile/src/backend/fastpath.cpp | Core implementation of fastpath backend with scoring and I/O logic for direct GPU file operations |
| rocfile/src/backend/fastpath.h | Header defining Fastpath backend interface |
| rocfile/src/state.cpp | Modified to lazily initialize backends based on HIP runtime capabilities |
| rocfile/src/state.h | Changed backends member to mutable for lazy initialization |
| rocfile/src/rocfile.cpp | Added catch handler for std::system_error to properly convert to errno |
| rocfile/test/fastpath.cpp | Comprehensive test suite for fastpath backend covering scoring, alignment, and I/O operations |
| rocfile/test/io.cpp | Updated test setup to initialize backends with required HIP mock expectations |
| rocfile/test/CMakeLists.txt | Added fastpath.cpp to test sources |
| rocfile/src/CMakeLists.txt | Added backend/fastpath.cpp to build sources |
| hipfile/examples/aiscp/aiscp.cpp | Improved error messages and checking logic (with one bug fix needed) |
| .codespellrc | Added 'hsa' to ignore list for HSA API names |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * - HSA_STATUS_ERROR_INVALID_ARGUMENT if size is zero | ||
| * - This check should be be removed | ||
| * - HSA_STATUS_ERROR_INVALID_ARGUMENT if devicePtr is NULL | ||
| * - This check should be be removed |
There was a problem hiding this comment.
Typo in comment: "be be removed" has a duplicated "be".
| @@ -0,0 +1,354 @@ | |||
| /* Copyright (c) Advanced Micro Devices, Inc. All rights reserved. | |||
There was a problem hiding this comment.
The test file doesn't verify that the Fastpath backend is actually selected over the Fallback backend when appropriate conditions are met. While individual score() tests exist, there's no integration test that confirms the backend selection logic in state.cpp prioritizes Fastpath correctly. Consider adding a test that verifies Fastpath has a higher score than Fallback for device memory with O_DIRECT.
| accept_io &= 0 <= file_offset; | ||
| accept_io &= 0 <= buffer_offset; | ||
|
|
||
| accept_io &= !(size & 0xFFF); |
There was a problem hiding this comment.
The alignment check !(size & 0xFFF) accepts size=0 since 0 is 4KiB-aligned. However, according to the extensive comment block above (lines 45-46 and 63-64), the HIP runtime and HSA layers return errors for size=0. While these checks "should be removed" according to the comments, accepting size=0 here may lead to unnecessary calls that will fail downstream. Consider explicitly rejecting size=0 until the stack is fixed, or add a comment explaining why size=0 is acceptable despite known downstream issues.
|
|
||
| struct FastpathTest : public FastpathTestBase, public Test {}; | ||
|
|
||
| TEST_F(FastpathTest, DefaultExpecations) |
There was a problem hiding this comment.
Typo in test name: "Expecations" should be "Expectations".
| if (-1 == nbytes || file_size != static_cast<size_t>(nbytes)) { | ||
| fprintf(stderr, "Could not write to %s (%zd) (%s)\n", dst_path, nbytes, strerror(errno)); | ||
| if (nbytes < 0 || file_size != static_cast<size_t>(nbytes)) { | ||
| fprintf(stderr, "Could not write to %s (%zd) (%s)\n", src_path, nbytes, |
There was a problem hiding this comment.
The error message prints "src_path" but this is the write operation for "dst_path". This will confuse users when a write error occurs, as the error message will report the wrong file. Change src_path to dst_path in this error message.
| * - status and size_coppied untouched | ||
| * - returns | ||
| * - HSA_STATUS_ERROR_INVALID_ARGUMENT if size is zero | ||
| * - This check should be be removed |
There was a problem hiding this comment.
Typo in comment: "be be removed" has a duplicated "be".
| static bool once = [&]() { | ||
| if (getHipAmdFileReadPtr() && getHipAmdFileWritePtr()) { | ||
| backends.emplace_back(new Fastpath{}); | ||
| } | ||
| backends.emplace_back(new Fallback{}); | ||
| return true; | ||
| }(); |
There was a problem hiding this comment.
The lambda capture [&] is dangerous here because it captures the 'this' pointer by reference, but the lambda is stored in a static variable that persists beyond the lifetime of any single DriverState instance. Since DriverState is a singleton accessed through Context, this may work in practice, but the code is technically undefined behavior if the singleton is ever destroyed and recreated.
Additionally, the lambda modifies the mutable 'backends' member without any synchronization. While function-local static initialization is thread-safe in C++11+, multiple threads could call getBackends() concurrently after initialization, potentially causing a data race when reading the 'backends' vector while it's being modified by emplace_back.
Consider either: (1) protecting the initialization with std::call_once or a mutex, or (2) moving the initialization to the constructor where it can be properly synchronized, or (3) using a non-capturing lambda and accessing the singleton instance explicitly within the lambda.
| // Initialize DriverState::backends | ||
| EXPECT_CALL(mhip, hipRuntimeGetVersion); | ||
| EXPECT_CALL(mhip, hipGetProcAddress(StrEq("hipAmdFileRead"), _, _, _)); | ||
| Context<DriverState>::get()->getBackends(); |
There was a problem hiding this comment.
The test setup only expects a call to hipGetProcAddress for "hipAmdFileRead", but the getBackends() implementation in state.cpp also checks getHipAmdFileWritePtr(). This means the test is missing an expected mock call for hipGetProcAddress with "hipAmdFileWrite". While this may work if the test mocking framework is lenient, with StrictMock this should fail or at minimum be incomplete. Add an expectation for hipGetProcAddress with "hipAmdFileWrite" to match the actual implementation.
| * - methods: Device::amdFileRead(...), Device::amdFileWrite(...) | ||
| * - Does not perform any input validation | ||
| * - Logs on every IO failure | ||
| * - size_copied and status are never modified |
There was a problem hiding this comment.
Typo in comment: "size_copied" is misspelled as "size_coppied".
| * - repository: rocm-systems | ||
| * - file: projects/rocr-runtime/runtime/hsa-runtime/core/runtime/hsa_ext_amd.cpp | ||
| * - hsa_amd_ais_file_read(...), hsa_amd_ais_file_write(...) | ||
| * - status and size_coppied untouched |
There was a problem hiding this comment.
Typo in comment: "size_copied" is misspelled as "size_coppied".
Motivation
Avoid bouncing IO through host memory when the destination buffer is on a GPU.