Skip to content

Introduce fastpath backend#27

Merged
kurtmcmillan merged 12 commits into
developfrom
z/kumcmill/fastpath
Nov 10, 2025
Merged

Introduce fastpath backend#27
kurtmcmillan merged 12 commits into
developfrom
z/kumcmill/fastpath

Conversation

@kurtmcmillan
Copy link
Copy Markdown
Collaborator

Motivation

Avoid bouncing IO through host memory when the destination buffer is on a GPU.

Comment thread rocfile/src/backend/fastpath.cpp
@kurtmcmillan kurtmcmillan requested a review from derobins November 7, 2025 17:01
@kurtmcmillan kurtmcmillan marked this pull request as ready for review November 7, 2025 17:10
Comment thread rocfile/src/backend/fastpath.cpp Outdated
Comment thread rocfile/src/backend/fastpath.cpp Outdated
Comment thread rocfile/src/backend/fastpath.cpp Outdated
Moving backend initialization out of the DriverState constructor and into
DriverState::incRefCount() resulted in the backends getting initialized after
rocFileIo() cached an empty backend vector from DriverState::getBackends().

This change relies on compilers supporting thread-stafe static initialization.
DriverState's backend vector is now initialized the first time
DriverState::getBackends() is called.
Comment thread rocfile/src/backend/fastpath.h
Comment thread rocfile/test/fastpath.cpp
Comment thread rocfile/test/fastpath.cpp
Comment thread rocfile/test/fastpath.cpp Outdated
Comment thread rocfile/src/backend/fastpath.cpp Outdated
@kurtmcmillan kurtmcmillan merged commit caf16c1 into develop Nov 10, 2025
29 of 32 checks passed
@kurtmcmillan kurtmcmillan deleted the z/kumcmill/fastpath branch November 10, 2025 22:06
@gaoikawa gaoikawa requested a review from Copilot December 17, 2025 00:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a fastpath backend that enables direct GPU-to-file I/O operations, bypassing host memory staging when the destination buffer is on a GPU. This optimization leverages the hipAmdFileRead and hipAmdFileWrite functions when available, providing significant performance improvements for GPU workloads with direct I/O requirements.

Key changes:

  • Implements Fastpath backend with alignment and O_DIRECT requirements for GPU device memory
  • Adds lazy backend initialization with runtime detection of HIP file I/O capabilities
  • Extends error handling to properly propagate std::system_error exceptions from HIP layer

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
rocfile/src/backend/fastpath.cpp Core implementation of fastpath backend with scoring and I/O logic for direct GPU file operations
rocfile/src/backend/fastpath.h Header defining Fastpath backend interface
rocfile/src/state.cpp Modified to lazily initialize backends based on HIP runtime capabilities
rocfile/src/state.h Changed backends member to mutable for lazy initialization
rocfile/src/rocfile.cpp Added catch handler for std::system_error to properly convert to errno
rocfile/test/fastpath.cpp Comprehensive test suite for fastpath backend covering scoring, alignment, and I/O operations
rocfile/test/io.cpp Updated test setup to initialize backends with required HIP mock expectations
rocfile/test/CMakeLists.txt Added fastpath.cpp to test sources
rocfile/src/CMakeLists.txt Added backend/fastpath.cpp to build sources
hipfile/examples/aiscp/aiscp.cpp Improved error messages and checking logic (with one bug fix needed)
.codespellrc Added 'hsa' to ignore list for HSA API names

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

* - HSA_STATUS_ERROR_INVALID_ARGUMENT if size is zero
* - This check should be be removed
* - HSA_STATUS_ERROR_INVALID_ARGUMENT if devicePtr is NULL
* - This check should be be removed
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "be be removed" has a duplicated "be".

Copilot uses AI. Check for mistakes.
Comment thread rocfile/test/fastpath.cpp
@@ -0,0 +1,354 @@
/* Copyright (c) Advanced Micro Devices, Inc. All rights reserved.
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test file doesn't verify that the Fastpath backend is actually selected over the Fallback backend when appropriate conditions are met. While individual score() tests exist, there's no integration test that confirms the backend selection logic in state.cpp prioritizes Fastpath correctly. Consider adding a test that verifies Fastpath has a higher score than Fallback for device memory with O_DIRECT.

Copilot uses AI. Check for mistakes.
accept_io &= 0 <= file_offset;
accept_io &= 0 <= buffer_offset;

accept_io &= !(size & 0xFFF);
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alignment check !(size & 0xFFF) accepts size=0 since 0 is 4KiB-aligned. However, according to the extensive comment block above (lines 45-46 and 63-64), the HIP runtime and HSA layers return errors for size=0. While these checks "should be removed" according to the comments, accepting size=0 here may lead to unnecessary calls that will fail downstream. Consider explicitly rejecting size=0 until the stack is fixed, or add a comment explaining why size=0 is acceptable despite known downstream issues.

Copilot uses AI. Check for mistakes.
Comment thread rocfile/test/fastpath.cpp

struct FastpathTest : public FastpathTestBase, public Test {};

TEST_F(FastpathTest, DefaultExpecations)
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in test name: "Expecations" should be "Expectations".

Copilot uses AI. Check for mistakes.
if (-1 == nbytes || file_size != static_cast<size_t>(nbytes)) {
fprintf(stderr, "Could not write to %s (%zd) (%s)\n", dst_path, nbytes, strerror(errno));
if (nbytes < 0 || file_size != static_cast<size_t>(nbytes)) {
fprintf(stderr, "Could not write to %s (%zd) (%s)\n", src_path, nbytes,
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message prints "src_path" but this is the write operation for "dst_path". This will confuse users when a write error occurs, as the error message will report the wrong file. Change src_path to dst_path in this error message.

Copilot uses AI. Check for mistakes.
* - status and size_coppied untouched
* - returns
* - HSA_STATUS_ERROR_INVALID_ARGUMENT if size is zero
* - This check should be be removed
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "be be removed" has a duplicated "be".

Copilot uses AI. Check for mistakes.
Comment thread rocfile/src/state.cpp
Comment on lines +280 to +286
static bool once = [&]() {
if (getHipAmdFileReadPtr() && getHipAmdFileWritePtr()) {
backends.emplace_back(new Fastpath{});
}
backends.emplace_back(new Fallback{});
return true;
}();
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lambda capture [&] is dangerous here because it captures the 'this' pointer by reference, but the lambda is stored in a static variable that persists beyond the lifetime of any single DriverState instance. Since DriverState is a singleton accessed through Context, this may work in practice, but the code is technically undefined behavior if the singleton is ever destroyed and recreated.

Additionally, the lambda modifies the mutable 'backends' member without any synchronization. While function-local static initialization is thread-safe in C++11+, multiple threads could call getBackends() concurrently after initialization, potentially causing a data race when reading the 'backends' vector while it's being modified by emplace_back.

Consider either: (1) protecting the initialization with std::call_once or a mutex, or (2) moving the initialization to the constructor where it can be properly synchronized, or (3) using a non-capturing lambda and accessing the singleton instance explicitly within the lambda.

Copilot uses AI. Check for mistakes.
Comment thread rocfile/test/io.cpp
Comment on lines +124 to +127
// Initialize DriverState::backends
EXPECT_CALL(mhip, hipRuntimeGetVersion);
EXPECT_CALL(mhip, hipGetProcAddress(StrEq("hipAmdFileRead"), _, _, _));
Context<DriverState>::get()->getBackends();
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test setup only expects a call to hipGetProcAddress for "hipAmdFileRead", but the getBackends() implementation in state.cpp also checks getHipAmdFileWritePtr(). This means the test is missing an expected mock call for hipGetProcAddress with "hipAmdFileWrite". While this may work if the test mocking framework is lenient, with StrictMock this should fail or at minimum be incomplete. Add an expectation for hipGetProcAddress with "hipAmdFileWrite" to match the actual implementation.

Copilot uses AI. Check for mistakes.
* - methods: Device::amdFileRead(...), Device::amdFileWrite(...)
* - Does not perform any input validation
* - Logs on every IO failure
* - size_copied and status are never modified
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "size_copied" is misspelled as "size_coppied".

Copilot uses AI. Check for mistakes.
* - repository: rocm-systems
* - file: projects/rocr-runtime/runtime/hsa-runtime/core/runtime/hsa_ext_amd.cpp
* - hsa_amd_ais_file_read(...), hsa_amd_ais_file_write(...)
* - status and size_coppied untouched
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "size_copied" is misspelled as "size_coppied".

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants