Skip to content

Conversation

@akashveramd
Copy link

@akashveramd akashveramd commented Aug 13, 2025

In this PR, I cherry picked upstream commit 78300c8.
This fixes the test_fully_shard_training_memory test under /distributed/_composable/fsdp/test_fully_shard_memory.py.
It was a failing test in Jira https://ontrack-internal.amd.com/browse/SWDEV-544125

The default workspace for hipblaslt is larger than for cublas/cublaslt which requires a slight increase to the buffer needed.

Forward-fix for pytorch#150227 that broke ROCm distributed tests but wasn't part of initial CI signal.

Pull Request resolved: pytorch#150348
Approved by: https://github.com/jeffdaily
@akashveramd akashveramd self-assigned this Aug 13, 2025
@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Aug 13, 2025

Jenkins build for 353f4ffb313e0f91191f65c8350bea11ced2705e commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@pragupta pragupta changed the title [Release/2.7][SWDEV-544125] update test buffer fudge factor for hipblaslt for test_fully_shard_training_memory test [release/2.7][SWDEV-544125] update test buffer fudge factor for hipblaslt for test_fully_shard_training_memory test Aug 15, 2025
@pragupta pragupta merged commit b0c5b24 into release/2.7 Aug 15, 2025
0 of 2 checks passed
@pragupta pragupta deleted the av_jira_544125_rel2.7 branch August 15, 2025 18:58
@akashveramd
Copy link
Author

! cherry-pick --onto release/2.8 rocm7.0_internal_testing rocm7.1_internal_testing

@dhonnappa-amd
Copy link

Nothing to cherry-pick onto the release/2.8 branch

Nothing to cherry-pick onto the rocm7.0_internal_testing branch

Nothing to cherry-pick onto the rocm7.1_internal_testing branch

Comment processed by Build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants