Skip to content

[amdsmi] Drop patches, bump submodule and enable testing#2015

Closed
dmitrii-galantsev wants to merge 9 commits intomainfrom
dgalants/test_amdsmi
Closed

[amdsmi] Drop patches, bump submodule and enable testing#2015
dmitrii-galantsev wants to merge 9 commits intomainfrom
dgalants/test_amdsmi

Conversation

@dmitrii-galantsev
Copy link
Copy Markdown
Contributor

@dmitrii-galantsev dmitrii-galantsev commented Nov 5, 2025

Motivation

  1. Combine Remove amdsmi patches after applying to base/amdsmi #1969 and [amdsmi] Fix amdsmitst's sourcing of libamd_smi.so #1869
  2. Bump amdsmi submodule (it includes the gtest, libamd_smi.so, and librocm_core.so patches)

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
@dmitrii-galantsev dmitrii-galantsev force-pushed the dgalants/test_amdsmi branch 2 times, most recently from e0922be to 9598021 Compare November 6, 2025 07:40
@dmitrii-galantsev dmitrii-galantsev changed the title DO NOT MERGE: [amdsmi] Apply patches and use amd-staging [amdsmi] Apply patches and use amd-staging Nov 6, 2025
@marbre marbre changed the title [amdsmi] Apply patches and use amd-staging [amdsmi] Apply patches and bump submodule Nov 6, 2025
@marbre marbre changed the title [amdsmi] Apply patches and bump submodule [amdsmi] Drop patches, bump submodule and enable testing Nov 6, 2025
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, the patches now upstreamed need to specify the exact soname, including the version as the rocm-core package does not provide symlinks and only ships versioned shared objects.

Copy link
Copy Markdown
Contributor Author

@dmitrii-galantsev dmitrii-galantsev Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bumped smi to ROCm/amdsmi@2bccfd1 after merging
ROCm/amdsmi@8bdf951

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
@dmitrii-galantsev dmitrii-galantsev marked this pull request as ready for review November 6, 2025 10:04
@araravik-psd
Copy link
Copy Markdown
Contributor

We are seeing multiple .so file imports while running "rocm-sdk test". Removed skipping amdsmi tests from core tests, devel tests and library tests in rocm_sdk tests

Gist file showing the import failures:

https://gist.github.com/araravik-psd/b16048ba563d6a3c2eab5dbc2b7fae21

@araravik-psd
Copy link
Copy Markdown
Contributor

Tried adding the below .so file path imports in dist_info files for rocm_sdk, rocm_core and rocm_devel packages but dont see them taking effect

LibraryEntry(
"LLVMOffload",
"core",
"libLLVMOffload.so*",
"",
"lib/llvm/lib",
)

LibraryEntry("hipsolver_fortran", "core", "libhipsolver_fortran.so*", "")

@jayhawk-commits
Copy link
Copy Markdown
Contributor

Tried adding the below .so file path imports in dist_info files for rocm_sdk, rocm_core and rocm_devel packages but dont see them taking effect

LibraryEntry( "LLVMOffload", "core", "libLLVMOffload.so*", "", "lib/llvm/lib", )

LibraryEntry("hipsolver_fortran", "core", "libhipsolver_fortran.so*", "")

Based on the gist provided, it is erroring out on treating libLLVMOffload.so.22.0git as a .so file. The wild cards in the LibraryEntry are pulling in this file. Please take a look at the packages from a nightly run to confirm the actual name of the .so file and search specifically for that and not this git file.

@jayhawk-commits
Copy link
Copy Markdown
Contributor

jayhawk-commits commented Dec 2, 2025

I downloaded a nightly build on an Ubuntu system and those .so*git files are the actual .so files and ldd command works on them. So that theory was off.

@jayhawk-commits
Copy link
Copy Markdown
Contributor

Taking a look at the gist log again, this path below does not exist.

  testSharedLibrariesLoad (rocm_sdk.tests.devel_test.ROCmDevelTest.testSharedLibrariesLoad) [Check shared library loads] (so_path=PosixPath('/workspace/TheRock/.venv/lib/python3.12/site-packages/_rocm_sdk_devel/lib/python3.12/site-packages/rocpd/libpyrocpd.cpython-312-x86_64-linux-gnu.so')) ... ERROR

@jayhawk-commits
Copy link
Copy Markdown
Contributor

rocprofiler-sdk had a PR involving rocpd recently here #2384

I also see amd_smi in the gist being in the wrong path. The .so files are not expected to be in /share/ but in /lib/ .

  testSharedLibrariesLoad (rocm_sdk.tests.devel_test.ROCmDevelTest.testSharedLibrariesLoad) [Check shared library loads] (so_path=PosixPath('/workspace/TheRock/.venv/lib/python3.12/site-packages/_rocm_sdk_devel/share/amd_smi/amdsmi/libamd_smi.so')) ... ERROR

@jayhawk-commits
Copy link
Copy Markdown
Contributor

Resolved a merge conflict.

@araravik-psd
Copy link
Copy Markdown
Contributor

rocprofiler-sdk had a PR involving rocpd recently here #2384

I also see amd_smi in the gist being in the wrong path. The .so files are not expected to be in /share/ but in /lib/ .

  testSharedLibrariesLoad (rocm_sdk.tests.devel_test.ROCmDevelTest.testSharedLibrariesLoad) [Check shared library loads] (so_path=PosixPath('/workspace/TheRock/.venv/lib/python3.12/site-packages/_rocm_sdk_devel/share/amd_smi/amdsmi/libamd_smi.so')) ... ERROR

Thanks @jayhawk-commits , I picked up the latest version of rocm with these changes after rebase and tested it on a shark machine.

I see the rocpd and roctx and a few more packages are not picked up properly, see the files are available and named fine. Wil to add these library imports to dist_info like before to check if this can be fixed.

ERROR: testSharedLibrariesLoad (rocm_sdk.tests.devel_test.ROCmDevelTest.testSharedLibrariesLoad) [Check shared library loads] (so_path=PosixPath('/home/tester/TheRock/.venv/lib/python3.12/site-packages/_rocm_sdk_devel/lib/python3.11/site-packages/rocpd/libpyrocpd.cpython-311-x86_64-linux-gnu.so'))

ERROR: testSharedLibrariesLoad (rocm_sdk.tests.devel_test.ROCmDevelTest.testSharedLibrariesLoad) [Check shared library loads] (so_path=PosixPath('/home/tester/TheRock/.venv/lib/python3.12/site-packages/_rocm_sdk_devel/lib/python3.13/site-packages/roctx/libpyroctx.cpython-313-x86_64-linux-gnu.so'))

Full logs:

rocm_sdk_test_logs.txt

dmitrii-galantsev and others added 4 commits December 10, 2025 21:42
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Copilot AI review requested due to automatic review settings December 10, 2025 22:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@ScottTodd
Copy link
Copy Markdown
Member

@araravik-psd please be careful with git pushes. This branch now contains additional commits.

@jayhawk-commits
Copy link
Copy Markdown
Contributor

Was CLI used to rebase or the 'update branch' button on GitHub?

@jayhawk-commits
Copy link
Copy Markdown
Contributor

jayhawk-commits commented Dec 10, 2025

#2470 bumps amdsmi to a newer SHA than this PR, also drops the patches, but is missing enabling the tests and the post-hook. Due to the bad state this PR is now in, I suggest we proceed to merge in #2470 as that has passed CI except for the rocwmma failure that is already pre-existing and unrelated. The enablement of tests with the post-hook can proceed on top after.

@araravik-psd
Copy link
Copy Markdown
Contributor

#2470 bumps amdsmi to a newer SHA than this PR, also drops the patches, but is missing enabling the tests and the post-hook. Due to the bad state this PR is now in, I suggest we proceed to merge in #2470 as that has passed CI except for the rocwmma failure that is already pre-existing and unrelated. The enablement of tests with the post-hook can proceed on top after.

Makes sense we can proceed with #2470 first and add the post-hook script on top of that. I was trying to use the CLI method for the rebase as update branch with rebase was disabled due to conflicts.

@jayhawk-commits
Copy link
Copy Markdown
Contributor

#2486 should also address the shared library load test failures that were from unrelated libraries.

Copy link
Copy Markdown
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the status of this PR? Are you waiting on a review? I see some merge conflicts.

@araravik-psd
Copy link
Copy Markdown
Contributor

This commits in the PR were cherry picked into

90a1ae6

Closing this PR.

@github-project-automation github-project-automation bot moved this from TODO to Done in TheRock Triage Jan 14, 2026
@dmitrii-galantsev dmitrii-galantsev deleted the dgalants/test_amdsmi branch March 31, 2026 03:42
ronlieb added a commit that referenced this pull request Apr 5, 2026
7beee31bc454 [Comgr] Fix metadata merge crash using upstream MsgPackDocument fixes (#2015)
55474970990e [MsgPackDocument]: Fix DocNode comparison and add copyNode (#189436)
8db8163364cb Revert "[SimplifyCFG] Extend jump-threading to allow live local defs … (#190269)
b49bd8bc2f0d [ASAN][AsmPrinter] Emit sanitized padded global symbols after alignment (#1572)
7c2d37414e30 [SimplifyCFG] Allow phi folding for boolean logic over non-equality (#185124)
5494bf13d27f [DAG] SimplifyDemandedBits - limit BITCAST -> FGETSIGN fold to custom/legal scalar SimplifyDemandedBits cases (#189363)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

7 participants