Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hipErrorSharedObjectInitFailed when testing default example #1478

Open
flint-stone opened this issue Mar 8, 2022 · 7 comments
Open

hipErrorSharedObjectInitFailed when testing default example #1478

flint-stone opened this issue Mar 8, 2022 · 7 comments
Assignees

Comments

@flint-stone
Copy link

Hello! I was trying to test Tensile based on example provided at this link and running the example based on ../Tensile/bin/Tensile ../Tensile/Configs/rocblas_sgemm_asm_only.yaml ./ but I'm getting an error saying hipErrorSharedObjectInitFailed. Here is a detail of this error:

Compiling source kernels: Done.
# Kernel Building elapsed time = 950.7 secs
# Actual Solutions: 192 / 192 after KernelWriter
+ set +e
+ ERR1=0
+ /home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/build/0_Build/client/tensile_client --config-file /home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/build/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/00_Final/build/../source/ClientParameters.ini
loading config file /home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/build/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/00_Final/build/../source/ClientParameters.ini
Loading /home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/build/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/00_Final/source/library/Kernels.so-000-gfx1012.hsaco
terminate called after throwing an instance of ‘std::runtime_error’
  what():  Error 303(hipErrorSharedObjectInitFailed) /home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/Tensile/Source/client/main.cpp:323:
retError
hipErrorSharedObjectInitFailed
/home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/build/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/00_Final/build/run.sh: line 6:  1976 Aborted                 (core dumped) /home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/build/0_Build/client/tensile_client --config-file /home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/build/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/00_Final/build/../source/ClientParameters.ini
+ ERR2=134
+ ERR=0
+ [[ 0 -ne 0 ]]
+ [[ 134 -ne 0 ]]
+ echo two
two
+ ERR=134
+ exit 134
Tensile::warning: ClientWriter Benchmark Process exited with code 134
Tensile::warning: BenchmarkProblems: Benchmark Process exited with code 134
################################################################################
# Cijk_Ailk_Bljk_SB_00
# 00_Final: End - 965.701s
################################################################################
clientExit=1 (ERROR) for /home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/Tensile/Configs/test.yaml
Traceback (most recent call last):
  File “../Tensile/bin/Tensile”, line 36, in <module>
    Tensile.main()
  File “/home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/Tensile/Tensile.py”, line 282, in main
    Tensile(sys.argv[1:])
  File “/home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/Tensile/Tensile.py”, line 239, in Tensile
    executeStepsInConfig(config)
  File “/home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/Tensile/Tensile.py”, line 51, in executeStepsInConfig
    BenchmarkProblems.main( config[“BenchmarkProblems”] )
  File “/home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/Tensile/BenchmarkProblems.py”, line 366, in main
    shutil.copy( resultsFileName, newResultsFileName )
  File “/usr/lib/python3.6/shutil.py”, line 245, in copy
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File “/usr/lib/python3.6/shutil.py”, line 120, in copyfile
    with open(src, ‘rb’) as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: ‘/home/syifan/dev/src/github.com/ROCmSoftwarePlatform/Tensile/build/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/Data/00_Final.csv’

Any suggestions on what could be the problem here? Thanks in advance!

@syifan
Copy link

syifan commented Mar 9, 2022

Here is some more detailed information about the environment we are trying.

ROCm version: 5.0.0
GPUs: Radeon VII + RX5500XT, but we only care about the Radeon VII.
Tensile Version: The current commit on the master branch. Commit ID is d5eea38

Let me know if you need more information.

@babakpst
Copy link
Collaborator

Thanks for filing the issue. Attached here, please find the updated config file. There are some obsolete parameters in the original config file that results in this error.

Please let me know if that solves the problem.

@babakpst
Copy link
Collaborator

@babakpst
Copy link
Collaborator

I will update the sample Configs files in my next PR.

@babakpst
Copy link
Collaborator

@flint-stone @syifan

@flint-stone
Copy link
Author

Hi @babakpst -- thanks for letting us know. I tried the new configuration file and it seems I'm still getting the similar error:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Error 303(hipErrorSharedObjectInitFailed) /home/lexu/Tensile/repo/Tensile/Source/client/main.cpp:323: 
retError
hipErrorSharedObjectInitFailed

/home/lexu/Tensile/repo/build/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/00_Final/build/run.sh: line 6: 28243 Aborted                 (core dumped) /home/lexu/Tensile/repo/build/0_Build/client/tensile_client --config-file /home/lexu/Tensile/repo/build/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/00_Final/build/../source/ClientParameters.ini
+ ERR2=134
+ ERR=0
+ [[ 0 -ne 0 ]]
+ [[ 134 -ne 0 ]]
+ echo two
two
+ ERR=134
+ exit 134
Tensile::WARNING: ClientWriter Benchmark Process exited with code 134
Tensile::WARNING: BenchmarkProblems: Benchmark Process exited with code 134
################################################################################
# Cijk_Ailk_Bljk_SB_00
# 00_Final: End - 172.577s
################################################################################

clientExit=1 (ERROR) for /home/lexu/rocblas_sgemm_asm_only_ChangeMyExtensionTo_yaml.txt
Traceback (most recent call last):
  File "../Tensile/bin/Tensile", line 36, in <module>
    Tensile.main()
  File "/home/lexu/Tensile/repo/Tensile/Tensile.py", line 282, in main
    Tensile(sys.argv[1:])
  File "/home/lexu/Tensile/repo/Tensile/Tensile.py", line 239, in Tensile
    executeStepsInConfig(config)
  File "/home/lexu/Tensile/repo/Tensile/Tensile.py", line 51, in executeStepsInConfig
    BenchmarkProblems.main( config["BenchmarkProblems"] )
  File "/home/lexu/Tensile/repo/Tensile/BenchmarkProblems.py", line 366, in main
    shutil.copy( resultsFileName, newResultsFileName )
  File "/usr/lib/python3.6/shutil.py", line 245, in copy
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/usr/lib/python3.6/shutil.py", line 120, in copyfile
    with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: '/home/lexu/Tensile/repo/build/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/Data/00_Final.csv'

I simply replaced the yaml file from the instruction with the new file. Let me know if I need to provide more information.

Thanks.

@babakpst
Copy link
Collaborator

Hi @flint-stone and sorry for the late reply. I ran that yaml file on a couple of newer architectures and did not get any error messages. I managed to find a Radeon VII node and am updating that node so that I can run Tensile there. It has been some time since we tuned Tensile for that architecture. There might be some other parameters in the yaml file that are not compatible with Radeon VII architecture. I will update you once I can run Tensile on Radeon VII. Thanks for your patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants