-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory access fault with rocm 2.9 #125
Comments
Can you report that failure in https://github.com/ROCmSoftwarePlatform/MIOpen ? Thanks! |
Sure but can not someone from AMD move it? |
Transferred over to MIOpen for now. |
Checked the other kernels that were written in the log file until core dump.. conv -n 32 -c 64 -H 1 -W 256 -k 32 -y 1 -x 258 -p 0 -q 2 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 1 -t 1 MEM FAULT |
This seams to be fixed with ROCm 3.0-6: |
With pytorch docker image rocm2.9_ubuntu16.04_py3.6_pytorch I get following error
after starting training:
With MIOPEN_ENABLE_LOGGING_CMD=1 MIOPEN_LOG_LEVEL=5 I was able to track it down to the following kernel:
/opt/rocm/miopen/bin/MIOpenDriver conv -n 32 -c 64 -H 1 -W 256 -k 32 -y 1 -x 384 -p 0 -q 128 -u 1 -v 64 -l 1 -j 1 -m conv -g 1 -F 1 -t 1
Config:
GPU gfx900
Card: Vega 10 XTX [Radeon Vega Frontier Edition]
hipconfig:
HIP version : 2.8.19361-cbe6b65
== hipconfig
HIP_PATH : /opt/rocm
HIP_PLATFORM : hcc
CPP_CONFIG : -D__HIP_PLATFORM_HCC__= -I/opt/rocm/include -I/opt/rocm/hcc/include -I/opt/rocm/hsa/include
== hcc
HSA_PATH : /opt/rocm/hsa
HCC_HOME : /opt/rocm/hcc
HCC clang version 10.0.0 (/data/jenkins_workspace/compute-rocm-rel-2.9/external/hcc-tot/clang fa40706d8ba0b8b958d42f579120eb9b89babc00) (/data/jenkins_workspace/compute-rocm-rel-2.9/external/hcc-tot/compiler b7f876231af7fdaf52e419088b8ba9e0c3a61845) (based on HCC 2.9.19392-75835c3-fa40706-b7f8762 )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/hcc/bin
LLVM (http://llvm.org/):
LLVM version 10.0.0svn
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: skylake
Registered Targets:
amdgcn - AMD GCN GPUs
r600 - AMD GPUs HD2XXX-HD6XXX
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
HCC-cxxflags : -hc -std=c++amp -I/opt/rocm/hcc/include -I/opt/rocm/includeHCC-ldflags : -hc -std=c++amp -L/opt/rocm/hcc/lib -Wl,--rpath=/opt/rocm/hcc/lib -ldl -lm -lpthread -lhc_am -Wl,--whole-archive -lmcwamp -Wl,--no-whole-archive
=== Environment Variables
PATH=/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HIP_PLATFORM=hcc
== Linux Kernel
Hostname : 08c9b8b666c9
Linux 08c9b8b666c9 4.15.0-65-generic ROCm/pytorch#74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.6 LTS
Release: 16.04
Codename: xenia
The text was updated successfully, but these errors were encountered: