Building PyTorch for ROCm 3.0 directly on host #581

junqingchang · 2020-02-07T09:33:33Z

Compiling a list of steps that worked for me that may help others as I had to refer to different sites for it.

References

Installation notes taken and edited from https://github.com/ROCmSoftwarePlatform/pytorch/wiki/Building-PyTorch-for-ROCm#option-4-install-directly-on-host

Fixes from
https://www.lizenghai.com/archives/43692.html
#550 (comment)

What made it work

More dependencies

apt install rock-dkms rocm-dev rocm-libs miopen-hip miopengemm hipsparse rccl rocthrust hipcub roctracer-dev

Using patch suggested on #550 (comment)

Editing pytorch/cmake/External/rccl.cmake file by adding set(RCCL_DIR "/opt/rocm/rccl/lib/cmake/rccl") to the top.

Full Build Notes

Option 4: Install directly on host

This is an option for advanced users that can analyze and solve potential problems arising from their environment and/or dependencies. This assumes a python3.6 based install, others versions may be different.

Note: We've received a report that as of ROCm 2.6, the host system must have gcc newer than version 6, same report indicates version 8 works.

Install ROCm and ROCm dependencies on host

Follow instructions in ROCm install documentation. In particular, the following packages must be installed:

apt install rock-dkms rocm-dev rocm-libs miopen-hip miopengemm hipsparse rccl rocthrust hipcub roctracer-dev

Install PyTorch package requirements:

Using the package manager of your distribution (apt or yum)
Package names are for Ubuntu 18.04, other distributions may have different naming.

apt install git python3-pip libopenblas-dev cmake libnuma-dev autoconf build-essential ca-certificates curl libgoogle-glog-dev libhiredis-dev libiomp-dev libleveldb-dev liblmdb-dev libopencv-dev libpthread-stubs0-dev libsnappy-dev sudo vim libprotobuf-dev protobuf-compiler

Install PyTorch pip requirements:

pip3 install enum34 numpy pyyaml setuptools typing cffi future hypothesis

Clone pytorch master:

cd ~
git clone https://github.com/pytorch/pytorch.git or git clone https://github.com/ROCmSoftwarePlatform/pytorch.git
cd pytorch
wget https://raw.githubusercontent.com/pytorch/pytorch/3a7ecd32eb7418e18146fe09dc9301076b5f0f17/caffe2/operators/relu_op.cu
mv relu_op.cu caffe2/operators/
git submodule update --init --recursive

Adjust ROCm internal dependency declarations:

This is a current shortcoming that we will address in a future ROCm release. Please be aware that this alters your ROCm installation!

sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocsparse/lib/cmake/rocsparse/rocsparse-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocfft/lib/cmake/rocfft/rocfft-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/miopen/lib/cmake/miopen/miopen-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rccl/lib/cmake/rccl/rccl-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/hipsparse/lib/cmake/hipsparse/hipsparse-config.cmake

Next you need to edit your pytorch/cmake/External/rccl.cmake file. Open the file with your preferred editor and add set(RCCL_DIR "/opt/rocm/rccl/lib/cmake/rccl") to the top. The file should look something like

set(RCCL_DIR "/opt/rocm/rccl/lib/cmake/rccl")
if (NOT __NCCL_INCLUDED)
  set(__NCCL_INCLUDED TRUE)

  if (USE_SYSTEM_NCCL)
    # NCCL_ROOT, NCCL_LIB_DIR, NCCL_INCLUDE_DIR will be accounted in the following line.
    find_package(RCCL REQUIRED)
    if (RCCL_FOUND)
      message (STATUS "RCCL Found!")
      add_library(__caffe2_nccl INTERFACE)
      target_link_libraries(__caffe2_nccl INTERFACE ${PYTORCH_RCCL_LIBRARIES})
      target_include_directories(__caffe2_nccl INTERFACE ${RCCL_INCLUDE_DIRS})
    else()
      message (STATUS "RCCL NOT Found!")
    endif()
  else()
    message (STATUS "USE_SYSTEM_NCCL=OFF is not supported yet when using RCCL")
  endif()
endif()

Run "hipify" to prepare source code:

cd pytorch/
python tools/amd_build/build_amd.py

Build and install pytorch:

By default, PyTorch will build for gfx803, gfx900, and gfx906 simultaneously (to see which AMD uarch you have, run /opt/rocm/bin/rocm_agent_enumerator, gfx900 are Vega10-type GPUs (MI25, Vega56, Vega64, ...) and work best). If you want to compile only for your uarch, export PYTORCH_ROCM_ARCH=gfx900 to gfx803, gfx900, or gfx906. Then build with
USE_ROCM=1 USE_LMDB=1 USE_OPENCV=1 MAX_JOBS=4 python setup.py install --user
Use MAX_JOBS=n to limit peak memory usage. If building fails try falling back to fewer jobs. 4 jobs assume available main memory of 16 GB or larger.

Confirm working installation:

PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py --verbose
runs all unit tests and skips as appropriate on your system based on ROCm and, e.g., single or multi GPU configuration. No tests will fail if the compilation and installation is correct.
Individual test sets can be run with
PYTORCH_TEST_WITH_ROCM=1 python test/test_nn.py --verbose
Where test_nn.py can be replaced with any other test set.

System Info

This was built on Ubuntu 18.04, ROCm 3.0, and Python 3.6.9

Note

Test using PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py --verbose does not pass all test, however, some basic testing using torch works. Will keep an eye if there are further errors

The text was updated successfully, but these errors were encountered:

Ge0rges · 2020-04-29T22:36:03Z

Hello, I'm attempting to follow these instructions but pytorch build fails, below the error.

HIP (/opt/rocm-3.3.0/hip) was built using hcc 3.1.20114-6776c83f-1ce0fe5e88b, but you are using /opt/rocm/hcc/hcc with version from hipcc. Please rebuild HIP including cmake or update HCC_HOME variable.

I appreciate any clues as to how I can rebuild HIP

Here's what I get when running the mentionned command:

gio@gio-linux-desktop:~/pytorch$ /opt/rocm-3.3.0/hip/bin/hipcc --help
/opt/rocm/hcc/bin/hcc: error while loading shared libraries: libtinfo.so.5: cannot open shared object file: No such file or directory
Use of uninitialized value $HCC_VERSION_MAJOR in substitution (s///) at /opt/rocm-3.3.0/hip/bin/hipcc line 242.
Use of uninitialized value $HCC_VERSION_MAJOR in string eq at /opt/rocm-3.3.0/hip/bin/hipcc line 256.
Use of uninitialized value $HCC_VERSION in string ne at /opt/rocm-3.3.0/hip/bin/hipcc line 754.
Use of uninitialized value $HCC_VERSION in concatenation (.) or string at /opt/rocm-3.3.0/hip/bin/hipcc line 755.
HIP (/opt/rocm-3.3.0/hip) was built using hcc 3.1.20114-6776c83f-1ce0fe5e88b, but you are using /opt/rocm/hcc/hcc with version  from hipcc. Please rebuild HIP including cmake or update HCC_HOME variable.
Died at /opt/rocm-3.3.0/hip/bin/hipcc line 756.

junqingchang · 2020-04-30T01:06:27Z

Maybe you can try the solution on https://github.com/ROCmSoftwarePlatform/hipBLAS/wiki/Build#common-build-problems

Ge0rges · 2020-04-30T01:07:25Z

Thanks I missed that!

junqingchang closed this as completed Feb 7, 2020

MichaelEssich mentioned this issue Mar 8, 2020

errors while compiling PyTroch ROCm/ROCm#1036

Closed

mritunjaymusale mentioned this issue Mar 25, 2020

Any update on 5700 Xt support? ROCm/ROCm#887

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building PyTorch for ROCm 3.0 directly on host #581

Building PyTorch for ROCm 3.0 directly on host #581

junqingchang commented Feb 7, 2020 •

edited

Ge0rges commented Apr 29, 2020 •

edited

junqingchang commented Apr 30, 2020

Ge0rges commented Apr 30, 2020

Building PyTorch for ROCm 3.0 directly on host #581

Building PyTorch for ROCm 3.0 directly on host #581

Comments

junqingchang commented Feb 7, 2020 • edited

References

What made it work

Full Build Notes

Option 4: Install directly on host

System Info

Note

Ge0rges commented Apr 29, 2020 • edited

junqingchang commented Apr 30, 2020

Ge0rges commented Apr 30, 2020

junqingchang commented Feb 7, 2020 •

edited

Ge0rges commented Apr 29, 2020 •

edited