Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building PyTorch for ROCm 3.0 directly on host #581

Closed
junqingchang opened this issue Feb 7, 2020 · 3 comments
Closed

Building PyTorch for ROCm 3.0 directly on host #581

junqingchang opened this issue Feb 7, 2020 · 3 comments

Comments

@junqingchang
Copy link

junqingchang commented Feb 7, 2020

Compiling a list of steps that worked for me that may help others as I had to refer to different sites for it.

References

Installation notes taken and edited from https://github.com/ROCmSoftwarePlatform/pytorch/wiki/Building-PyTorch-for-ROCm#option-4-install-directly-on-host

Fixes from
https://www.lizenghai.com/archives/43692.html
#550 (comment)

What made it work

More dependencies

apt install rock-dkms rocm-dev rocm-libs miopen-hip miopengemm hipsparse rccl rocthrust hipcub roctracer-dev

Using patch suggested on #550 (comment)

Editing pytorch/cmake/External/rccl.cmake file by adding set(RCCL_DIR "/opt/rocm/rccl/lib/cmake/rccl") to the top.

Full Build Notes

Option 4: Install directly on host

This is an option for advanced users that can analyze and solve potential problems arising from their environment and/or dependencies. This assumes a python3.6 based install, others versions may be different.

Note: We've received a report that as of ROCm 2.6, the host system must have gcc newer than version 6, same report indicates version 8 works.

  1. Install ROCm and ROCm dependencies on host

Follow instructions in ROCm install documentation. In particular, the following packages must be installed:

apt install rock-dkms rocm-dev rocm-libs miopen-hip miopengemm hipsparse rccl rocthrust hipcub roctracer-dev
  1. Install PyTorch package requirements:

Using the package manager of your distribution (apt or yum)
Package names are for Ubuntu 18.04, other distributions may have different naming.

apt install git python3-pip libopenblas-dev cmake libnuma-dev autoconf build-essential ca-certificates curl libgoogle-glog-dev libhiredis-dev libiomp-dev libleveldb-dev liblmdb-dev libopencv-dev libpthread-stubs0-dev libsnappy-dev sudo vim libprotobuf-dev protobuf-compiler
  1. Install PyTorch pip requirements:
pip3 install enum34 numpy pyyaml setuptools typing cffi future hypothesis
  1. Clone pytorch master:
cd ~
git clone https://github.com/pytorch/pytorch.git or git clone https://github.com/ROCmSoftwarePlatform/pytorch.git
cd pytorch
wget https://raw.githubusercontent.com/pytorch/pytorch/3a7ecd32eb7418e18146fe09dc9301076b5f0f17/caffe2/operators/relu_op.cu
mv relu_op.cu caffe2/operators/
git submodule update --init --recursive
  1. Adjust ROCm internal dependency declarations:

This is a current shortcoming that we will address in a future ROCm release. Please be aware that this alters your ROCm installation!

sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocsparse/lib/cmake/rocsparse/rocsparse-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocfft/lib/cmake/rocfft/rocfft-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/miopen/lib/cmake/miopen/miopen-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rccl/lib/cmake/rccl/rccl-config.cmake
sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/hipsparse/lib/cmake/hipsparse/hipsparse-config.cmake

Next you need to edit your pytorch/cmake/External/rccl.cmake file. Open the file with your preferred editor and add set(RCCL_DIR "/opt/rocm/rccl/lib/cmake/rccl") to the top. The file should look something like

set(RCCL_DIR "/opt/rocm/rccl/lib/cmake/rccl")
if (NOT __NCCL_INCLUDED)
  set(__NCCL_INCLUDED TRUE)

  if (USE_SYSTEM_NCCL)
    # NCCL_ROOT, NCCL_LIB_DIR, NCCL_INCLUDE_DIR will be accounted in the following line.
    find_package(RCCL REQUIRED)
    if (RCCL_FOUND)
      message (STATUS "RCCL Found!")
      add_library(__caffe2_nccl INTERFACE)
      target_link_libraries(__caffe2_nccl INTERFACE ${PYTORCH_RCCL_LIBRARIES})
      target_include_directories(__caffe2_nccl INTERFACE ${RCCL_INCLUDE_DIRS})
    else()
      message (STATUS "RCCL NOT Found!")
    endif()
  else()
    message (STATUS "USE_SYSTEM_NCCL=OFF is not supported yet when using RCCL")
  endif()
endif()
  1. Run "hipify" to prepare source code:
cd pytorch/
python tools/amd_build/build_amd.py
  1. Build and install pytorch:

By default, PyTorch will build for gfx803, gfx900, and gfx906 simultaneously (to see which AMD uarch you have, run /opt/rocm/bin/rocm_agent_enumerator, gfx900 are Vega10-type GPUs (MI25, Vega56, Vega64, ...) and work best). If you want to compile only for your uarch, export PYTORCH_ROCM_ARCH=gfx900 to gfx803, gfx900, or gfx906. Then build with
USE_ROCM=1 USE_LMDB=1 USE_OPENCV=1 MAX_JOBS=4 python setup.py install --user
Use MAX_JOBS=n to limit peak memory usage. If building fails try falling back to fewer jobs. 4 jobs assume available main memory of 16 GB or larger.

  1. Confirm working installation:

PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py --verbose
runs all unit tests and skips as appropriate on your system based on ROCm and, e.g., single or multi GPU configuration. No tests will fail if the compilation and installation is correct.
Individual test sets can be run with
PYTORCH_TEST_WITH_ROCM=1 python test/test_nn.py --verbose
Where test_nn.py can be replaced with any other test set.

System Info

This was built on Ubuntu 18.04, ROCm 3.0, and Python 3.6.9

Note

Test using PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py --verbose does not pass all test, however, some basic testing using torch works. Will keep an eye if there are further errors

@Ge0rges
Copy link

Ge0rges commented Apr 29, 2020

Hello, I'm attempting to follow these instructions but pytorch build fails, below the error.

HIP (/opt/rocm-3.3.0/hip) was built using hcc 3.1.20114-6776c83f-1ce0fe5e88b, but you are using /opt/rocm/hcc/hcc with version from hipcc. Please rebuild HIP including cmake or update HCC_HOME variable.

I appreciate any clues as to how I can rebuild HIP

Here's what I get when running the mentionned command:

gio@gio-linux-desktop:~/pytorch$ /opt/rocm-3.3.0/hip/bin/hipcc --help
/opt/rocm/hcc/bin/hcc: error while loading shared libraries: libtinfo.so.5: cannot open shared object file: No such file or directory
Use of uninitialized value $HCC_VERSION_MAJOR in substitution (s///) at /opt/rocm-3.3.0/hip/bin/hipcc line 242.
Use of uninitialized value $HCC_VERSION_MAJOR in string eq at /opt/rocm-3.3.0/hip/bin/hipcc line 256.
Use of uninitialized value $HCC_VERSION in string ne at /opt/rocm-3.3.0/hip/bin/hipcc line 754.
Use of uninitialized value $HCC_VERSION in concatenation (.) or string at /opt/rocm-3.3.0/hip/bin/hipcc line 755.
HIP (/opt/rocm-3.3.0/hip) was built using hcc 3.1.20114-6776c83f-1ce0fe5e88b, but you are using /opt/rocm/hcc/hcc with version  from hipcc. Please rebuild HIP including cmake or update HCC_HOME variable.
Died at /opt/rocm-3.3.0/hip/bin/hipcc line 756.

@junqingchang
Copy link
Author

@Ge0rges
Copy link

Ge0rges commented Apr 30, 2020

Thanks I missed that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants