New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building PyTorch for ROCm 3.0 directly on host #581
Comments
Hello, I'm attempting to follow these instructions but pytorch build fails, below the error.
I appreciate any clues as to how I can rebuild HIP Here's what I get when running the mentionned command:
|
Maybe you can try the solution on https://github.com/ROCmSoftwarePlatform/hipBLAS/wiki/Build#common-build-problems |
Thanks I missed that! |
Compiling a list of steps that worked for me that may help others as I had to refer to different sites for it.
References
Installation notes taken and edited from https://github.com/ROCmSoftwarePlatform/pytorch/wiki/Building-PyTorch-for-ROCm#option-4-install-directly-on-host
Fixes from
https://www.lizenghai.com/archives/43692.html
#550 (comment)
What made it work
More dependencies
Using patch suggested on #550 (comment)
Editing
pytorch/cmake/External/rccl.cmake
file by addingset(RCCL_DIR "/opt/rocm/rccl/lib/cmake/rccl")
to the top.Full Build Notes
Option 4: Install directly on host
This is an option for advanced users that can analyze and solve potential problems arising from their environment and/or dependencies. This assumes a python3.6 based install, others versions may be different.
Note: We've received a report that as of ROCm 2.6, the host system must have gcc newer than version 6, same report indicates version 8 works.
Follow instructions in ROCm install documentation. In particular, the following packages must be installed:
Using the package manager of your distribution (apt or yum)
Package names are for Ubuntu 18.04, other distributions may have different naming.
This is a current shortcoming that we will address in a future ROCm release. Please be aware that this alters your ROCm installation!
Next you need to edit your
pytorch/cmake/External/rccl.cmake
file. Open the file with your preferred editor and addset(RCCL_DIR "/opt/rocm/rccl/lib/cmake/rccl")
to the top. The file should look something likeBy default, PyTorch will build for gfx803, gfx900, and gfx906 simultaneously (to see which AMD uarch you have, run /opt/rocm/bin/rocm_agent_enumerator, gfx900 are Vega10-type GPUs (MI25, Vega56, Vega64, ...) and work best). If you want to compile only for your uarch,
export PYTORCH_ROCM_ARCH=gfx900
to gfx803, gfx900, or gfx906. Then build withUSE_ROCM=1 USE_LMDB=1 USE_OPENCV=1 MAX_JOBS=4 python setup.py install --user
Use MAX_JOBS=n to limit peak memory usage. If building fails try falling back to fewer jobs. 4 jobs assume available main memory of 16 GB or larger.
PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py --verbose
runs all unit tests and skips as appropriate on your system based on ROCm and, e.g., single or multi GPU configuration. No tests will fail if the compilation and installation is correct.
Individual test sets can be run with
PYTORCH_TEST_WITH_ROCM=1 python test/test_nn.py --verbose
Where
test_nn.py
can be replaced with any other test set.System Info
This was built on Ubuntu 18.04, ROCm 3.0, and Python 3.6.9
Note
Test using
PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py --verbose
does not pass all test, however, some basic testing using torch works. Will keep an eye if there are further errorsThe text was updated successfully, but these errors were encountered: