Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3Packages.{torch,torch-bin}: 2.1.1 -> 2.2.0 #285249

Merged
merged 6 commits into from
Feb 12, 2024

Conversation

GaetanLepage
Copy link
Contributor

@GaetanLepage GaetanLepage commented Jan 31, 2024

Description of changes

Update the torch ecosystem:

cc @teh @thoughtpolice @tscholak

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.05 Release Notes (or backporting 23.05 and 23.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@GaetanLepage
Copy link
Contributor Author

It fails with

* Building wheel...
Building wheel torch-2.2.0
-- Building version 2.2.0
cmake --build . --target install --config Release -- -j 48
[3/4] Generating ATen sourcesns_yaml
[545/6025] Generating src/x86_64-fma/2d-fourier-8x8.py.omicrokernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-scalar-x2.c.o.otor.cc.oar-rr2-p5-x4.c.o.oK.o
FAILED: confu-deps/NNPACK/src/x86_64-fma/2d-fourier-8x8.py.o /build/source/build/confu-deps/NNPACK/src/x86_64-fma/2d-fourier-8x8.py.o 
cd /build/source/build/confu-deps/NNPACK && PYTHONPATH=/build/source/build/confu-srcs/six:/build/source/third_party/python-peachpy /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python>
Traceback (most recent call last):
  File "<frozen runpy>", line 189, in _run_module_as_main
  File "<frozen runpy>", line 112, in _get_module_details
  File "/build/source/third_party/python-peachpy/peachpy/__init__.py", line 39, in <module>
    from peachpy.literal import Constant
  File "/build/source/third_party/python-peachpy/peachpy/literal.py", line 1, in <module>
    import six
ModuleNotFoundError: No module named 'six'
[592/6025] Building CXX object third_party/protobuf/cmake/CMakeFiles/libprotobuf.dir/__/src/google/protobuf/descriptor.cc.oator.cc.o[Kc.oc.o
ninja: build stopped: subcommand failed.

Although, six is indeed in the propagatedBuildInputs.

@junjihashimoto
Copy link
Member

Thx! As you may have noticed, torchvision, torchvision-bin, and torchaudio-bin also need to be updated.

@GaetanLepage
Copy link
Contributor Author

Thx! As you may have noticed, torchvision, torchvision-bin, and torchaudio-bin also need to be updated.

Absolutely ! But for this, I first need to fix torch and torch-bin which both fails for weird reasons...

@daniel-fahey
Copy link
Contributor

daniel-fahey commented Feb 8, 2024

@GaetanLepage I got torch to build without NNPACK, see GaetanLepage#1 commit 186bbe1 I also tried adding six to the nativeBuildInputs (didn't help)

Same problem happened for @jkachmar #283343

@GaetanLepage
Copy link
Contributor Author

Thanks to @daniel-fahey, all the source packages build fine.

Now the issue with torch-bin remains.

@GaetanLepage
Copy link
Contributor Author

Now the issue with torch-bin remains.

auto-patchelf: 20 dependencies could not be satisfied
error: auto-patchelf could not satisfy dependency libcudart.so.12 wanted by /nix/store/gsk74ydbfm25vda3wr27n513p3xck7cs-python3.11-torch-2.2.0/lib/python3.11/site-packages/torch/lib/libtorch_python.so
error: auto-patchelf could not satisfy dependency libcudnn.so.8 wanted by /nix/store/gsk74ydbfm25vda3wr27n513p3xck7cs-python3.11-torch-2.2.0/lib/python3.11/site-packages/torch/lib/libtorch_python.so
warn: auto-patchelf ignoring missing libcuda.so.1 wanted by /nix/store/gsk74ydbfm25vda3wr27n513p3xck7cs-python3.11-torch-2.2.0/lib/python3.11/site-packages/torch/lib/libcaffe2_nvrtc.so
error: auto-patchelf could not satisfy dependency libnvrtc.so.12 wanted by /nix/store/gsk74ydbfm25vda3wr27n513p3xck7cs-python3.11-torch-2.2.0/lib/python3.11/site-packages/torch/lib/libcaffe2_nvrtc.so
error: auto-patchelf could not satisfy dependency libcudart.so.12 wanted by /nix/store/gsk74ydbfm25vda3wr27n513p3xck7cs-python3.11-torch-2.2.0/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so
error: auto-patchelf could not satisfy dependency libcusparse.so.12 wanted by /nix/store/gsk74ydbfm25vda3wr27n513p3xck7cs-python3.11-torch-2.2.0/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so
error: auto-patchelf could not satisfy dependency libcufft.so.11 wanted by /nix/store/gsk74ydbfm25vda3wr27n513p3xck7cs-python3.11-torch-2.2.0/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so
error: auto-patchelf could not satisfy dependency libcurand.so.10 wanted by /nix/store/gsk74ydbfm25vda3wr27n513p3xck7cs-python3.11-torch-2.2.0/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so
error: auto-patchelf could not satisfy dependency libcublas.so.12 wanted by /nix/store/gsk74ydbfm25vda3wr27n513p3xck7cs-python3.11-torch-2.2.0/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so
error: auto-patchelf could not satisfy dependency libcublasLt.so.12 wanted by /nix/store/gsk74ydbfm25vda3wr27n513p3xck7cs-python3.11-torch-2.2.0/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so
error: auto-patchelf could not satisfy dependency libcudnn.so.8 wanted by /nix/store/gsk74ydbfm25vda3wr27n513p3xck7cs-python3.11-torch-2.2.0/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so
error: auto-patchelf could not satisfy dependency libnccl.so.2 wanted by /nix/store/gsk74ydbfm25vda3wr27n513p3xck7cs-python3.11-torch-2.2.0/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so
error: auto-patchelf could not satisfy dependency libcusolver.so.11 wanted by /nix/store/gsk74ydbfm25vda3wr27n513p3xck7cs-python3.11-torch-2.2.0/lib/python3.11/site-packages/torch/lib/libtorch_cuda_linalg.so
error: auto-patchelf could not satisfy dependency libcudart.so.12 wanted by /nix/store/gsk74ydbfm25vda3wr27n513p3xck7cs-python3.11-torch-2.2.0/lib/python3.11/site-packages/torch/lib/libtorch_cuda_linalg.so
error: auto-patchelf could not satisfy dependency libcublas.so.12 wanted by /nix/store/gsk74ydbfm25vda3wr27n513p3xck7cs-python3.11-torch-2.2.0/lib/python3.11/site-packages/torch/lib/libtorch_cuda_linalg.so
error: auto-patchelf could not satisfy dependency libcusparse.so.12 wanted by /nix/store/gsk74ydbfm25vda3wr27n513p3xck7cs-python3.11-torch-2.2.0/lib/python3.11/site-packages/torch/lib/libtorch_cuda_linalg.so
error: auto-patchelf could not satisfy dependency libcupti.so.12 wanted by /nix/store/gsk74ydbfm25vda3wr27n513p3xck7cs-python3.11-torch-2.2.0/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
error: auto-patchelf could not satisfy dependency libcudart.so.12 wanted by /nix/store/gsk74ydbfm25vda3wr27n513p3xck7cs-python3.11-torch-2.2.0/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
error: auto-patchelf could not satisfy dependency libcudart.so.12 wanted by /nix/store/gsk74ydbfm25vda3wr27n513p3xck7cs-python3.11-torch-2.2.0/lib/python3.11/site-packages/torch/lib/libc10_cuda.so
error: auto-patchelf could not satisfy dependency libcudart.so.12 wanted by /nix/store/gsk74ydbfm25vda3wr27n513p3xck7cs-python3.11-torch-2.2.0/lib/python3.11/site-packages/torch/lib/libtorch_global_deps.so
auto-patchelf failed to find all the required dependencies.

@daniel-fahey
Copy link
Contributor

daniel-fahey commented Feb 9, 2024

I'm getting a new failure when building torch with

config = {
  allowUnfree = true;
  cudaSupport = true;
};

Looks like the patch you made @GaetanLepage doesn't work any more:

$ nix log /nix/store/3h86msbim7ziklba01gwf1gn5aizw71s-python3.11-torch-2.2.0.drv^*
Sourcing python-remove-tests-dir-hook
Sourcing python-catch-conflicts-hook.sh
Sourcing python-remove-bin-bytecode-hook.sh
Sourcing pypa-build-hook
Using pypaBuildPhase
Sourcing python-runtime-deps-check-hook
Using pythonRuntimeDepsCheckHook
Sourcing pypa-install-hook
Using pypaInstallPhase
Sourcing python-imports-check-hook.sh
Using pythonImportsCheckPhase
Sourcing python-namespaces-hook
Sourcing python-catch-conflicts-hook.sh
Sourcing auto-add-opengl-runpath-hook
Using autoAddOpenGLRunpathPhase
Sourcing setup-cuda-hook
@nix { "action": "setPhase", "phase": "unpackPhase" }
Running phase: unpackPhase
unpacking source archive /nix/store/chpsb0nf5ik31gnrhyiy5g0dsh3ydjcm-source
source root is source
setting SOURCE_DATE_EPOCH to timestamp 315619200 of file source/version.txt
@nix { "action": "setPhase", "phase": "patchPhase" }
Running phase: patchPhase
applying patch /nix/store/mv20hpf1x27afyzn4hlfz4bbx4dna8ln-fix-cmake-cuda-toolkit.patch
patching file CMakeLists.txt
Hunk #1 succeeded at 1160 (offset 8 lines).
patching file cmake/public/cuda.cmake
Hunk #1 FAILED at 62.
1 out of 1 hunk FAILED -- saving rejects to file cmake/public/cuda.cmake.rej
patching file tools/setup_helpers/cmake.py

I also tried directly applying on my copy of the pytorch repo:

$ git apply ../nixpkgs/pkgs/development/python-modules/torch/fix-cmake-cuda-toolkit.patch
error: patch failed: cmake/public/cuda.cmake:62
error: cmake/public/cuda.cmake: patch does not apply

I did however get the patch to apply on commit a6b452dfdcb484d5dfdbb577b74cecbd7021df2e:

[daniel@laptop:~/Source/pytorch]$ git checkout a6b452dfdcb484d5dfdbb577b74cecbd7021df2e
Previous HEAD position was 93cea394dee CMake: Loosen CUDA consistency check (#113174)
HEAD is now at a6b452dfdcb [2/N] Enable Wunused-result, Wunused-variable and Wmissing-braces in torch targets (#110836)

[daniel@laptop:~/Source/pytorch]$ git apply ../nixpkgs/pkgs/development/python-modules/torch/fix-cmake-cuda-toolkit.patch
[daniel@laptop:~/Source/pytorch]$ echo $?
0

Have a great weekend anyway I probably won't get back to this until Monday,

@GaetanLepage
Copy link
Contributor Author

@daniel-fahey I updated the patch and it now applies fine :)

Copy link
Member

@junjihashimoto junjihashimoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link
Contributor

@SomeoneSerge SomeoneSerge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging tomorrow noon (UTC) unless there are objections

@SomeoneSerge SomeoneSerge merged commit db19b28 into NixOS:master Feb 12, 2024
25 checks passed
@GaetanLepage GaetanLepage deleted the torch branch February 12, 2024 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants