New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python3Packages.torch-bin: 1.13.1 -> 2.0.0 #221652
Conversation
Out of curiosity, I would like to consult a few questions. It seems that
Thanks a lot! |
@breakds IIRC PyTorch distributes their own builds of Triton from the branch OpenAI maintains for (with?) them: https://github.com/openai/triton/tree/torch-inductor-stable. I'll try it out later to see if it works. To be clear, I personally don't think a broken |
Thanks a lot for the explanation, especially the information about the Agree that a broken |
@junjihashimoto I was unable to build the derivation because $ nix build --impure -L github:NixOS/nixpkgs/refs/pull/221652/head#python3Packages.torch-bin
python3.10-torch> Sourcing python-remove-tests-dir-hook
python3.10-torch> Sourcing python-catch-conflicts-hook.sh
python3.10-torch> Sourcing python-remove-bin-bytecode-hook.sh
python3.10-torch> Sourcing wheel setup hook
python3.10-torch> Using wheelUnpackPhase
python3.10-torch> Sourcing pip-install-hook
python3.10-torch> Using pipInstallPhase
python3.10-torch> Sourcing python-imports-check-hook.sh
python3.10-torch> Using pythonImportsCheckPhase
python3.10-torch> Sourcing python-namespaces-hook
python3.10-torch> Sourcing python-catch-conflicts-hook.sh
python3.10-torch> unpacking sources
python3.10-torch> Executing wheelUnpackPhase
python3.10-torch> Finished executing wheelUnpackPhase
python3.10-torch> patching sources
python3.10-torch> configuring
python3.10-torch> no configure script, doing nothing
python3.10-torch> building
python3.10-torch> no Makefile or custom buildPhase, doing nothing
python3.10-torch> installing
python3.10-torch> Executing pipInstallPhase
python3.10-torch> /build/dist /build
python3.10-torch> Processing ./torch-2.0.0-cp310-cp310-linux_x86_64.whl
python3.10-torch> Requirement already satisfied: typing-extensions in /nix/store/bndw06wps3i7xpqdk6ryq6wiqg11ggy8-python3.10-typing-extensions-4.5.0/lib/python3.10/site-packages (from torch==2.0.0) (4.5.0)
python3.10-torch> ERROR: Could not find a version that satisfies the requirement sympy (from torch) (from versions: none)
python3.10-torch> ERROR: No matching distribution found for sympy
python3.10-torch>
error: builder for '/nix/store/wrbw0hl6p10cc0x230j8z7f655yal0x4-python3.10-torch-2.0.0.drv' failed with exit code 1 |
Ah shoot, so while Triton may be bundled with their distribution when installing using pip, because Nixpkgs' python builder prevents that, we need to make sure we install it. @junjihashimoto you'll probably have to add a After that's added you'll need to update the Seems like with the following it at least builds torch and torchvision successfully. Haven't tried any tests though. (final: prev: {
python3Packages = prev.python3Packages.overrideScope (pfinal: pprev: {
triton-bin = pprev.buildPythonPackage {
version = "2.0.0";
pname = "triton";
format = "wheel";
dontStrip = true;
pythonRemoveDeps = [ "cmake" "torch" ];
nativeBuildInputs = [
prev.lit
pprev.pythonRelaxDepsHook
];
propagatedBuildInputs = [
pprev.filelock
];
src = prev.fetchurl {
name = "triton-2.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl";
url = "https://download.pytorch.org/whl/triton-2.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl";
hash = "sha256-OIBu6WY/Sw981keQ6WxXk3QInlj0mqxKZggSGqVeJQU=";
};
};
torch-bin = pprev.torch-bin.overrideAttrs (oldAttrs: {
nativeBuildInputs = oldAttrs.nativeBuildInputs ++ [
prev.lit
];
propagatedBuildInputs = oldAttrs.propagatedBuildInputs ++ [
pprev.sympy
pprev.jinja2
pprev.networkx
pprev.filelock
pfinal.triton-bin
];
});
torchvision-bin = pprev.torchvision-bin.overrideAttrs (oldAttrs: {
nativeBuildInputs = oldAttrs.nativeBuildInputs ++ [
prev.lit
];
});
});
}) |
64e70ab
to
40e5ffd
Compare
@junjihashimoto those commits helped! Trying to use Try adding chmod +x $out/lib/<whatever python version>/site-packages/triton/third_party/cuda/bin/ptxas to whatever phase feels appropriate in the triton-bin derivation. After I did that, I was able to use |
ca58698
to
7d06d9d
Compare
@breakds @ConnorBaker |
I gave this a shot along with #222273 Triton appeared to build but when I ran the sample test above, resulted in this. When trying to import triton: |
@katanallama I created the environment as follows.
Why is it linked to a different file? |
|
|
7d06d9d
to
944c56e
Compare
url = "https://download.pytorch.org/whl/cu117/torch-1.13.1%2Bcu117-cp38-cp38-linux_x86_64.whl"; | ||
hash = "sha256-u/lUbw0Ni1EmPKR5Y3tCaogzX8oANPQs7GPU0y3uBa8="; | ||
name = "torch-2.0.0-cp38-cp38-linux_x86_64.whl"; | ||
url = "https://download.pytorch.org/whl/cu118/torch-2.0.0%2Bcu118-cp38-cp38-linux_x86_64.whl"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had another question. It seems to me that the binaries here would only work when torch
is used with CUDA
11.8, is that right?
I tried to build it with cuda 11.7 and it can build and run
>>> import torch
>>> torch.version.cuda
'11.7'
Can this be a potential problem, if the original torch binary is built from CUDA 11.8?
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@breakds There are two versions of CUDA. One is a cudatoolkit version. Another is a driver version.
The driver supports multiple versions of cudatoolkit.
And a new GPU like H100 needs a new cudatoolkit to support a new instruction set that is called as compute capability or ptx.
torch-2.0.0+cu118-xxx.whl
means that the torch-2.0.0 binary is built by cudatoolkit-11.8.
It is important that the new GPU of H100 is supported by a cudatoolkit of 11.8 or higher version.
https://docs.nvidia.com/deploy/cuda-compatibility/index.html
We can just use a latest nvidia driver supporting CUDA-12 or a driver supporting the binary.
For example, torch-1.13.1+cu117
works with A100 and CUDA-12 driver, but torch-1.13.1+cu117
does not work with H100 and CUDA-12 driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, Junji!
944c56e
to
0782be2
Compare
license = licenses.bsd3; | ||
# torch's license is BSD3. | ||
# torch-bin includes CUDA and MKL binaries, therefore unfreeRedistributable is set. | ||
license = licenses.unfreeRedistributable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the license of torch-bin to unfreeRedistributable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very much in support of this change! However, this definitely should go in a separate git commit. Maybe even in a separate pull request for the affected people to land on and leave comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SomeoneSerge
I've created a sparate git commit to update the license to unfreeRedistributable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new comment overlaps with the old one, we need to merge them. Thoughts:
- [👍🏻] You already explain that Pytorch is redistributed under BSD3
- [✅] Keep the links to CUDA EULA and Intel Open Source License
- When mentioning Intel Open Source License, also include the link to the short identifier: https://spdx.org/licenses/Intel.html, (kudos @fabaff)
- Add a link to
lib.licenses.issl
https://www.intel.com/content/dam/develop/external/us/en/documents/pdf/intel-simplified-software-license.pdf - Explain that Intel's oneAPI and mkl-dnn are free ASL20
- Explain that upstream wheels link Pytorch statically against MKL (which are
lib.licenses.issl
unfree and redistributable) - Explain that upstream wheels include cuda applications, subject to CUDA EULA
- Explain that since the whole thing is distributed as a single package, we have to mark the final derivation as
unfreeRedistributable
I actually didn't notice any components that refer to the Intel Opensource License, but we should keep the links for reference
0782be2
to
937bc1c
Compare
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/tweag-nix-dev-update-46/26872/1 |
description = "A language and compiler for custom Deep Learning operations"; | ||
homepage = "https://github.com/openai/triton/"; | ||
changelog = "https://github.com/openai/triton/releases/tag/v${version}"; | ||
license = licenses.mit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, it includes a copy of NVIDIA's ptxas
pushd $out/${python.sitePackages}/torch/lib | ||
LIBNVRTC=`ls libnvrtc-* |grep -v libnvrtc-builtins` | ||
if [ ! -z "$LIBNVRTC" ] ; then | ||
ln -s "$LIBNVRTC" libnvrtc.so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling that using the nix-packaged libnvrtc ("${cudaPackages.cuda_nvrtc}/lib/libnvrtc.so"
) should be less fragile. At the very list, we have control over it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disclaimer: there's an open issue about libnvrtc.so
locating libnvrtc-builtins.so
, #225240
postFixup = let | ||
rpath = lib.makeLibraryPath [ stdenv.cc.cc.lib ]; | ||
in '' | ||
find $out/${python.sitePackages}/torchaudio/lib -type f \( -name '*.so' -or -name '*.so.*' \) | while read lib; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have autoPatchelfHook
for exactly this kind of logic. It'll add libraries from buildInputs
and things like $out/lib
into runpaths of libraries and executables, depending on what they declare as DT_NEEDED
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an update: we have just merged a source-build of triton into a different location. It happens to be pkgs/development/python-modules/openai-triton/default.nix
.I think moving this all over to openai-triton/
is the easiest way to go from here
Btw, I think we should keep a -bin
version of triton as well, because there are differences between that and our source-build. For one thing, upstream has already moved to llvm17, and we're only using llvm15 at the moment. So it's good to have both
@@ -11890,6 +11890,8 @@ self: super: with self; { | |||
|
|||
trove-classifiers = callPackage ../development/python-modules/trove-classifiers { }; | |||
|
|||
triton-bin = callPackage ../development/python-modules/triton/bin.nix { }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I suggest we rename this into openai-triton-bin
, jinja2 | ||
, networkx | ||
, filelock | ||
, triton-bin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
H'm... it seems that other derivations (e.g. torchvision-bin) do it this way as well. Nonetheless, I'd rather declare the formal parameter with the source-build's name. I.e. I'd take openai-triton
, but pass openai-triton-bin
in the callPackage
. This way the overrides to torch
and torch-bin
look exactly the same and I don't have to guess which naming scheme to use
Feel free to ignore this comment though
patchelf --add-needed ${zlib.out}/lib/libz.so \ | ||
"$out/${python.sitePackages}/triton/_C/libtriton.so" | ||
''; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pythonImportsCheck
(or does it still fail because of the circular dependency?)
Description of changes
Pytorch 2.0 is released.
Things done
sandbox = true
set innix.conf
? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)