Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3.pkgs.{torch,triton}: two copies of llvm in the closure #256571

Closed
SomeoneSerge opened this issue Sep 21, 2023 · 9 comments
Closed

python3.pkgs.{torch,triton}: two copies of llvm in the closure #256571

SomeoneSerge opened this issue Sep 21, 2023 · 9 comments
Labels
6.topic: closure size The final size of a derivation, including its dependencies 6.topic: python

Comments

@SomeoneSerge
Copy link
Contributor

Issue description

...one brought in by mlir, and one seemingly brought it directly. E.g. on my current system (paths included only to show they're different):

  1. python3-3.10.12-env → python3.10-triton-2.0.0 → rocm-llvm-mlir-5.4.4 → hip-amd-5.4.4 → rocm-opencl-runtime-5.4.4 → rocm-runtime-5.4.3 → rocm-llvm-clang-5.4.4 → rocm-llvm-llvm-5.4.4
    • Path: /nix/store/hg2nby9s72yc2lx2by4fn9smffnv0iyd-rocm-llvm-llvm-5.4.4
    • Added Size: 1.43 GiB
  2. python3-3.10.12-env → python3.10-triton-2.0.0 → rocm-llvm-llvm-5.4.4:
    • Path: /nix/store/j0vfk0ybphyv9398svhxz303x11aanpa-rocm-llvm-llvm-5.4.4
    • Added Size: 1.47 GiB

image

@NixOS/cuda-maintainers

@samuela
Copy link
Member

samuela commented Sep 21, 2023

IIRC triton requires its own patched version of llvm, but weird that we'd have two different patched versions somehow 🤔

@ConnorBaker
Copy link
Contributor

ConnorBaker commented Sep 21, 2023

I thought they had moved largely towards building LLVM HEAD (or at least trying to track it more closely). I do remember the python setup script downloading a pre-built copy of LLVM with MLIR (MLIR might not be present in the default version of LLVM most distributions make available?), but I think that was more for convenience than anything else. https://github.com/openai/triton/blob/c4bc3fd92f79a1a470f0a3a086c0fa525124bc6d/python/setup.py#L69

It looks like it supports building with arbitrary LLVM (https://github.com/openai/triton#building-with-a-custom-llvm), just not arbitrary versions due to breaking changes and whatnot.

EDIT: This issue is probably also still relevant: triton-lang/triton#1617

@SomeoneSerge
Copy link
Contributor Author

SomeoneSerge commented Sep 22, 2023

I thought they had moved largely towards building LLVM HEAD

Well since we're bringing this up, there's actually even more divergence: upstream Pytorch wheels are built against the HEAD of the fork of openai/triton, which in turn is built against the HEAD of LLVM. We're rebuilding LLVM for Pytorch anyway, maybe we should use both LLVM and Triton from ROCmSoftwarePlatform/

As for the original issue, we probably just need to override llvmPackages.mlir:

...to use this llvm:
llvm = (llvmPackages.llvm.override {
llvmTargetsToBuild = [ "NATIVE" "NVPTX" ];
# Upstream CI sets these too:
# targetProjects = [ "mlir" ];
extraCMakeFlags = [
"-DLLVM_INSTALL_UTILS=ON"
];
});

EDIT: But also, idk what exactly the llvm package includes, or how much of that openai/triton actually uses...

@Madouura
Copy link
Contributor

You may want to see the openai-triton changes in #258328.

@Madouura
Copy link
Contributor

Shouldn't this be fixed now?
Since openai-triton has it's own custom LLVM now?
Excluding torchWithRocm, which will probably always have rocm-llvm.

@FliegendeWurst FliegendeWurst added the 6.topic: closure size The final size of a derivation, including its dependencies label Nov 11, 2023
@SomeoneSerge
Copy link
Contributor Author

Yes, this might have been fixed. We need to inspect nix-tree (if not add an automated check...) before we close the issue

@Madouura
Copy link
Contributor

Inspected with nix-tree.
openai-triton only has one copy of LLVM.
torch has no copies of LLVM.
torchWithRocm only has an indirect copy of LLVM (rocm-llvm).

@SomeoneSerge
Copy link
Contributor Author

SomeoneSerge commented Nov 12, 2023

I'm afraid we may still be suffering from this issue: there's only rocm-llvm-llvm in the closure of python3Packages.torchWithRocm, but there's both a rocm-llvm-llvm and an openai-triton-llvm in the closure of python3.withPackages (ps: [ ps.torchWithRocm ]) which includes the propagatedBuildInputs:

Each is about 1.5GiB of added size judging by nix-tree (these are negligible though, compared to the total 17GiB of rocm-merged)

@Madouura could you please confirm my interpretation and reopen the issue?

EDIT: the cudaSupport = true variant seems fine at this point, it only contains the openai-triton-llvm

@Madouura
Copy link
Contributor

Ah. I didn't realize we had to use python3.withPackages for this.
Regular torch and torchWithCuda should be fine.
torchWithRocm unfortunately will have to stay with two copies, because we can't build openai-triton on rocm-llvm reliably. Thus, I don't think the issue needs to be reopened.

@rrbutani rrbutani added 6.topic: llvm/clang Issues related to llvmPackages, clangStdenv and related and removed 6.topic: llvm/clang Issues related to llvmPackages, clangStdenv and related labels May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: closure size The final size of a derivation, including its dependencies 6.topic: python
Projects
None yet
Development

No branches or pull requests

6 participants