Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZLUDA: init at 3 #288644

Merged
merged 1 commit into from Apr 13, 2024
Merged

ZLUDA: init at 3 #288644

merged 1 commit into from Apr 13, 2024

Conversation

errnoh
Copy link
Contributor

@errnoh errnoh commented Feb 13, 2024

Description of changes

Goal is to provide package for ZLUDA ( #288392 ), letting you run unmodified CUDA applications with AMD GPUs.

See comment below for current issue blocking the build.

In addition: as this is only providing /lib contents, should this be named libzluda or similar?

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.05 Release Notes (or backporting 23.05 and 23.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@errnoh
Copy link
Contributor Author

errnoh commented Feb 13, 2024

So problem currently is that while while compiling with the same buildinputs works when just running the cargo xtask --release builds fine, nix-build -A zluta with this configuration results in the following:

error: failed to run custom build command for `llvm-sys v150.1.2 (/build/source/ext/llvm-sys.rs)`

Caused by:
  process didn't exit successfully: `/build/source/target/release/build/llvm-sys-15b6c27fc49b5c7a/build-script-build` (exit status: 101)
  --- stdout
  cargo:rerun-if-changed=build.rs
  cargo:rerun-if-changed=/build/source/ext/llvm-project/llvm
  cargo:rerun-if-changed=/build/source/ext/llvm-sys.rs/build.cmake
  CMAKE_TOOLCHAIN_FILE_x86_64-unknown-linux-gnu = None
  CMAKE_TOOLCHAIN_FILE_x86_64_unknown_linux_gnu = None
  HOST_CMAKE_TOOLCHAIN_FILE = None
  CMAKE_TOOLCHAIN_FILE = None
  CMAKE_GENERATOR_x86_64-unknown-linux-gnu = None
  CMAKE_GENERATOR_x86_64_unknown_linux_gnu = None
  HOST_CMAKE_GENERATOR = None
  CMAKE_GENERATOR = None
  CMAKE_PREFIX_PATH_x86_64-unknown-linux-gnu = None
  CMAKE_PREFIX_PATH_x86_64_unknown_linux_gnu = None
  HOST_CMAKE_PREFIX_PATH = None
  CMAKE_PREFIX_PATH = Some("/nix/store/4k89msq5ifwlcizq9kc5dkf5kfbmpnfq-compiler-rt-libc-16.0.6-dev:/nix/store/7dnixd9bx0a84b8anifscwmy1dydds71-compiler-rt-libc-16.0.6")
  CMAKE_x86_64-unknown-linux-gnu = None
  CMAKE_x86_64_unknown_linux_gnu = None
  HOST_CMAKE = None
  CMAKE = None
  running: cd "/build/source/target/release/build/llvm-sys-65bdd1fa3c987ace/out/build" && CMAKE_PREFIX_PATH="/nix/store/4k89msq5ifwlcizq9kc5dkf5kfbmpnfq-compiler-rt-libc-16.0.6-dev:/nix/store/7dnixd9bx0a84b8anifscwmy1dydds71-compiler-rt-libc-16.0.6" "cmake" "/build/source/ext/llvm-project/llvm" "-DLLVM_ENABLE_TERMINFO=OFF" "-DLLVM_BUILD_TOOLS=OFF" "-DLLVM_TARGETS_TO_BUILD=" "-DLLVM_ENABLE_PROJECTS=" "-DCMAKE_PROJECT_INCLUDE_BEFORE=/build/source/ext/llvm-sys.rs/build.cmake" "-DCMAKE_INSTALL_PREFIX=/build/source/target/release/build/llvm-sys-65bdd1fa3c987ace/out" "-DCMAKE_C_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_C_COMPILER=/nix/store/4cjqvbp1jbkps185wl8qnbjpf8bdy8j9-gcc-wrapper-13.2.0/bin/gcc" "-DCMAKE_CXX_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_CXX_COMPILER=/nix/store/4cjqvbp1jbkps185wl8qnbjpf8bdy8j9-gcc-wrapper-13.2.0/bin/g++" "-DCMAKE_ASM_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_ASM_COMPILER=/nix/store/4cjqvbp1jbkps185wl8qnbjpf8bdy8j9-gcc-wrapper-13.2.0/bin/gcc" "-DCMAKE_BUILD_TYPE=Release"

  --- stderr
  thread 'main' panicked at /build/zluda-3-vendor.tar.gz/cmake/src/lib.rs:1098:5:

  failed to execute command: No such file or directory (os error 2)
  is `cmake` not installed?

  build script failed, must exit now
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...
warning: `zluda_dnn` (lib) generated 13 warnings
error: builder for '/nix/store/s7ryy831swlhjbvpxhf7wsnc19vh303k-zluda-3.drv' failed with exit code 101;
       last 10 log lines:
       >   --- stderr
       >   thread 'main' panicked at /build/zluda-3-vendor.tar.gz/cmake/src/lib.rs:1098:5:
       >
       >   failed to execute command: No such file or directory (os error 2)
       >   is `cmake` not installed?
       >
       >   build script failed, must exit now
       >   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
       > warning: build failed, waiting for other jobs to finish...
       > warning: `zluda_dnn` (lib) generated 13 warnings
       For full logs, run 'nix log /nix/store/s7ryy831swlhjbvpxhf7wsnc19vh303k-zluda-3.drv'.

Not sure how to fix the issue so help is appreciated.

@errnoh errnoh changed the title [WIP] zluda: init at 3 [WIP] ZLUDA: init at 3 Feb 13, 2024
@errnoh
Copy link
Contributor Author

errnoh commented Feb 22, 2024

Yup, just ran out of disk space for couple days which prevented me from building things 😅. Anyways, getting closer. With the latest changes the llvm part actually seems to compile fine, but errors later in compilation with the following:

   Compiling dynasmrt v1.2.3
error: failed to run custom build command for `zluda v0.0.0 (/build/source/zluda)`

Caused by:
  process didn't exit successfully: `/build/source/target/release/build/zluda-80c9fbe76904c49f/build-script-build` (exit status: 101)
  --- stderr
  thread 'main' panicked at zluda/build.rs:4:29:
  called `Result::unwrap()` on an `Err` value: could not find repository from '/build/source/zluda'; class=Repository (6); code=NotFound (-3)
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...
error: builder for '/nix/store/1almnpy0bpkmsa1gm3c0g2hxg3fvpivf-zluda-3.drv' failed with exit code 101;
       last 10 log lines:
       >    Compiling dynasmrt v1.2.3
       > error: failed to run custom build command for `zluda v0.0.0 (/build/source/zluda)`
       >
       > Caused by:
       >   process didn't exit successfully: `/build/source/target/release/build/zluda-80c9fbe76904c49f/build-script-build` (exit status: 101)
       >   --- stderr
       >   thread 'main' panicked at zluda/build.rs:4:29:
       >   called `Result::unwrap()` on an `Err` value: could not find repository from '/build/source/zluda'; class=Repository (6); code=NotFound (-3)
       >   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
       > warning: build failed, waiting for other jobs to finish...
       For full logs, run 'nix log /nix/store/1almnpy0bpkmsa1gm3c0g2hxg3fvpivf-zluda-3.drv'.

zluda/build.rs is the following (https://github.com/vosen/ZLUDA/blob/master/zluda/build.rs):

use vergen::{Config, vergen};

fn main() {
  vergen(Config::default()).unwrap()
}

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/help-with-packaging-complex-rust-library/40197/1

@KiaraGrouwstra
Copy link
Contributor

this now gets me:

nix run nixpkgs#nixpkgs-review -- pr 288644
$ git -c fetch.prune=false fetch --no-tags --force https://github.com/NixOS/nixpkgs master:refs/nixpkgs-review/0 pull/288644/head:refs/nixpkgs-review/1
remote: Enumerating objects: 328, done.
remote: Counting objects: 100% (272/272), done.
remote: Compressing objects: 100% (102/102), done.
remote: Total 328 (delta 196), reused 217 (delta 164), pack-reused 56
Receiving objects: 100% (328/328), 396.56 KiB | 4.90 MiB/s, done.
Resolving deltas: 100% (205/205), completed with 73 local objects.
From https://github.com/NixOS/nixpkgs
   923c6a8b5ceb..3beaece283ae  master                -> refs/nixpkgs-review/0
 + 024196a54e42...fdea3dbc3756 refs/pull/288644/head -> refs/nixpkgs-review/1  (forced update)
$ git worktree add /home/kiara/.cache/nixpkgs-review/pr-288644/nixpkgs 3beaece283ae37b855f92ccc4f42e1e3ab40c8fd
Preparing worktree (detached HEAD 3beaece283ae)
Updating files: 100% (39716/39716), done.
HEAD is now at 3beaece283ae Merge pull request #288566 from Yarny0/foomatic-db-engine-update
$ git merge --no-commit --no-ff fdea3dbc375683b601529d09a29a3f8efc5ba2ad
Automatic merge went well; stopped before committing as requested
$ nix build --nix-path nixpkgs=/home/kiara/.cache/nixpkgs-review/pr-288644/nixpkgs nixpkgs-overlays=/tmp/tmpygwuk3cw --extra-experimental-features nix-command no-url-literals --no-link --keep-going --no-allow-import-from-derivation --option build-use-sandbox relaxed -f /home/kiara/.cache/nixpkgs-review/pr-288644/build.nix
[1/2/4 built, 23 copied (12558.1/12558.2 MiB), 1889.4 MiB DL] building zluda-3 (buildPhase): warning: build failed, waiting for other jobs[1/2/4 bui[1/2/4 built, 23 copied (12558.1/12558.2 MiB), 1889.4 MiB DL] building zluda-3 (buildPhase): warning: build failed, waiting for error: builder for '/nix/store/rwjpny287hyrfbzx9i8jv2shil7c6xm0-zluda-3.drv' failed with exit code 101;
       last 10 log lines:
       >    Compiling zluda v0.0.0 (/build/source/zluda)
       > error: failed to run custom build command for `zluda v0.0.0 (/build/source/zluda)`
       >
       > Caused by:
       >   process didn't exit successfully: `/build/source/target/release/build/zluda-80c9fbe76904c49f/build-script-build` (exit status: 101)
       >   --- stderr
       >   thread 'main' panicked at zluda/build.rs:4:29:
       >   called `Result::unwrap()` on an `Err` value: could not find repository from '/build/source/zluda'; class=Repository (6); code=NotFound (-3)
       >   note: run with `RUST_RACETRACK=1` environment variable to display a backtrace
       > warning: build failed, waiting for other jobs to finish...
       For full logs, run 'nix log /nix/store/rwjpny287hyrfbzx9i8jv2shil7c6xm0-zluda-3.drv'.
error: 1 dependencies of derivation '/nix/store/klsik0x8zl188p296anp9lxya11zsknx-review-shell.drv' failed to build

Link to currently reviewing PR:
https://github.com/NixOS/nixpkgs/pull/288644

1 package failed to build:
zluda

@errnoh
Copy link
Contributor Author

errnoh commented Feb 28, 2024

this now gets me:

       ...failed with exit code 101;
       last 10 log lines:
       >    Compiling zluda v0.0.0 (/build/source/zluda)
       > error: failed to run custom build command for `zluda v0.0.0 (/build/source/zluda)`
       >
       > Caused by:
       >   process didn't exit successfully: `/build/source/target/release/build/zluda-80c9fbe76904c49f/build-script-build` (exit status: 101)
       >   --- stderr
       >   thread 'main' panicked at zluda/build.rs:4:29:
       >   called `Result::unwrap()` on an `Err` value: could not find repository from '/build/source/zluda'; class=Repository (6); code=NotFound (-3)
       >   note: run with `RUST_RACETRACK=1` environment variable to display a backtrace
       > warning: build failed, waiting for other jobs to finish...
       For full logs, run 'nix log /nix/store/rwjpny287hyrfbzx9i8jv2shil7c6xm0-zluda-3.drv'.
error: 1 dependencies of derivation '/nix/store/klsik0x8zl188p296anp9lxya11zsknx-review-shell.drv' failed to build

Thanks for verifying that it's at least not just me. This seems to match what I'm getting with the current branch #288644 (comment) (The comment after that was just local testing while trying to bypass this error)

Any ideas?

@KiaraGrouwstra
Copy link
Contributor

i feel a bit over my head there 🙈, i've yet to really get into rust

@ulrikstrid
Copy link
Member

It seems like this is a issue with vergen < 8.0.0-beta1 not supporting being run in a directory without a .git folder. If we can patch zluda to use a newer version we will probably get past the above error. I might try my hands at this next week if no-one else has time to do it.

@errnoh
Copy link
Contributor Author

errnoh commented Mar 4, 2024

It seems like this is a issue with vergen < 8.0.0-beta1 not supporting being run in a directory without a .git folder. If we can patch zluda to use a newer version we will probably get past the above error. I might try my hands at this next week if no-one else has time to do it.

Did some testing during weekend, using cargoPatches for the changes but so far no luck. Again very likely to be due to not really taking the time to learn rust packaging properly 😅. Naively just updating to 8.3.1 didn't seem to work for me, neither did some testing I did with beta versions of 9 and vergen-gix. But that said, I was mostly just modifying Cargo.lock, zluda/Cargo.toml and zluda_rt/Cargo.toml manually on the patch files without properly testing if the changes even worked outside of nix build so the errors might've been caused by human errors 🤷

So @ulrikstrid if you have time to test that idea properly I'd appreciate :)

@jcaesar
Copy link
Contributor

jcaesar commented Mar 5, 2024

I got it to build: pass.patch.txt, mostly by looking for another package that deals with vergen: it's completely sidestepping it.
I did run into two more small problems

  • cargo xtask doesn't allow --target, but the nix rust hooks expect the build output to be in target/$platform/release.
  • One of the integration test looks like it can't actually work on linux? Not sure.

Now, I don't think the resulting build output will work as is:

  • It's missing the zluda binary
  • cargo xtask creates some versioned .so.123 files in target/result which aren't copied - not sure if required
  • How would you actually use this? One might actually want a useZluda setting analogous to useCuda that builds packages with cuda support but links in the zluda libs instead?

So, more work remaining.

@errnoh
Copy link
Contributor Author

errnoh commented Mar 5, 2024

@jcaesar great! I've added your changes and indeed the build finishes now. Not resulting in binaries is not an issue as ZLUDA on Linux is a library and you just need to add it to your LD_LIBRARY_PATH. e.g. you'd basically do LD_LIBRARY_PATH="$PWD/result/lib:$LD_LIBRARY_PATH" binaryThatRequiresCudaHere

That said, the result is not yet what's expected. Currently:

> ls result/lib/
libcublas.so  libcufft.so     libnccl.so    libnvml.so
libcudnn.so   libcusparse.so  libnvcuda.so  libzluda_dump.so

and when trying to run it:

> LD_LIBRARY_PATH="$PWD/result/lib:$LD_LIBRARY_PATH" /nix/store/6nx3cp9h0pkjvyg2rh4cnrpn7yp7inr7-cuda_demo_suite-12.2.140/demo_suite/vectorAdd 
[Vector addition of 50000 elements]
Failed to allocate device vector A (error code CUDA driver version is insufficient for CUDA runtime version)!

while manually building it outside nixpkgs resulted in:

> ls ../zluda/target/release/
build         libcublas.so.10  libcudnn.so.8      libnccl.d          libnvml.d
deps          libcublas.so.11  libcufft.d         libnccl.so         libnvml.so
dump          libcuda.so       libcufft.so        libnccl.so.2       libzluda_dump.d
examples      libcuda.so.1     libcufft.so.10     libnvcuda.d        libzluda_dump.so
incremental   libcudnn.d       libcusparse.d      libnvcuda.so
libcublas.d   libcudnn.so      libcusparse.so     libnvidia-ml.so
libcublas.so  libcudnn.so.7    libcusparse.so.11  libnvidia-ml.so.1

and running it:

> LD_LIBRARY_PATH="$PWD/../zluda/target/release:$LD_LIBRARY_PATH" /nix/store/6nx3cp9h0pkjvyg2rh4cnrpn7yp7inr7-cuda_demo_suite-12.2.140/demo_suite/vectorAdd 
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

@errnoh
Copy link
Contributor Author

errnoh commented Mar 5, 2024

Though those should be all .so files and missing ones are just symlinks (apart from .d files but I'd assume those aren't necessary). I'll check if not having the symlinks is the only remaining issue

EDIT: yup, that's it. After creating those symlinks it works:

LD_LIBRARY_PATH="$PWD/result/lib:$LD_LIBRARY_PATH" /nix/store/6nx3cp9h0pkjvyg2rh4cnrpn7yp7inr7-cuda_demo_suite-12.2.140/demo_suite/vectorAdd 
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

So remaining question is: what's causing the symlinks not to be created automatically. I'd prefer not adding those manually in nix package if possible.

@errnoh
Copy link
Contributor Author

errnoh commented Mar 5, 2024

Added version with the symlinks manually created, this is the first version that actually results in a working package!

@deftdawg
Copy link

deftdawg commented Mar 8, 2024

Quick how to test zluda out using Blender:

# nvcc must be present for blender to attempt to detect CUDA
NIXPKGS_CUDA_SUPPORT=1 NIXPKGS_ALLOW_UNFREE=1 nix-shell -p cudaPackages.cuda_nvcc

## Blender
nix-shell -p blender

nix run nixpkgs#nixpkgs-review -- pr 288644
alias zluda="LD_LIBRARY_PATH=$(env | grep -oE '\-L([^ ]*zluda[^ ]*/lib)' | sed -e 's/^-L//' | head -1):${LD_LIBRARY_PATH}"
zluda blender
# Edit -> Preferences -> System -> CUDA - should see AMD card with (Zluda)

Also tried to run pytorch as well but unfortunately, couldn't get it working because of these... ran out of time:

  1. libtorch_cuda.so: undefined symbol: ncclCommRegister (fixable by downgrading to 2.1.2; but don't know how to do that from nix-shell)
NIXPKGS_CUDA_SUPPORT=1 NIXPKGS_ALLOW_UNFREE=1 nix-shell -p python311Packages.pytorch-bin python311Packages.numpy
  1. import torch -> OSError: libstdc++.so.6: cannot open shared object file: No such file or directory
python -m venv .venv
. .venv/bin/activate
pip install "torch==2.1.2" numpy

@ulrikstrid
Copy link
Member

@deftdawg I think you can add lib.stdenv.cc.lib (or something like that, I'm on my phone) to get past that last error

@errnoh
Copy link
Contributor Author

errnoh commented Mar 23, 2024

There doesn't seem to be that many unresolved issues here. Any suggestions for remaining work or should we start moving towards getting this merged?

preInstall = ''
mkdir -p $out/lib/
find target/release/ -maxdepth 1 -type l -name '*.so*' -exec \
cp --recursive --no-clobber --target-directory=$out/lib/ {} +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not a change request)

  • Do buildRustPackage/cargo setup hooks not install these automatically?
  • Do we need --recursive for .so`?..

Copy link
Contributor

@jcaesar jcaesar Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • buildRustPackage doesn't install these automatically because they're created by an upstream script that doesn't respect cargo's environment variables correctly. (The script places them into target/release instead of target/$ARCH/release.)
  • I think --recursive is necessary here to correctly copy the symlinks instead of turning them into files. But my memory is hazy.

Comment on lines 41 to 44
# Comment out zluda_blaslt in Cargo.toml
sed -i '/zluda_blaslt/d' Cargo.toml
# TODO: investigate test failure (the test seems to require build time env vars that aren't set on linux?)
rm zluda_inject/tests/inject.rs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reference to a more detailed documentation of the errors is desirable. A gist with the build logs or, better yet, a github issue. A very short textual description of the issue so people can't get the idea without opening the link.

E.g.

  • "seems to require build time env vars that aren't set on linux" -> "[seems to ]require(s) variables X and Y during ZZZZ"
  • "Comment out zluda_blaslt in Cargo.toml" -> "zluda_blaslt used upstream for XXXX, disabled because YYYY (link to the issue)"

Copy link
Contributor

@jcaesar jcaesar Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • For inject: disable test written for windows only: https://github.com/vosen/ZLUDA/blob/774f4bcb37c39f876caf80ae0d39420fa4bc1c8b/zluda_inject/tests/inject.rs#L55? (Though I see quite a few packages that just set doCheck = false; with no explanation or comments like tests fail.)
  • For blaslt: Sorry, already forgot, you'll have to reinvestigate this @errnoh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the above note for the inject.rs, starting a separate conversation to address the zluda_blaslt part

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Apr 8, 2024

Result of nixpkgs-review pr 288644 run on x86_64-linux 1

1 package failed to build:
  • zluda

EDIT: Ignore this, it was a timeout

@errnoh
Copy link
Contributor Author

errnoh commented Apr 9, 2024

Resolved most of the remaining conversations in the latest commit.

Thought also that it's probably good idea to mention that the Cargo.lock has been generated by just running cargo xtask --release in the ZLUDA repo root as per official install instructions and then copying the resulting Cargo.lock file from there. In case someone wants to update it or make some changes while testing things.

@errnoh
Copy link
Contributor Author

errnoh commented Apr 9, 2024

Main conversation still remaining is the commenting out of zluda_blaslt during the build process. That was likely initially done by myself in order to unblock the progress and get at least something to build.

When left uncommented the build results in:

   Compiling hip_common v0.0.0 (/build/ZLUDA/hip_common)
   Compiling zluda_dark_api v0.0.0 (/build/ZLUDA/zluda_dark_api)
   Compiling comgr v0.0.0 (/build/ZLUDA/comgr)
   Compiling zluda_blaslt v0.0.0 (/build/ZLUDA/zluda_blaslt)
   Compiling zluda_sparse v0.0.0 (/build/ZLUDA/zluda_sparse)
   Compiling zluda_blas v0.0.0 (/build/ZLUDA/zluda_blas)
   Compiling zluda_fft v0.0.0 (/build/ZLUDA/zluda_fft)
warning: crate `cublasLt` should have a snake case name
  |
  = help: convert the identifier to snake case: `cublas_lt`
  = note: `#[warn(non_snake_case)]` on by default

error: linking with `/nix/store/4cjqvbp1jbkps185wl8qnbjpf8bdy8j9-gcc-wrapper-13.2.0/bin/cc` failed: exit status: 1
  |
  = note: LC_ALL="C" PATH="/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/bin:/nix/store/y027d3bvlaizbri04c1bzh28hqd6lj01-python3-3.11.7/bin:/nix/store/r7a1rz942f5yvbknm262sg369kwbv7b7-cargo-1.75.0/bin:/nix/store/qjv64w8q2higlmsa5wl9dxnizvqplkrp-rustc-wrapper-1.75.0/bin:/nix/store/hkhmxs4n1agpdpyamlh2b78pm9wch0br-cmake-3.27.9/bin:/nix/store/lnl2zcfs4gd0cj2mpc7744s63babv37g-clang-wrapper-16.0.6/bin:/nix/store/s0rk29zc6n3x6xmpb39rypac36k2gpbj-clang-16.0.6/bin:/nix/store/36wymklsa60bigdhb0p3139ws02r46lw-glibc-2.38-44-bin/bin:/nix/store/bicmg5gd50q6igk0y5mga1v0p1lk8f26-coreutils-9.4/bin:/nix/store/3avks95g4s9rij1s47ldzh7h93m43lss-binutils-wrapper-2.40/bin:/nix/store/2ab5740x0cy1d74qvbpl5s28qikmppl5-binutils-2.40/bin:/nix/store/4sf3mmnawkgjyyyzqz5nn8wm0gdvp0wa-auditable-cargo-1.75.0/bin:/nix/store/v3b4la4kh5l7dqzdyraqb1lyfrajfl5w-patchelf-0.15.0/bin:/nix/store/4cjqvbp1jbkps185wl8qnbjpf8bdy8j9-gcc-wrapper-13.2.0/bin:/nix/store/qs1nwzbp2ml3cxzsxihn82hl0w73snr0-gcc-13.2.0/bin:/nix/store/c53f8hagyblvx52zylsnqcc0b3nxbrcl-binutils-wrapper-2.40/bin:/nix/store/bicmg5gd50q6igk0y5mga1v0p1lk8f26-coreutils-9.4/bin:/nix/store/p6fd7piqrin2h0mqxzmvyxyr6pyivndj-findutils-4.9.0/bin:/nix/store/2d582qba31ii28nyrww9bzb00aq06d1g-diffutils-3.10/bin:/nix/store/vd92lhcxs39hbdnzj8ycak5wvj466s3l-gnused-4.9/bin:/nix/store/mn911d51n5lklwr3zy4mdhxa77wzancb-gnugrep-3.11/bin:/nix/store/h53ycc406fmbq3ff0n0rjxdzb6lk9zcn-gawk-5.2.2/bin:/nix/store/1ds6c0i7z4advdr0z210sxgvmq786h09-gnutar-1.35/bin:/nix/store/nf4fhdqgjka360nkibx1yg14gybwb018-gzip-1.13/bin:/nix/store/v3hp6kidlb9yz6j51a0wlbnpclqpi94f-bzip2-1.0.8-bin/bin:/nix/store/15xrks0frcgils8qxfkhspyg6gi9rxdh-gnumake-4.4.1/bin:/nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin:/nix/store/2pi9hb31np2vhy8r9lfih47rf9n51crz-patch-2.7.6/bin:/nix/store/h8vfiwhq6kmvrnj96w52n36c6qm4lbyl-xz-5.4.6-bin/bin:/nix/store/rn6yfzxwp12z0zqavxx1841mh0ypr7jg-file-5.45/bin" VSLANG="1033" "/nix/store/4cjqvbp1jbkps185wl8qnbjpf8bdy8j9-gcc-wrapper-13.2.0/bin/cc" "-Wl,--version-script=/build/rustcahiD0a/list" "-Wl,--no-undefined-version" "-m64" "/build/rustcahiD0a/symbols.o" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/cublasLt.cublasLt.aee62bd5d56fa81a-cgu.0.rcgu.o" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/cublasLt.47gcc4jonrh11t15.rcgu.o" "-Wl,--as-needed" "-L" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps" "-L" "/build/ZLUDA/target/release/deps" "-L" "/opt/rocm/lib/" "-L" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/build/libsqlite3-sys-6f6f9cf7ba865a7f/out" "-L" "/opt/rocm/lib/" "-L" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/build/lz4-sys-cbf01ed93ef0cfde/out" "-L" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-Wl,-Bstatic" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libzluda_dark_api-309235fa31e9aab1.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libthread_id-581b343fe573f8b0.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/liblz4_sys-ca4d797647e2a031.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/liblibc-8fdb7fcb9a4819e0.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libbitflags-5430a0b02c754ead.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libbit_vec-3e4c5620a30ac5e5.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libhip_common-f1ec58a835dcc419.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/liblibloading-a276f34f50629942.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libmemchr-5c4a7f1ebb9fd8aa.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/librustc_hash-dc1a96583e6f3d89.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libcapnp-3dfebef2d6d75177.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libgoblin-92470a2530cf50d1.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libplain-094ec0584b4e1528.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/liblog-ccd73e91eb01d483.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libscroll-bd3a3726a47e373d.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/librusqlite-10164dcf0927e589.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libbitflags-9dda00baa75ae047.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libsmallvec-b0b884bc679f621a.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libserde_json-45d32efa3cf75a1d.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libitoa-118cb57f35bbd8fc.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libryu-462642de0f633fe4.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libserde-adf6c4e5a8285439.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libfallible_streaming_iterator-4cc98b876b55a4ce.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libfallible_iterator-68abc77b7f56c66f.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libhashlink-396894a1c08b212c.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libhashbrown-4d90f8b654fad04f.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libahash-2c0cabd2d851f5a7.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libonce_cell-76d12902c72ad724.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libcfg_if-3c02971f6388fdf8.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libzerocopy-ab20fc1c86e5a482.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/liballocator_api2-4d43a7f0bdc365e3.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/liblibsqlite3_sys-2b4f0e9c94c331a6.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libitertools-e6ec8afd82417cdf.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libeither-d65617cfa6ca3fd6.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libhip_runtime_sys-3f6b8fa5c8000e15.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libcuda_types-f36bb99fa8b90530.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libhipblaslt_sys-a74f769aef006537.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd-bf2160fadd66da13.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libpanic_unwind-56ffc7344a3fa9ec.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libobject-45b585256bbdad6f.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libmemchr-4ff2a73349a27351.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libaddr2line-13b90ddabcfcdff7.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libgimli-d24d3e21c4b6d183.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_demangle-09fdf503f250d6a7.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd_detect-4c6d792e86d76f74.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libhashbrown-67ad1ad36ffec836.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_std_workspace_alloc-ab6e54dadf25bfd1.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libminiz_oxide-0f5ce74ab0128e5b.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libadler-c41e85cdd0e0dfff.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libunwind-b9b23c28f438e60f.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcfg_if-6771b1ae9cbfc2c9.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/liblibc-84992ee57e15ddc0.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/liballoc-bc9174b398261284.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_std_workspace_core-ce683bfa6346b7ff.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcore-baa5449bbf3e5ae7.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcompiler_builtins-e8cdafc9faa29ecd.rlib" "-Wl,-Bdynamic" "-ldl" "-lamdhip64" "-lhipblaslt" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-Wl,--eh-frame-hdr" "-Wl,-z,noexecstack" "-L" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-o" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libcublasLt.so" "-Wl,--gc-sections" "-shared" "-Wl,-z,relro,-z,now" "-Wl,-O1" "-nodefaultlibs"
  = note: /nix/store/2ab5740x0cy1d74qvbpl5s28qikmppl5-binutils-2.40/bin/ld: cannot find -lhipblaslt: No such file or directory
          collect2: error: ld returned 1 exit status

this library likely is generated by hipblaslt-sys directory? Seems like something we should be able to fix in the build, any ideas?

@SomeoneSerge SomeoneSerge mentioned this pull request Apr 9, 2024
34 tasks
@Atemu Atemu mentioned this pull request Apr 10, 2024
@errnoh
Copy link
Contributor Author

errnoh commented Apr 10, 2024

Ah, I see request for hipblaslt added in #197885. Should we aim to release this without zluda_blaslt for now and patch it in once it's available?

@SomeoneSerge
Copy link
Contributor

Ah, I see request for hipblaslt added in #197885. Should we aim to release this without zluda_blaslt for now and patch it in once it's available?

If some non-trivial part of zluda is usable without hipblaslt we should release it. We can update update the comment with a link to the issue(s)

@errnoh
Copy link
Contributor Author

errnoh commented Apr 10, 2024

Ah, I see request for hipblaslt added in #197885. Should we aim to release this without zluda_blaslt for now and patch it in once it's available?

If some non-trivial part of zluda is usable without hipblaslt we should release it. We can update update the comment with a link to the issue(s)

Modified comment to point at the issue, or specifically your comment there. It's a bit weird one to follow as it's a long-running issue that's not closed when the package is implemented, but it's still closest there is so should be ok.

Based on the earlier message by @deftdawg this does seem to already provide enough functionality to work in Blender for example. Sounds like value to me :) Main risk I can see is that some users might be confused or make wrong assumptions about zluda due to the blaslt not being there, but as you said it's likely better to release as it provides value already.

@SomeoneSerge
Copy link
Contributor

Can't afford to test locally rn because of #301937 and I suspect Ofborg would fail as well. @errnoh can you confirm zluda is in a buildable state rn?

Also please update the commit messages ("working nixpkgs build" etc) as per the manual

@errnoh
Copy link
Contributor Author

errnoh commented Apr 11, 2024

Builds fine on the branch I'm developing it, but before answering you I tried with current nixpkgs head and the build that takes 1-2 minutes is now taking hours with 24 cores at 100% so it's hard to say definitely that it's working there. Is this related to #301937 and wth is going on?

edit: looking at the log it's trying to compile composable_kernel-6.0.2

[errnoh@desk:~/dev/nixpkgs]$ nix-build -A zluda
these 3 derivations will be built:
  /nix/store/ycw27kbfiymaxn661ss60xzfw6p59m2s-composable_kernel-6.0.2.drv
  /nix/store/wxxvxzz0rhwh1rzf4xrfpn3ghh485pgx-miopen-6.0.2.drv
  /nix/store/jhxswf95dy2k6l7npj6fq95q6sy9yaz2-zluda-3.drv
building '/nix/store/ycw27kbfiymaxn661ss60xzfw6p59m2s-composable_kernel-6.0.2.drv'...
Running phase: unpackPhase
unpacking source archive /nix/store/awil5vbp1dhj6if3qf4iac8yw7jxkv1m-source
source root is source
Running phase: patchPhase
substituteStream(): WARNING: '--replace' is deprecated, use --replace-{fail,warn,quiet}. (file 'CMakeLists.txt')
Running phase: updateAutotoolsGnuConfigScriptsPhase
Running phase: configurePhase
fixing cmake files...
cmake flags: -DCMAKE_FIND_USE_SYSTEM_PACKAGE_REGISTRY=OFF -DCMAKE_FIND_USE_PACKAGE_REGISTRY=OFF -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=OFF -DCMAKE_INSTALL_LOCALEDIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/share/locale -DCMAKE_INSTALL_LIBEXECDIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/libexec -DCMAKE_INSTALL_LIBDIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/lib -DCMAKE_INSTALL_DOCDIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/share/doc/composable_kernel -DCMAKE_INSTALL_INFODIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/share/info -DCMAKE_INSTALL_MANDIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/share/man -DCMAKE_INSTALL_OLDINCLUDEDIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/include -DCMAKE_INSTALL_INCLUDEDIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/include -DCMAKE_INSTALL_SBINDIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/sbin -DCMAKE_INSTALL_BINDIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/bin -DCMAKE_INSTALL_NAME_DIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/lib -DCMAKE_POLICY_DEFAULT_CMP0025=NEW -DCMAKE_OSX_SYSROOT= -DCMAKE_FIND_FRAMEWORK=LAST -DCMAKE_STRIP=/nix/store/m4d89r3829qzkxr98nzkfdfcs0z4h3cw-rocm-llvm-binutils-6.0.2/bin/strip -DCMAKE_RANLIB=/nix/store/yb0flwbqcwh66b80lyx0ifmxybqzklaj-rocm-llvm-clang-wrapper-6.0.2/bin/ranlib -DCMAKE_AR=/nix/store/yb0flwbqcwh66b80lyx0ifmxybqzklaj-rocm-llvm-clang-wrapper-6.0.2/bin/ar -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_INSTALL_PREFIX=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2 -DCMAKE_C_COMPILER=hipcc -DCMAKE_CXX_COMPILER=hipcc 
-- The C compiler identification is Clang 17.0.0
-- The CXX compiler identification is Clang 17.0.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /nix/store/z5lnja1kc4l9cwn99dm1jgghvgsxj0y4-clr-6.0.2/bin/hipcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /nix/store/z5lnja1kc4l9cwn99dm1jgghvgsxj0y4-clr-6.0.2/bin/hipcc - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /nix/store/y56vmnczakd9p0dsjl6jgnqrkqv04yxx-git-2.44.0/bin/git (found version "2.44.0") 
fatal: not a git repository (or any of the parent directories): .git
GPU_TARGETS= 
checking which targets are supported
-- Performing Test COMPILER_HAS_TARGET_ID_gfx908
-- Performing Test COMPILER_HAS_TARGET_ID_gfx908 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a
-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx940
-- Performing Test COMPILER_HAS_TARGET_ID_gfx940 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx941
-- Performing Test COMPILER_HAS_TARGET_ID_gfx941 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx942
-- Performing Test COMPILER_HAS_TARGET_ID_gfx942 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1030
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1030 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1100
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1100 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1101
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1101 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1102
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1102 - Success
Supported GPU_TARGETS= gfx908;gfx90a;gfx940;gfx941;gfx942;gfx1030;gfx1100;gfx1101;gfx1102
Building CK for the following targets: gfx908;gfx90a;gfx940;gfx941;gfx942;gfx1030;gfx1100;gfx1101;gfx1102
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Success
hip_version_flat=600000000
Adding the fno-offload-uniform-block compiler flag
CMAKE_CXX_COMPILER_ID: Clang
OpenMP_CXX_LIB_NAMES: libomp;libgomp;libiomp5
OpenMP_gomp_LIBRARY: 
OpenMP_pthread_LIBRARY: 
OpenMP_CXX_FLAGS: -fopenmp=libomp -Wno-unused-command-line-argument
-- Build with HIP 6.0.0
-- Clang tidy found: 17.0.0git
-- Clang tidy checks: *,-abseil-*,-android-cloexec-fopen,-cert-msc30-c,-bugprone-exception-escape,-bugprone-macro-parentheses,-cert-env33-c,-cert-msc32-c,-cert-msc50-cpp,-cert-msc51-cpp,-cert-dcl37-c,-cert-dcl51-cpp,-clang-analyzer-alpha.core.CastToStruct,-clang-analyzer-optin.performance.Padding,-clang-diagnostic-deprecated-declarations,-clang-diagnostic-extern-c-compat,-clang-diagnostic-unused-command-line-argument,-cppcoreguidelines-avoid-c-arrays,-cppcoreguidelines-avoid-magic-numbers,-cppcoreguidelines-explicit-virtual-functions,-cppcoreguidelines-init-variables,-cppcoreguidelines-macro-usage,-cppcoreguidelines-non-private-member-variables-in-classes,-cppcoreguidelines-pro-bounds-array-to-pointer-decay,-cppcoreguidelines-pro-bounds-constant-array-index,-cppcoreguidelines-pro-bounds-pointer-arithmetic,-cppcoreguidelines-pro-type-member-init,-cppcoreguidelines-pro-type-reinterpret-cast,-cppcoreguidelines-pro-type-union-access,-cppcoreguidelines-pro-type-vararg,-cppcoreguidelines-special-member-functions,-fuchsia-*,-google-explicit-constructor,-google-readability-braces-around-statements,-google-readability-todo,-google-runtime-int,-google-runtime-references,-hicpp-vararg,-hicpp-braces-around-statements,-hicpp-explicit-conversions,-hicpp-named-parameter,-hicpp-no-array-decay,-hicpp-avoid-c-arrays,-hicpp-signed-bitwise,-hicpp-special-member-functions,-hicpp-uppercase-literal-suffix,-hicpp-use-auto,-hicpp-use-equals-default,-hicpp-use-override,-llvm-header-guard,-llvm-include-order,-llvmlibc-restrict-system-libc-headers,-llvmlibc-callee-namespace,-llvmlibc-implementation-in-namespace,-llvm-else-after-return,-llvm-qualified-auto,-misc-misplaced-const,-misc-non-private-member-variables-in-classes,-misc-no-recursion,-modernize-avoid-bind,-modernize-avoid-c-arrays,-modernize-pass-by-value,-modernize-use-auto,-modernize-use-default-member-init,-modernize-use-equals-default,-modernize-use-trailing-return-type,-modernize-use-transparent-functors,-performance-unnecessary-value-param,-readability-braces-around-statements,-readability-else-after-return,-readability-function-cognitive-complexity,-readability-isolate-declaration,-readability-magic-numbers,-readability-named-parameter,-readability-uppercase-literal-suffix,-readability-convert-member-functions-to-static,-readability-qualified-auto,-readability-redundant-string-init,-bugprone-narrowing-conversions,-cppcoreguidelines-narrowing-conversions,-altera-struct-pack-align,-cppcoreguidelines-prefer-member-initializer
CMAKE_CXX_FLAGS: 
instance should be built for all types!
adding instance device_avg_pool3d_bwd_instance
instance should be built for all types!
adding instance device_batched_gemm_instance
instance should be built for all types!
adding instance device_batched_gemm_add_relu_gemm_add_instance
instance should be built for all types!
adding instance device_batched_gemm_bias_permute_instance
instance should be built for all types!
adding instance device_batched_gemm_gemm_instance
instance should be built for all types!
Found only dl instances, but DL_KERNELS is not set. Skipping.
instance should be built for all types!
adding instance device_batched_gemm_reduce_instance
instance should be built for all types!
adding instance device_batched_gemm_softmax_gemm_instance
instance should be built for all types!
adding instance device_batched_gemm_softmax_gemm_permute_instance
instance should be built for all types!
adding instance device_batchnorm_instance
instance should be built for all types!
adding instance device_column_to_image_instance
instance should be built for all types!
adding instance device_contraction_bilinear_instance
instance should be built for all types!
adding instance device_contraction_scale_instance
instance should be built for all types!
adding instance device_conv1d_bwd_data_instance
instance should be built for all types!
adding instance device_conv2d_bwd_data_instance
removing dl instance device_conv2d_bwd_data_dl_nhwc_kyxc_nhwk_f32_instance.cpp 
removing dl instance device_conv2d_bwd_data_dl_nhwc_kyxc_nhwk_f16_instance.cpp 
removing dl instance device_conv2d_bwd_data_dl_nhwc_kyxc_nhwk_int8_instance.cpp 
instance should be built for all types!
adding instance device_conv2d_fwd_instance
instance should be built for all types!
adding instance device_conv2d_fwd_bias_relu_instance
instance should be built for all types!
adding instance device_conv2d_fwd_bias_relu_add_instance
instance should be built for all types!
adding instance device_conv3d_bwd_data_instance
instance should be built for all types!
adding instance device_elementwise_instance
instance should be built for all types!
adding instance device_elementwise_normalization_instance
instance should be built for all types!
adding instance device_gemm_instance
removing dl instance device_gemm_dl_f32_f32_f32_mk_kn_mn_instance.cpp 
removing dl instance device_gemm_dl_f32_f32_f32_mk_nk_mn_instance.cpp 
removing dl instance device_gemm_dl_f32_f32_f32_km_kn_mn_instance.cpp 
removing dl instance device_gemm_dl_f32_f32_f32_km_nk_mn_instance.cpp 
removing dl instance device_gemm_dl_f16_f16_f16_mk_kn_mn_instance.cpp 
removing dl instance device_gemm_dl_f16_f16_f16_mk_kn_mn_irregular_instance.cpp 
removing dl instance device_gemm_dl_f16_f16_f16_mk_nk_mn_instance.cpp 
removing dl instance device_gemm_dl_f16_f16_f16_mk_nk_mn_irregular_instance.cpp 
removing dl instance device_gemm_dl_f16_f16_f16_km_kn_mn_instance.cpp 
removing dl instance device_gemm_dl_f16_f16_f16_km_kn_mn_irregular_instance.cpp 
removing dl instance device_gemm_dl_f16_f16_f16_km_nk_mn_instance.cpp 
removing dl instance device_gemm_dl_f16_f16_f16_km_nk_mn_irregular_instance.cpp 
removing dl instance device_gemm_dl_i8_i8_i8_mk_kn_mn_instance.cpp 
removing dl instance device_gemm_dl_i8_i8_i8_mk_kn_mn_irregular_instance.cpp 
removing dl instance device_gemm_dl_i8_i8_i8_mk_nk_mn_instance.cpp 
removing dl instance device_gemm_dl_i8_i8_i8_mk_nk_mn_irregular_instance.cpp 
removing dl instance device_gemm_dl_i8_i8_i8_km_kn_mn_instance.cpp 
removing dl instance device_gemm_dl_i8_i8_i8_km_kn_mn_irregular_instance.cpp 
removing dl instance device_gemm_dl_i8_i8_i8_km_nk_mn_instance.cpp 
removing dl instance device_gemm_dl_i8_i8_i8_km_nk_mn_irregular_instance.cpp 
instance should be built for all types!
adding instance device_gemm_add_add_fastgelu_instance
instance should be built for all types!
adding instance device_gemm_add_fastgelu_instance
instance should be built for all types!
adding instance device_gemm_add_multiply_instance
instance should be built for all types!
adding instance device_gemm_add_relu_add_layernorm_instance
instance should be built for all types!
adding instance device_gemm_bias_add_reduce_instance
instance should be built for all types!
adding instance device_gemm_bilinear_instance
instance should be built for all types!
adding instance device_gemm_fastgelu_instance
instance should be built for all types!
adding instance device_gemm_multiply_add_instance
instance should be built for all types!
adding instance device_gemm_reduce_instance
instance should be built for all types!
adding instance device_gemm_splitk_instance
instance should be built for all types!
adding instance device_gemm_streamk_instance
instance should be built for all types!
adding instance device_grouped_conv1d_bwd_weight_instance
instance should be built for all types!
adding instance device_grouped_conv1d_fwd_instance
instance should be built for all types!
adding instance device_grouped_conv2d_bwd_data_instance
instance should be built for all types!
adding instance device_grouped_conv2d_bwd_weight_instance
instance should be built for all types!
adding instance device_grouped_conv2d_fwd_instance
removing dl instance dl/device_grouped_conv2d_fwd_dl_gnhwc_gkyxc_gnhwk_f16_instance.cpp 
removing dl instance dl/device_grouped_conv2d_fwd_dl_gnhwc_gkyxc_gnhwk_f32_instance.cpp 
removing dl instance dl/device_grouped_conv2d_fwd_dl_nhwgc_gkyxc_nhwgk_f16_instance.cpp 
removing dl instance dl/device_grouped_conv2d_fwd_dl_nhwgc_gkyxc_nhwgk_f32_instance.cpp 
instance should be built for all types!
adding instance device_grouped_conv3d_bwd_data_instance
instance should be built for all types!
adding instance device_grouped_conv3d_bwd_weight_instance
instance should be built for all types!
adding instance device_grouped_conv3d_fwd_instance
instance should be built for all types!
adding instance device_grouped_gemm_instance
instance should be built for all types!
adding instance device_grouped_gemm_bias_instance
instance should be built for all types!
adding instance device_grouped_gemm_fastgelu_instance
instance should be built for all types!
adding instance device_grouped_gemm_fixed_nk_instance
instance should be built for all types!
adding instance device_image_to_column_instance
instance should be built for all types!
adding instance device_max_pool_bwd_instance
instance should be built for all types!
adding instance device_normalization_instance
instance should be built for all types!
adding instance device_pool3d_fwd_instance
instance should be built for all types!
adding instance device_quantization_instance
removing dl instance conv2d_fwd/device_conv2d_dl_perlayer_quantization_int8_instance.cpp 
removing dl instance conv2d_fwd/device_conv2d_dl_perchannel_quantization_int8_instance.cpp 
removing dl instance conv2d_fwd/device_conv2d_dl_bias_perlayer_quantization_int8_instance.cpp 
removing dl instance conv2d_fwd/device_conv2d_dl_bias_perchannel_quantization_int8_instance.cpp 
removing dl instance gemm/device_gemm_quantization_dl_c_shuffle_i8_i8_i8_km_kn_mn_instance.cpp 
removing dl instance gemm/device_gemm_quantization_dl_c_shuffle_i8_i8_i8_km_nk_mn_instance.cpp 
removing dl instance gemm/device_gemm_quantization_dl_c_shuffle_i8_i8_i8_mk_kn_mn_instance.cpp 
removing dl instance gemm/device_gemm_quantization_dl_c_shuffle_i8_i8_i8_mk_nk_mn_instance.cpp 
instance should be built for all types!
adding instance device_reduce_instance
instance should be built for all types!
adding instance device_softmax_instance
-- Configuring done (13.2s)
-- Generating done (0.3s)
CMake Warning:
  Manually-specified variables were not used by the project:

    BUILD_TESTING
    CMAKE_EXPORT_NO_PACKAGE_REGISTRY
    CMAKE_POLICY_DEFAULT_CMP0025


-- Build files have been written to: /build/source/build
cmake: enabled parallel building
cmake: enabled parallel installing
Running phase: buildPhase
build flags: -j24 SHELL=/nix/store/a1s263pmsci9zykm5xcdf7x9rv26w6d5-bash-5.2p26/bin/bash
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f16_f16_instance_rank3_reduce1.cpp.o
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_gemm/CMakeFiles/device_batched_gemm_gemm_instance.dir/device_batched_gemm_gemm_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gno_gmo_instance.cpp.o
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/avg_pool3d_bwd/CMakeFiles/device_avg_pool3d_bwd_instance.dir/device_avg_pool3d_bwd_ndhwc_f16_instance.cpp.o
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f16_f16_f16_gmk_gkn_gmn_instance.cpp.o
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_bias_permute/CMakeFiles/device_batched_gemm_bias_permute_instance.dir/device_batched_gemm_bias_permute_m2_n3_k1_xdl_c_shuffle_f16_f16_f16_f16_instance.cpp.o
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_reduce/CMakeFiles/device_batched_gemm_reduce_instance.dir/device_batched_gemm_reduce_xdl_cshuffle_f16_f16_f16_f32_f32_gmk_gkn_gmn_instance.cpp.o
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_softmax_gemm/CMakeFiles/device_batched_gemm_softmax_gemm_instance.dir/device_batched_gemm_softmax_gemm_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gno_gmo_instance.cpp.o
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_softmax_gemm_permute/CMakeFiles/device_batched_gemm_softmax_gemm_permute_instance.dir/device_batched_gemm_softmax_gemm_permute_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gno_gmo_instance.cpp.o
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_f32_kknn_instance.cpp.o
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/column_to_image/CMakeFiles/device_column_to_image_instance.dir/device_column_to_image_nhwc_1d_instance.cpp.o
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_forward_f16_instance.cpp.o
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_kkn_instance.cpp.o
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_bwd_data/CMakeFiles/device_conv2d_bwd_data_instance.dir/device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk_f32_instance.cpp.o
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_f16_instance.cpp.o
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_fwd_bias_relu/CMakeFiles/device_conv2d_fwd_bias_relu_instance.dir/device_conv2d_fwd_xdl_c_shuffle_bias_relu_nhwc_kyxc_nhwk_f16_instance.cpp.o
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_add_relu_gemm_add/CMakeFiles/device_batched_gemm_add_relu_gemm_add_instance.dir/device_batched_gemm_add_relu_gemm_add_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gno_gmo_instance.cpp.o
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_fwd_bias_relu_add/CMakeFiles/device_conv2d_fwd_bias_relu_add_instance.dir/device_conv2d_fwd_xdl_c_shuffle_bias_relu_add_nhwc_kyxc_nhwk_f16_instance.cpp.o
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/conv3d_bwd_data/CMakeFiles/device_conv3d_bwd_data_instance.dir/device_conv3d_bwd_data_xdl_ndhwc_kzyxc_ndhwk_f16_instance.cpp.o
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/elementwise_normalization/CMakeFiles/device_elementwise_normalization_instance.dir/device_elementwise_normalization_f16_instance.cpp.o
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/elementwise/CMakeFiles/device_elementwise_instance.dir/device_normalize_instance.cpp.o
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_add_fastgelu/CMakeFiles/device_gemm_add_add_fastgelu_instance.dir/device_gemm_add_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_f16_km_kn_mn_mn_mn_instance.cpp.o
[  2%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_fwd/CMakeFiles/device_conv2d_fwd_instance.dir/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_fastgelu/CMakeFiles/device_gemm_add_fastgelu_instance.dir/device_gemm_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_km_kn_mn_mn_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f64_f64_f64_mk_kn_mn_instance.cpp.o
[  4%] Built target device_elementwise_instance
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f64_f64_f64_mk_nk_mn_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/avg_pool3d_bwd/CMakeFiles/device_avg_pool3d_bwd_instance.dir/device_avg_pool3d_bwd_ndhwc_bf16_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f16_f16_instance_rank3_reduce2.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f16_f16_instance_rank3_reduce3.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f64_f64_f64_km_kn_mn_instance.cpp.o
[  4%] Built target device_elementwise_normalization_instance
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f64_f64_f64_km_nk_mn_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/avg_pool3d_bwd/CMakeFiles/device_avg_pool3d_bwd_instance.dir/device_avg_pool3d_bwd_ndhwc_f32_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/column_to_image/CMakeFiles/device_column_to_image_instance.dir/device_column_to_image_nhwc_2d_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_knn_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f16_f16_instance_rank4_reduce1.cpp.o
[  4%] Built target device_avg_pool3d_bwd_instance
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_mkn_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f16_f16_f16_gmk_gnk_gmn_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f16_f16_f16_gkm_gkn_gmn_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f32_f32_f32_mk_kn_mn_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_f32_knnn_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_forward_f32_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_gemm/CMakeFiles/device_batched_gemm_gemm_instance.dir/device_batched_gemm_gemm_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gon_gmo_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f16_f16_instance_rank4_reduce2.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_reduce/CMakeFiles/device_batched_gemm_reduce_instance.dir/device_batched_gemm_reduce_xdl_cshuffle_f16_f16_f16_f32_f32_gmk_gnk_gmn_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/column_to_image/CMakeFiles/device_column_to_image_instance.dir/device_column_to_image_nhwc_3d_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_add_relu_gemm_add/CMakeFiles/device_batched_gemm_add_relu_gemm_add_instance.dir/device_batched_gemm_add_relu_gemm_add_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gon_gmo_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_fwd/CMakeFiles/device_conv2d_fwd_instance.dir/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f16_f16_instance_rank4_reduce3.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_mnn_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f32_f32_f32_mk_nk_mn_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_bwd_data/CMakeFiles/device_conv2d_bwd_data_instance.dir/device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp.o
[  4%] Built target device_batched_gemm_bias_permute_instance
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_softmax_gemm_permute/CMakeFiles/device_batched_gemm_softmax_gemm_permute_instance.dir/device_batched_gemm_bias_softmax_gemm_permute_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gno_gmo_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f16_f16_instance_rank4_reduce4.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_f32_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_add_fastgelu/CMakeFiles/device_gemm_add_add_fastgelu_instance.dir/device_gemm_add_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_f16_km_nk_mn_mn_mn_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_fastgelu/CMakeFiles/device_gemm_add_fastgelu_instance.dir/device_gemm_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_km_nk_mn_mn_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_fastgelu/CMakeFiles/device_gemm_add_fastgelu_instance.dir/device_gemm_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_mk_kn_mn_mn_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_fastgelu/CMakeFiles/device_gemm_add_fastgelu_instance.dir/device_gemm_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_mk_nk_mn_mn_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_f32_mknn_instance.cpp.o
[  4%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_f32_mnnn_instance.cpp.o
[  6%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_forward_bf16_instance.cpp.o
[  6%] Built target device_column_to_image_instance
[  6%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_f64_kknn_instance.cpp.o
[  6%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f16_f16_f16_gkm_gnk_gmn_instance.cpp.o
[  8%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_reduce/CMakeFiles/device_batched_gemm_reduce_instance.dir/device_batched_gemm_reduce_xdl_cshuffle_f16_f16_f16_f32_f32_gkm_gkn_gmn_instance.cpp.o
[  8%] Built target device_batched_gemm_gemm_instance
[ 10%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_f64_knnn_instance.cpp.o
[ 12%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f32_f32_instance_rank3_reduce1.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/conv3d_bwd_data/CMakeFiles/device_conv3d_bwd_data_instance.dir/device_conv3d_bwd_data_xdl_ndhwc_kzyxc_ndhwk_f32_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f32_f32_f32_km_kn_mn_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_kkn_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f32_f32_instance_rank3_reduce2.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_bf16_bf16_bf16_gmk_gkn_gmn_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_fwd/CMakeFiles/device_conv2d_fwd_instance.dir/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f32_f32_instance_rank3_reduce3.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f32_f32_instance_rank4_reduce1.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_forward_f64_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_backward_f16_instance.cpp.o
[ 14%] Built target device_batched_gemm_add_relu_gemm_add_instance
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_backward_f32_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_add_fastgelu/CMakeFiles/device_gemm_add_add_fastgelu_instance.dir/device_gemm_add_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_f16_mk_kn_mn_mn_mn_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_backward_bf16_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f32_f32_f32_km_nk_mn_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_f64_mknn_instance.cpp.o
[ 14%] Built target device_conv2d_fwd_bias_relu_instance
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_fwd/CMakeFiles/device_conv2d_fwd_instance.dir/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_add_fastgelu/CMakeFiles/device_gemm_add_add_fastgelu_instance.dir/device_gemm_add_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_f16_mk_nk_mn_mn_mn_instance.cpp.o
[ 14%] Built target device_conv2d_fwd_bias_relu_add_instance
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_bf16_bf16_bf16_gmk_gnk_gmn_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_knn_instance.cpp.o
[ 16%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_mkn_instance.cpp.o
[ 16%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f32_f32_instance_rank4_reduce2.cpp.o
[ 16%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_reduce/CMakeFiles/device_batched_gemm_reduce_instance.dir/device_batched_gemm_reduce_xdl_cshuffle_f16_f16_f16_f32_f32_gkm_gnk_gmn_instance.cpp.o
[ 16%] Built target device_batched_gemm_softmax_gemm_instance
[ 16%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f32_f32_instance_rank4_reduce3.cpp.o
[ 16%] Building CXX object library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_bf16_instance.cpp.o
[ 16%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_mnn_instance.cpp.o
[ 16%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_bwd_data/CMakeFiles/device_conv2d_bwd_data_instance.dir/device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk_f16_instance.cpp.o
[ 16%] Built target device_gemm_add_fastgelu_instance
[ 16%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_multiply/CMakeFiles/device_gemm_add_multiply_instance.dir/device_gemm_add_multiply_xdl_c_shuffle_f16_f16_f16_f16_f16_km_kn_mn_mn_mn_instance.cpp.o
[ 16%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_multiply/CMakeFiles/device_gemm_add_multiply_instance.dir/device_gemm_add_multiply_xdl_c_shuffle_f16_f16_f16_f16_f16_km_nk_mn_mn_mn_instance.cpp.o
[ 16%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_softmax_gemm_permute/CMakeFiles/device_batched_gemm_softmax_gemm_permute_instance.dir/device_batched_gemm_softmax_gemm_permute_xdl_cshuffle_bf16_bf16_bf16_bf16_gmk_gnk_gno_gmo_instance.cpp.o
[ 18%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f32_f32_f32_mk_kn_mn_instance.cpp.o
[ 18%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f32_f32_instance_rank4_reduce4.cpp.o
[ 18%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_relu_add_layernorm/CMakeFiles/device_gemm_add_relu_add_layernorm_instance.dir/device_gemm_add_relu_add_xdl_c_shuffle_layernorm_f16_km_kn_mn_mn_mn_instance.cpp.o
^[[A[ 18%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_f64_mnnn_instance.cpp.o
[ 20%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_relu_add_layernorm/CMakeFiles/device_gemm_add_relu_add_layernorm_instance.dir/device_gemm_add_relu_add_xdl_c_shuffle_layernorm_f16_km_nk_mn_mn_mn_instance.cpp.o
[ 20%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_relu_add_layernorm/CMakeFiles/device_gemm_add_relu_add_layernorm_instance.dir/device_gemm_add_relu_add_xdl_c_shuffle_layernorm_f16_mk_kn_mn_mn_mn_instance.cpp.o
[ 20%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_multiply/CMakeFiles/device_gemm_add_multiply_instance.dir/device_gemm_add_multiply_xdl_c_shuffle_f16_f16_f16_f16_f16_mk_kn_mn_mn_mn_instance.cpp.o
[ 20%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_backward_f64_instance.cpp.o
[ 20%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_bf16_bf16_bf16_gkm_gkn_gmn_instance.cpp.o
[ 20%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_softmax_gemm_permute/CMakeFiles/device_batched_gemm_softmax_gemm_permute_instance.dir/device_batched_gemm_bias_softmax_gemm_permute_xdl_cshuffle_bf16_bf16_bf16_bf16_gmk_gnk_gno_gmo_instance.cpp.o
[ 20%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_fwd/CMakeFiles/device_conv2d_fwd_instance.dir/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp.o
[ 20%] Built target device_softmax_instance
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_bwd_data/CMakeFiles/device_conv2d_bwd_data_instance.dir/device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk_int8_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/conv3d_bwd_data/CMakeFiles/device_conv3d_bwd_data_instance.dir/device_conv3d_bwd_data_xdl_ndhwc_kzyxc_ndhwk_bf16_instance.cpp.o
[ 22%] Built target device_contraction_scale_instance
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/conv3d_bwd_data/CMakeFiles/device_conv3d_bwd_data_instance.dir/device_conv3d_bwd_data_xdl_ndhwc_kzyxc_ndhwk_int8_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f32_f32_f32_mk_nk_mn_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bias_add_reduce/CMakeFiles/device_gemm_bias_add_reduce_instance.dir/device_gemm_bias_add_mean_squaremean_xdl_cshuffle_f16_f16_f16_f32_f32_mk_kn_mn_instance.cpp.o
[ 22%] Built target device_batched_gemm_reduce_instance
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bias_add_reduce/CMakeFiles/device_gemm_bias_add_reduce_instance.dir/device_gemm_bias_add_mean_squaremean_xdl_cshuffle_f16_f16_f16_f32_f32_mk_nk_mn_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_multiply/CMakeFiles/device_gemm_add_multiply_instance.dir/device_gemm_add_multiply_xdl_c_shuffle_f16_f16_f16_f16_f16_mk_nk_mn_mn_mn_instance.cpp.o
[ 22%] Built target device_contraction_bilinear_instance
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_bf16_bf16_bf16_gkm_gnk_gmn_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_int8_instance.cpp.o
[ 22%] Built target device_gemm_add_add_fastgelu_instance
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bias_add_reduce/CMakeFiles/device_gemm_bias_add_reduce_instance.dir/device_gemm_bias_add_mean_squaremean_xdl_cshuffle_f16_f16_f16_f32_f32_km_kn_mn_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_infer_f16_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f32_f32_f32_km_kn_mn_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_relu_add_layernorm/CMakeFiles/device_gemm_add_relu_add_layernorm_instance.dir/device_gemm_add_relu_add_xdl_c_shuffle_layernorm_f16_mk_nk_mn_mn_mn_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f32_f32_f32_km_nk_mn_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_infer_f32_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_km_kn_mn_instance.cpp.o
^[[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_infer_bf16_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bilinear/CMakeFiles/device_gemm_bilinear_instance.dir/device_gemm_bilinear_xdl_c_shuffle_f16_f16_f16_f16_km_kn_mn_mn_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bilinear/CMakeFiles/device_gemm_bilinear_instance.dir/device_gemm_bilinear_xdl_c_shuffle_f16_f16_f16_f16_km_nk_mn_mn_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f32_f32_f32_gmk_gkn_gmn_instance.cpp.o
[ 24%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_infer_f64_instance.cpp.o
[ 24%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_fastgelu/CMakeFiles/device_gemm_fastgelu_instance.dir/device_gemm_fastgelu_xdl_c_shuffle_f16_f16_f16_km_kn_mn_instance.cpp.o
[ 24%] Built target device_batchnorm_instance
[ 24%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bias_add_reduce/CMakeFiles/device_gemm_bias_add_reduce_instance.dir/device_gemm_bias_add_mean_squaremean_xdl_cshuffle_f16_f16_f16_f32_f32_km_nk_mn_instance.cpp.o
[ 24%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp.o
[ 24%] Built target device_conv2d_fwd_instance
[ 26%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bilinear/CMakeFiles/device_gemm_bilinear_instance.dir/device_gemm_bilinear_xdl_c_shuffle_f16_f16_f16_f16_mk_kn_mn_mn_instance.cpp.o
[ 26%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_fastgelu/CMakeFiles/device_gemm_fastgelu_instance.dir/device_gemm_fastgelu_xdl_c_shuffle_f16_f16_f16_km_nk_mn_instance.cpp.o
[ 26%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_fastgelu/CMakeFiles/device_gemm_fastgelu_instance.dir/device_gemm_fastgelu_xdl_c_shuffle_f16_f16_f16_mk_kn_mn_instance.cpp.o
[ 28%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_fastgelu/CMakeFiles/device_gemm_fastgelu_instance.dir/device_gemm_fastgelu_xdl_c_shuffle_f16_f16_f16_mk_nk_mn_instance.cpp.o
[ 28%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_multiply_add/CMakeFiles/device_gemm_multiply_add_instance.dir/device_gemm_multiply_add_xdl_c_shuffle_f16_f16_f16_f16_f16_mk_kn_mn_mn_mn_instance.cpp.o
[ 30%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f32_f32_f32_gmk_gnk_gmn_instance.cpp.o
[ 30%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_multiply_add/CMakeFiles/device_gemm_multiply_add_instance.dir/device_gemm_multiply_add_xdl_c_shuffle_f16_f16_f16_f16_f16_mk_nk_mn_mn_mn_instance.cpp.o
[ 30%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bilinear/CMakeFiles/device_gemm_bilinear_instance.dir/device_gemm_bilinear_xdl_c_shuffle_f16_f16_f16_f16_mk_nk_mn_mn_instance.cpp.o
[ 30%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_mk_kn_mn_instance.cpp.o
[ 30%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bilinear/CMakeFiles/device_gemm_bilinear_instance.dir/device_gemm_bilinear_wmma_c_shuffle_i8_i8_i8_i8_km_kn_mn_mn_instance.cpp.o
[ 30%] Built target device_conv2d_bwd_data_instance
[ 30%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bilinear/CMakeFiles/device_gemm_bilinear_instance.dir/device_gemm_bilinear_wmma_c_shuffle_i8_i8_i8_i8_km_nk_mn_mn_instance.cpp.o
[ 30%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bilinear/CMakeFiles/device_gemm_bilinear_instance.dir/device_gemm_bilinear_wmma_c_shuffle_i8_i8_i8_i8_mk_kn_mn_mn_instance.cpp.o
[ 30%] Built target device_gemm_add_multiply_instance
[ 30%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bilinear/CMakeFiles/device_gemm_bilinear_instance.dir/device_gemm_bilinear_wmma_c_shuffle_i8_i8_i8_i8_mk_nk_mn_mn_instance.cpp.o
[ 30%] Built target device_conv3d_bwd_data_instance
[ 32%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_multiply_add/CMakeFiles/device_gemm_multiply_add_instance.dir/device_gemm_multiply_add_xdl_c_shuffle_f16_f8_f32_f32_f16_mk_kn_mn_mn_mn_instance.cpp.o
[ 32%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_multiply_add/CMakeFiles/device_gemm_multiply_add_instance.dir/device_gemm_multiply_add_xdl_c_shuffle_f16_f8_f32_f32_f16_mk_nk_mn_mn_mn_instance.cpp.o
[ 32%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_mk_nk_mn_instance.cpp.o
[ 32%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f32_f32_f32_gkm_gkn_gmn_instance.cpp.o
[ 32%] Built target device_conv1d_bwd_data_instance
[ 32%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_km_kn_mn_irregular_instance.cpp.o
[ 32%] Built target device_gemm_bias_add_reduce_instance
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_km_nk_mn_irregular_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_mk_kn_mn_irregular_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_reduce/CMakeFiles/device_gemm_reduce_instance.dir/device_gemm_reduce_xdl_cshuffle_f16_f16_f16_f32_f32_mk_kn_mn_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f32_f32_f32_gkm_gnk_gmn_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_mk_nk_mn_irregular_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f32_f32_f32_mk_kn_mn_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f32_f32_f32_mk_nk_mn_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_streamk/CMakeFiles/device_gemm_streamk_instance.dir/device_gemm_xdl_streamk_f16_f16_f16_mk_kn_mn_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_reduce/CMakeFiles/device_gemm_reduce_instance.dir/device_gemm_reduce_xdl_cshuffle_f16_f16_f16_f32_f32_mk_nk_mn_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_int8_int8_int8_gmk_gkn_gmn_instance.cpp.o
[ 34%] Built target device_gemm_fastgelu_instance
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_reduce/CMakeFiles/device_gemm_reduce_instance.dir/device_gemm_reduce_xdl_cshuffle_f16_f16_f16_f32_f32_km_kn_mn_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_reduce/CMakeFiles/device_gemm_reduce_instance.dir/device_gemm_reduce_xdl_cshuffle_f16_f16_f16_f32_f32_km_nk_mn_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f16_f16_f16_mk_kn_mn_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_int8_int8_int8_gmk_gnk_gmn_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f32_f32_f32_km_kn_mn_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f32_f32_f32_km_nk_mn_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f16_f16_f16_mk_kn_mn_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f16_f16_f16_mk_nk_mn_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f16_f16_f16_mk_nk_mn_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv1d_bwd_weight/CMakeFiles/device_grouped_conv1d_bwd_weight_instance.dir/device_grouped_conv1d_bwd_weight_xdl_gnwc_gkxc_gnwk_f16_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv1d_bwd_weight/CMakeFiles/device_grouped_conv1d_bwd_weight_instance.dir/device_grouped_conv1d_bwd_weight_xdl_gnwc_gkxc_gnwk_f32_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f16_f16_f16_km_kn_mn_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_int8_int8_int8_gkm_gkn_gmn_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/CMakeFiles/device_grouped_conv1d_fwd_instance.dir/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_bf16_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_int8_int8_int8_gkm_gnk_gmn_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f16_f16_f16_km_kn_mn_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f16_f16_f16_km_nk_mn_instance.cpp.o
[ 36%] Built target device_gemm_multiply_add_instance
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_fp8_f16_f16_mk_kn_mn_instance.cpp.o
[ 36%] Built target device_gemm_bilinear_instance
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_fp8_f16_f16_mk_nk_mn_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/xdl/device_grouped_conv2d_bwd_data_xdl_gnhwc_gkyxc_gnhwk_f16_instance.cpp.o
[ 36%] Built target device_gemm_add_relu_add_layernorm_instance
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f16_f16_f16_km_nk_mn_instance.cpp.o
[ 38%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/CMakeFiles/device_grouped_conv1d_fwd_instance.dir/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_f16_instance.cpp.o
[ 38%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/xdl/device_grouped_conv2d_bwd_data_xdl_gnhwc_gkyxc_gnhwk_bf16_instance.cpp.o
[ 38%] Built target device_gemm_streamk_instance
[ 38%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/xdl/device_grouped_conv2d_bwd_data_xdl_gnhwc_gkyxc_gnhwk_f32_instance.cpp.o
[ 38%] Built target device_gemm_reduce_instance
[ 38%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv1d_bwd_weight/CMakeFiles/device_grouped_conv1d_bwd_weight_instance.dir/device_grouped_conv1d_bwd_weight_xdl_gnwc_gkxc_gnwk_bf16_instance.cpp.o
[ 38%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_2_stage_f16_f16_f16_mk_nk_mn_instance.cpp.o
[ 38%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_add_instance.cpp.o
[ 38%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_fp8_f16_f16_km_kn_mn_instance.cpp.o
[ 38%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/xdl/device_grouped_conv2d_bwd_data_xdl_nhwgc_gkyxc_nhwgk_f16_instance.cpp.o
[ 40%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_default_pipeline_v1_instance.cpp.o
[ 40%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/CMakeFiles/device_grouped_conv1d_fwd_instance.dir/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_f32_instance.cpp.o
[ 40%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_weight/CMakeFiles/device_grouped_conv2d_bwd_weight_instance.dir/device_grouped_conv2d_bwd_weight_xdl_gnhwc_gkyxc_gnhwk_f16_instance.cpp.o
[ 40%] Built target device_batched_gemm_softmax_gemm_permute_instance
[ 42%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_weight/CMakeFiles/device_grouped_conv2d_bwd_weight_instance.dir/device_grouped_conv2d_bwd_weight_xdl_gnhwc_gkyxc_gnhwk_f32_instance.cpp.o
[ 42%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_weight/CMakeFiles/device_grouped_conv2d_bwd_weight_instance.dir/device_grouped_conv2d_bwd_weight_xdl_gnhwc_gkyxc_gnhwk_bf16_instance.cpp.o
[ 42%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_default_pipeline_v2_instance.cpp.o
[ 42%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/CMakeFiles/device_grouped_conv1d_fwd_instance.dir/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_int8_instance.cpp.o
[ 42%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/xdl/device_grouped_conv2d_bwd_data_xdl_nhwgc_gkyxc_nhwgk_bf16_instance.cpp.o
[ 42%] Built target device_batched_gemm_instance
[ 42%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/xdl/device_grouped_conv2d_bwd_data_xdl_nhwgc_gkyxc_nhwgk_f32_instance.cpp.o
[ 44%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/wmma/device_grouped_conv2d_bwd_data_wmma_gnhwc_gkyxc_gnhwk_f16_1x1s1p0_instance.cpp.o
[ 44%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_fp8_f16_f16_km_nk_mn_instance.cpp.o
[ 46%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f16_fp8_f16_mk_kn_mn_instance.cpp.o
[ 46%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/CMakeFiles/device_grouped_conv2d_fwd_instance.dir/xdl/device_grouped_conv2d_fwd_xdl_gnhwc_gkyxc_gnhwk_bf16_instance.cpp.o
[ 46%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/CMakeFiles/device_grouped_conv2d_fwd_instance.dir/xdl/device_grouped_conv2d_fwd_xdl_gnhwc_gkyxc_gnhwk_f16_instance.cpp.o
[ 46%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/CMakeFiles/device_grouped_conv2d_fwd_instance.dir/xdl/device_grouped_conv2d_fwd_xdl_gnhwc_gkyxc_gnhwk_f32_instance.cpp.o
[ 46%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f16_fp8_f16_mk_nk_mn_instance.cpp.o
[ 46%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/wmma/device_grouped_conv2d_bwd_data_wmma_nhwgc_gkyxc_nhwgk_f16_1x1s1p0_instance.cpp.o
[ 46%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/wmma/device_grouped_conv2d_bwd_data_wmma_gnhwc_gkyxc_gnhwk_i8_1x1s1p0_instance.cpp.o
[ 46%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/wmma/device_grouped_conv2d_bwd_data_wmma_nhwgc_gkyxc_nhwgk_i8_1x1s1p0_instance.cpp.o
[ 46%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_default_pipeline_v2_opt_instance.cpp.o
[ 46%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_interwave_pipeline_v1_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_data/CMakeFiles/device_grouped_conv3d_bwd_data_instance.dir/xdl/device_grouped_conv3d_bwd_data_xdl_gndhwc_gkzyxc_gndhwk_f16_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_data/CMakeFiles/device_grouped_conv3d_bwd_data_instance.dir/xdl/device_grouped_conv3d_bwd_data_xdl_gndhwc_gkzyxc_gndhwk_bf16_instance.cpp.o
[ 48%] Built target device_grouped_conv1d_bwd_weight_instance
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_data/CMakeFiles/device_grouped_conv3d_bwd_data_instance.dir/xdl/device_grouped_conv3d_bwd_data_xdl_gndhwc_gkzyxc_gndhwk_f32_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_weight/CMakeFiles/device_grouped_conv2d_bwd_weight_instance.dir/device_grouped_conv2d_bwd_weight_xdl_nhwgc_gkyxc_nhwgk_f16_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/wmma/device_grouped_conv2d_bwd_data_wmma_gnhwc_gkyxc_gnhwk_f16_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f16_fp8_f16_km_kn_mn_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f16_fp8_f16_km_nk_mn_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_irregular_default_pipeline_v1_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_irregular_default_pipeline_v2_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_weight/CMakeFiles/device_grouped_conv3d_bwd_weight_instance.dir/device_grouped_conv3d_bwd_weight_xdl_gndhwc_gkzyxc_gndhwk_f16_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/wmma/device_grouped_conv2d_bwd_data_wmma_nhwgc_gkyxc_nhwgk_f16_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_irregular_interwave_pipeline_v1_instance.cpp.o
[ 48%] Built target device_grouped_conv1d_fwd_instance
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_nk_mn_add_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_weight/CMakeFiles/device_grouped_conv3d_bwd_weight_instance.dir/device_grouped_conv3d_bwd_weight_xdl_gndhwc_gkzyxc_gndhwk_f32_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_data/CMakeFiles/device_grouped_conv3d_bwd_data_instance.dir/xdl/device_grouped_conv3d_bwd_data_xdl_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_nk_mn_default_pipeline_v1_instance.cpp.o
[ 50%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_nk_mn_default_pipeline_v2_instance.cpp.o
[ 50%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/CMakeFiles/device_grouped_conv2d_fwd_instance.dir/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_instance.cpp.o
[ 50%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_weight/CMakeFiles/device_grouped_conv3d_bwd_weight_instance.dir/device_grouped_conv3d_bwd_weight_xdl_gndhwc_gkzyxc_gndhwk_bf16_instance.cpp.o
[ 50%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_weight/CMakeFiles/device_grouped_conv2d_bwd_weight_instance.dir/device_grouped_conv2d_bwd_weight_xdl_nhwgc_gkyxc_nhwgk_f32_instance.cpp.o
[ 50%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_weight/CMakeFiles/device_grouped_conv2d_bwd_weight_instance.dir/device_grouped_conv2d_bwd_weight_xdl_nhwgc_gkyxc_nhwgk_bf16_instance.cpp.o
[ 50%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/CMakeFiles/device_grouped_conv3d_fwd_instance.dir/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_bf16_instance.cpp.o
[ 50%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/CMakeFiles/device_grouped_conv3d_fwd_instance.dir/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_f16_instance.cpp.o
[ 50%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_nk_mn_default_pipeline_v2_opt_instance.cpp.o
[ 50%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_gemm/CMakeFiles/device_grouped_gemm_instance.dir/device_grouped_gemm_xdl_f16_f16_f16_mk_kn_mn_instance.cpp.o
[ 50%] Built target device_gemm_splitk_instance
[ 50%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_nk_mn_interwave_pipeline_v1_instance.cpp.o
^Cerror: interrupted by the user

That's completely different from the build from my dev branch.

@Atemu
Copy link
Member

Atemu commented Apr 11, 2024

@errnoh that's likely just some transitive dep which hydra simply hasn't built yet. Master is not guaranteed to be in the binary cache; it's the development trunk basically. Use a channel such as nixpkgs-unstable or nixos-unstable as your base; that's good enough usually.

Copy link
Member

@Atemu Atemu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur with @SomeoneSerge, this is working for one real-world use-case already; let's ship it.

Some minor nits.

Also please squash the commits into logical units and title them according to the contributor's manual.

pkgs/by-name/zl/zluda/package.nix Outdated Show resolved Hide resolved
pkgs/by-name/zl/zluda/package.nix Show resolved Hide resolved
@SomeoneSerge
Copy link
Contributor

that's likely just some transitive dep which hydra simply hasn't built yet

I linked #301937 but actually it's a separate issue: Hydra discards composable_kernel's outputs because of the size limit

@ulrikstrid
Copy link
Member

I can run a nixpkgs-review tomorrow

@ulrikstrid
Copy link
Member

Result of nixpkgs-review pr 288644 run on x86_64-linux 1

1 package built:
  • zluda

Copy link
Member

@ulrikstrid ulrikstrid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view this looks good. I'm not sure how I can test it more than building but others seems to have done that.

So after some squashing this should be good to go

@errnoh
Copy link
Contributor Author

errnoh commented Apr 13, 2024

Newest version with minor changes based on previous round of comments. Now squashed into a single commit.

Copy link
Contributor

@SomeoneSerge SomeoneSerge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with import (builtins.getFlake github:errnoh/nixpkgs/add-zluda) { config.allowUnfree = true; };

let
  saxpy' = runCommand "saxpy-zluda" { nativeBuildInputs = [ makeWrapper ]; } ''
    mkdir "$out/bin" -p
    args=(
      makeWrapper
      ${lib.getExe cudaPackages.saxpy}
      "$out/bin/$name"
      --prefix LD_LIBRARY_PATH : "${lib.getLib zluda}/lib"
    )
    "''${args[@]}"
  '';
in

(singularity-tools.buildImage rec {
  name = "zluda";
  contents = [
    cudaPackages.saxpy
    saxpy'
  ];
  diskSize = 1024 * 64;
  memSize = diskSize;
}) // {
  passthru = { inherit saxpy'; };
}
❯ nom build -f zluda.nix
❯ rsync -LP zluda.sif lumi.csc.fi:proj-nixpkgs/
❯ ssh lumi.csc.fi srun --account=project_$lumi_project --partition=small-g --ntasks=1 --gpus-per-node=1 --time=00:05:00 singularity exec proj-nixpkgs/zluda.sif saxpy-zluda
srun: job 6896251 queued and waiting for resources
srun: job 6896251 has been allocated resources
WARNING: passwd file doesn't exist in container, not updating
WARNING: group file doesn't exist in container, not updating
Start
Runtime version: 12020
Driver version: 12020
Host memory initialized, copying to the device
Scheduled a cudaMemcpy, calling the kernel
Scheduled a kernel call
Max error: 0.000000

@SomeoneSerge SomeoneSerge merged commit 1b69196 into NixOS:master Apr 13, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

9 participants