Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ollama 0.1.31 -> 0.1.33 #309330

Closed
wants to merge 1 commit into from
Closed

Conversation

nsbuitrago
Copy link

Description of changes

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.05 Release Notes (or backporting 23.05 and 23.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

Copy link
Contributor

@drupol drupol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit log message is wrong, this is why the CI is not triggering the tests.

@abysssol
Copy link
Contributor

abysssol commented May 5, 2024

I tested this (by running nix-build -A ollama.passthru.tests in the nixpkgs root), and 0.1.33 has the same problem as 0.1.32 (see #304823) in that it breaks the rocm compilation. It seems that everything builds right, until the end when a check in the build script finds that the compiled artifact doesn't have any dependencies on the rocm libraries. I did test what happens if I disable the check and run the compiled ollama binary anyway, but it just crashes almost instantly after running it.

So, I'm against merging this into the current nixpkgs-unstable, since this is a breaking change that removes support for rocm, and breaking changes are restricted since the next stable release will be soon (#303286).

I may be alright with merging this into the next unstable after nixos-24.05 has released, but until then I think it would be a bad idea to merge a breaking change.

Copy link
Contributor

@abysssol abysssol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, since it's known that rocm is broken, meta.broken should be updated accordingly, unless/until the rocm build can be fixed.

   meta = {
     description = "Get up and running with large language models locally";
     homepage = "https://github.com/ollama/ollama";
     changelog = "https://github.com/ollama/ollama/releases/tag/v${version}";
     license = licenses.mit;
     platforms = platforms.unix;
+    broken = enableRocm;
     mainProgram = "ollama";
     maintainers = with maintainers; [ abysssol dit7ya elohmeier ];
   };

@drupol drupol mentioned this pull request May 6, 2024
13 tasks
@onny
Copy link
Contributor

onny commented May 10, 2024

ollama 0.1.34 released

@onny
Copy link
Contributor

onny commented May 10, 2024

If possible please move the package derivation to pkgs/by-name

@redyf
Copy link
Member

redyf commented May 11, 2024

ollama 0.1.36 has been released
https://github.com/ollama/ollama/releases/tag/v0.1.36

@ejiektpobehuk
Copy link

ollama 0.1.37 has been released

@nsbuitrago
Copy link
Author

nsbuitrago commented May 13, 2024

I tested this (by running nix-build -A ollama.passthru.tests in the nixpkgs root), and 0.1.33 has the same problem as 0.1.32 (see #304823) in that it breaks the rocm compilation. It seems that everything builds right, until the end when a check in the build script finds that the compiled artifact doesn't have any dependencies on the rocm libraries. I did test what happens if I disable the check and run the compiled ollama binary anyway, but it just crashes almost instantly after running it.

as far as missing dependencies, I was able to get passed this using the rocm-core package in rocmPath. There is a collision between rocmPackages-clr and clr from rocm-core, so I removed it for now to test.

  rocmPath = buildEnv {
    name = "rocm-path";
    paths = [
      rocmPackages.hipblas
      rocmPackages.rocblas
      rocmPackages.rocsolver
      rocmPackages.rocsparse
      rocmPackages.rocm-device-libs
      rocmClang
      rocmPackages.rocm-core
    ];
  };

I also had to set the CLBlast_DIR path and append some paths to CMAKE_PREFIX_PATH at the pre-build step for this to work for me.

  preBuild = ''
    # disable uses of `git`, since nix removes the git directory
    export OLLAMA_SKIP_PATCHING=true
    # build llama.cpp libraries for ollama
    # set CLBlast_DIR and append ROCM path to CMAKE_PREFIX_PATH
    ${lib.optionalString enableRocm
      ''export CLBlast_DIR="${clblast}/lib/cmake/CLBlast" \
      export CMAKE_PREFIX_PATH="${rocmPath}:${rocmPackages.rocm-comgr}/lib/cmake:$CMAKE_PREFIX_PATH" ''}
    go generate ${lib.optionalString enableRocm ''-tags rocm''} ./...
  '';

However, this brings me to other errors, specifically with building for gfx1010. It seems like some libraries are still not built correctly. I have played around with the build script but have had no luck for now.

@volfyd
Copy link
Contributor

volfyd commented May 13, 2024

I haven't fully tested, but I was able to get ollama to see my AMD card. It may just work?

Here's what I changed from your original PR from last week:

diff --git a/pkgs/tools/misc/ollama/default.nix b/pkgs/tools/misc/ollama/default.nix
index 1934ac80a..b0dc7a9da 100644
--- a/pkgs/tools/misc/ollama/default.nix
+++ b/pkgs/tools/misc/ollama/default.nix
@@ -24,7 +24,7 @@
 
 , config
   # one of `[ null false "rocm" "cuda" ]`
-, acceleration ? null
+, acceleration ? "rocm"
 }:
 
 let
@@ -104,6 +104,13 @@ let
   };
 
   runtimeLibs = lib.optionals enableRocm [
+    rocmPackages.clr
+    rocmPackages.hipblas
+    rocmPackages.rocblas
+    rocmPackages.rocsolver
+    rocmPackages.rocsparse
+    rocmPackages.rocm-device-libs
+    rocmClang
     rocmPackages.rocm-smi
   ] ++ lib.optionals enableCuda [
     linuxPackages.nvidia_x11
@@ -166,6 +173,7 @@ goBuild ((lib.optionalAttrs enableRocm {
   postPatch = ''
     # replace inaccurate version number with actual release version
     substituteInPlace version/version.go --replace-fail 0.0.0 '${version}'
+    substituteInPlace llm/generate/gen_linux.sh --replace-fail 'exit 1' ""
   '';
   preBuild = ''
     # disable uses of `git`, since nix removes the git directory

Obviously changing the default acceleration to rocm was just so I could run

nix build .#ollama

I don't know if there is a way to enable rocm when running that build command.

Also, the substituteInPlace should really be a patch (or even an upstream bug?)

Basically the problem is that the upstream change here

ollama/ollama#3218

made it so that we need the rocm libraries as runtime dependencies instead of build time, since they aren't fully pulled in until runtime. I think anyway, I haven't completely tested this.

Here is it working (at least to the point of seeing my AMD card):

$ HSA_OVERRIDE_GFX_VERSION="11.0.0" result/bin/ollama serve 
time=2024-05-13T13:58:50.843-04:00 level=INFO source=images.go:828 msg="total blobs: 0"
time=2024-05-13T13:58:50.843-04:00 level=INFO source=images.go:835 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:	export GIN_MODE=release
 - using code:	gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST   /api/pull                 --> github.com/ollama/ollama/server.(*Server).PullModelHandler-fm (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/ollama/ollama/server.(*Server).CreateModelHandler-fm (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/ollama/ollama/server.(*Server).PushModelHandler-fm (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/ollama/ollama/server.(*Server).CopyModelHandler-fm (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/ollama/ollama/server.(*Server).DeleteModelHandler-fm (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/ollama/ollama/server.(*Server).ShowModelHandler-fm (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
[GIN-debug] POST   /v1/chat/completions      --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
[GIN-debug] GET    /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListModelsHandler-fm (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListModelsHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
time=2024-05-13T13:58:50.843-04:00 level=INFO source=routes.go:1071 msg="Listening on 127.0.0.1:11434 (version 0.1.33)"
time=2024-05-13T13:58:50.843-04:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1030226676/runners
time=2024-05-13T13:58:50.987-04:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 rocm]"
time=2024-05-13T13:58:50.987-04:00 level=INFO source=gpu.go:96 msg="Detecting GPUs"
time=2024-05-13T13:58:50.988-04:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-05-13T13:58:50.988-04:00 level=WARN source=amd_linux.go:49 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-05-13T13:58:50.988-04:00 level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=0 total="20464.0 MiB"
time=2024-05-13T13:58:50.988-04:00 level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=0 available="20464.0 MiB"

@abysssol
Copy link
Contributor

@volfyd I tested these changes with 0.1.37, but it doesn't appear to work for me; I'm not sure if I'm doing something wrong. My changes are in this branch. I did test with rocmClang included as well like your diff, that didn't work either.

Ollama does run, but doesn't detect the libraries. This is the relevant section of ollama's debug logs. I also confirmed that only the cpu is used when actually running a model.

time=2024-05-14T06:38:09.748-04:00 level=INFO source=routes.go:1052 msg="Listening on 127.0.0.1:11434 (version 0.1.37)"
time=2024-05-14T06:38:09.753-04:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama902218830/runners
time=2024-05-14T06:38:09.869-04:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 rocm]"
time=2024-05-14T06:38:09.869-04:00 level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-05-14T06:38:09.870-04:00 level=WARN source=amd_linux.go:346 msg="amdgpu detected, but no compatible rocm library found.  Either install rocm v6, or follow manual install instructions at https://github.com/ollama/ollama/blob/main/docs/linux.md#manual-install"
time=2024-05-14T06:38:09.870-04:00 level=WARN source=amd_linux.go:278 msg="unable to verify rocm library, will use cpu" error="no suitable rocm found, falling back to CPU"
time=2024-05-14T06:38:09.870-04:00 level=INFO source=types.go:71 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="62.0 GiB" available="2.6 GiB"

@volfyd
Copy link
Contributor

volfyd commented May 14, 2024

@abysssol Maybe you need HSA_OVERRIDE_GFX_VERSION?

I tried 0.1.37 and it is working for me.

diff --git a/pkgs/tools/misc/ollama/default.nix b/pkgs/tools/misc/ollama/default.nix
index fdda6ba3f1e8..73f9b08c2ca0 100644
--- a/pkgs/tools/misc/ollama/default.nix
+++ b/pkgs/tools/misc/ollama/default.nix
@@ -24,28 +24,28 @@
 
 , config
   # one of `[ null false "rocm" "cuda" ]`
-, acceleration ? null
+, acceleration ? "rocm"
 }:
 
 let
   pname = "ollama";
   # don't forget to invalidate all hashes each update
-  version = "0.1.31";
+  version = "0.1.37";
 
   src = fetchFromGitHub {
     owner = "jmorganca";
     repo = "ollama";
     rev = "v${version}";
-    hash = "sha256-Ip1zrhgGpeYo2zsN206/x+tcG/bmPJAq4zGatqsucaw=";
+    hash = "sha256-ZorOrIOWjXltxqOXNkFJ9190EXTAn+YcjZZhDBJsLqc=";
     fetchSubmodules = true;
   };
-  vendorHash = "sha256-Lj7CBvS51RqF63c01cOCgY7BCQeCKGu794qzb/S80C0=";
+  vendorHash = "sha256-zOQGhNcGNlQppTqZdPfx+y4fUrxH0NOUl38FN8J6ffE=";
   # ollama's patches of llama.cpp's example server
   # `ollama/llm/generate/gen_common.sh` -> "apply temporary patches until fix is upstream"
   # each update, these patches should be synchronized with the contents of `ollama/llm/patches/`
   llamacppPatches = [
     (preparePatch "03-load_exception.diff" "sha256-1DfNahFYYxqlx4E4pwMKQpL+XR0bibYnDFGt6dCL4TM=")
-    (preparePatch "04-locale.diff" "sha256-r5nHiP6yN/rQObRu2FZIPBKpKP9yByyZ6sSI2SKj6Do=")
+    #(preparePatch "04-locale.diff" "sha256-r5nHiP6yN/rQObRu2FZIPBKpKP9yByyZ6sSI2SKj6Do=")
   ];
 
   preparePatch = patch: hash: fetchpatch {
@@ -103,6 +103,13 @@ let
   };
 
   runtimeLibs = lib.optionals enableRocm [
+    rocmPackages.clr
+    rocmPackages.hipblas
+    rocmPackages.rocblas
+    rocmPackages.rocsolver
+    rocmPackages.rocsparse
+    rocmPackages.rocm-device-libs
+    rocmClang
     rocmPackages.rocm-smi
   ] ++ lib.optionals enableCuda [
     linuxPackages.nvidia_x11
@@ -164,9 +171,10 @@ goBuild ((lib.optionalAttrs enableRocm {
   ] ++ llamacppPatches;
   postPatch = ''
     # replace a hardcoded use of `g++` with `$CXX` so clang can be used on darwin
-    substituteInPlace llm/generate/gen_common.sh --replace-fail 'g++' '$CXX'
+    #substituteInPlace llm/generate/gen_common.sh --replace-fail 'g++' '$CXX'
     # replace inaccurate version number with actual release version
     substituteInPlace version/version.go --replace-fail 0.0.0 '${version}'
+    substituteInPlace llm/generate/gen_linux.sh --replace-fail 'exit 1' ""
   '';
   preBuild = ''
     # disable uses of `git`, since nix removes the git directory

So I run the daemon here:

HSA_OVERRIDE_GFX_VERSION="11.0.0" ~/..code/nixpkgs/result/bin/ollama serve

and then in another window

$ ~/..code/nixpkgs/result/bin/ollama --version
ollama version is 0.1.37

~                                                                                                       lhuhn@chlorine
$ ~/..code/nixpkgs/result/bin/ollama run llama3
pulling manifest 
pulling 00e1317cbf74... 100% ▕███████████████████████████████████████████████████████▏ 4.7 GB                         
pulling 4fa551d4f938... 100% ▕███████████████████████████████████████████████████████▏  12 KB                         
pulling 8ab4849b038c... 100% ▕███████████████████████████████████████████████████████▏  254 B                         
pulling 577073ffcc6c... 100% ▕███████████████████████████████████████████████████████▏  110 B                         
pulling ad1518640c43... 100% ▕███████████████████████████████████████████████████████▏  483 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success 
>>> Hello.
Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?

>>> 

and then checking back with the logs in the first window:

time=2024-05-14T13:38:58.973-04:00 level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=11.0.0

...

ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XT, compute capability 11.0, VMM: no
llm_load_tensors: ggml ctx size =    0.30 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:      ROCm0 buffer size =  4155.99 MiB
llm_load_tensors:        CPU buffer size =   281.81 MiB

@volfyd
Copy link
Contributor

volfyd commented May 14, 2024

I made an override. It's really messy because overriding Go builds is really messy.

_: super: {
  ollama = let
    version = "0.1.37";
    hash = "sha256-ZorOrIOWjXltxqOXNkFJ9190EXTAn+YcjZZhDBJsLqc=";
    vendorHash = "sha256-zOQGhNcGNlQppTqZdPfx+y4fUrxH0NOUl38FN8J6ffE=";
    src = super.fetchFromGitHub {
      owner = "jmorganca";
      repo = "ollama";
      rev = "v${version}";
      inherit hash;
      fetchSubmodules = true;
    };
  in
    (super.ollama.overrideAttrs (old: rec {
      inherit version src;
      patches = let
        preparePatch = patch: hash:
          super.fetchpatch {
            url = "file://${src}/llm/patches/${patch}";
            inherit hash;
            stripLen = 1;
            extraPrefix = "llm/llama.cpp/";
          };
      in
        [(builtins.head old.patches)]
        ++ [
          (preparePatch "02-clip-log.diff" "sha256-rMWbl3QgrPlhisTeHwD7EnGRJyOhLB4UeS7rqa0tdXM=")
          (preparePatch "03-load_exception.diff" "sha256-1DfNahFYYxqlx4E4pwMKQpL+XR0bibYnDFGt6dCL4TM=")
          (preparePatch "04-metal.diff" "sha256-Ne8J9R8NndUosSK0qoMvFfKNwqV5xhhce1nSoYrZo7Y=")
          (preparePatch "05-clip-fix.diff" "sha256-rCc3xNuJR11OkyiXuau8y46hb+KYk40ZqH1Llq+lqWc=")
        ];

      postPatch = ''
        # replace inaccurate version number with actual release version
        substituteInPlace version/version.go --replace-fail 0.0.0 '${version}'
        substituteInPlace llm/generate/gen_linux.sh --replace-fail 'exit 1' ""
      '';
    }))
    .override {
      buildGo122Module = args:
        super.buildGo122Module (args
          // {
            inherit version src vendorHash;
            postFixup = let
              runtimeLibs = [
                super.rocmPackages.clr
                super.rocmPackages.hipblas
                super.rocmPackages.rocblas
                super.rocmPackages.rocsolver
                super.rocmPackages.rocsparse
                super.rocmPackages.rocm-device-libs
                (super.linkFarm
                  "rocm-clang"
                  {
                    llvm = super.rocmPackages.llvm.clang;
                  })
                super.rocmPackages.rocm-smi
              ];
              rocmPath = super.buildEnv {
                name = "rocm-path";
                paths = runtimeLibs;
              };
            in ''
              # the app doesn't appear functional at the moment, so hide it
              mv "$out/bin/app" "$out/bin/.ollama-app"
              # expose runtime libraries necessary to use the gpu
              mv "$out/bin/ollama" "$out/bin/.ollama-unwrapped"
              makeWrapper "$out/bin/.ollama-unwrapped" "$out/bin/ollama" --set-default HIP_PATH '${rocmPath}' \
                --suffix LD_LIBRARY_PATH : '/run/opengl-driver/lib:${super.lib.makeLibraryPath runtimeLibs}'
            '';
          });
    };
};

@abysssol
Copy link
Contributor

@volfyd I tried setting HSA_OVERRIDE_GFX_VERSION to 10.3.0, but that seemed to have no effect. For reference, I have an rx 6950 xt, and it's worked correctly with previous versions of ollama.

Have you tested the changes on my branch to see if they work for you? If it doesn't work, then I must have just set something up wrong, but if it does work for you then there must be some strange hardware specific incompatibility.

@volfyd
Copy link
Contributor

volfyd commented May 15, 2024

Your nixpkgs works for me if I use the HSA_OVERRIDE_GFX_VERSION="11.0.0" environment variable. (I omited some output):

$ mkdir ~/..code/abysssol
$ cd ~/..code/abysssol
$ git clone --reference ~/..code/nixpkgs https://github.com/abysssol/nixpkgs.git
$ cd nixpkgs
$ git checkout ollama-update-0.1.37                                             
$ sed -i -e 's/acceleration ? null/acceleration ? "rocm"/' pkgs/tools/misc/ollama/default.nix 
$ nix build .#ollama
$ sudo systemctl stop ollama                                                                  
$ result/bin/ollama serve
...
time=2024-05-15T10:21:12.215-04:00 level=INFO source=amd_linux.go:311 msg="no compatible amdgpu devices detected"
^C
$ HSA_OVERRIDE_GFX_VERSION="11.0.0" result/bin/ollama serve
...
time=2024-05-15T10:21:48.861-04:00 level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=11.0.0
time=2024-05-15T10:21:48.861-04:00 level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx1100 driver=0.0 name=1002:744c total="20.0 GiB" available="20.0 GiB"
^C

So then I got curious, and I removed the change I proposed (add the libraries as runtime dependencies). Note that the diff is relative to your commit, so it is showing me removing the runtime deps.

$ vi pkgs/tools/misc/ollama/default.nix
$ git diff
diff --git a/pkgs/tools/misc/ollama/default.nix b/pkgs/tools/misc/ollama/default.nix
index dbe46f2fc4c6..69fe04da371c 100644
--- a/pkgs/tools/misc/ollama/default.nix
+++ b/pkgs/tools/misc/ollama/default.nix
@@ -24,7 +24,7 @@
 
 , config
   # one of `[ null false "rocm" "cuda" ]`
-, acceleration ? null
+, acceleration ? "rocm"
 }:
 
 let
@@ -107,9 +107,9 @@ let
   };
 
   runtimeLibs = lib.optionals enableRocm
-    (rocmLibs ++ [
+    [
       rocmPackages.rocm-smi
-    ])
+    ]
   ++ lib.optionals enableCuda [
     linuxPackages.nvidia_x11
   ];
$ nix build .#ollama
$ HSA_OVERRIDE_GFX_VERSION="11.0.0" result/bin/ollama serve
...
time=2024-05-15T10:47:09.055-04:00 level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx1100 driver=0.0 name=1002:744c total="20.0 GiB" available="20.0 GiB"
^C

And that still worked! So my theory may be incorrect. I don't know why this is working on my system and not yours. Perhaps I have something different in my NixOS config? I will try again, reverting even more of the change...

@volfyd
Copy link
Contributor

volfyd commented May 15, 2024

OK, so here I am trying it with most of the "fixes" removed (except the build fix related to the exit 1)

I git reset to peel off your commit, and then unstage the bits I don't want, run git restore to get rid of them. Then the diff below is relative to a plain nixpkgs.

$ git reset --soft HEAD~1
$ lazygit
$ git restore .
$ sed -i -e 's/acceleration ? null/acceleration ? "rocm"/' pkgs/tools/misc/ollama/default.nix 
$ git diff HEAD
diff --git a/pkgs/tools/misc/ollama/default.nix b/pkgs/tools/misc/ollama/default.nix
index fdda6ba3f1e8..2cd2792da1f0 100644
--- a/pkgs/tools/misc/ollama/default.nix
+++ b/pkgs/tools/misc/ollama/default.nix
@@ -24,28 +24,30 @@
 
 , config
   # one of `[ null false "rocm" "cuda" ]`
-, acceleration ? null
+, acceleration ? "rocm"
 }:
 
 let
   pname = "ollama";
   # don't forget to invalidate all hashes each update
-  version = "0.1.31";
+  version = "0.1.37";
 
   src = fetchFromGitHub {
     owner = "jmorganca";
     repo = "ollama";
     rev = "v${version}";
-    hash = "sha256-Ip1zrhgGpeYo2zsN206/x+tcG/bmPJAq4zGatqsucaw=";
+    hash = "sha256-ZorOrIOWjXltxqOXNkFJ9190EXTAn+YcjZZhDBJsLqc=";
     fetchSubmodules = true;
   };
-  vendorHash = "sha256-Lj7CBvS51RqF63c01cOCgY7BCQeCKGu794qzb/S80C0=";
+  vendorHash = "sha256-zOQGhNcGNlQppTqZdPfx+y4fUrxH0NOUl38FN8J6ffE=";
   # ollama's patches of llama.cpp's example server
   # `ollama/llm/generate/gen_common.sh` -> "apply temporary patches until fix is upstream"
   # each update, these patches should be synchronized with the contents of `ollama/llm/patches/`
   llamacppPatches = [
+    (preparePatch "02-clip-log.diff" "sha256-rMWbl3QgrPlhisTeHwD7EnGRJyOhLB4UeS7rqa0tdXM=")
     (preparePatch "03-load_exception.diff" "sha256-1DfNahFYYxqlx4E4pwMKQpL+XR0bibYnDFGt6dCL4TM=")
-    (preparePatch "04-locale.diff" "sha256-r5nHiP6yN/rQObRu2FZIPBKpKP9yByyZ6sSI2SKj6Do=")
+    (preparePatch "04-metal.diff" "sha256-Ne8J9R8NndUosSK0qoMvFfKNwqV5xhhce1nSoYrZo7Y=")
+    (preparePatch "05-clip-fix.diff" "sha256-rCc3xNuJR11OkyiXuau8y46hb+KYk40ZqH1Llq+lqWc=")
   ];
 
   preparePatch = patch: hash: fetchpatch {
@@ -161,10 +163,10 @@ goBuild ((lib.optionalAttrs enableRocm {
     # this also disables necessary patches contained in `ollama/llm/patches/`
     # those patches are added to `llamacppPatches`, and reapplied here in the patch phase
     ./disable-git.patch
+    # TODO: add reason
+    ./disable-lib-check.patch
   ] ++ llamacppPatches;
   postPatch = ''
-    # replace a hardcoded use of `g++` with `$CXX` so clang can be used on darwin
-    substituteInPlace llm/generate/gen_common.sh --replace-fail 'g++' '$CXX'
     # replace inaccurate version number with actual release version
     substituteInPlace version/version.go --replace-fail 0.0.0 '${version}'
   '';
diff --git a/pkgs/tools/misc/ollama/disable-lib-check.patch b/pkgs/tools/misc/ollama/disable-lib-check.patch
new file mode 100644
index 000000000000..8ce5fcb04e25
--- /dev/null
+++ b/pkgs/tools/misc/ollama/disable-lib-check.patch
@@ -0,0 +1,10 @@
+--- a/llm/generate/gen_linux.sh
++++ b/llm/generate/gen_linux.sh
+@@ -245,7 +245,6 @@
+     if [ $(cat "${BUILD_DIR}/bin/deps.txt" | wc -l ) -lt 8 ] ; then
+         cat "${BUILD_DIR}/bin/deps.txt"
+         echo "ERROR: deps file short"
+-        exit 1
+     fi
+     compress
+ fi
$ nix build .#ollama
$ HSA_OVERRIDE_GFX_VERSION="11.0.0" result/bin/ollama serve
...
time=2024-05-15T12:10:24.603-04:00 level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx1100 driver=0.0 name=1002:744c total="20.0 GiB" available="20.0 GiB"
^C

And you can see that it still works! I should probably take ollama out of my NixOS config and see if that makes a difference.

@volfyd
Copy link
Contributor

volfyd commented May 15, 2024

So in my NixOS config, I have

hardware.opengl.extraPackages = with pkgs; [
  rocmPackages.clr.icd #following for GPU AI acceleration
  rocmPackages.rocm-smi
  rocmPackages.clr
  rocmPackages.hipblas
  rocmPackages.rocblas
  rocmPackages.rocsolver
  rocmPackages.rocm-comgr
  rocmPackages.rocm-runtime
  rocmPackages.rocsparse

  # unnecessary for OLLAMA ?
  amdvlk
  rocm-opencl-icd #gaming?
  rocm-opencl-runtime #gaming?
  libva #some hardware acceleration for stuff like OBS
  vaapiVdpau
  libvdpau-va-gl
];

If I remove that, I get

time=2024-05-15T14:48:47.092-04:00 level=WARN source=amd_linux.go:278 msg="unable to verify rocm library, will use cpu" error="no suitable rocm found, falling back to CPU"

Higher up @abysssol you said

I did test what happens if I disable the check and run the compiled ollama binary anyway, but it just crashes almost instantly after running it.

I am not sure if "crash" means it exits with an error or without, but I have never seen it crash really. It either works if I run it with the environment variable and the necessary hardware.opengl.extraPackages or it falls back to CPU without crashing.

@abysssol
Copy link
Contributor

I am not sure if "crash" means it exits with an error or without, but I have never seen it crash really. It either works if I run it with the environment variable and the necessary hardware.opengl.extraPackages or it falls back to CPU without crashing.

I seem to remember testing it a few weeks ago and it crashing without any output/logs/error, somewhat like a segfault or linker error (from unavailable libs). However, my current testing has never had this outcome either, so I'm not sure if it's a false memory, or if I confused crashing as an outcome of a different experiment as being the result of removing the lib check, or if a different ollama version had different behavior in this regard. You can probably safely ignore that comment, as it seems to be irrelevant/outdated/false. My apologies for the confusion.

I tried adding the packages from your hardware.opengl.extraPackages into both the ollama package's runtimeLibs and into my hardware.opengl.extraPackages, and I tried this both with and without HSA_OVERRIDE_GFX_VERSION set. Unfortunately, I've still been unable to get ollama to use rocm. It seems that my ability to contribute to figuring this out will be greatly reduced as ollama seems simply unwilling to run rocm on my machine.

However, since this seems to work right for you it may work right for others, so it's probably worth putting these changes in whatever version ends up getting into unstable to at least allow correct functionality for some users.

Could you try removing everything from your hardware.opengl.extraPackages and adding that into the ollama package's runtimeLibs to see if that correctly exposes those libraries to ollama?

   runtimeLibs = lib.optionals enableRocm
     (rocmLibs ++ [
       rocmPackages.rocm-smi
+      rocmPackages.clr.icd
+      rocmPackages.rocm-comgr
+      rocmPackages.rocm-runtime
+      pkgs.rocm-opencl-icd
+      pkgs.rocm-opencl-runtime
+      pkgs.libva
+      pkgs.vaapiVdpau
+      pkgs.libvdpau-va-gl
+      pkgs.amdvlk
     ])
   ++ lib.optionals enableCuda [
     linuxPackages.nvidia_x11
   ];

If that works, then try removing libraries to find the minimum necessary libraries for ollama to function correctly. That way someone can use ollama without having to first add anything into their hardware.opengl.extraPackages, as the package will be self-contained.

Thank you for all your help with this.

@volfyd
Copy link
Contributor

volfyd commented May 16, 2024

Here are the actual packages I need in my NixOS

  # With these packages in NixOS config, ollama 0.1.37 works.
  hardware.opengl.extraPackages = with pkgs; [
    rocmPackages.hipblas
    rocmPackages.rocblas
];

Putting them into the nixpkgs definition instead doesn't work

  # This incorrectly falls back to CPU
  runtimeLibs = lib.optionals enableRocm [
    rocmPackages.rocm-smi
    rocmPackages.hipblas
    rocmPackages.rocblas
  ] ++ lib.optionals enableCuda [
    linuxPackages.nvidia_x11
  ];
nix build .#ollama
HSA_OVERRIDE_GFX_VERSION="11.0.0" result/bin/ollama serve
...
time=2024-05-16T15:23:57.751-04:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 rocm cpu cpu_avx]"
time=2024-05-16T15:23:57.751-04:00 level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-05-16T15:23:57.752-04:00 level=WARN source=amd_linux.go:346 msg="amdgpu detected, but no compatible rocm library found.  Either install rocm v6, or follow manual install instructions at https://github.com/ollama/ollama/blob/main/docs/linux.md#manual-install"
time=2024-05-16T15:23:57.752-04:00 level=WARN source=amd_linux.go:278 msg="unable to verify rocm library, will use cpu" error="no suitable rocm found, falling back to CPU"

I don't have an explanation

@volfyd
Copy link
Contributor

volfyd commented May 16, 2024

I have an explanation!

ollama looks for rocmhipblas and rocmlocblas and requires them to be in the same directory. They can be in the same directory if they are installed by the NixOS system, but otherwise they will each be in their own directory.

https://github.com/ollama/ollama/blob/5bece945090b94a3f1eab03be48fb6f6b25e1e79/gpu/amd_linux.go#L33

	ROCmLibGlobs          = []string{"libhipblas.so.2*", "rocblas"} // TODO - probably include more coverage of files here...

https://github.com/ollama/ollama/blob/main/gpu/amd_common.go#L14

// Determine if the given ROCm lib directory is usable by checking for existence of some glob patterns
func rocmLibUsable(libDir string) bool {
	slog.Debug("evaluating potential rocm lib dir " + libDir)
	for _, g := range ROCmLibGlobs {
		res, _ := filepath.Glob(filepath.Join(libDir, g))
		if len(res) == 0 {
			return false
		}
	}
	return true
}

@volfyd
Copy link
Contributor

volfyd commented May 17, 2024

One thing to note. If I make the check (the check for hipblas and rocblas in the same directory) less strict, to only require that rocblas is found, then ollama appears to work with the graphics card.

diff --git a/pkgs/tools/misc/ollama/default.nix b/pkgs/tools/misc/ollama/default.nix
index 861f0901ea88..66aef69cd10e 100644
--- a/pkgs/tools/misc/ollama/default.nix
+++ b/pkgs/tools/misc/ollama/default.nix
@@ -24,7 +24,7 @@
 
 , config
   # one of `[ null false "rocm" "cuda" ]`
-, acceleration ? null
+, acceleration ? "rocm"
 }:
 
 let
@@ -106,24 +106,8 @@ let
 
   runtimeLibs = lib.optionals enableRocm [
     rocmPackages.rocm-smi
-    rocmPackages.clr.icd #following for GPU AI acceleration
-    rocmPackages.clr
     rocmPackages.hipblas
     rocmPackages.rocblas
-
-    rocmPackages.rocsolver
-    rocmPackages.rocm-comgr
-    rocmPackages.rocm-runtime
-    rocmPackages.rocsparse
-
-    # unnecessary for OLLAMA ?
-    pkgs.amdvlk
-    pkgs.rocm-opencl-icd #gaming?
-    pkgs.rocm-opencl-runtime #gaming?
-    pkgs.libva #some hardware acceleration for stuff like OBS
-    pkgs.vaapiVdpau
-    pkgs.libvdpau-va-gl
-
   ] ++ lib.optionals enableCuda [
     linuxPackages.nvidia_x11
   ];
@@ -187,6 +171,7 @@ goBuild ((lib.optionalAttrs enableRocm {
   postPatch = ''
     # replace inaccurate version number with actual release version
     substituteInPlace version/version.go --replace-fail 0.0.0 '${version}'
+    substituteInPlace gpu/amd_linux.go --replace-fail '"libhipblas.so.2*", "rocblas"' '"rocblas"'
   '';
   preBuild = ''
     # disable uses of `git`, since nix removes the git directory

ollama does not appear to explicitly load hipblas at runtime (but it does use it at build time) so I think the check in the code is just overly strict.

@abysssol abysssol mentioned this pull request May 18, 2024
13 tasks
@abysssol
Copy link
Contributor

I've finally managed to figure out how to get the rocm build to work for me, all thanks to you @volfyd!
I couldn't have done it without your help; I was stuck, and had entirely given up on getting ollama to work right with rocm.

I've created pr #312608 for further testing and discussion. If all goes well (rocm works correctly for others and no further issues are found), I want to merge it as soon as the next stable nixos version is released.

@nsbuitrago nsbuitrago closed this May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants