ollama: 0.0.17 -> 0.1.7 #257760

elohmeier · 2023-09-28T03:50:35Z

Description of changes

Updated the package and added llama-cpp as a required dependency. Also added support for Metal, CUDA, ROCm hw acceleration.

Things done

IogaMaster · 2023-10-02T01:01:56Z

We need to allow to be built with gpu support before merge

zopieux · 2023-10-03T18:10:18Z

Note 0.1.1 is already available.

elohmeier · 2023-10-12T07:13:46Z

Thanks, I've updated to 0.1.1 and also added CUDA/ROCm support (based on https://github.com/ggerganov/llama.cpp/blob/master/flake.nix) by integrating llama-cpp as a package. I don't have hardware to test CUDA and ROCm locally, maybe someone could try that out?

benneti · 2023-10-12T14:39:40Z

Thank you for the effort, is there a reason that there is no opencl option (I think it should be supported and might be easier to setup on some devices than rocm and cuda), see
ollama/ollama#259
and https://github.com/ggerganov/llama.cpp
(or is it the default and I misunderstood something in the nixpkg as I dont see clblast or the likes).

Additionally I tested the the compiled binary with and without rocmSupport(ollama.override { llama-cpp = (llama-cpp.override {rocmSupport = true;}); })
and while both seem to build and start the server, when running a model I get "Error: failed to start a llama runner" and in the serve logs I see

2023/10/12 15:33:49 llama.go:316: starting llama runner
2023/10/12 15:33:49 llama.go:352: waiting for llama runner to start responding
{"timestamp":1697117629,"level":"WARNING","function":"server_params_parse","line":887,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1}
error: unknown argument: --gqa
usage: /nix/store/srlrbyrfs3kjfiaacnv2nfni7yjf8vcb-llama-cpp-2023-10-11/bin/llama-cpp-server [options]

options:
  -h, --help            show this help message and exit
  -v, --verbose         verbose output (default: disabled)
  -t N, --threads N     number of threads to use during computation (default: 8)
  -c N, --ctx-size N    size of the prompt context (default: 512)
  --rope-freq-base N    RoPE base frequency (default: loaded from model)
  --rope-freq-scale N   RoPE frequency scaling factor (default: loaded from model)
  -b N, --batch-size N  batch size for prompt processing (default: 512)
  --memory-f32          use f32 instead of f16 for memory key+value (default: disabled)
                        not recommended: doubles context memory required and no measurable increase in quality
  --mlock               force system to keep model in RAM rather than swapping or compressing
  --no-mmap             do not memory-map model (slower load but may reduce pageouts if not using mlock)
  --numa                attempt optimizations that help on some NUMA systems
  -m FNAME, --model FNAME
                        model path (default: models/7B/ggml-model-f16.gguf)
  -a ALIAS, --alias ALIAS
                        set an alias for the model, will be added as `model` field in completion response
  --lora FNAME          apply LoRA adapter (implies --no-mmap)
  --lora-base FNAME     optional model to use as a base for the layers modified by the LoRA adapter
  --host                ip address to listen (default  (default: 127.0.0.1)
  --port PORT           port to listen (default  (default: 8080)
  --path PUBLIC_PATH    path from which to serve static files (default examples/server/public)
  -to N, --timeout N    server read/write timeout in seconds (default: 600)
  --embedding           enable embedding vector output (default: disabled)

2023/10/12 15:33:49 llama.go:326: llama runner exited with error: exit status 1
2023/10/12 15:33:50 llama.go:333: error starting llama runner: llama runner process has terminated
[GIN] 2023/10/12 - 15:33:50 | 500 |  200.926851ms |       127.0.0.1 | POST     "/api/generate"

With the version in nixpkgs/master it works (though without the option for gpu support)

nixos/modules/services/torrent/transmission.nix

elohmeier · 2023-10-14T05:11:54Z

Rebased & updated ollama to 0.1.3.

elohmeier · 2023-10-14T05:19:56Z

@benneti Thanks for testing. I'll look into that issue reg. ROCm. I've also added OpenCL support, could you test that as well?

elohmeier · 2023-10-14T05:34:52Z

I've patched the passing of the deprecated --gqa flag, which seems to be only needed for GGML format models. Could you test that again?

benneti · 2023-10-14T13:47:08Z

As I am unable (I run out of space while compiling rocblas and am too lazy to change my tmpfs setup for this) even with the fixes necessary (see code comment), I only tried the standard version and opencl support.
Both work and seem to be much more performant than the previous version, thanks for the effort again.
For others that quickly want to check whether it works, only GGUF models are supported in this version and they are not clearly labeled, for me a model that worked was ollama run zephyr (see ollama/ollama#738 (comment)).

pkgs/by-name/ll/llama-cpp/package.nix

a-kenji · 2023-10-30T07:16:35Z

I can also confirm that it now works as intended,
thanks for keeping at it!

elohmeier · 2023-11-02T08:32:56Z

The issue was fixed upstream, so I updated & removed the patch.

pbsds · 2023-11-03T11:04:41Z

I read through the change logs and it seems there are only additions and no major changes, and i don't consider this a breaking change for the 23.11 release. but the aarch64 darwin ofborg builder isn't doing too good, i think it could use some friends

markuskowa · 2023-11-03T11:20:56Z

I read through the change logs and it seems there are only additions and no major changes, and i don't consider this a breaking change for the 23.11 release. but the aarch64 darwin ofborg builder isn't doing too good, i think it could use some friends

Version 0.0.17 is broken with the current models. This update puts it back in a working state.
Is there anyone who can confirm this PR builds with aarch64-darwin?

elohmeier · 2023-11-03T11:53:40Z

I can confirm that, actively using it on aarch64-darwin w/ Metal support.

geekodour · 2023-11-06T12:16:03Z

I installed ollama from unstable and I am getting:

{"timestamp":1699272675,"level":"WARNING","function":"server_params_parse","line":1994,"message":"Not compiled with GPU offload support, --n-
gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1}

Related upstream issues:

I am not sure, what's the issue. I have 11.8 toolkit running. Building ollama directly also doesn't seem to pickup the includes. I think this (ollama/ollama#958) should finally fix it?

geekodour · 2023-11-06T14:34:14Z

The override looks like this:

ollamagpu = pkgs.unstable.ollama.override { llama-cpp = (pkgs.unstable.llama-cpp.override {cudaSupport = true;}); };

But it seems like compiling llma-cpp fails with issues mentioned here: ggerganov/llama.cpp#2481

EDIT: following worked

ollamagpu = pkgs.unstable.ollama.override { llama-cpp = (pkgs.unstable.llama-cpp.override {cudaSupport = true; openblasSupport = false; }); };

pbsds · 2023-11-06T15:21:26Z

Should we provide ollameWithNvidia and ollamaWithRocm in all packages like torch does?

markuskowa · 2023-11-06T15:59:31Z

Should we provide ollameWithNvidia and ollamaWithRocm in all packages like torch does?

I would avoid to create *withNvidia|Rocm attributes. A better way to turn on CUDA support is to set cudaSupport = true in nixpkgs config, which consistently enables CUDA across nixpkgs (a similar thing will probably be established soon for Rocm). Note, that Hydra would not build CUDA packages anyway since CUDA itself is marked as unfree (i.e. you have to build it locally anyway).

teto · 2023-12-03T19:34:30Z

Just wanted to thank @geekodour for his fix #257760 (comment) . Worked for me as well.

dltacube · 2023-12-26T19:04:26Z

Does this still work for you guys? I'm getting the following error:

error: llama-cpp: exactly one of openclSupport, openblasSupport and rocmSupport should be enabled

And removing the disable cublas option then complains about running an unsupported compiler version.

teto · 2023-12-26T19:49:06Z

Does this still work for you guys? I'm getting the following error:

error: llama-cpp: exactly one of openclSupport, openblasSupport and rocmSupport should be enabled

And removing the disable cublas option then complains about running an unsupported compiler version.

It stopped working for me so I removed the check altogether. Not sure what's the proper fix.

dltacube · 2023-12-26T20:34:11Z

Does this still work for you guys? I'm getting the following error:

error: llama-cpp: exactly one of openclSupport, openblasSupport and rocmSupport should be enabled

And removing the disable cublas option then complains about running an unsupported compiler version.

It stopped working for me so I removed the check altogether. Not sure what's the proper fix.

So you're just using the CPU right? I removed the override and it installs just fine but holy hell is it slow lol. Must be some new configuration for llama-cpp I'm going to lookup if I have time later.

dltacube · 2023-12-28T06:00:59Z

I don't have a proper fix but ran into the following errors:

complained about it using gcc 12 rather than version 11
passing -DALLOW_UNSUPPORTED_COMPILER=ON to gcc had no impact on nvcc
overriding nativeBuildInputs to use super.gcc11 yielded other errors
complained about git not being found, so I added that in an overlay
proceeds to run the git command except it's not in any git folder so it has no effect
here's what I'm left with and it works for some reason w/ gpu usage (checked w/ nvtop)

    (self: super: {
      llama-cpp = super.llama-cpp.overrideAttrs (oldAttrs: {
        nativeBuildInputs = oldAttrs.nativeBuildInputs ++ [ super.git ];
      });
    })
    (self: super: {
      llama-cpp = super.llama-cpp.override (args: {
        openclSupport = true;
        openblasSupport = false;
      });
    })

I don't think that first overlay is doing much of anything.

Globally, cuda support is not turned on. Just thought I'd leave this here in case someone else stumbles on this issue. I'm also not using unstable.

happysalada · 2023-12-30T11:51:06Z

to try to address this, I made #277709
let me know if of course, I missed something.

Green-Sky · 2023-12-30T13:28:32Z

ggerganov/llama.cpp@68eccbd

there has be a rewrite of the upstream nix files. It included a stdenv fix for cuda versions.

(edit: oops, actually wanted to reply in #277709 , but here works too)

happysalada · 2023-12-30T17:34:08Z

Thank you! SomeoneSerge nicely commented on what had to change, I think all is incorporated now. Let me know if you notice something I missed.

elohmeier mentioned this pull request Sep 28, 2023

ollama: 0.0.17 -> 0.1.0 #257648

Closed

12 tasks

ofborg bot requested a review from dit7ya September 28, 2023 04:11

ofborg bot added 10.rebuild-darwin: 1-10 10.rebuild-darwin: 1 10.rebuild-linux: 1-10 10.rebuild-linux: 1 labels Sep 28, 2023

elohmeier force-pushed the ollama branch from 9a1f8d1 to f198365 Compare October 12, 2023 07:11

github-actions bot added 6.topic: python 6.topic: nixos 8.has: module (update) labels Oct 12, 2023

elohmeier changed the title ~~ollama: 0.0.17 -> 0.1.0~~ ollama: 0.0.17 -> 0.1.1 Oct 12, 2023

elohmeier force-pushed the ollama branch from f198365 to cde058f Compare October 12, 2023 09:51

zopieux reviewed Oct 12, 2023

View reviewed changes

nixos/modules/services/torrent/transmission.nix Outdated Show resolved Hide resolved

elohmeier force-pushed the ollama branch from cde058f to 740bce6 Compare October 14, 2023 05:06

github-actions bot removed 6.topic: python 6.topic: nixos 8.has: module (update) labels Oct 14, 2023

elohmeier force-pushed the ollama branch from 740bce6 to 4440700 Compare October 14, 2023 05:11

elohmeier changed the title ~~ollama: 0.0.17 -> 0.1.1~~ ollama: 0.0.17 -> 0.1.3 Oct 14, 2023

elohmeier force-pushed the ollama branch from 4440700 to 7c77484 Compare October 14, 2023 05:16

elohmeier force-pushed the ollama branch from 7c77484 to 5a65a6f Compare October 14, 2023 05:32

zopieux reviewed Oct 14, 2023

View reviewed changes

pkgs/by-name/ll/llama-cpp/package.nix Outdated Show resolved Hide resolved

benneti approved these changes Oct 30, 2023

View reviewed changes

pkgs/by-name/ll/llama-cpp/package.nix Outdated Show resolved Hide resolved

elohmeier added 2 commits November 2, 2023 09:31

llama-cpp: init at 1469

ff77d3c

ollama: 0.0.17 -> 0.1.7, use llama-cpp

ea3a7c5

elohmeier force-pushed the ollama branch from 899b5da to ea3a7c5 Compare November 2, 2023 08:32

markuskowa approved these changes Nov 2, 2023

View reviewed changes

DrRuhe mentioned this pull request Nov 3, 2023

nix flake ollama/ollama#973

Open

markuskowa merged commit 47ccb89 into NixOS:master Nov 3, 2023
21 of 22 checks passed

elohmeier deleted the ollama branch November 3, 2023 14:58

geekodour mentioned this pull request Nov 7, 2023

Packaging ollama + cuda for Arch Linux ollama/ollama#1024

Closed

natsukium mentioned this pull request Nov 11, 2023

Update request: ollama 0.0.17 → 0.0.19 (added GPU support) #255934

Closed

1 task

elohmeier mentioned this pull request Nov 20, 2023

Package request: llama.cpp #224921

Closed

blurgyy mentioned this pull request Dec 7, 2023

Build failure: llama-cpp #272569

Closed

elohmeier mentioned this pull request Feb 15, 2024

ollama: 0.1.17 -> 0.1.24 #289108

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ollama: 0.0.17 -> 0.1.7 #257760

ollama: 0.0.17 -> 0.1.7 #257760

elohmeier commented Sep 28, 2023 •

edited

IogaMaster commented Oct 2, 2023

zopieux commented Oct 3, 2023

elohmeier commented Oct 12, 2023

benneti commented Oct 12, 2023 •

edited

elohmeier commented Oct 14, 2023

elohmeier commented Oct 14, 2023

elohmeier commented Oct 14, 2023

benneti commented Oct 14, 2023

a-kenji commented Oct 30, 2023

elohmeier commented Nov 2, 2023

pbsds commented Nov 3, 2023

markuskowa commented Nov 3, 2023

elohmeier commented Nov 3, 2023

geekodour commented Nov 6, 2023

geekodour commented Nov 6, 2023 •

edited

pbsds commented Nov 6, 2023 •

edited

markuskowa commented Nov 6, 2023

teto commented Dec 3, 2023

dltacube commented Dec 26, 2023

teto commented Dec 26, 2023

dltacube commented Dec 26, 2023

dltacube commented Dec 28, 2023 •

edited

happysalada commented Dec 30, 2023

Green-Sky commented Dec 30, 2023 •

edited

happysalada commented Dec 30, 2023

ollama: 0.0.17 -> 0.1.7 #257760

ollama: 0.0.17 -> 0.1.7 #257760

Conversation

elohmeier commented Sep 28, 2023 • edited

Description of changes

Things done

IogaMaster commented Oct 2, 2023

zopieux commented Oct 3, 2023

elohmeier commented Oct 12, 2023

benneti commented Oct 12, 2023 • edited

elohmeier commented Oct 14, 2023

elohmeier commented Oct 14, 2023

elohmeier commented Oct 14, 2023

benneti commented Oct 14, 2023

a-kenji commented Oct 30, 2023

elohmeier commented Nov 2, 2023

pbsds commented Nov 3, 2023

markuskowa commented Nov 3, 2023

elohmeier commented Nov 3, 2023

geekodour commented Nov 6, 2023

geekodour commented Nov 6, 2023 • edited

pbsds commented Nov 6, 2023 • edited

markuskowa commented Nov 6, 2023

teto commented Dec 3, 2023

dltacube commented Dec 26, 2023

teto commented Dec 26, 2023

dltacube commented Dec 26, 2023

dltacube commented Dec 28, 2023 • edited

happysalada commented Dec 30, 2023

Green-Sky commented Dec 30, 2023 • edited

happysalada commented Dec 30, 2023

elohmeier commented Sep 28, 2023 •

edited

benneti commented Oct 12, 2023 •

edited

geekodour commented Nov 6, 2023 •

edited

pbsds commented Nov 6, 2023 •

edited

dltacube commented Dec 28, 2023 •

edited

Green-Sky commented Dec 30, 2023 •

edited