Skip to content

Problems when launching from other programs like llama-swap #49

@links486

Description

@links486

The GGUF and Metal files are loaded relative to the current working directory, not relative to the ds4-server binary. So llama-swap and other external programs can't just easily execute the binary directly:

username@MacStudio ~ % /Users/username/git/ds4/ds4-server --port 5555 --ctx 262144 --kv-disk-dir /tmp/ds4-kv --kv-disk-space-mb 32768
ds4: cannot open model 'ds4flash.gguf': No such file or directory

Alright, let's give it the full path to the model:

username@MacStudio ~ % /Users/username/git/ds4/ds4-server -m /Users/username/git/ds4/gguf/DeepSeek-V4-Flash-Q4KExperts-F16HC-F16Compressor-F16Indexer-Q8Attn-Q8Shared-Q8Out-chat-v2.gguf --port 5555 --ctx 262144 --kv-disk-dir /tmp/ds4-kv --kv-disk-space-mb 32768
ds4: Metal source metal/flash_attn.metal not found (set DS4_METAL_FLASH_ATTN_SOURCE to override)
ds4: Metal backend unavailable; aborting startup

A simple workaround is to create a shell script and launch that instead:

#!/bin/sh
set -e
cd /Users/username/git/ds4
exec ./ds4-server --port 5555 --ctx 262144 --kv-disk-dir /tmp/ds4-kv --kv-disk-space-mb 32768

llama-swap also attempts to perform a health check (requesting /health by default). If that endpoint doesn't respond, requests are queued indefinitely waiting for a successful health check that will never arrive. So this could be a good candidate for a future endpoint if other software checks /health too.

But this is also easy to work around by simply giving it one of the other endpoints instead. Here's a working config for llama-swap:

models:
  "deepseek-v4-flash":
    cmd: /Users/username/git/ds4/start-ds4.sh
    proxy: http://127.0.0.1:5555
    checkEndpoint: /v1/models

This project is amazing. Thank you to all contributors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions