Skip to content

sd: sync to master-593-3d6064b#2175

Open
wbruna wants to merge 4 commits intoLostRuins:concedo_experimentalfrom
wbruna:kcpp_sd_update_202604_4
Open

sd: sync to master-593-3d6064b#2175
wbruna wants to merge 4 commits intoLostRuins:concedo_experimentalfrom
wbruna:kcpp_sd_update_202604_4

Conversation

@wbruna
Copy link
Copy Markdown

@wbruna wbruna commented Apr 30, 2026

master-592-b8079e2 removes all build-time backend dependencies on sd.cpp sources.

The backend change is very incompatible with many ggml includes: I've had to remove util.h from sdtypes_adapter.cpp because it triggered weird errors on llama.h. A better solution could be splitting util.h into two headers, but I didn't want to change stuff outside otherarch/sdcpp in the middle of this PR. I've also had to adapt our main_gpu support, but didn't test it very well.

On the other hand, since sd.cpp became backend-agnostic, sdtypes_adapter.cpp now only needs to be built once for all backends. Both Vulkan and ROCm seem to be running fine with the change.

After #2155 . Edit: moving the discussion here to make it easier to follow.

wbruna added 2 commits May 1, 2026 07:15
Since master-592-b8079e2, no sd.cpp source depends on the ggml
backend build anymore.
@wbruna wbruna force-pushed the kcpp_sd_update_202604_4 branch from c1de647 to 9f2f7a4 Compare May 1, 2026 10:27
@wbruna wbruna marked this pull request as ready for review May 1, 2026 10:28
@wbruna
Copy link
Copy Markdown
Author

wbruna commented May 1, 2026

Copied from #2155:

oof, the backend change does not look very pleasant, even though I can see why you added it.

How exactly is backend determination supposed to be done now?

We first pick the device name with this:

__STATIC_INLINE__ std::string get_default_backend_name() {
    ggml_backend_load_all_once();
    // should pick the same backend as ggml_backend_init_best
    ggml_backend_dev_t dev = ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_GPU);
    dev                    = dev ? dev : ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_IGPU);
    dev                    = dev ? dev : ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_CPU);
    if (dev == nullptr) {
        return "";
    }
    return ggml_backend_dev_name(dev);
}

then initialize it by name.

because currently when selecting a desired backend (via compile flags for that target), the selected backend is expected to be used for everything (e.g. If i pick CPU, even though I might have cuda or vulkan, I expect to use CPU for TTS, LLM and image gen).

It still works like this: only two types of backend will be exposed by ggml at a time (Vulkan and CPU, ROCm and CPU, etc), so it'll prefer the first discrete GPU, then the first iGPU; and use it for everything (the flags --x-on-cpu still work the same way as before).

I do not want automatic backend determination from C++ side, kcpp backend selection should always be explicit (which means it must use pure CPU, or pure vulkan with a specified device if the user chooses it, even though I launched the cuda build with CUDA available and supported). I think since ggml is built with the expected flags this should be doable but I'm very unclear on how this auto works right now.

In the sense of, say, Vulkan0 versus Vulkan1, I think the selection has changed: I have a lot of scripts to convince everybody that my dGPU is at a specific slot (it often switches places with the iGPU at boot 😕), and sd.cpp now seems to correctly pick the dGPU regardless of them. But note: even if the previous algorithm was just "pick the first device", technically it was already the C++ layer choosing it (unless forced with main_gpu).

We will likely get more sophisticated selection soon (like "Vulkan1 for the diffusion, Vulkan0 for the VAE, CPU for the conditioner"), but we are not there yet: it's still the same backend for everything, except for --x-on-cpu.

I made a hackish adaptation for main_gpu to keep the interface working as-is, but I believe a more future-proof approach would be:

  • we add a function on sdtype_adapter that publishes to the Python layer the list of device names, together with useful metadata (e.g. if it is a dGPU, if it would be the C++ choice, etc);
  • we change the model loading interface to receive names for the diffusion model / clip / vae device.
  • sdtype_adapter passes the main device name directly to the default device selection at util.cpp (like the main_gpuadaptation I did).

This way, if the Python layer decides it wants to force a device, let C++ pick whatever, or even support fancy aliases like "iGPU", it can. In fact, we could even add support for placing models on different devices before upstream: the diff would be pretty simple at this point.

@wbruna
Copy link
Copy Markdown
Author

wbruna commented May 1, 2026

@LostRuins , I've just noticed this:

    bool load_model(const load_model_inputs inputs)
    {
        (...)
        std::string vulkan_info_raw = inputs.vulkan_info;
        std::string vulkan_info_str = "";
        for (size_t i = 0; i < vulkan_info_raw.length(); ++i) {
            vulkan_info_str += vulkan_info_raw[i];
            if (i < vulkan_info_raw.length() - 1) {
                vulkan_info_str += ",";
            }
        }
        const char* existingenv = getenv("GGML_VK_VISIBLE_DEVICES");
        if(!existingenv && vulkan_info_str!="")
        {
            vulkandeviceenv = "GGML_VK_VISIBLE_DEVICES="+vulkan_info_str;
            putenv((char*)vulkandeviceenv.c_str());
        }

GGML_VK_* env vars shouldn't be set here: ggml's behavior for image, TTS, etc. could change depending on the presence of a text model. Maybe it's better to move it to the Python layer, or to a general 'init' function?

@wbruna
Copy link
Copy Markdown
Author

wbruna commented May 1, 2026

And as far as I can tell, apart from main_gpu and global env vars, we currently don't have device selection for image gen: it's left to the upstream code to decide.

I've added device reporting to the sdtype_get_info function right before realizing I won't need it 😅 Selecting by number is actually easier. I'll keep the info support for now, just in case we want e.g. debug logging.

My plan is changing the "on_gpu" booleans at the C++ interface to receive the device index instead. So we'd have integers clip_gpu and vae_gpu, with:

  • -2: CPU (default for clip_gpu)
  • -1: for clip_gpu and vae_gpu, same as main_gpu (default for vae_gpu); for main_gpu, means "let sd.cpp choose";
  • >= 0: use that GPU number

I believe this will be enough for any behavior we need:

  • to avoid having the C++ layer choose the backend, the Python layer can simply set main_gpu;
  • things like "main SD device is CPU, with clip on GPU 1, and VAE on GPU 0" will already be supported at the C++ level (of course, we'd still need launcher and command-line support).

@LostRuins
Copy link
Copy Markdown
Owner

If you search GGML_VK_VISIBLE_DEVICES in koboldcpp you'll see it initialized in every single load function - because it's not known which modules (e.g. tts + video gen, or whatever) the user might wish to load, so I duplicated it. We could probably deduplicate it and have a unified pre-loader maybe...

@wbruna wbruna force-pushed the kcpp_sd_update_202604_4 branch from 21e2cdf to 1d5c5da Compare May 1, 2026 16:20
@wbruna wbruna force-pushed the kcpp_sd_update_202604_4 branch from 1d5c5da to c002d36 Compare May 1, 2026 17:29
@wbruna
Copy link
Copy Markdown
Author

wbruna commented May 1, 2026

I've simplified the main_gpu handling. Should be good to go already; I can add the multi-device support afterwards.

About the GGML_VK_VISIBLE_DEVICES handling: any reason to keep doing it at the C++ level? I saw CUDA_VISIBLE_DEVICES and HIP_VISIBLE_DEVICES being set on on koboldcpp.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants