sd: sync to master-593-3d6064b#2175
sd: sync to master-593-3d6064b#2175wbruna wants to merge 4 commits intoLostRuins:concedo_experimentalfrom
Conversation
Since master-592-b8079e2, no sd.cpp source depends on the ggml backend build anymore.
c1de647 to
9f2f7a4
Compare
|
Copied from #2155:
We first pick the device name with this: __STATIC_INLINE__ std::string get_default_backend_name() {
ggml_backend_load_all_once();
// should pick the same backend as ggml_backend_init_best
ggml_backend_dev_t dev = ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_GPU);
dev = dev ? dev : ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_IGPU);
dev = dev ? dev : ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_CPU);
if (dev == nullptr) {
return "";
}
return ggml_backend_dev_name(dev);
}then initialize it by name.
It still works like this: only two types of backend will be exposed by ggml at a time (Vulkan and CPU, ROCm and CPU, etc), so it'll prefer the first discrete GPU, then the first iGPU; and use it for everything (the flags --x-on-cpu still work the same way as before).
In the sense of, say, Vulkan0 versus Vulkan1, I think the selection has changed: I have a lot of scripts to convince everybody that my dGPU is at a specific slot (it often switches places with the iGPU at boot 😕), and sd.cpp now seems to correctly pick the dGPU regardless of them. But note: even if the previous algorithm was just "pick the first device", technically it was already the C++ layer choosing it (unless forced with We will likely get more sophisticated selection soon (like "Vulkan1 for the diffusion, Vulkan0 for the VAE, CPU for the conditioner"), but we are not there yet: it's still the same backend for everything, except for --x-on-cpu. I made a hackish adaptation for
This way, if the Python layer decides it wants to force a device, let C++ pick whatever, or even support fancy aliases like "iGPU", it can. In fact, we could even add support for placing models on different devices before upstream: the diff would be pretty simple at this point. |
|
@LostRuins , I've just noticed this: bool load_model(const load_model_inputs inputs)
{
(...)
std::string vulkan_info_raw = inputs.vulkan_info;
std::string vulkan_info_str = "";
for (size_t i = 0; i < vulkan_info_raw.length(); ++i) {
vulkan_info_str += vulkan_info_raw[i];
if (i < vulkan_info_raw.length() - 1) {
vulkan_info_str += ",";
}
}
const char* existingenv = getenv("GGML_VK_VISIBLE_DEVICES");
if(!existingenv && vulkan_info_str!="")
{
vulkandeviceenv = "GGML_VK_VISIBLE_DEVICES="+vulkan_info_str;
putenv((char*)vulkandeviceenv.c_str());
}
|
|
And as far as I can tell, apart from I've added device reporting to the My plan is changing the "on_gpu" booleans at the C++ interface to receive the device index instead. So we'd have integers
I believe this will be enough for any behavior we need:
|
|
If you search |
21e2cdf to
1d5c5da
Compare
1d5c5da to
c002d36
Compare
|
I've simplified the About the |
master-592-b8079e2removes all build-time backend dependencies on sd.cpp sources.The backend change is very incompatible with many ggml includes: I've had to remove
util.hfromsdtypes_adapter.cppbecause it triggered weird errors onllama.h. A better solution could be splittingutil.hinto two headers, but I didn't want to change stuff outside otherarch/sdcpp in the middle of this PR. I've also had to adapt ourmain_gpusupport, but didn't test it very well.On the other hand, since sd.cpp became backend-agnostic,
sdtypes_adapter.cppnow only needs to be built once for all backends. Both Vulkan and ROCm seem to be running fine with the change.After #2155 .Edit: moving the discussion here to make it easier to follow.