Skip to content

CUDA support

Woolverine94 edited this page Apr 14, 2024 · 7 revisions

CUDA support

  • biniou has been thinked as a cpu-only-no-gpu-required application, but it should be really easy to make it use your NVIDIA GPU to accelerate inferences. As of commit [630c975] (11/27/23), biniou will support the following features :
    • Autodetection of CUDA device and configuration of biniou to use it
    • If CUDA is enabled, using fp16 torch_dtype, which will force you to re-download the models, but half the size of them (when supported).
    • If CUDA is enabled, using cpu_offload to save as much VRAM as possible (when supported).
  • Complementary prerequisites are a 4GB+ VRAM Nvidia GPU using a working CUDA 12.1 environment and an already functional biniou standard installation.
  • You can easily activate CUDA support by selecting the type of optimization to activate (CPU, CUDA or ROCm for Linux), in the WebUI control module.

Note : support option for CUDA in Chatbot and Llava module is in a separate setting panel, also in the WebUI control module.

Clone this wiki locally