CUDA support

biniou has been thinked as a cpu-only-no-gpu-required application, but it should be really easy to make it use your NVIDIA GPU to accelerate inferences. As of commit [630c975] (11/27/23), biniou will support the following features :
- Autodetection of CUDA device and configuration of biniou to use it
- If CUDA is enabled, using fp16 torch_dtype, which will force you to re-download the models, but half the size of them (when supported).
- If CUDA is enabled, using cpu_offload to save as much VRAM as possible (when supported).
Complementary prerequisites are a 4GB+ VRAM Nvidia GPU using a working CUDA 12.1 environment and an already functional biniou standard installation.
You can easily activate CUDA support by selecting the type of optimization to activate (CPU, CUDA or ROCm for Linux), in the WebUI control module.

Note : support option for CUDA in Chatbot and Llava module is in a separate setting panel, also in the WebUI control module.

Provide feedback