Ollama / Litellm support #1092

madsamjp · 2023-12-20T17:45:14Z

I know some others have requested alternative backends/apis in other threads, but I wanted to ask about supporting Ollama and/or Litellm. The reason I'm asking is that consumer hardware is often limited. My 4090 only has 24GB of VRAM. Tabby takes up about 10GB when I'm using it.

Let's say I'm working on a project using Tabby in my code editor, then I want to jump to my WebUI to ask a coding question. I'd first need to ssh into my dedicated inference server, stop tabby to clear the VRAM, ask my question in the WebUI which uses Ollama as a backend, then restart tabby afterwards. This is not a realistic workflow.

The benefit of using a single backend like Ollama (or Litellm on top of Ollama), is that Ollama can dynamically switch out models on the fly, and it can queue requests. It would be much better if local LLM/AI projects supported such backends out of the box to enable more efficient management of precious VRAM. If everyone just relied on using their own separate backends, we'd never be able to make use of multiple tools at once.

I've only been using Tabby for a day or so, and it seems like something I'd definitely like to integrate into my workflow. However, as I also heavily rely on Ollama in my current workflow, I cant really use both simultaneously without creating extra hassle.

I guess the other alternative would be to have the ability to unload models using a keybinding in the text editor (I use nvim). I've seen issue 624, however this seems related to shutting down the whole docker container (which is running on the same machine). Just having the option to temporarily unload the model with an api call would be more suitable.

wsxiaoys · 2023-12-21T03:18:23Z

Thank you for submitting such a detailed FR. I thoroughly understand your use case. Before delving into another post on why tabby relies on the token decoding interface ...

There's a chat playground within Tabby, with --chat-model and --webserver set as arguments.
Check out Making /v1beta/chat/completions streaming output compatible with openai #1076, we're actively working on making Tabby's chat completion interface compatible with OpenAI.

These enhancement should provides reasonable tradeoff / deployment choice regarding of chat use cases.

madsamjp added the enhancement New feature or request label Dec 20, 2023

wsxiaoys removed the enhancement New feature or request label Dec 21, 2023

TabbyML locked and limited conversation to collaborators Dec 22, 2023

wsxiaoys converted this issue into discussion #1096 Dec 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Ollama / Litellm support #1092

Ollama / Litellm support #1092

madsamjp commented Dec 20, 2023 •

edited

Loading

wsxiaoys commented Dec 21, 2023 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Ollama / Litellm support #1092

Ollama / Litellm support #1092

Comments

madsamjp commented Dec 20, 2023 • edited Loading

wsxiaoys commented Dec 21, 2023 • edited Loading

This issue was moved to a discussion.

madsamjp commented Dec 20, 2023 •

edited

Loading

wsxiaoys commented Dec 21, 2023 •

edited

Loading