Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ollama / Litellm support #1092

Closed
madsamjp opened this issue Dec 20, 2023 · 1 comment
Closed

Ollama / Litellm support #1092

madsamjp opened this issue Dec 20, 2023 · 1 comment

Comments

@madsamjp
Copy link

madsamjp commented Dec 20, 2023

I know some others have requested alternative backends/apis in other threads, but I wanted to ask about supporting Ollama and/or Litellm. The reason I'm asking is that consumer hardware is often limited. My 4090 only has 24GB of VRAM. Tabby takes up about 10GB when I'm using it.

Let's say I'm working on a project using Tabby in my code editor, then I want to jump to my WebUI to ask a coding question. I'd first need to ssh into my dedicated inference server, stop tabby to clear the VRAM, ask my question in the WebUI which uses Ollama as a backend, then restart tabby afterwards. This is not a realistic workflow.

The benefit of using a single backend like Ollama (or Litellm on top of Ollama), is that Ollama can dynamically switch out models on the fly, and it can queue requests. It would be much better if local LLM/AI projects supported such backends out of the box to enable more efficient management of precious VRAM. If everyone just relied on using their own separate backends, we'd never be able to make use of multiple tools at once.

I've only been using Tabby for a day or so, and it seems like something I'd definitely like to integrate into my workflow. However, as I also heavily rely on Ollama in my current workflow, I cant really use both simultaneously without creating extra hassle.

I guess the other alternative would be to have the ability to unload models using a keybinding in the text editor (I use nvim). I've seen issue 624, however this seems related to shutting down the whole docker container (which is running on the same machine). Just having the option to temporarily unload the model with an api call would be more suitable.

@madsamjp madsamjp added the enhancement New feature or request label Dec 20, 2023
@wsxiaoys
Copy link
Member

wsxiaoys commented Dec 21, 2023

Thank you for submitting such a detailed FR. I thoroughly understand your use case. Before delving into another post on why tabby relies on the token decoding interface ...

These enhancement should provides reasonable tradeoff / deployment choice regarding of chat use cases.

@wsxiaoys wsxiaoys removed the enhancement New feature or request label Dec 21, 2023
@TabbyML TabbyML locked and limited conversation to collaborators Dec 22, 2023
@wsxiaoys wsxiaoys converted this issue into discussion #1096 Dec 22, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants