Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Conversational Search (RAG) with a local LLM #1732

Open
elliot-sawyer opened this issue May 15, 2024 · 8 comments
Open

[Feature Request] Conversational Search (RAG) with a local LLM #1732

elliot-sawyer opened this issue May 15, 2024 · 8 comments

Comments

@elliot-sawyer
Copy link

Description

Is it possible to use Conversational Search (RAG) with a local LLM? The documentation suggests it is only possible with OpenAI and Cloudflare. I was wondering if any of the HuggingFace models could be used with an available GPU instead to avoid making slow network calls.

Metadata

Typesense Version: 26

OS: Linux

@piccaso
Copy link

piccaso commented May 17, 2024

It takes only care of the R part of RAG but yes, custom models and using GPU are supported.
And you also have the option to generate the embeddings yourself and store them.

Check out all the subtopics of this part of the documentation:
https://typesense.org/docs/26.0/api/vector-search.html#index-embeddings

@jasonbosco
Copy link
Member

@piccaso Typesense does support the "AG" part of RAG, by integrating with ChatGPT / Cloduflare APIs: https://typesense.org/docs/26.0/api/conversational-search-rag.html

@elliot-sawyer We don't yet have a way to integrate with local LLMs. But I'll leave this open as a feature request.

@jasonbosco
Copy link
Member

May I know which local LLMs you're looking for?

@jasonbosco jasonbosco changed the title Is it possible to use Conversational Search (RAG) with a local LLM? [Feature Request] Conversational Search (RAG) with a local LLM May 17, 2024
@elliot-sawyer
Copy link
Author

I don't have a particular one in mind yet - would any of the Typesense models on HuggingFace be appropriate? I'll have an NVIDIA A100 available in a couple of months to do some Typesense work with, but only on the stipulation that I use a locally downloaded LLM (no network or API keys).

@jasonbosco
Copy link
Member

jasonbosco commented May 20, 2024

I misspoke earlier. Turns out that we actually added support for vLLM through which you can run several local LLMs. Just haven't documented it yet.

Will post a link here once we update the docs.

@Ku3mi41
Copy link

Ku3mi41 commented May 23, 2024

@jasonbosco It's nice to know that this is being done. I already started testing this myself, without documentation (heh) and ran into an authorization problem. Now api_key is not used by vLLM at all, could you add this? In case when LLM inference on different server it's is important to have authentication.

@jasonbosco
Copy link
Member

CC: @ozanarmagan

@piccaso
Copy link

piccaso commented May 30, 2024

I misspoke earlier. Turns out that we actually added support for vLLM through which you can run several local LLMs. Just haven't documented it yet.

Will post a link here once we update the docs.

Thats huge. If you manage to make this easily accessible it could be quite the hype.
Looking forward to try it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants