[Feature Request] Conversational Search (RAG) with a local LLM #1732

elliot-sawyer · 2024-05-15T05:42:56Z

Description

Is it possible to use Conversational Search (RAG) with a local LLM? The documentation suggests it is only possible with OpenAI and Cloudflare. I was wondering if any of the HuggingFace models could be used with an available GPU instead to avoid making slow network calls.

Metadata

Typesense Version: 26

OS: Linux

piccaso · 2024-05-17T11:31:44Z

It takes only care of the R part of RAG but yes, custom models and using GPU are supported.
And you also have the option to generate the embeddings yourself and store them.

Check out all the subtopics of this part of the documentation:
https://typesense.org/docs/26.0/api/vector-search.html#index-embeddings

jasonbosco · 2024-05-17T17:28:04Z

@piccaso Typesense does support the "AG" part of RAG, by integrating with ChatGPT / Cloduflare APIs: https://typesense.org/docs/26.0/api/conversational-search-rag.html

@elliot-sawyer We don't yet have a way to integrate with local LLMs. But I'll leave this open as a feature request.

jasonbosco · 2024-05-17T17:28:29Z

May I know which local LLMs you're looking for?

elliot-sawyer · 2024-05-18T06:46:55Z

I don't have a particular one in mind yet - would any of the Typesense models on HuggingFace be appropriate? I'll have an NVIDIA A100 available in a couple of months to do some Typesense work with, but only on the stipulation that I use a locally downloaded LLM (no network or API keys).

jasonbosco · 2024-05-20T21:13:26Z

I misspoke earlier. Turns out that we actually added support for vLLM through which you can run several local LLMs. Just haven't documented it yet.

Will post a link here once we update the docs.

Ku3mi41 · 2024-05-23T09:54:27Z

@jasonbosco It's nice to know that this is being done. I already started testing this myself, without documentation (heh) and ran into an authorization problem. Now api_key is not used by vLLM at all, could you add this? In case when LLM inference on different server it's is important to have authentication.

jasonbosco · 2024-05-23T16:31:44Z

CC: @ozanarmagan

piccaso · 2024-05-30T14:32:36Z

I misspoke earlier. Turns out that we actually added support for vLLM through which you can run several local LLMs. Just haven't documented it yet.

Will post a link here once we update the docs.

Thats huge. If you manage to make this easily accessible it could be quite the hype.
Looking forward to try it!

jasonbosco changed the title ~~Is it possible to use Conversational Search (RAG) with a local LLM?~~ [Feature Request] Conversational Search (RAG) with a local LLM May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Conversational Search (RAG) with a local LLM #1732

[Feature Request] Conversational Search (RAG) with a local LLM #1732

elliot-sawyer commented May 15, 2024

piccaso commented May 17, 2024

jasonbosco commented May 17, 2024

jasonbosco commented May 17, 2024

elliot-sawyer commented May 18, 2024

jasonbosco commented May 20, 2024 •

edited

Ku3mi41 commented May 23, 2024

jasonbosco commented May 23, 2024

piccaso commented May 30, 2024

[Feature Request] Conversational Search (RAG) with a local LLM #1732

[Feature Request] Conversational Search (RAG) with a local LLM #1732

Comments

elliot-sawyer commented May 15, 2024

Description

Metadata

piccaso commented May 17, 2024

jasonbosco commented May 17, 2024

jasonbosco commented May 17, 2024

elliot-sawyer commented May 18, 2024

jasonbosco commented May 20, 2024 • edited

Ku3mi41 commented May 23, 2024

jasonbosco commented May 23, 2024

piccaso commented May 30, 2024

jasonbosco commented May 20, 2024 •

edited