Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add vLLM to k8s environment #67

Open
1 task done
Tracked by #82
mfreeman451 opened this issue Apr 4, 2024 · 0 comments
Open
1 task done
Tracked by #82

feat: Add vLLM to k8s environment #67

mfreeman451 opened this issue Apr 4, 2024 · 0 comments
Labels
enhancement New feature or request k8s LLM

Comments

@mfreeman451
Copy link
Contributor

mfreeman451 commented Apr 4, 2024

What

We need to use local open models to do inference on conversations to build relationships since just relying on regular expressions alone will not produce accurate results.

A user in our community has mentioned we should not try and use ollama/llama.cpp and checkout this instead:

Why

We need to be able to feed conversations in, probably on a schedule, perhaps more frequently, and use it to analyze and identify relationships between users in the logs. By feed in conversations, this would most likely just be a query against our existing data set in neo4j. This basically fits into the ETL pipeline.

How

Install vLLM in cluster, deploying with kserve??

https://kserve.github.io/website/latest/modelserving/v1beta1/llm/vllm/
https://docs.vllm.ai/en/latest/index.html
https://docs.vllm.ai/en/latest/serving/deploying_with_kserve.html

API

We'll need to build a gated API around this unless they have their own system. I'm not sure exactly what the inputs will be yet, probably a system_prompt and a query. We'll most likely have to build langchain in with the API so it can perform function calling, otherwise it will just spit back a bunch of junk to us every time that we cant rely on programattically.

Usage

We might even consider using mage.ai here on the sort of processing piece once the vLLM stuff is setup. Since local inference is usually a bit slower, we'll probably do most of our work in batch jobs.

Extra Links

https://docs.vllm.ai/en/latest/serving/serving_with_langchain.html
https://python.langchain.com/docs/integrations/llms/vllm/

@mfreeman451 mfreeman451 added enhancement New feature or request k8s LLM labels Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request k8s LLM
Projects
None yet
Development

No branches or pull requests

1 participant