Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploying for 1000+ Users #217

Open
orgh0 opened this issue May 28, 2024 · 2 comments
Open

Deploying for 1000+ Users #217

orgh0 opened this issue May 28, 2024 · 2 comments

Comments

@orgh0
Copy link

orgh0 commented May 28, 2024

Hi, thanks for the awesome work on this project !!

What might be the best way to get this project to work for scale? I have seen the docker images released, is deploying with kubernetes a sustainable solution?

We only need the smallest model, but GPU inferencing is not an option for us.

Any support would be super helpful.

@zoq
Copy link
Contributor

zoq commented May 28, 2024

Great to hear you find the project helpful. To serve multiple users I would suggest to look into batching, something that is on the roadmap but currently not supported. After that, you probably want some kind of router/load balancer to forward users to the correct endpoint.

@orgh0
Copy link
Author

orgh0 commented May 29, 2024

@zoq - thanks for the quick response, really appreciate it.

Quick Questions:

  1. Do you have any suggestions on resources to get started with batching? when you say batching, i'm assuming you're taking about sending batches of data to the model. I'm trying to understand how that will be helpful to scale the model to more users on CPU. Do you suggest batching requests from multiple users in one batch? Hard to imagine how i can batch requests from one user into a batch form, although given realtime in a continuous stream, there could be sometime there
  2. Is there a particular reference architecture for building live inferencing infrastructure for scale that you're referencing when you talk about router/load-balancer with batching in model. I'm unclear on how you're visualizing batching and request redirection working together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants