Deploying for 1000+ Users #217

orgh0 · 2024-05-28T20:44:20Z

Hi, thanks for the awesome work on this project !!

What might be the best way to get this project to work for scale? I have seen the docker images released, is deploying with kubernetes a sustainable solution?

We only need the smallest model, but GPU inferencing is not an option for us.

Any support would be super helpful.

zoq · 2024-05-28T20:49:24Z

Great to hear you find the project helpful. To serve multiple users I would suggest to look into batching, something that is on the roadmap but currently not supported. After that, you probably want some kind of router/load balancer to forward users to the correct endpoint.

orgh0 · 2024-05-29T07:16:02Z

@zoq - thanks for the quick response, really appreciate it.

Quick Questions:

Do you have any suggestions on resources to get started with batching? when you say batching, i'm assuming you're taking about sending batches of data to the model. I'm trying to understand how that will be helpful to scale the model to more users on CPU. Do you suggest batching requests from multiple users in one batch? Hard to imagine how i can batch requests from one user into a batch form, although given realtime in a continuous stream, there could be sometime there
Is there a particular reference architecture for building live inferencing infrastructure for scale that you're referencing when you talk about router/load-balancer with batching in model. I'm unclear on how you're visualizing batching and request redirection working together.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploying for 1000+ Users #217

Deploying for 1000+ Users #217

orgh0 commented May 28, 2024

zoq commented May 28, 2024

orgh0 commented May 29, 2024

Deploying for 1000+ Users #217

Deploying for 1000+ Users #217

Comments

orgh0 commented May 28, 2024

zoq commented May 28, 2024

orgh0 commented May 29, 2024