You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the production scenario, multiple model registration is a needed feature, which could be served as auto scale or model update case or centralized service dispatch accessed from fixed URL. Previously, we use fastchat w/ vllm, and it works well to serve our purpose.
But nowadays vllm get rapidly expansion in it LLM support feature like images/video, etc, and also engine args grows to support various need, fastchat鈥榮 provided openai interface seems cannot keep up the pace with the changes of vllm side.
So shall we consider to host some kind of function just like fastchat's controller feature, and model worker could be loosely-coupled with controller, and dynamic leave and register into controller's server backend, while controler could choose the best route for certain prompt request?
Alternatives
I'm not sure whether there is some other openai API server could does this controller/work loosely-coupled working mode well, and also could keep sync with vllm's quickly changing API.
Additional context
No response
The text was updated successfully, but these errors were encountered:
馃殌 The feature, motivation and pitch
In the production scenario, multiple model registration is a needed feature, which could be served as auto scale or model update case or centralized service dispatch accessed from fixed URL. Previously, we use fastchat w/ vllm, and it works well to serve our purpose.
But nowadays vllm get rapidly expansion in it LLM support feature like images/video, etc, and also engine args grows to support various need, fastchat鈥榮 provided openai interface seems cannot keep up the pace with the changes of vllm side.
So shall we consider to host some kind of function just like fastchat's controller feature, and model worker could be loosely-coupled with controller, and dynamic leave and register into controller's server backend, while controler could choose the best route for certain prompt request?
Alternatives
I'm not sure whether there is some other openai API server could does this controller/work loosely-coupled working mode well, and also could keep sync with vllm's quickly changing API.
Additional context
No response
The text was updated successfully, but these errors were encountered: