-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Description
The Feature
In general, the openai models in Azure (and I suspect other models anywhere) get progressively slower as the size of the request increases. Combine that with how Azure behaves when their backend is overloaded (requests hang), and it is sometimes challenging to set the proper timeout settings for a model. Too low of a timeout value and callers may be missing out on responses that were being handled fine but were taking a while. Too high and callers unnecessarily wait on requests that will never come back.
This RFE is about supporting something like this in the proxy:
general_settings:
custom_timeout_setter: timeout.my_timeout_setter
The custom timeout setter function would take the request about to be sent as input, and return a timeout value or None if the timeouts defined in the config.yaml should be used.
This would allow us to set smaller timeouts on small requests and higher timeouts on bigger requests, using our telemetry data to set appropriate thresholds.
Motivation, pitch
Better handling of current model performance behavior (in Azure)
LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?
No