Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: async timeout error on proxy #1278

Closed
krrishdholakia opened this issue Dec 30, 2023 · 11 comments
Closed

[Bug]: async timeout error on proxy #1278

krrishdholakia opened this issue Dec 30, 2023 · 11 comments
Labels
bug Something isn't working

Comments

@krrishdholakia
Copy link
Contributor

What happened?

Hey, it seems after a few hours, the python server starts timing out for every call, then when it's restarted, it's fine but then starts timing out all the calls again.

Relevant log output

ERROR    raise APITimeoutError(request=request) from err
ERROR    raise APITimeoutError(request=request) from err
ERROR    raise APITimeoutError(request=request) from err
ERROR    raise APITimeoutError(request=request) from err
ERROR    raise APITimeoutError(request=request) from err
ERROR    raise APITimeoutError(request=request) from err
ERROR    raise APITimeoutError(request=request) from err
ERROR    raise APITimeoutError(request=request) from err
ERROR    raise APITimeoutError(request=request) from err
ERROR    raise APITimeoutError(request=request) from err
ERROR    raise APITimeoutError(request=request) from err
ERROR    raise APITimeoutError(request=request) from err
ERROR    raise APITimeoutError(request=request) from err

Twitter / LinkedIn details

No response

@krrishdholakia krrishdholakia added the bug Something isn't working label Dec 30, 2023
@krrishdholakia
Copy link
Contributor Author

User is using:

  • litellm proxy
  • azure chat completions

@ishaan-jaff
Copy link
Contributor

we have a deployed instance of the proxy here, we can run load testing against it: https://litellm-api.up.railway.app/ and try to repro

@ishaan-jaff
Copy link
Contributor

seems related to this: openai/openai-python#821

I suspect it has something to do with the Azure/OpenAI client Initialization

@krrishdholakia
Copy link
Contributor Author

"It's like consistently around 4 hours" - user

"[7:51 AM, 12/30/2023] Okay happened, again, checking logs
[7:53 AM, 12/30/2023] No errors
[7:53 AM, 12/30/2023] Only warning is

WARNINGWarning: the config option 'server.enableCORS=false' is not compatible with 'server.enableXsrfProtection=true'.
[7:54 AM, 12/30/2023] any subsequent calls to gpt-3.5-turbo (azure) hung. calls to gpt-4 (openai) were fine
[7:55 AM, 12/30/2023] After restart, all is good"

@krrishdholakia
Copy link
Contributor Author

Pretty sure it's an azure-openai client issue.

@ishaan-jaff open to thoughts on how to repro this within a shorter timeperiod. Can play around with min timeouts, and see how to better test this.

@krrishdholakia
Copy link
Contributor Author

This seems pretty similar to what we're facing: openai/openai-python#769 (comment)
Screenshot 2023-12-30 at 9 55 10 AM

@krrishdholakia
Copy link
Contributor Author

krrishdholakia commented Dec 30, 2023

Seems like this might be related to connections not being properly closed, I don't think bumping the number is a good solution, as it'll just show up a little later on -

Screenshot 2023-12-30 at 10 00 09 AM Screenshot 2023-12-30 at 10 01 06 AM

@krrishdholakia
Copy link
Contributor Author

Another relevant issue - openai/openai-python#874 (comment)

@krrishdholakia
Copy link
Contributor Author

Seems like this - #1270, could be a solution for the async timeout problem.

We can just cache clients for ~1hr, and reinitialize them if they're invalid.

@krrishdholakia
Copy link
Contributor Author

krrishdholakia commented Dec 30, 2023

have this be a controllable param, for users who are seeing higher traffic, and need to re-initalize clients more frequently (there's obviously a trade-off here).

This is also a patch, until openai has a more permanent fix out for how they handle connections (seems to be some sort of underlying bug with connections not being closed properly).

@krrishdholakia
Copy link
Contributor Author

this is now added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants