Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong order state after multiple connection reset errors #464

Closed
flxbe opened this issue Jul 10, 2021 · 11 comments
Closed

Wrong order state after multiple connection reset errors #464

flxbe opened this issue Jul 10, 2021 · 11 comments

Comments

@flxbe
Copy link
Contributor

flxbe commented Jul 10, 2021

Hi!

I sometimes get multiple ConnectionResetErrors when trying to cancel an order, even when retrying. After the last failed attempt, the order state is not properly reset, but remains CANCELLING.

I looked at the code to execute the cancel operation:
https://github.com/liampauling/flumine/blob/4da58825ae529ddde2b3bfee2a434b20ed41ec3c/flumine/execution/betfairexecution.py#L55

The actual exception is handled here:
https://github.com/liampauling/flumine/blob/4da58825ae529ddde2b3bfee2a434b20ed41ec3c/flumine/execution/betfairexecution.py#L244

However, after the last attempt, there is simply no new retry and the order is not updated again to reflect the failed request.
Maybe the exception handler should re-raise the error when not retrying the operation. Then, the execute_cancel method could catch the error and set the order state back to EXECUTABLE, when the state is still CANCELLING (to mostly avoid the current problem with race conditions in the order state processing).

Also: I noticed flumine is using its own session pool instead of relying on the connection pool of the session itself. Why is that?

@liampauling
Copy link
Member

Its probably overkill but with requests.Session not being thread safe (apparently) flumine has its own connection pool so that each thread in the executor can have its own session. We then have some logic to try and keep these sessions/connections active to reduce latency. Tbh this is probably overkill and we should do some tests on the feasibility of just a single shared session.

I will look into the issue now, should be a quick fix.

@flxbe
Copy link
Contributor Author

flxbe commented Jul 12, 2021

Regarding the thread safety of requests.Session: they are using urllib3 under the hood for the connection pooling, which is pretty mature AFAIK. They even put thread-safety as their first goal of the library. Do you have a concrete example for the session not being thread-safe? If so, we should probably upstream the problem to urllib3 for them to fix it.

@liampauling
Copy link
Member

Thoughts?

Screenshot 2021-07-12 at 09 23 16

@flxbe
Copy link
Contributor Author

flxbe commented Jul 12, 2021

https://stackoverflow.com/questions/18188044/is-the-session-object-from-pythons-requests-library-thread-safe

I had a look into some issues of urllib3 and requests:
urllib3/urllib3#1252
psf/requests#2766

I must say I am almost horrified by these issues. Seems pretty bold to me to put "thread-safety" as their first bullet point into the description of urllib3.

Regarding the code change: Personally, I think it is not very intuitive that calling retry potentially changes the order status. I would prefer to explicitly call reset_orders in the _execution_helper after retry returns False. I see however that this is only a cosmetic issue, so otherwise lgtm. Thanks for the fast fix!

@flxbe
Copy link
Contributor Author

flxbe commented Jul 12, 2021

Another thought on the flumine implementation of the session pool: Currently, pop and append are used to manage the pool of active sessions, creating a stack of sessions. This implementation could lead to problems with older sessions only being reused after all "fresher" sessions are already in use, increasing the probability of reusing a dropped connection eventually. Maybe a queue-like behavior (by using pop(0))would reduce connection errors, since flumine always reuses the oldest available session. What do you think?

@flxbe
Copy link
Contributor Author

flxbe commented Jul 12, 2021

https://stackoverflow.com/questions/18188044/is-the-session-object-from-pythons-requests-library-thread-safe

I had a look into some issues of urllib3 and requests:
urllib3/urllib3#1252
psf/requests#2766

I must say I am almost horrified by these issues. Seems pretty bold to me to put "thread-safety" as their first bullet point into the description of urllib3.

Just to clarify: I had a longer look at the issues and it seems like the ConnectionPool of urllib3 is thread-safe, the requests.Session object is not.

Maybe directly using urllib3 could be a potential solution to drop the custom session pool. The retries could even be handled via urllib3 directly: https://urllib3.readthedocs.io/en/stable/reference/urllib3.connectionpool.html#urllib3.HTTPConnectionPool.urlopen

@liampauling
Copy link
Member

Regarding the code change: Personally, I think it is not very intuitive that calling retry potentially changes the order status. I would prefer to explicitly call reset_orders in the _execution_helper after retry returns False. I see however that this is only a cosmetic issue, so otherwise lgtm. Thanks for the fast fix!

Agreed, updated to handle within the helper.

@liampauling
Copy link
Member

Another thought on the flumine implementation of the session pool: Currently, pop and append are used to manage the pool of active sessions, creating a stack of sessions. This implementation could lead to problems with older sessions only being reused after all "fresher" sessions are already in use, increasing the probability of reusing a dropped connection eventually. Maybe a queue-like behavior (by using pop(0))would reduce connection errors, since flumine always reuses the oldest available session. What do you think?

Agreed, updated to pop(0), should be easy to see if error rate decreases.

@liampauling
Copy link
Member

https://stackoverflow.com/questions/18188044/is-the-session-object-from-pythons-requests-library-thread-safe

I had a look into some issues of urllib3 and requests:
urllib3/urllib3#1252
psf/requests#2766
I must say I am almost horrified by these issues. Seems pretty bold to me to put "thread-safety" as their first bullet point into the description of urllib3.

Just to clarify: I had a longer look at the issues and it seems like the ConnectionPool of urllib3 is thread-safe, the requests.Session object is not.

Maybe directly using urllib3 could be a potential solution to drop the custom session pool. The retries could even be handled via urllib3 directly: https://urllib3.readthedocs.io/en/stable/reference/urllib3.connectionpool.html#urllib3.HTTPConnectionPool.urlopen

It's a bit of a mess, happy to try some tests using urllib3, keen to keep it simple though and if we can get away with a single session maybe we should try that first?

@flxbe
Copy link
Contributor Author

flxbe commented Jul 12, 2021

Just using a simple, shared requests.Session is I think the optimal solution, as long as it does not cause any problems. I agree that using urllib3 directly would increase complexity and should only be a fallback solution.

If the change to the flumine session pool already fixes or significantly reduces the the connection errors, I also see no cause for immediate action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants