Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_tx_details_by_hash futures are deadlocked when run on same executor. #592

Closed
2 tasks done
artemii235 opened this issue Feb 29, 2020 · 5 comments
Closed
2 tasks done
Assignees

Comments

@artemii235
Copy link
Member

artemii235 commented Feb 29, 2020

I've received a report from cipi that some of his swaps failed with future timed out errors when there're perfect network conditions.
Cipi provided a tcpdump that contained successful electrum request and response records but MM2 didn't process them for some reason.

I was able to recreate the error on my local environment using this test: https://github.com/KomodoPlatform/atomicDEX-API/blob/mm2-electrum-deadlock-fix/mm2src/coins/utxo/utxo_tests.rs#L540. It runs 2 requests in async loop and spawns them on shared executor where TCP processing also runs. It typically dead locks in less than 1 minute.

At the beginning I thought that it's std Mutex fault, but refactoring the lock to async version didn't solve the issue. I also stopped storing responses in shared Hashmap in response processing, it checked the response existence every 200 ms. Now it works on top of async oneshot channels.
Response processing refactoring to channels also didn't solve the problem.

After all I tried to run the TCP processing futures on another executor: https://github.com/KomodoPlatform/atomicDEX-API/blob/mm2-electrum-deadlock-fix/mm2src/coins/utxo/rpc_clients.rs#L1334. This change fixed the dead lock, cipi also reported that it's fixed on his side.

Heuristic method is ok as hot fix, but I'd like to research it more, this issue is created to share ideas and track the status of research.

  • look for tokio issues mentioning TCP handling and dead locks.
  • try to run the dead lock test in debugger and see if some threads are locked with involved code path.
@artemii235 artemii235 self-assigned this Feb 29, 2020
@ArtemGr
Copy link

ArtemGr commented Feb 29, 2020

Tough cookie. I remember seeing this article - https://tokio.rs/blog/2019-10-scheduler/ - maybe it'll shed some line on the architecture of tokio executors. Just my two cents.

P.S. Would also be interesting whether it works just fine under a single async-std executor.
BTW, I'm experimenting with migrating some of my code to async-std, and they have some interesting blocking execution support, cf. https://async.rs/blog/stop-worrying-about-blocking-the-new-async-std-runtime/, async-rs/async-std#631

@artemii235
Copy link
Member Author

I will check it out on this week, thanks!

@artemii235
Copy link
Member Author

Well, it turns out to be much easier, there was a couple of blocking .wait() calls in async tx_details_by_hash function: e373916#diff-03e28973db1da355a83fccb7d7c53933L1623
The test runs fine after I replaced them with .await.
@cipig Could you please pull the latest changes from mm2-electrum-deadlock-fix and retest with several concurrent swaps?

@cipig
Copy link
Member

cipig commented Mar 4, 2020

i did 100 RICK/MORTY swaps between my home node and a VPS, both electrum
there were more than 40 swaps in parallel
success rate was 100% and not a single error or timeout

@artemii235 artemii235 changed the title Electrum handling futures are deadlocked when run on same executor. get_tx_details_by_hash futures are deadlocked when run on same executor. Mar 5, 2020
artemii235 added a commit that referenced this issue Mar 5, 2020
#592 (#596)

* WIP. Using different executors for Electrum connect loop and requests prevents deadlock.

* Use .await instead of blocking .wait() in utxo get_tx_details_by_hash.
@artemii235
Copy link
Member Author

@cipig thanks for testing! Fix released

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants