New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_tx_details_by_hash futures are deadlocked when run on same executor. #592
Comments
Tough cookie. I remember seeing this article - https://tokio.rs/blog/2019-10-scheduler/ - maybe it'll shed some line on the architecture of tokio executors. Just my two cents. P.S. Would also be interesting whether it works just fine under a single async-std executor. |
I will check it out on this week, thanks! |
Well, it turns out to be much easier, there was a couple of blocking |
i did 100 RICK/MORTY swaps between my home node and a VPS, both electrum |
@cipig thanks for testing! Fix released |
I've received a report from cipi that some of his swaps failed with
future timed out
errors when there're perfect network conditions.Cipi provided a tcpdump that contained successful electrum request and response records but MM2 didn't process them for some reason.
I was able to recreate the error on my local environment using this test: https://github.com/KomodoPlatform/atomicDEX-API/blob/mm2-electrum-deadlock-fix/mm2src/coins/utxo/utxo_tests.rs#L540. It runs 2 requests in async loop and spawns them on shared executor where TCP processing also runs. It typically dead locks in less than 1 minute.
At the beginning I thought that it's std Mutex fault, but refactoring the lock to async version didn't solve the issue. I also stopped storing responses in shared Hashmap in response processing, it checked the response existence every 200 ms. Now it works on top of async oneshot channels.
Response processing refactoring to channels also didn't solve the problem.
After all I tried to run the TCP processing futures on another executor: https://github.com/KomodoPlatform/atomicDEX-API/blob/mm2-electrum-deadlock-fix/mm2src/coins/utxo/rpc_clients.rs#L1334. This change fixed the dead lock, cipi also reported that it's fixed on his side.
Heuristic method is ok as hot fix, but I'd like to research it more, this issue is created to share ideas and track the status of research.
The text was updated successfully, but these errors were encountered: