Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Mitogen intermittent hangs on "Connection timed out" target #598
First thanks for the awesome project! I had a 11 times performance improvement!!! after using mitogen. However, I am facing below problem and wondering if you can take a look.
Submit ansible playbook jobs in a for loop for 20 time (96 hosts)
and it will intermittently hang on (Observed output)
If it doesn't hang it looks like below (Expected output)
if I take out
Below is the content of ping.yml playbook
more log with "MITOGEN_DUMP_THREAD_STACKS=10"
Sorry for the late acknowledgement, I've been busy elsewhere. :) Thanks for an amazing bug report! It sounds like you might be hitting a deadlock early during startup.
I will aim to set up a reproduction 'real soon'. The current master branch needs a soak test before release, will try running it against 100 nodes to see if we can flush this one out.
There are some forking-related deadlocks on current master that might explain this. I have not changed in the recent releases relating to forking, and the last soak I did was fine.
How is the quality of networking? I don't think it is a network issue, but worth asking just in case
I reproduced your issue using 96 Google Cloud nodes ( https://github.com/dw/mitogen/blob/master/tests/ansible/gcloud/mitogen-load-testing.tf ), 0.2.7 fails very quickly.
I have found and fixed 2 deadlocks, one during startup in the target that looks like it could have impacted a lot of people ( 769a8b2 ), and one in the master ( f78a5f0 ). After 120 runs (11,520 connections) I can no longer reproduce your issue
Thanks for reporting this, and apologies for the delay in responding.
This is now on the master branch and will make it into the next release. To be updated when a new release is made, subscribe to https://networkgenomics.com/mail/mitogen-announce/