The error message:
There may be a lot of timeouts when ddfs is queried too often. I have been able to reproduce this issue with the following steps:
1. Start disco.
2. Start a long running jobs with a lot of tasks (I use test_50k).
3. Run a lot of ddfs queries concurrently:
I use the following steps for doing so:
for i in $(seq 100)
ddfs chunk test$i ./AUTHORS &
ddfs xcat test$i &
ddfs rm test$i &
As soon as the http requests kick in, disco master will behave oddly. These are some of the behavior I have seen:
Disco master has been made more resilient with increasing the timeouts for the more important gen_server calls.