You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hang during multi-worker crash + recovery. See master-crasher.sh command below to run.
Last messages from each worker that match the regexp --|~~|RECOVERY|Recovery|recovery:
initializer:
1580343036.556997,UNEXPECTED CALL to add_expected_boundary_count on recovery reconnector phase Not Recovery Reconnecting Phase. Ignoring!
worker1:
1580343036.555562,UNEXPECTED CALL to add_expected_boundary_count on recovery reconnector phase Not Recovery Reconnecting Phase. Ignoring!
worker2:
1580343036.768260,|~~ - Recovery initiated at worker5. Ceding control. - ~~|
1580343036.768268,_RecoveryPhase transition to _NotRecovering
1580343036.768335,Sent control message to worker5: AckRecoveryInitiatedMsg
1580343036.768343,Recovering worker: Skipping Phase III
1580343036.768347,|~~ INIT PHASE IV: Cluster is ready to work! ~~|
worker3:
1580343036.558269,UNEXPECTED CALL to add_expected_boundary_count on recovery reconnector phase Not Recovery Reconnecting Phase. Ignoring!
worker4:
1580343036.571309,UNEXPECTED CALL to add_expected_boundary_count on recovery reconnector phase Not Recovery Reconnecting Phase. Ignoring!
worker5:
1580343036.771454,Received msg on Control Channel: AckRecoveryInitiatedMsg
Successful restart after multi-worker crash & restart
What OS and version of Wallaroo are you using?
Ubuntu 16.04 LTS/Xenial and master branch @ commit master
Steps to reproduce?
cd /path/to/wallaroo/testing/correctness/scripts/effectively-once
make -C ../../../.. resilience=on PONYCFLAGS="--verbose=1 -d -Dtrace -Dcheckpoint_trace -Didentify_routing_ids" build-examples-pony-passthrough build-testing-tools-external_sender
./master-crasher.sh 6 crash5 crash{0,1,2,3,4}.slow crash-sink no-ack-progress
I see this error happening every 1-3 hours, so it's rare. If you're short on disk space, then I recommend running the following to reduce the # of log files that consume space in /tmp.
while [ 1 ]; do echo -n `date` "" ; df -h /tmp; ls -t /tmp/wallaroo.*gz | sed 1,20d | xargs rm -f ; sleep 15; done
The text was updated successfully, but these errors were encountered:
Is this a bug, feature request, or feedback?
Bug
What is the current behavior?
Hang during multi-worker crash + recovery. See
master-crasher.sh
command below to run.Last messages from each worker that match the regexp
--|~~|RECOVERY|Recovery|recovery
:Full logs are available at http://wallaroolabs-dev.s3.amazonaws.com/logs/logs.1580344129.tar.gz
What is the expected behavior?
Successful restart after multi-worker crash & restart
What OS and version of Wallaroo are you using?
Ubuntu 16.04 LTS/Xenial and master branch @ commit master
Steps to reproduce?
I see this error happening every 1-3 hours, so it's rare. If you're short on disk space, then I recommend running the following to reduce the # of log files that consume space in /tmp.
The text was updated successfully, but these errors were encountered: