You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since I added a fast node to an existing cluster, the nightly apply-updates procedure blocks during update-core.
● redis.service - Core Redis DB
Loaded: loaded (/etc/systemd/system/redis.service; enabled; preset: disabled)
Active: active (running) since Tue 2024-02-20 02:45:50 CET; 13h ago
Feb 20 02:45:49.688798 ns8n5 agent@node[11056]: Running /var/lib/nethserver/node/update-core.d/95cleanup_images...
Feb 20 02:45:49.822616 ns8n5 agent@node[11056]: Failed to publish the action status on channel progress/node/15/task/9d6cec45-13e1-4799-a481-846ed0eb3469
Feb 20 02:45:49.822864 ns8n5 agent@node[11056]: task/node/15/9d6cec45-13e1-4799-a481-846ed0eb3469: update-core/95cleanup_images is starting
Feb 20 02:45:49.900384 ns8n5 agent@node[11056]: Failed to publish the action status on channel progress/node/15/task/9d6cec45-13e1-4799-a481-846ed0eb3469
Feb 20 02:45:49.900954 ns8n5 agent@node[11056]: Redis command failed: dial tcp 10.5.4.5:6379: connect: connection refused
Feb 20 02:45:49.900954 ns8n5 agent@node[11056]: task/node/15/9d6cec45-13e1-4799-a481-846ed0eb3469: action "update-core" status is "completed" (0) at step 95cleanup_images
From the log trace, there is no retry attempt to write the task output in Redis: after Redis is restarted, the node agent running on the fast node fails to publish its update-core exit status. As result the task outcome is never found by the controlling task running on the cluster leader and the whole action blocks.
Steps to reproduce
install NS8
define the check-bug-6854 action and run it with api-cli
To define such action
mkdir /var/lib/nethserver/cluster/action/check-bug-6854
vi /var/lib/nethserver/cluster/actions/check-bug-6854/10restart_redis
chmod +x /var/lib/nethserver/cluster/actions/check-bug-6854/10restart_redis
Since I added a fast node to an existing cluster, the nightly
apply-updates
procedure blocks duringupdate-core
.From the log trace, there is no retry attempt to write the task output in Redis: after Redis is restarted, the node agent running on the fast node fails to publish its
update-core
exit status. As result the task outcome is never found by the controlling task running on the cluster leader and the whole action blocks.Steps to reproduce
check-bug-6854
action and run it withapi-cli
To define such action
In 10restart_redis:
#!/bin/bash systemctl stop redis systemctl start redis --no-block
Expected results
The action terminates.
**Actual results.
The action is blocked until I manually create a fake task exit status with MPUT.
Fix proposal
During Redis restarts the default go-redis library retry settings may not suffice
Increase the retry period of our
agent
.Components
The text was updated successfully, but these errors were encountered: