Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network failure with geographical redundancy breaks barman cron #202

Closed
martinmarques opened this issue Mar 16, 2019 · 1 comment

Comments

Projects
None yet
1 participant
@martinmarques
Copy link

commented Mar 16, 2019

If Geographical Redundancy is set up between two barman servers and for some reason the connection between the two nodes fails, successive barman cron executions will crash with an exception.

The fact that the execution is not processed for the node that the barman server can't connect to is fine, but the exception is not caught, and so any further processing of other postgres servers to backup isn't done.

[barman@endor ~]$ barman cron
Skipping inactive server 'endor-11'
WARNING: sync-info is out of sync. Self-recovery procedure started: requesting full synchronisation from primary server alderaan-10
EXCEPTION: sync-info execution on remote primary server alderaan-10 failed: ssh: connect to host 192.168.0.101 port 22: No route to host

See log file for more details.
[barman@endor ~]$ barman list-server
alderaan-10 - Example of a Barman passive server (Passive)
endor-10 - Endor Postgres Database (Streaming-Only)
endor-11 - Endor PG11 Database (Streaming-Only) (inactive)
endor-9.6 - Endor Postgres 9.6 Database
@martinmarques

This comment has been minimized.

Copy link
Author

commented Mar 16, 2019

This patch fixed the bug:

$ git diff
diff --git a/barman/cli.py b/barman/cli.py
index d559259..547b6e8 100644
--- a/barman/cli.py
+++ b/barman/cli.py
@@ -141,7 +141,10 @@ def cron():
         # server is None and to report inactive and disabled servers,
         # but here we have only active and well configured servers.
 
-        server.cron()
+        try:
+            server.cron()
+        except:
+            output.error("Unable to run cron on server %s" % name)
 
     output.close_and_exit()
 

@mnencia mnencia closed this in 6bacce9 Mar 20, 2019

mnencia added a commit that referenced this issue Mar 20, 2019

Make `barman cron` resilient to unhandled exceptions
Before this patch, any unhandled exceptions raised during the handling
of a server, would have terminated the cron, preventing any other
maintenance operation required by subsequent servers.

Now the exception is logged and the cron continues with eventual other
servers.

Thanks to Martín Marqués for the analysis.

Closes: #202

Signed-off-by: Marco Nenciarini <marco.nenciarini@2ndquadrant.it>
Signed-off-by: Gabriele Bartolini <gabriele.bartolini@2ndQuadrant.it>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.