-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only exit at upgrade when the on-disk binary is ready. #771
Only exit at upgrade when the on-disk binary is ready. #771
Conversation
keeperVersion.pg_autoctl_version, | ||
keeperVersion.required_extension_version); | ||
exit(EXIT_CODE_MONITOR); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do anything here in the else
case? e.g. log something and/or return false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point indeed. I have added extra logging (but that will happen every second) and returned false in that case, so that we don't try to use a monitor protocol version that we don't know. I have yet to figure a nice way to test this code path now (if I just upgrade the binary locally in a make cluster
environment, because it's the same binary for every node, the right thing will happen...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My trick for testing this locally was to change the version number everywhere to 1.7
, but change this to compare against hardcoded "1.6"
instead of the PG_AUTOCTL_EXTENSION_VERSION
constant:
https://github.com/citusdata/pg_auto_failover/blob/0fecd653f9ef43019e4d76e347afce378adcc481/src/bin/pg_autoctl/keeper.c#L1405
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually no, I think it was a different line where I did this trick. I made pg_autoctl version
always return 1.6
I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I added preliminary support for running an upgrade from within a docker compose environment, where it's possible to use the following command to replace a running container with a new container freshly built from the repository:
$ docker compose up -d --no-deps --build monitor
I added a little Makefile in tests/upgrade
along with the docker-compose file to use. Then the idea would be to install such a patch as the following, and run make upgrade
, which replaces the monitor and we can see the new error message about the upgrade to be done on the other nodes.
The docker compose network is sometimes acting up though, and I have had other errors, such as the following, making it pretty hard to reliably test for upgrade paths. Still, it's a promising work in progress, I believe.
monitor_1 | 21:29:44 25 WARN Failed to resolve DNS name "79a3a3ebe79e": Temporary failure in name resolution
monitor_1 | 21:29:44 25 WARN Failed to find IP address for hostname "79a3a3ebe79e", see above for details
monitor_1 | 21:29:44 25 WARN Failed to grant connection to local network.
monitor_1 | 21:29:44 1 ERROR pg_autoctl service listener exited with exit status 12
monitor_1 | 21:29:44 1 INFO Restarting service listener
diff --git a/src/bin/pg_autoctl/defaults.h b/src/bin/pg_autoctl/defaults.h
index 8e7e2eaf..643dc96a 100644
--- a/src/bin/pg_autoctl/defaults.h
+++ b/src/bin/pg_autoctl/defaults.h
@@ -17,7 +17,7 @@
#define PG_AUTOCTL_VERSION "1.6.1"
/* version of the extension that we requite to talk to on the monitor */
-#define PG_AUTOCTL_EXTENSION_VERSION "1.6"
+#define PG_AUTOCTL_EXTENSION_VERSION "1.7"
/* environment variable to use to make DEBUG facilities available */
#define PG_AUTOCTL_DEBUG "PG_AUTOCTL_DEBUG"
diff --git a/src/monitor/Makefile b/src/monitor/Makefile
index 8e6bf321..478fb2a7 100644
--- a/src/monitor/Makefile
+++ b/src/monitor/Makefile
@@ -2,7 +2,7 @@
# Licensed under the PostgreSQL License.
EXTENSION = pgautofailover
-EXTVERSION = 1.6
+EXTVERSION = 1.7
SRC_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))
diff --git a/src/monitor/metadata.h b/src/monitor/metadata.h
index 9e50ab72..0f8b39a3 100644
--- a/src/monitor/metadata.h
+++ b/src/monitor/metadata.h
@@ -15,7 +15,7 @@
#include "storage/lockdefs.h"
-#define AUTO_FAILOVER_EXTENSION_VERSION "1.6"
+#define AUTO_FAILOVER_EXTENSION_VERSION "1.7"
#define AUTO_FAILOVER_EXTENSION_NAME "pgautofailover"
#define AUTO_FAILOVER_SCHEMA_NAME "pgautofailover"
#define AUTO_FAILOVER_FORMATION_TABLE "pgautofailover.formation"
diff --git a/src/monitor/pgautofailover.control b/src/monitor/pgautofailover.control
index a649eb76..d0504410 100644
--- a/src/monitor/pgautofailover.control
+++ b/src/monitor/pgautofailover.control
@@ -1,5 +1,5 @@
comment = 'pg_auto_failover'
-default_version = '1.6'
+default_version = '1.7'
module_pathname = '$libdir/pgautofailover'
relocatable = false
requires = 'btree_gist'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what happened with the DNS resolution is actually that on the first run in the docker compose environment, the hostname would be set to a dynamic hash (the container hash/name). We would store that as the default --hostname
in our pg_autoctl configuration file. And at the next start-up of course the container name/hash would change, but pg_autoctl would still try to resolve the previous container name, which the DNS now knows nothing about.
I have added a retry loop to the DNS temporary resolution failures, and more importantly fixed the docker compose file to use known hostnames thanks to the hostname:
service parameter.
I have also improved the tests/upgrade/Makefile
to handle the patching of the source code for the 1.7 version, which we would have to maintain for other versions later of course. And added a README that makes it easy enough to navigate the tooling.
Oh, in passing, I have validated that the current PR actually works as intended. The Makefile also allows to run a full upgrade of all the nodes, using the docker compose commands.
f3f2308
to
5544491
Compare
This happens when e.g. the HBA rules have not been set yet on the monitor. We have a timing where we could ping successfully and then fail to connect, and in that situation we had a spurious call to pgsql_finish() that would reset the connection statement type to SINGLE, even when it was before set to SINGLE.
The new directory tests/upgrade contains a docker compose file that allows running a monitor and three nodes in a way that will allow testing upgrades thanks to the following command: $ docker-compose up --build monitor This will rebuild the docker image used in the docker compose environment from the local repository checkout and restart the monitor with the new binaries. To test upgrades from earlier versions of pg_auto_failover though, hacking the docker compose file to its argument-based style (instead of the current environment based style) would be necessary, because this PR introduces new code to enhance our support for environment variables at create time.
Only allow connections to be made to the monitor when everything else has been checked and is known to be ready.
Debian bullseye got released 5 days ago and this broke the docker build file.
34708ad
to
c827134
Compare
See #749.