keystore-setup fails to create agent.properties on fresh KVM hosts; KVM addHost cert flow then loops with "Failed to find keystore passphrase" and setupAgentSecurity() throws "Failed to setup certificate in the KVM agent's keystore file" (4.22)
problem
When LibvirtServerDiscoverer.setupAgentSecurity() invokes scripts/util/keystore-setup on a freshly-installed KVM agent (one whose /etc/cloudstack/agent/ directory contains only the package-shipped environment.properties and log4j-cloud.xml - i.e. the natural state on a first-time addHost, or after an operator has wiped agent state to recover from a botched provisioning attempt), the script silently fails to create agent.properties and silently fails to persist keystore.passphrase=... to it. The downstream keystore-cert-import invocation then reads the missing/empty agent.properties, exits 1 with "Failed to find keystore passphrase from file: /etc/cloudstack/agent/agent.properties, quitting!", and after the 3-attempt retry loop in setupAgentSecurity() the management server throws CloudRuntimeException("Failed to setup certificate in the KVM agent's keystore file, please see logs and configure manually!"). The host never gets added.
The defect is in scripts/util/keystore-setup. The passphrase-write block is gated on [ -f "$PROPS_FILE" ]:
PROPS_FILE="$1"
KS_FILE="$2"
KS_PASS="$3"
KS_VALIDITY="$4"
CSR_FILE="$5"
...
# Re-use existing password or use the one provided
if [ -f "$PROPS_FILE" ]; then # <- gates the entire write block
OLD_PASS=$(sed -n '/^keystore.passphrase/p' "$PROPS_FILE" | sed 's/^keystore.passphrase=//g')
if [ -n "${OLD_PASS// }" ]; then
KS_PASS="$OLD_PASS"
else
sed -i "/^keystore.passphrase.*/d" "$PROPS_FILE" 2>&1 | $LOGGER_CMD || true
echo "keystore.passphrase=$KS_PASS" >> "$PROPS_FILE" # <- never reached if file missing
fi
fi
The intent appears to have been "if a previous props file exists, reuse its passphrase; otherwise use the one provided as $3" - and on the otherwise path, write $3 into a fresh file. But the write is also inside the gate, so on a fresh host the new passphrase is never persisted. The script does proceed (it has KS_PASS from arg $3 so keytool -genkey and keytool -certreq work), so cloud.jks and cloud.csr come out fine. The trailing chmod 600 "$PROPS_FILE" fails with chmod: cannot access '/etc/cloudstack/agent/agent.properties': No such file or directory to stderr, but that's swallowed via || $LOGGER_CMD ... and the script's final exit status is the logger's (always 0). MS reads the CSR from stdout and proceeds.
The next stage (keystore-cert-import, called with all 10 args correctly populated by setupAgentSecurity()) then reads passphrase from agent.properties:
KS_PASS=$(sed -n '/keystore.passphrase/p' "$PROPS_FILE" 2>/dev/null | sed 's/keystore.passphrase=//g' 2>/dev/null)
if [ -z "${KS_PASS// }" ]; then
echo "Failed to find keystore passphrase from file: $PROPS_FILE, quitting!"
exit 1
fi
(Note: arg $2 KS_PASS is overwritten here on the KVM path - the arg $2 is only honoured inside the [ ! -f "$LIBVIRTD_FILE" ] systemvm branch above. So the file is the only source of truth on KVM agents.)
KS_PASS ends up empty, script exits 1.
setupAgentSecurity() retries three times (each attempt deterministically the same since the bug is reproduced from clean state on every retry), then throws.
evidence
bash -x trace of keystore-cert-import on the agent during a failing addHost, captured by wrapping the upstream script with a tracing shim:
+ '[' '!' -f /etc/libvirt/libvirtd.conf ']'
++ sed -n /keystore.passphrase/p /etc/cloudstack/agent/agent.properties
++ sed s/keystore.passphrase=//g
+ KS_PASS=
+ '[' -z '' ']'
+ echo 'Failed to find keystore passphrase from file: /etc/cloudstack/agent/agent.properties, quitting!'
+ exit 1
State of /etc/cloudstack/agent/ immediately after keystore-setup returned (note the missing agent.properties despite the script having "succeeded"):
total 24
-rwxr-xr-x 1 root root 904 Nov 7 environment.properties <- package
-rwxr-xr-x 1 root root 2601 Nov 7 log4j-cloud.xml <- package
-rw------- 1 root root 2742 ... cloud.jks <- created by keystore-setup
<- agent.properties NOT created
<- cloud.csr written + cat'd to stdout (consumed by MS)
Argument-count verification on the agent side (instrumented script logging argc/argv on entry):
keystore-setup argc=5 argv=[/etc/cloudstack/agent/agent.properties /etc/cloudstack/agent/cloud.jks <16-char-random> 365 /etc/cloudstack/agent/cloud.csr]
keystore-cert-import argc=10 arg6_len=1696 arg8_len=1814 arg10_len=0
So setupAgentSecurity() is sending all 5 / 10 args correctly - the dispatch side is fine. The defect is purely inside scripts/util/keystore-setup.
management-server.log for the same addHost attempt - three back-to-back invocations of keystore-cert-import, all returning the same "Failed to find keystore passphrase" output, then the wrapper exception:
... DEBUG c.c.u.s.SSHCmdHelper Executing cmd: sudo /usr/share/cloudstack-common/scripts/util/keystore-cert-import ...
... DEBUG c.c.u.s.SSHCmdHelper SSH command output: Failed to find keystore passphrase from file: /etc/cloudstack/agent/agent.properties, quitting!
... [retry 2: same output]
... [retry 3: same output]
... WARN c.c.h.k.d.KvmServerDiscoverer can't setup agent, due to com.cloud.utils.exception.CloudRuntimeException: Failed to setup certificate in the KVM agent's keystore file, please see logs and configure manually!
versions
- Apache CloudStack 4.22.0.0 (apache packaging, apt install)
- Hypervisor: KVM
- Agent host OS: Ubuntu Server 24.04 LTS, OpenJDK 17
- Management server OS: Ubuntu Server 24.04 LTS, OpenJDK 17, CA framework
provider.plugin=root
- Agent dir before addHost: only
environment.properties + log4j-cloud.xml (i.e. the natural state right after apt install cloudstack-agent on a fresh host that has not yet been registered)
steps to reproduce
- Install
cloudstack-management 4.22.0.0 on a clean host. Bootstrap CA (default - ca.framework.provider.plugin=root works fine for this; not the issue).
- Install
cloudstack-agent 4.22.0.0 on a clean KVM host. Verify /etc/cloudstack/agent/ contains only environment.properties + log4j-cloud.xml (the package-shipped files).
- Stop
cloudstack-agent on the KVM host (so no startup process can pre-create state).
- From the management server, dispatch
addHost for the KVM host - e.g. via cmk add host or ngine_io.cloudstack.cs_host.
- Observe failure with
errortext: 'Could not add host at [http://...] ... due to: [ can't setup agent, due to com.cloud.utils.exception.CloudRuntimeException: Failed to setup certificate in the KVM agent's keystore file, please see logs and configure manually!]'.
- SSH to the KVM host as root and inspect:
/etc/cloudstack/agent/cloud.jks exists (created by keystore-setup)
/etc/cloudstack/agent/cloud.csr does not exist on disk after the call (consumed by MS via stdout, then trailing chmod fails silently - this is a separate cosmetic issue)
/etc/cloudstack/agent/agent.properties does not exist - this is the bug
- Re-running
addHost after manually creating agent.properties with a keystore.passphrase=<any-non-empty-value> line lets the second invocation succeed, because keystore-setup's if [ -f "$PROPS_FILE" ] gate now passes and the existing passphrase is reused. This is a workable manual workaround per host.
what to do about it
Primary - patch scripts/util/keystore-setup to ensure PROPS_FILE exists before the passphrase-write branch. Two-block change, restructure the gate so the file is created if missing, then the passphrase is always either reused or written:
-# Re-use existing password or use the one provided
-if [ -f "$PROPS_FILE" ]; then
- $LOGGER_CMD "Previous props file exists, trying to extract password"
- OLD_PASS=$(sed -n '/^keystore.passphrase/p' "$PROPS_FILE" | sed 's/^keystore.passphrase=//g')
- if [ -n "${OLD_PASS// }" ]; then
- KS_PASS="$OLD_PASS"
- $LOGGER_CMD "Password extraction successful"
- else
- sed -i "/^keystore.passphrase.*/d" "$PROPS_FILE" 2>&1 | $LOGGER_CMD || true
- echo "keystore.passphrase=$KS_PASS" >> "$PROPS_FILE"
- if [ $? != 0 ]; then
- $LOGGER_CMD "Could not add new password to agent.properties"
- else
- $LOGGER_CMD "New keystore password set"
- fi
- fi
-fi
+# Ensure props file exists, then re-use existing passphrase or persist the one provided.
+if [ ! -f "$PROPS_FILE" ]; then
+ touch "$PROPS_FILE"
+ chmod 600 "$PROPS_FILE"
+ $LOGGER_CMD "Created new props file at $PROPS_FILE"
+fi
+
+OLD_PASS=$(sed -n '/^keystore.passphrase/p' "$PROPS_FILE" | sed 's/^keystore.passphrase=//g')
+if [ -n "${OLD_PASS// }" ]; then
+ KS_PASS="$OLD_PASS"
+ $LOGGER_CMD "Password extraction successful"
+else
+ sed -i "/^keystore.passphrase.*/d" "$PROPS_FILE" 2>&1 | $LOGGER_CMD || true
+ echo "keystore.passphrase=$KS_PASS" >> "$PROPS_FILE"
+ if [ $? != 0 ]; then
+ $LOGGER_CMD "Could not add new password to agent.properties"
+ else
+ $LOGGER_CMD "New keystore password set"
+ fi
+fi
After this, fresh-host KVM addHost works first try. Re-runs are idempotent (existing passphrase reused).
Secondary - harden scripts/util/keystore-cert-import to fail loudly instead of exit 0 in error branches. Same script has bare exit (no status) in two error branches, which return exit code 0 and let the MS see SSHCmdResult.isSuccess()=true even when nothing was imported:
# Import certificate
if [ ! -z "${CERT// }" ]; then
echo "$CERT" > "$CERT_FILE"
elif [ ! -f "$CERT_FILE" ]; then
- echo "Cannot find certificate file: $CERT_FILE, exiting"
- exit
+ echo "Cannot find certificate file: $CERT_FILE, exiting" >&2
+ exit 1
fi
# Import ca certs
if [ ! -z "${CACERT// }" ]; then
echo "$CACERT" > "$CACERT_FILE"
elif [ ! -f "$CACERT_FILE" ]; then
- echo "Cannot find ca certificate file: $CACERT_FILE, exiting!"
- exit
+ echo "Cannot find ca certificate file: $CACERT_FILE, exiting!" >&2
+ exit 1
fi
exit 1 (with status) lets SSHCmdResult.isSuccess() correctly return false, and the existing 3-retry-then-throw loop in setupAgentSecurity() will surface the real failure to the operator instead of leaving the agent with a broken keystore that only manifests later as runtime TLS handshake errors.
relationship to #13064
This is a sibling defect, not the same one as #13064 - same script family, same general class of silent-failure footguns, but a different trigger and a different downstream symptom:
|
this issue |
#13064 |
| Hypervisor |
KVM |
VMware |
| Triggered by |
fresh KVM agent in addHost (no prior agent.properties) |
VMware SSH dispatch of SetupCertificateCommand to a SystemVM |
| Buggy script |
keystore-setup (passphrase-write gated on file existing) |
keystore-cert-import (bare exit in error branch) |
| Args sent by MS |
all 5 / 10 args correct |
only 6 of 10 args ($7/$8 CACERT missing on the VMware path) |
| Exit code from script |
exit 1 (loud failure) |
exit 0 (silent success masking real failure) |
| Downstream symptom |
setupAgentSecurity() throws "Failed to setup certificate in the KVM agent's keystore file" |
agent's cloud.jks ends up missing the trustedCertEntry; agent later fails TLS handshake to MS:8250 with certificate_unknown |
| Surface time |
immediate, on first addHost |
minutes later, once the agent actually tries to connect |
The proposed Secondary fix above (replacing bare exit with exit 1) addresses one half of #13064 - once those branches return non-zero, the MS setupAgentSecurity() retry-and-throw loop will surface the VMware-dispatch arg-count bug as a proper failure instead of letting the silent-success continue to hide it. The actual primary fix for #13064 (passing CACERT args from the VMware SSH dispatch path) is independent of this change and orthogonal.
operator workaround (until upstream fix lands)
Replace /usr/share/cloudstack-common/scripts/util/keystore-setup on each KVM agent with a thin wrapper that pre-creates agent.properties with the passphrase from arg $3 before exec'ing the original. Originals preserved as .orig:
#!/bin/bash
# Workaround for upstream keystore-setup not creating agent.properties on
# fresh hosts. See <this issue link>.
if [ $# -ge 5 ]; then
PROPS_FILE="$1"
KS_PASS="$3"
if [ ! -f "$PROPS_FILE" ]; then
touch "$PROPS_FILE"
chmod 600 "$PROPS_FILE"
fi
EXISTING=$(sed -n '/^keystore\.passphrase=/p' "$PROPS_FILE" | head -1 | sed 's/^keystore\.passphrase=//')
if [ -z "${EXISTING// }" ]; then
echo "keystore.passphrase=$KS_PASS" >> "$PROPS_FILE"
fi
exec /usr/share/cloudstack-common/scripts/util/keystore-setup.orig "$@"
fi
exec /usr/share/cloudstack-common/scripts/util/keystore-setup.orig "$@"
Verified working on Ubuntu 24.04 + CloudStack 4.22.0.0 KVM - addHost succeeds first try after the wrapper is in place and the agent dir is wiped to clean state.
keystore-setupfails to createagent.propertieson fresh KVM hosts; KVMaddHostcert flow then loops with "Failed to find keystore passphrase" andsetupAgentSecurity()throws "Failed to setup certificate in the KVM agent's keystore file" (4.22)problem
When
LibvirtServerDiscoverer.setupAgentSecurity()invokesscripts/util/keystore-setupon a freshly-installed KVM agent (one whose/etc/cloudstack/agent/directory contains only the package-shippedenvironment.propertiesandlog4j-cloud.xml- i.e. the natural state on a first-timeaddHost, or after an operator has wiped agent state to recover from a botched provisioning attempt), the script silently fails to createagent.propertiesand silently fails to persistkeystore.passphrase=...to it. The downstreamkeystore-cert-importinvocation then reads the missing/emptyagent.properties, exits 1 with "Failed to find keystore passphrase from file: /etc/cloudstack/agent/agent.properties, quitting!", and after the 3-attempt retry loop insetupAgentSecurity()the management server throwsCloudRuntimeException("Failed to setup certificate in the KVM agent's keystore file, please see logs and configure manually!"). The host never gets added.The defect is in
scripts/util/keystore-setup. The passphrase-write block is gated on[ -f "$PROPS_FILE" ]:The intent appears to have been "if a previous props file exists, reuse its passphrase; otherwise use the one provided as
$3" - and on the otherwise path, write$3into a fresh file. But the write is also inside the gate, so on a fresh host the new passphrase is never persisted. The script does proceed (it hasKS_PASSfrom arg$3sokeytool -genkeyandkeytool -certreqwork), socloud.jksandcloud.csrcome out fine. The trailingchmod 600 "$PROPS_FILE"fails withchmod: cannot access '/etc/cloudstack/agent/agent.properties': No such file or directoryto stderr, but that's swallowed via|| $LOGGER_CMD ...and the script's final exit status is the logger's (always 0). MS reads the CSR from stdout and proceeds.The next stage (
keystore-cert-import, called with all 10 args correctly populated bysetupAgentSecurity()) then reads passphrase fromagent.properties:(Note: arg
$2KS_PASSis overwritten here on the KVM path - the arg$2is only honoured inside the[ ! -f "$LIBVIRTD_FILE" ]systemvm branch above. So the file is the only source of truth on KVM agents.)KS_PASSends up empty, script exits 1.setupAgentSecurity()retries three times (each attempt deterministically the same since the bug is reproduced from clean state on every retry), then throws.evidence
bash -xtrace ofkeystore-cert-importon the agent during a failingaddHost, captured by wrapping the upstream script with a tracing shim:State of
/etc/cloudstack/agent/immediately afterkeystore-setupreturned (note the missingagent.propertiesdespite the script having "succeeded"):Argument-count verification on the agent side (instrumented script logging argc/argv on entry):
So
setupAgentSecurity()is sending all 5 / 10 args correctly - the dispatch side is fine. The defect is purely insidescripts/util/keystore-setup.management-server.logfor the sameaddHostattempt - three back-to-back invocations ofkeystore-cert-import, all returning the same "Failed to find keystore passphrase" output, then the wrapper exception:versions
provider.plugin=rootenvironment.properties+log4j-cloud.xml(i.e. the natural state right afterapt install cloudstack-agenton a fresh host that has not yet been registered)steps to reproduce
cloudstack-management4.22.0.0 on a clean host. Bootstrap CA (default -ca.framework.provider.plugin=rootworks fine for this; not the issue).cloudstack-agent4.22.0.0 on a clean KVM host. Verify/etc/cloudstack/agent/contains onlyenvironment.properties+log4j-cloud.xml(the package-shipped files).cloudstack-agenton the KVM host (so no startup process can pre-create state).addHostfor the KVM host - e.g. viacmk add hostorngine_io.cloudstack.cs_host.errortext: 'Could not add host at [http://...] ... due to: [ can't setup agent, due to com.cloud.utils.exception.CloudRuntimeException: Failed to setup certificate in the KVM agent's keystore file, please see logs and configure manually!]'./etc/cloudstack/agent/cloud.jksexists (created bykeystore-setup)/etc/cloudstack/agent/cloud.csrdoes not exist on disk after the call (consumed by MS via stdout, then trailingchmodfails silently - this is a separate cosmetic issue)/etc/cloudstack/agent/agent.propertiesdoes not exist - this is the bugaddHostafter manually creatingagent.propertieswith akeystore.passphrase=<any-non-empty-value>line lets the second invocation succeed, becausekeystore-setup'sif [ -f "$PROPS_FILE" ]gate now passes and the existing passphrase is reused. This is a workable manual workaround per host.what to do about it
Primary - patch
scripts/util/keystore-setupto ensurePROPS_FILEexists before the passphrase-write branch. Two-block change, restructure the gate so the file is created if missing, then the passphrase is always either reused or written:After this, fresh-host KVM
addHostworks first try. Re-runs are idempotent (existing passphrase reused).Secondary - harden
scripts/util/keystore-cert-importto fail loudly instead ofexit 0in error branches. Same script has bareexit(no status) in two error branches, which return exit code 0 and let the MS seeSSHCmdResult.isSuccess()=trueeven when nothing was imported:# Import certificate if [ ! -z "${CERT// }" ]; then echo "$CERT" > "$CERT_FILE" elif [ ! -f "$CERT_FILE" ]; then - echo "Cannot find certificate file: $CERT_FILE, exiting" - exit + echo "Cannot find certificate file: $CERT_FILE, exiting" >&2 + exit 1 fi # Import ca certs if [ ! -z "${CACERT// }" ]; then echo "$CACERT" > "$CACERT_FILE" elif [ ! -f "$CACERT_FILE" ]; then - echo "Cannot find ca certificate file: $CACERT_FILE, exiting!" - exit + echo "Cannot find ca certificate file: $CACERT_FILE, exiting!" >&2 + exit 1 fiexit 1(with status) letsSSHCmdResult.isSuccess()correctly return false, and the existing 3-retry-then-throw loop insetupAgentSecurity()will surface the real failure to the operator instead of leaving the agent with a broken keystore that only manifests later as runtime TLS handshake errors.relationship to #13064
This is a sibling defect, not the same one as #13064 - same script family, same general class of silent-failure footguns, but a different trigger and a different downstream symptom:
addHost(no prioragent.properties)SetupCertificateCommandto a SystemVMkeystore-setup(passphrase-write gated on file existing)keystore-cert-import(bareexitin error branch)$7/$8CACERT missing on the VMware path)setupAgentSecurity()throws "Failed to setup certificate in the KVM agent's keystore file"cloud.jksends up missing the trustedCertEntry; agent later fails TLS handshake to MS:8250 withcertificate_unknownaddHostThe proposed Secondary fix above (replacing bare
exitwithexit 1) addresses one half of #13064 - once those branches return non-zero, the MSsetupAgentSecurity()retry-and-throw loop will surface the VMware-dispatch arg-count bug as a proper failure instead of letting the silent-success continue to hide it. The actual primary fix for #13064 (passing CACERT args from the VMware SSH dispatch path) is independent of this change and orthogonal.operator workaround (until upstream fix lands)
Replace
/usr/share/cloudstack-common/scripts/util/keystore-setupon each KVM agent with a thin wrapper that pre-createsagent.propertieswith the passphrase from arg$3before exec'ing the original. Originals preserved as.orig:Verified working on Ubuntu 24.04 + CloudStack 4.22.0.0 KVM -
addHostsucceeds first try after the wrapper is in place and the agent dir is wiped to clean state.