Skip to content

[4.22] keystore-setup script does not create agent.properties on fresh KVM hosts #13079

@CMStevenKusters

Description

@CMStevenKusters

keystore-setup fails to create agent.properties on fresh KVM hosts; KVM addHost cert flow then loops with "Failed to find keystore passphrase" and setupAgentSecurity() throws "Failed to setup certificate in the KVM agent's keystore file" (4.22)

problem

When LibvirtServerDiscoverer.setupAgentSecurity() invokes scripts/util/keystore-setup on a freshly-installed KVM agent (one whose /etc/cloudstack/agent/ directory contains only the package-shipped environment.properties and log4j-cloud.xml - i.e. the natural state on a first-time addHost, or after an operator has wiped agent state to recover from a botched provisioning attempt), the script silently fails to create agent.properties and silently fails to persist keystore.passphrase=... to it. The downstream keystore-cert-import invocation then reads the missing/empty agent.properties, exits 1 with "Failed to find keystore passphrase from file: /etc/cloudstack/agent/agent.properties, quitting!", and after the 3-attempt retry loop in setupAgentSecurity() the management server throws CloudRuntimeException("Failed to setup certificate in the KVM agent's keystore file, please see logs and configure manually!"). The host never gets added.

The defect is in scripts/util/keystore-setup. The passphrase-write block is gated on [ -f "$PROPS_FILE" ]:

PROPS_FILE="$1"
KS_FILE="$2"
KS_PASS="$3"
KS_VALIDITY="$4"
CSR_FILE="$5"
...
# Re-use existing password or use the one provided
if [ -f "$PROPS_FILE" ]; then           # <- gates the entire write block
    OLD_PASS=$(sed -n '/^keystore.passphrase/p' "$PROPS_FILE" | sed 's/^keystore.passphrase=//g')
    if [ -n "${OLD_PASS// }" ]; then
        KS_PASS="$OLD_PASS"
    else
        sed -i "/^keystore.passphrase.*/d" "$PROPS_FILE" 2>&1 | $LOGGER_CMD || true
        echo "keystore.passphrase=$KS_PASS" >> "$PROPS_FILE"   # <- never reached if file missing
    fi
fi

The intent appears to have been "if a previous props file exists, reuse its passphrase; otherwise use the one provided as $3" - and on the otherwise path, write $3 into a fresh file. But the write is also inside the gate, so on a fresh host the new passphrase is never persisted. The script does proceed (it has KS_PASS from arg $3 so keytool -genkey and keytool -certreq work), so cloud.jks and cloud.csr come out fine. The trailing chmod 600 "$PROPS_FILE" fails with chmod: cannot access '/etc/cloudstack/agent/agent.properties': No such file or directory to stderr, but that's swallowed via || $LOGGER_CMD ... and the script's final exit status is the logger's (always 0). MS reads the CSR from stdout and proceeds.

The next stage (keystore-cert-import, called with all 10 args correctly populated by setupAgentSecurity()) then reads passphrase from agent.properties:

KS_PASS=$(sed -n '/keystore.passphrase/p' "$PROPS_FILE" 2>/dev/null  | sed 's/keystore.passphrase=//g' 2>/dev/null)

if [ -z "${KS_PASS// }" ]; then
    echo "Failed to find keystore passphrase from file: $PROPS_FILE, quitting!"
    exit 1
fi

(Note: arg $2 KS_PASS is overwritten here on the KVM path - the arg $2 is only honoured inside the [ ! -f "$LIBVIRTD_FILE" ] systemvm branch above. So the file is the only source of truth on KVM agents.)

KS_PASS ends up empty, script exits 1.

setupAgentSecurity() retries three times (each attempt deterministically the same since the bug is reproduced from clean state on every retry), then throws.

evidence

bash -x trace of keystore-cert-import on the agent during a failing addHost, captured by wrapping the upstream script with a tracing shim:

+ '[' '!' -f /etc/libvirt/libvirtd.conf ']'
++ sed -n /keystore.passphrase/p /etc/cloudstack/agent/agent.properties
++ sed s/keystore.passphrase=//g
+ KS_PASS=
+ '[' -z '' ']'
+ echo 'Failed to find keystore passphrase from file: /etc/cloudstack/agent/agent.properties, quitting!'
+ exit 1

State of /etc/cloudstack/agent/ immediately after keystore-setup returned (note the missing agent.properties despite the script having "succeeded"):

total 24
-rwxr-xr-x  1 root root   904 Nov  7 environment.properties     <- package
-rwxr-xr-x  1 root root  2601 Nov  7 log4j-cloud.xml            <- package
-rw-------  1 root root  2742 ...    cloud.jks                  <- created by keystore-setup
                                                                 <- agent.properties NOT created
                                                                 <- cloud.csr written + cat'd to stdout (consumed by MS)

Argument-count verification on the agent side (instrumented script logging argc/argv on entry):

keystore-setup       argc=5  argv=[/etc/cloudstack/agent/agent.properties /etc/cloudstack/agent/cloud.jks <16-char-random> 365 /etc/cloudstack/agent/cloud.csr]
keystore-cert-import argc=10 arg6_len=1696 arg8_len=1814 arg10_len=0

So setupAgentSecurity() is sending all 5 / 10 args correctly - the dispatch side is fine. The defect is purely inside scripts/util/keystore-setup.

management-server.log for the same addHost attempt - three back-to-back invocations of keystore-cert-import, all returning the same "Failed to find keystore passphrase" output, then the wrapper exception:

... DEBUG c.c.u.s.SSHCmdHelper Executing cmd: sudo /usr/share/cloudstack-common/scripts/util/keystore-cert-import ...
... DEBUG c.c.u.s.SSHCmdHelper SSH command output: Failed to find keystore passphrase from file: /etc/cloudstack/agent/agent.properties, quitting!
... [retry 2: same output]
... [retry 3: same output]
... WARN  c.c.h.k.d.KvmServerDiscoverer can't setup agent, due to com.cloud.utils.exception.CloudRuntimeException: Failed to setup certificate in the KVM agent's keystore file, please see logs and configure manually!

versions

  • Apache CloudStack 4.22.0.0 (apache packaging, apt install)
  • Hypervisor: KVM
  • Agent host OS: Ubuntu Server 24.04 LTS, OpenJDK 17
  • Management server OS: Ubuntu Server 24.04 LTS, OpenJDK 17, CA framework provider.plugin=root
  • Agent dir before addHost: only environment.properties + log4j-cloud.xml (i.e. the natural state right after apt install cloudstack-agent on a fresh host that has not yet been registered)

steps to reproduce

  1. Install cloudstack-management 4.22.0.0 on a clean host. Bootstrap CA (default - ca.framework.provider.plugin=root works fine for this; not the issue).
  2. Install cloudstack-agent 4.22.0.0 on a clean KVM host. Verify /etc/cloudstack/agent/ contains only environment.properties + log4j-cloud.xml (the package-shipped files).
  3. Stop cloudstack-agent on the KVM host (so no startup process can pre-create state).
  4. From the management server, dispatch addHost for the KVM host - e.g. via cmk add host or ngine_io.cloudstack.cs_host.
  5. Observe failure with errortext: 'Could not add host at [http://...] ... due to: [ can't setup agent, due to com.cloud.utils.exception.CloudRuntimeException: Failed to setup certificate in the KVM agent's keystore file, please see logs and configure manually!]'.
  6. SSH to the KVM host as root and inspect:
    • /etc/cloudstack/agent/cloud.jks exists (created by keystore-setup)
    • /etc/cloudstack/agent/cloud.csr does not exist on disk after the call (consumed by MS via stdout, then trailing chmod fails silently - this is a separate cosmetic issue)
    • /etc/cloudstack/agent/agent.properties does not exist - this is the bug
  7. Re-running addHost after manually creating agent.properties with a keystore.passphrase=<any-non-empty-value> line lets the second invocation succeed, because keystore-setup's if [ -f "$PROPS_FILE" ] gate now passes and the existing passphrase is reused. This is a workable manual workaround per host.

what to do about it

Primary - patch scripts/util/keystore-setup to ensure PROPS_FILE exists before the passphrase-write branch. Two-block change, restructure the gate so the file is created if missing, then the passphrase is always either reused or written:

-# Re-use existing password or use the one provided
-if [ -f "$PROPS_FILE" ]; then
-    $LOGGER_CMD "Previous props file exists, trying to extract password"
-    OLD_PASS=$(sed -n '/^keystore.passphrase/p' "$PROPS_FILE" | sed 's/^keystore.passphrase=//g')
-    if [ -n "${OLD_PASS// }" ]; then
-        KS_PASS="$OLD_PASS"
-        $LOGGER_CMD "Password extraction successful"
-    else
-        sed -i "/^keystore.passphrase.*/d" "$PROPS_FILE" 2>&1 | $LOGGER_CMD || true
-        echo "keystore.passphrase=$KS_PASS" >> "$PROPS_FILE"
-        if [ $? != 0 ]; then
-                $LOGGER_CMD "Could not add new password to agent.properties"
-        else
-                $LOGGER_CMD "New keystore password set"
-        fi
-    fi
-fi
+# Ensure props file exists, then re-use existing passphrase or persist the one provided.
+if [ ! -f "$PROPS_FILE" ]; then
+    touch "$PROPS_FILE"
+    chmod 600 "$PROPS_FILE"
+    $LOGGER_CMD "Created new props file at $PROPS_FILE"
+fi
+
+OLD_PASS=$(sed -n '/^keystore.passphrase/p' "$PROPS_FILE" | sed 's/^keystore.passphrase=//g')
+if [ -n "${OLD_PASS// }" ]; then
+    KS_PASS="$OLD_PASS"
+    $LOGGER_CMD "Password extraction successful"
+else
+    sed -i "/^keystore.passphrase.*/d" "$PROPS_FILE" 2>&1 | $LOGGER_CMD || true
+    echo "keystore.passphrase=$KS_PASS" >> "$PROPS_FILE"
+    if [ $? != 0 ]; then
+        $LOGGER_CMD "Could not add new password to agent.properties"
+    else
+        $LOGGER_CMD "New keystore password set"
+    fi
+fi

After this, fresh-host KVM addHost works first try. Re-runs are idempotent (existing passphrase reused).

Secondary - harden scripts/util/keystore-cert-import to fail loudly instead of exit 0 in error branches. Same script has bare exit (no status) in two error branches, which return exit code 0 and let the MS see SSHCmdResult.isSuccess()=true even when nothing was imported:

 # Import certificate
 if [ ! -z "${CERT// }" ]; then
     echo "$CERT" > "$CERT_FILE"
 elif [ ! -f "$CERT_FILE" ]; then
-    echo "Cannot find certificate file: $CERT_FILE, exiting"
-    exit
+    echo "Cannot find certificate file: $CERT_FILE, exiting" >&2
+    exit 1
 fi

 # Import ca certs
 if [ ! -z "${CACERT// }" ]; then
     echo "$CACERT" > "$CACERT_FILE"
 elif [ ! -f "$CACERT_FILE" ]; then
-    echo "Cannot find ca certificate file: $CACERT_FILE, exiting!"
-    exit
+    echo "Cannot find ca certificate file: $CACERT_FILE, exiting!" >&2
+    exit 1
 fi

exit 1 (with status) lets SSHCmdResult.isSuccess() correctly return false, and the existing 3-retry-then-throw loop in setupAgentSecurity() will surface the real failure to the operator instead of leaving the agent with a broken keystore that only manifests later as runtime TLS handshake errors.

relationship to #13064

This is a sibling defect, not the same one as #13064 - same script family, same general class of silent-failure footguns, but a different trigger and a different downstream symptom:

this issue #13064
Hypervisor KVM VMware
Triggered by fresh KVM agent in addHost (no prior agent.properties) VMware SSH dispatch of SetupCertificateCommand to a SystemVM
Buggy script keystore-setup (passphrase-write gated on file existing) keystore-cert-import (bare exit in error branch)
Args sent by MS all 5 / 10 args correct only 6 of 10 args ($7/$8 CACERT missing on the VMware path)
Exit code from script exit 1 (loud failure) exit 0 (silent success masking real failure)
Downstream symptom setupAgentSecurity() throws "Failed to setup certificate in the KVM agent's keystore file" agent's cloud.jks ends up missing the trustedCertEntry; agent later fails TLS handshake to MS:8250 with certificate_unknown
Surface time immediate, on first addHost minutes later, once the agent actually tries to connect

The proposed Secondary fix above (replacing bare exit with exit 1) addresses one half of #13064 - once those branches return non-zero, the MS setupAgentSecurity() retry-and-throw loop will surface the VMware-dispatch arg-count bug as a proper failure instead of letting the silent-success continue to hide it. The actual primary fix for #13064 (passing CACERT args from the VMware SSH dispatch path) is independent of this change and orthogonal.

operator workaround (until upstream fix lands)

Replace /usr/share/cloudstack-common/scripts/util/keystore-setup on each KVM agent with a thin wrapper that pre-creates agent.properties with the passphrase from arg $3 before exec'ing the original. Originals preserved as .orig:

#!/bin/bash
# Workaround for upstream keystore-setup not creating agent.properties on
# fresh hosts. See <this issue link>.
if [ $# -ge 5 ]; then
    PROPS_FILE="$1"
    KS_PASS="$3"
    if [ ! -f "$PROPS_FILE" ]; then
        touch "$PROPS_FILE"
        chmod 600 "$PROPS_FILE"
    fi
    EXISTING=$(sed -n '/^keystore\.passphrase=/p' "$PROPS_FILE" | head -1 | sed 's/^keystore\.passphrase=//')
    if [ -z "${EXISTING// }" ]; then
        echo "keystore.passphrase=$KS_PASS" >> "$PROPS_FILE"
    fi
    exec /usr/share/cloudstack-common/scripts/util/keystore-setup.orig "$@"
fi
exec /usr/share/cloudstack-common/scripts/util/keystore-setup.orig "$@"

Verified working on Ubuntu 24.04 + CloudStack 4.22.0.0 KVM - addHost succeeds first try after the wrapper is in place and the agent dir is wiped to clean state.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions