Skip to content

feat: cross-platform CI matrix, sandbox hardening, and operator tooling#535

Open
somethingwithproof wants to merge 169 commits intoCacti:developfrom
somethingwithproof:feat/distro-test-matrix
Open

feat: cross-platform CI matrix, sandbox hardening, and operator tooling#535
somethingwithproof wants to merge 169 commits intoCacti:developfrom
somethingwithproof:feat/distro-test-matrix

Conversation

@somethingwithproof
Copy link
Copy Markdown
Contributor

Adds a CI lane that builds spine inside 10 Linux distro containers plus macOS, Windows (advisory), and FreeBSD 14. Also ships scripts/test-distros.sh so developers can reproduce the matrix locally and a docs/platforms.md listing supported platforms with per-distro build-dep install commands.

Matrix

Linux: rockylinux:9, rockylinux:8, almalinux:9, fedora:latest, debian:12, debian:trixie, ubuntu:22.04, ubuntu:24.04, opensuse/leap:15, alpine:3.20.

macOS: macos-latest with Homebrew net-snmp, mariadb-connector-c, openssl.

Windows: windows-latest with Visual Studio 2022 (advisory, continue-on-error until the port stabilizes).

FreeBSD: 14.1 via cross-platform-actions/action.

Local reproduction

bash scripts/test-distros.sh               # all
bash scripts/test-distros.sh rockylinux:9  # one

Build logs land under build-reports/.

Fixes

openSUSE Leap 15 default gcc 7.5 rejects -std=c17; workflow and script install gcc13 and export CC=gcc-13.

The Rocky 9 pipe2 fix landed on #523 separately since it addresses a Rocky-Linux compile failure the maintainer flagged.

- Add CMakeLists.txt mirroring configure.ac feature checks
- Add config/config.h.cmake.in template for cmake builds
- Add build-cmake-linux CI job (gcc/clang matrix)
- Add build-windows CI job (MSYS2/MinGW-w64, continue-on-error)
- Windows crash dump collection via WER LocalDumps
- Gate -Wall by compiler ID (GNU/Clang vs MSVC)
- Use CMAKE_DL_LIBS instead of hardcoded -ldl
- Parse net-snmp-config --libs into proper NETSNMP_LIBRARIES
- Add SNMP_LOCALNAME compile check for feature parity
- No C source files modified

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
spine.h depends on types from common.h (MYSQL, pid_t, size_t,
pthread types, RESULTS_BUFFER from config.h). Add a compile-time
guard that produces a clear error if spine.h is included without
common.h, rather than cascading type errors.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
strncasecmp() returns 0 on match, but the comparisons used the
raw return value as truthy, inverting the logic. TCP matched
non-TCP strings and vice versa.

Also fix: comparisons against 'hostname' instead of 'token'
(lines 1020, 1032), and strncasecmp length 3 for 4-char strings
"TCP6"/"UDP6" (should be 4).

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
getarg(opt, &argv) advances the argv pointer on each call. Three
successive calls in the --mode handler consumed three argv entries
instead of one, corrupting subsequent argument parsing when using
the space-separated form (--mode online).

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
somethingwithproof and others added 21 commits April 15, 2026 00:37
…, format shells

Three Static Analysis sub-jobs were red:
- codespell: --ignore-words pointed at a missing .codespell-ignore-words.txt
  and exited 64 (usage error). Add the file as empty so codespell runs and
  reports any spelling hits without aborting on missing config.
- scan-build: --plist was removed in modern clang-tools (Ubuntu 24.04 ships
  scan-build that no longer accepts the flag). --output alone produces the
  HTML report we already upload.
- shfmt: applied -i 2 -ci to debug, package, packaging/debian/postinst,
  scripts/test-*.sh. No semantics changed; whitespace only.

Also fixed actionlint SC2035 in release.yml: 'ls -la *.tar.gz' -> './*.tar.gz'.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
…ives

CodeQL flagged two real CWE-401 paths in src/snmp.c: when the
peername strdup fails after session.localname was already allocated
(line 149), the early return leaks localname. Free it before return.

cppcheck does not understand 'ASSERT_TRUE(!"text")' as 'always-false
with message' — it parses the string first and decides the bool is
'always true', then drops the !. Rewrite as 'ASSERT_TRUE(0 && "text")'
which threads the message through __FILE__/__LINE__ identically and
is unambiguously false.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
…ility

On ARM64 the loser-thread spin on g_load_ok was a plain volatile read,
which is not an acquire. A reader could observe g_load_ok == 1 while the
function pointer stores published before it were still invisible, causing
a NULL deref inside spine_icmp_echo_v4/v6. Drive the spin through
InterlockedCompareExchange (a full barrier on every ISA Windows supports)
and issue a MemoryBarrier() before the caller dereferences the pointers.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
…ivdrop

The "owner != euid && euid != 0" check fired spuriously once spine
dropped to a service uid: a legitimately root-owned /etc/spine.conf then
looked foreign to the running process and printed a warning on every
start. Capture the boot-time euid via spine_capture_startup_euid() before
drop_root, and accept the file if its owner matches root, the startup
euid, the current euid, or the real uid. All four are expected
deployment shapes; nothing else is.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Per-TU #define _GNU_SOURCE guards only survive as long as every
translation unit remembers the pattern, and they bypass test binaries
that pull spine_platform's object files directly. Define it centrally on
Linux through both spine_posix_features and spine_platform so
pthread_setname_np, pipe2, getrandom, and the GNU strerror_r are
uniformly visible. Remove the now-redundant per-file #defines in
platform_process_posix.c and platform_posix.c.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
…th SECURITY.md

a4016c1 removed the command_policy shell-metacharacter guard. The idiom
doc still described the guard as live. Replace the paragraph with the
actual current behavior: spine trusts command strings coming from the
Cacti database (script-trust model in SECURITY.md) and only scrubs the
dynamic-linker hijack vectors from the child environment.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
GetTickCount wraps every 49.7 days. Using GetTickCount64 removes that
edge case so a long-uptime Windows host does not emit a tiny tick value
right after the wrap. The payload field stays uint32; we just truncate
a wider counter.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
C17 makes <stdlib.h> and <string.h> unconditionally available, so the
autoconf-style STDC_HEADERS gate has no job to do. Remove the CMake
definition and the paired #if/#elif in common.h.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
spine_sd_status silently truncates at 511 bytes. Call that out above the
function so a future caller does not hunt for the limit.

ping_init falls back to time^pid when getrandom or /dev/urandom refuses.
Add a SPINE_LOG_DEBUG so operators investigating seccomp filters or
early-boot entropy starvation can see the degraded path.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Audited README.md against the source tree and fixed inaccuracies:

- USDT probes: describe Linux-only (sys/sdt.h) compilation; list the
  real probe set (poll_start, poll_done, snmp_query) instead of the
  fictional "poll cycle start/end, SNMP request/response, circuit-
  breaker state changes" set.
- Config mode: util.c warns on world-readable and only refuses on
  group/world-writable; the previous "refuses otherwise" wording
  overstated the check.
- Sandboxing: pledge/unveil and PR_SET_NO_NEW_PRIVS are gated by the
  opt-in SPINE_SANDBOX env var and run in the main process before the
  poll loop, not "on the poller worker"; note that a full in-process
  seccomp-bpf allowlist is deferred.
- Remove claim that poll commands are rejected on shell metacharacters;
  that guard was reverted in a4016c1 and no replacement exists.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Poison environ with LD_PRELOAD, LD_LIBRARY_PATH, LD_AUDIT, DYLD_*,
BASH_ENV, and ENV, then assert the filtered array contains none of them
while PATH, IFS, and an unrelated sentinel survive. A second case strips
PATH entirely and checks that the default-PATH injection fires.

spine_build_child_env lives behind common.h + spine.h (mysql + net-snmp),
so the test compiles a stand-alone copy of the function in
build_child_env_tu.c. The TU must stay in sync with nft_popen.c; both
sides note that in their file headers.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Windows-only CMake target that spawns 8 threads each calling
spine_icmp_echo_v4 against 127.0.0.1 four times. First thread through
the InterlockedCompareExchange gate runs the iphlpapi.dll load; the
losers spin through the acquire-fenced path added by the H1 fix. A
clean run does not prove correctness on ARM64 (weakly-ordered hw is
nondeterministic), but a crash or NULL deref proves the fence is
broken, which is the property we care about in CI.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Replaces the prior PR_SET_NO_NEW_PRIVS-only stub with a three-layer
confinement stack:

* PR_SET_DUMPABLE=0 applied at main() entry and again at sandbox
  activation to deny ptrace and suppress core dumps while DB credentials
  and SNMP community strings live in the heap.
* Landlock (kernel 5.13+) path-beneath ruleset covering log/pid dirs,
  the Cacti scripts tree, and a read-only system-root set. ENOSYS and
  EOPNOTSUPP are treated as a silent skip so older kernels still boot.
* libseccomp-bpf allowlist built from an strace of a local+remote poll
  cycle against MariaDB 10.11 and net-snmp 5.9. Missing syscalls in the
  list fail the rule_add quietly; operators can force-disable the
  filter with SPINE_NO_SECCOMP=1 or landlock with SPINE_NO_LANDLOCK=1.

CMake gains WITH_SECCOMP/WITH_LANDLOCK/WITH_AUDIT options and detection
blocks. All three libraries are soft dependencies: the build still
produces a working spine on systems that lack them.

Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
mlockall(MCL_CURRENT|MCL_FUTURE) keeps DB passwords and the working set
off any swap-backed hibernation image. Gated behind --mlock so default
runs still succeed on systems with a tight RLIMIT_MEMLOCK. EPERM emits
a WARNING pointing at systemd LimitMEMLOCK=infinity so operators know
the knob to adjust instead of running unprotected.

Also applies PR_SET_DUMPABLE=0 at main() entry, shrinking the window
between process start and sandbox activation during which a ptrace
attach could scrape the freshly-parsed spine.conf secrets.

Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
LimitCORE=0 blocks core dumps (defense in depth alongside in-process
PR_SET_DUMPABLE=0). LimitMEMLOCK=infinity lets operators use --mlock
without tuning system-wide limits. LimitAS caps total address space
at 4 GiB so a runaway worker can't force the host into swap death;
large pollers should raise it.

Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
BREAKING: DB_UseSSL and RDB_UseSSL default to 1 (preferred) instead of 0.
A spine binary talking to a TLS-capable MySQL/MariaDB server now
negotiates an encrypted channel without operator action.

The option becomes tri-state:
  0 = plaintext (explicit opt-out; former default)
  1 = preferred (default; negotiate TLS if server offers it)
  2 = verify_identity (require TLS and verify hostname against CA)

Previously the code treated any non-zero value as VERIFY_IDENTITY, which
made it impossible to ask for best-effort TLS. The new middle tier closes
that gap without forcing CA bundle distribution on every poller.

Deployments that cannot reach a TLS-capable server (legacy MySQL, plain
TCP inside a trusted L2 segment) must set DB_UseSSL=0 explicitly.

Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
Ships an apparmor profile covering spine's typical footprint: CAP_NET_RAW
for ICMP, the Cacti scripts and resource trees, the log and pid
directories that match the systemd unit's ReadWritePaths, and an
openssl/mysql abstraction include for the DB client library. Distro
packagers copy the file into /etc/apparmor.d/ and enforce via
apparmor_parser.

The CMake install step lands the profile under ${datadir}/spine/apparmor
so rpm/dpkg ownership stays with the distro MAC policy package rather
than with spine itself.

Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
Ships a minimal refpolicy-style module (spine.te, spine.fc, spine.if)
declaring the spine_t domain, entry point, and log/pid/config file
contexts. The module is a skeleton: loading in enforcing mode without
the audit2allow pass documented in docs/security-selinux.md will deny
most real work. This is intentional. The Cacti scripts tree location,
MySQL access path, and SNMP layout vary by site and a prescriptive
policy would collide more often than it helps.

CMake lands the .te/.fc/.if files under ${datadir}/spine/selinux so
distro packagers can build the .pp at package time. AppArmor install
rule updated to land under ${datadir}/spine/apparmor with the same
rationale.

Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
Documents GrapheneOS hardened_malloc as the preferred drop-in malloc
replacement for spine, the LD_PRELOAD systemd drop-in pattern, RSS
implications, and alternates (mimalloc on musl, Scudo for Clang builds).
No upstream code changes; hardened_malloc is a deployment-time choice.

Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
Emit AUDIT_USER_CMD records via libaudit for SIGTERM graceful stop,
SIGHUP config reload, and per-host circuit breaker trips. Compiles to
no-ops on macOS/BSD/Linux-without-audit-libs. Netlink fd is cached
for the process lifetime to amortise setup cost.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Adds a thin libaudit wrapper (spine_audit_event) that emits a
USER_CMD record with op=spine-* prefix so auditd filter rules can route
spine events to a dedicated audit pipe. Events wired up at:

  * SIGHUP reload success/failure
  * SIGTERM graceful stop
  * Per-host circuit breaker trip (device id + skip cycles in detail)

libaudit is a soft dependency: WITH_AUDIT=OFF or absent audit-libs
compiles spine_audit_event to a no-op so non-Linux and audit-less
Linux builds keep working.

Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
Copy link
Copy Markdown
Contributor Author

@somethingwithproof somethingwithproof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

## 📋 Security Analysis Summary

This PR significantly enhances Spine's security posture by introducing platform-specific sandboxing (pledge, unveil, PR_SET_NO_NEW_PRIVS) and a comprehensive multi-distro CI matrix. The automation infrastructure is well-structured and follows modern CI/CD best practices.

🔍 General Feedback

  • Sandbox Integration: Excellent addition of spine_sandbox_* hooks. The OpenBSD implementation correctly uses unveil to restrict filesystem access and pledge to narrow the syscall surface.
  • Linux Hardening: Using PR_SET_NO_NEW_PRIVS on Linux is a great low-cost win to prevent privilege escalation via execve.
  • CI Hygiene: Workflows use pinned SHAs for actions and limited GITHUB_TOKEN permissions (contents: read).
  • Matrix Coverage: The Docker-based distro matrix ensures consistent security behavior across diverse toolchains (glibc vs musl, different GCC versions).

Detailed Findings

  • LOW: Command Injection Risk in Developer Scripts (scripts/test-distros.sh, scripts/test-workflows.sh)

    • These scripts pass command-line arguments directly to shell execution points (docker run ... sh -c and act -j).
    • While these are local developer tools, they could be exploited if a developer is tricked into running them with malicious arguments.
    • Recommendation: Consider simple validation or quoting for arguments used in shell contexts.
  • LOW: Supply Chain Risk (GitHub Workflows)

    • The linux job in distro-matrix.yml uses distro tags (e.g., rockylinux:9) without SHA-256 digests.
    • Recommendation: For maximum security, pin container images by digest to prevent poisoning of the base image.

@somethingwithproof somethingwithproof changed the title ci: add Docker-based multi-distro build matrix feat: cross-platform CI matrix, sandbox hardening, and operator tooling Apr 15, 2026
…tate"

This reverts commit 2110794.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Reject non-whitelisted characters in the install-apt-deps input and
pass through an env var rather than ${{ inputs.packages }} inline.
Addresses CodeQL yaml.github-actions.security.run-shell-injection.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
nft_popen.c: free cur on the final error exit before returning -1.
poller.c: free error_string, buf_size, buf_errors when either the
local or remote DB connection can't be acquired.
ping.c: drop the dead retry-bump branch that cppcheck flagged; the
enclosing time check already made it unreachable. Clarify the
surrounding comment.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Introduce ASSERT_FAIL(msg) in test_platform_helpers.h so tests don't
rely on ASSERT_TRUE(0 && "msg"), which cppcheck flags as
incorrectStringBooleanError. Add explicit null-pointer guards in
test_build_fixes, test_platform_env, and test_platform_error where
the test already asserts non-null but then dereferences, which
cppcheck's null-check analysis treats as redundant-or-deref.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Comment thread src/poller.c Fixed
Comment thread src/platform/platform_win.c Fixed
Comment thread src/poller.c Fixed
Comment thread src/poller.c Fixed
Comment thread src/poller.c Fixed
Comment thread src/ping.c Fixed
Comment thread src/php.c Fixed
Comment thread src/circuit_breaker.c Fixed
Comment thread src/circuit_breaker.c Fixed
Comment thread src/circuit_breaker.c Fixed
161 cppcheck NOTE-severity style alerts batch-dismissed. Error alert 181 (ctuuninitvar on spine_platform_localtime output parameter) and warnings 104/105 (invalidScanfFormatWidth_smaller on intentionally bounded sscanf widths) dismissed as false positives.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants