Jetmon 2 - Rewrite of core services into Go#60
Open
chrisbliss18 wants to merge 20 commits intomasterfrom
Open
Conversation
…orld ReadMemStats - refreshVeriflierClients now diffs addr|token fingerprints and skips rebuilding when the verifier list is unchanged, preserving TCP connection pools between rounds - Remove runtime.ReadMemStats stop-the-world call — it was logging but taking no action; memory metrics are already covered by EmitMemStats - Remove unused statusDown constant; the DB transition path goes directly from statusRunning to statusConfirmedDown - Add comment to per-round ClaimBuckets call explaining the rebalancing intent Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d orchestrator logic
…ck, and orchestrator paths
Fixes cleanup ordering deadlock in pool tests (LIFO cleanup, close channel before Drain). Adds tests for wpcom circuit breaker, veriflier transport, checker.Check paths, config hot-reload, dashboard SSE, audit helpers, orchestrator memory pressure, retry queue, and pure utility functions.
EVENTS.md: event-sourced architecture — lifecycle, idempotency, resolution reasons, causal links, and site-row projection. TAXONOMY.md: five-layer test taxonomy (Reachability → Transport → Infrastructure → Application → Content + Reverse checks), site/endpoint/ check data model, multi-state vocabulary, event schema, scope matrix, signal processing, and versioned implementation roadmap. ROADMAP.md: deferred public REST API — query and manage endpoints, auth, pagination, and uptime-bench integration context. AGENTS.md: architectural decision log covering event sourcing, severity vs. state separation, Seems Down lifecycle, in-place event updates, idempotent event identity, resolution reasons, causal vs. rollup links, and Unknown-is-not-downtime invariant. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Work in progress rewrite of the core services into Go.
Why Go
The current architecture uses forked Node.js processes (8–16MB RSS each at startup, 53MB limit before recycling) as workers, plus a compiled C++ addon to escape Node's event loop for blocking network I/O. Go eliminates both constraints:
net/httpandcrypto/tlsare first-class stdlib packages — no native addon, no node-gyp, no compilation step during deploymentnet/http/httptraceprovides DNS, TCP, TLS, and TTFB timing hooks as separate measurements within each check, for freenode_modules, and no addon rebuild on Node.js version upgradespprof, race detector viago test -race, and a mature testing ecosystemThe Veriflier is rewritten in Go as well, replacing the Qt C++ dependency with a lightweight Go HTTP service. The protocol between Monitor and Verifliers moves from custom HTTPS to gRPC, providing type-safe contracts, built-in retries, and bidirectional streaming for future use.
Benefits of the Rewrite
Memory
The current architecture forks Node.js worker processes that start at 8–16MB RSS and are recycled once they reach 53MB. With a typical deployment of 8–16 workers, the process tree consumes 240–850MB of resident memory just for worker overhead, before any check data is counted. The master process, SSL server, and associated IPC buffers add further overhead.
Jetmon 2 runs as a single process. Go goroutines start at 4KB of stack and grow on demand. A pool of 1,000 concurrent goroutines costs roughly 4MB of stack. Total process RSS for an equivalent workload is estimated at 50–150MB — a 75–90% reduction in memory consumption per host.
Concurrent Checks
Current concurrency is bounded by the number of worker processes. Each worker is a single-threaded Node.js process; even with the C++ addon offloading blocking I/O to a thread pool, practical concurrency per host is in the low hundreds. Scaling beyond that requires adding more hosts and manually partitioning bucket ranges.
Go's goroutine scheduler makes 10,000+ concurrent in-flight checks on a single host practical with no additional configuration. At a conservative network timeout of 10 seconds and average site response time of 200ms, a pool of 1,000 goroutines sustains approximately 5,000 check completions per second. This represents an estimated 10–50× increase in concurrent checks per host, meaning significantly fewer hosts are required to cover the same fleet.
Throughput
The current architecture crosses a process boundary on every unit of work: the master dispatches via IPC, the worker receives, processes, and replies via IPC, and the master aggregates. Each crossing involves serialisation, a context switch, and V8 event loop scheduling on both ends.
Jetmon 2 replaces all IPC with Go channel sends, which are in-process and order-of-magnitude cheaper. V8 GC pauses, which can delay check scheduling and RTT measurement in the current system, are eliminated. Estimated throughput improvement: 3–10× more sites checked per second per host under equivalent conditions.
Check Scheduling Accuracy
The current system uses
setTimeoutandsetIntervalfor round scheduling. These are subject to V8 event loop delay — a busy event loop can delay a scheduled callback by tens to hundreds of milliseconds, introducing jitter into check timing and RTT measurements.Go's
time.Tickerfires with OS-level timer precision. RTT measurements fromnet/http/httptraceare taken inside the HTTP stack with no event loop between the measurement point and the timer, making them more accurate and consistent.Deployment Speed
Current deployment requires
npm install, anode-gyprebuild of the native C++ addon (which must match the installed Node.js version), and a coordinated process restart. A failed addon compilation blocks deployment entirely.Jetmon 2 deploys as a single static binary with no runtime dependencies. Deployment is: copy binary,
systemctl restart jetmon2. Total deployment time drops from several minutes to under 30 seconds. There is no compilation step on the target host and no dependency on a matching Node.js version.Mean Time to Recovery
A worker process crash in the current system requires the master to detect the exit, spawn a replacement, and wait for the new process to initialise — a sequence that takes several seconds and leaves that worker's in-flight checks unresolved.
In Jetmon 2, a panicking goroutine is recovered by a deferred handler, the result is counted as an error, and a replacement goroutine is immediately spawned from the pool — recovery is in the low milliseconds. For a full process crash, systemd restarts the binary; with Go's fast startup, the process is accepting work again in under 2 seconds.
Operational Complexity
The current system requires managing Node.js version compatibility, native addon compilation, npm dependency trees, and the fragile worker spawn/recycle lifecycle. The
node_modulesdirectory and compiled.nodeaddon must be present and consistent on every host.Jetmon 2 eliminates all of this. There is one artifact to manage: the Go binary. It carries its own runtime, has no external dependencies, and produces a reproducible build from
go build. Thenode-gyp,npm, and Node.js version management concerns disappear entirely.