Jetmon 2 - Rewrite of core services into Go by chrisbliss18 · Pull Request #60 · Automattic/jetmon

chrisbliss18 · 2026-04-19T15:16:40Z

Work in progress rewrite of the core services into Go.

Why Go

The current architecture uses forked Node.js processes (8–16MB RSS each at startup, 53MB limit before recycling) as workers, plus a compiled C++ addon to escape Node's event loop for blocking network I/O. Go eliminates both constraints:

Goroutines start at ~4KB of stack and grow on demand, making 50,000 concurrent checks on a single host practical without the memory overhead of forked processes or libuv thread pools
net/http and crypto/tls are first-class stdlib packages — no native addon, no node-gyp, no compilation step during deployment
net/http/httptrace provides DNS, TCP, TLS, and TTFB timing hooks as separate measurements within each check, for free
Single static binary deployment with no runtime dependencies, no node_modules, and no addon rebuild on Node.js version upgrades
Built-in profiling via pprof, race detector via go test -race, and a mature testing ecosystem
Graceful goroutine lifecycle management replaces the fragile worker spawn/recycle/evaporate lifecycle

The Veriflier is rewritten in Go as well, replacing the Qt C++ dependency with a lightweight Go HTTP service. The protocol between Monitor and Verifliers moves from custom HTTPS to gRPC, providing type-safe contracts, built-in retries, and bidirectional streaming for future use.

Benefits of the Rewrite

Memory

The current architecture forks Node.js worker processes that start at 8–16MB RSS and are recycled once they reach 53MB. With a typical deployment of 8–16 workers, the process tree consumes 240–850MB of resident memory just for worker overhead, before any check data is counted. The master process, SSL server, and associated IPC buffers add further overhead.

Jetmon 2 runs as a single process. Go goroutines start at 4KB of stack and grow on demand. A pool of 1,000 concurrent goroutines costs roughly 4MB of stack. Total process RSS for an equivalent workload is estimated at 50–150MB — a 75–90% reduction in memory consumption per host.

Concurrent Checks

Current concurrency is bounded by the number of worker processes. Each worker is a single-threaded Node.js process; even with the C++ addon offloading blocking I/O to a thread pool, practical concurrency per host is in the low hundreds. Scaling beyond that requires adding more hosts and manually partitioning bucket ranges.

Go's goroutine scheduler makes 10,000+ concurrent in-flight checks on a single host practical with no additional configuration. At a conservative network timeout of 10 seconds and average site response time of 200ms, a pool of 1,000 goroutines sustains approximately 5,000 check completions per second. This represents an estimated 10–50× increase in concurrent checks per host, meaning significantly fewer hosts are required to cover the same fleet.

Throughput

The current architecture crosses a process boundary on every unit of work: the master dispatches via IPC, the worker receives, processes, and replies via IPC, and the master aggregates. Each crossing involves serialisation, a context switch, and V8 event loop scheduling on both ends.

Jetmon 2 replaces all IPC with Go channel sends, which are in-process and order-of-magnitude cheaper. V8 GC pauses, which can delay check scheduling and RTT measurement in the current system, are eliminated. Estimated throughput improvement: 3–10× more sites checked per second per host under equivalent conditions.

Check Scheduling Accuracy

The current system uses setTimeout and setInterval for round scheduling. These are subject to V8 event loop delay — a busy event loop can delay a scheduled callback by tens to hundreds of milliseconds, introducing jitter into check timing and RTT measurements.

Go's time.Ticker fires with OS-level timer precision. RTT measurements from net/http/httptrace are taken inside the HTTP stack with no event loop between the measurement point and the timer, making them more accurate and consistent.

Deployment Speed

Current deployment requires npm install, a node-gyp rebuild of the native C++ addon (which must match the installed Node.js version), and a coordinated process restart. A failed addon compilation blocks deployment entirely.

Jetmon 2 deploys as a single static binary with no runtime dependencies. Deployment is: copy binary, systemctl restart jetmon2. Total deployment time drops from several minutes to under 30 seconds. There is no compilation step on the target host and no dependency on a matching Node.js version.

Mean Time to Recovery

A worker process crash in the current system requires the master to detect the exit, spawn a replacement, and wait for the new process to initialise — a sequence that takes several seconds and leaves that worker's in-flight checks unresolved.

In Jetmon 2, a panicking goroutine is recovered by a deferred handler, the result is counted as an error, and a replacement goroutine is immediately spawned from the pool — recovery is in the low milliseconds. For a full process crash, systemd restarts the binary; with Go's fast startup, the process is accepting work again in under 2 seconds.

Operational Complexity

The current system requires managing Node.js version compatibility, native addon compilation, npm dependency trees, and the fragile worker spawn/recycle lifecycle. The node_modules directory and compiled .node addon must be present and consistent on every host.

Jetmon 2 eliminates all of this. There is one artifact to manage: the Go binary. It carries its own runtime, has no external dependencies, and produces a reproducible build from go build. The node-gyp, npm, and Node.js version management concerns disappear entirely.

…orld ReadMemStats - refreshVeriflierClients now diffs addr|token fingerprints and skips rebuilding when the verifier list is unchanged, preserving TCP connection pools between rounds - Remove runtime.ReadMemStats stop-the-world call — it was logging but taking no action; memory metrics are already covered by EmitMemStats - Remove unused statusDown constant; the DB transition path goes directly from statusRunning to statusConfirmedDown - Add comment to per-round ClaimBuckets call explaining the rebalancing intent Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…d orchestrator logic

…ck, and orchestrator paths

Fixes cleanup ordering deadlock in pool tests (LIFO cleanup, close channel before Drain). Adds tests for wpcom circuit breaker, veriflier transport, checker.Check paths, config hot-reload, dashboard SSE, audit helpers, orchestrator memory pressure, retry queue, and pure utility functions.

EVENTS.md: event-sourced architecture — lifecycle, idempotency, resolution reasons, causal links, and site-row projection. TAXONOMY.md: five-layer test taxonomy (Reachability → Transport → Infrastructure → Application → Content + Reverse checks), site/endpoint/ check data model, multi-state vocabulary, event schema, scope matrix, signal processing, and versioned implementation roadmap. ROADMAP.md: deferred public REST API — query and manage endpoints, auth, pagination, and uptime-bench integration context. AGENTS.md: architectural decision log covering event sourcing, severity vs. state separation, Seems Down lifecycle, in-place event updates, idempotent event identity, resolution reasons, causal vs. rollup links, and Unknown-is-not-downtime invariant. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Introducing Jetmon 2.

77c1568

chrisbliss18 self-assigned this Apr 19, 2026

chrisbliss18 added enhancement DO NOT MERGE labels Apr 19, 2026

chrisbliss18 requested a review from heydemoura April 19, 2026 15:16

Chris Jean and others added 13 commits April 19, 2026 16:37

Fix Jetmon v2 scheduling and drain semantics

4cf0a53

Add veriflier reuse tests and worker memory pressure drain

9b2fbe2

Add unit tests for config validation, retry queue, checker result, an…

da21a8f

…d orchestrator logic

Add coverage for runtime orchestration paths

e805309

Add tests for wpcom circuit breaker, veriflier transport, checker.Che…

e78a9cd

…ck, and orchestrator paths

Ignore Go coverage output files

18c3544

Harden flaky and stateful test coverage

416f687

Remove compiled jetmon2 binary from tracking and add to .gitignore

03490e3

Add architecture overview document

7a2c388

Clean up architecture overview document

fb5f3bb

chrisbliss18 mentioned this pull request Apr 22, 2026

Jetmon 2 — Site health platform #61

Open

heydemoura added 6 commits April 22, 2026 15:16

Add local development documentation

a6cf57b

Add build artifacts to gitignore

e44e591

Migrate from claude definitions to agents definition

63b826c

add docs for agentic local testing

cb7dfd8

add shell scripts to be used by agents for building iterations

0cdcf1e

Add readme updates for referencing docs

76913d0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jetmon 2 - Rewrite of core services into Go#60

Jetmon 2 - Rewrite of core services into Go#60
chrisbliss18 wants to merge 20 commits intomasterfrom
refactor/jetmon2

chrisbliss18 commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chrisbliss18 commented Apr 19, 2026

Why Go

Benefits of the Rewrite

Memory

Concurrent Checks

Throughput

Check Scheduling Accuracy

Deployment Speed

Mean Time to Recovery

Operational Complexity

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants