Skip to content

Logging improvements with zerolog#429

Merged
driv3r merged 6 commits intomainfrom
logging-improvements-with-zerolog
Apr 15, 2026
Merged

Logging improvements with zerolog#429
driv3r merged 6 commits intomainfrom
logging-improvements-with-zerolog

Conversation

@driv3r
Copy link
Copy Markdown
Contributor

@driv3r driv3r commented Apr 14, 2026

  • align default zerolog log level with logrus (info vs trace)
  • introduce slog handler wrapper around our Logger interface which we can pass into BinlogStreamer

driv3r added 3 commits April 14, 2026 19:30
zerolog's own ctx.go init() sets SetGlobalLevel(TraceLevel), which
passes all messages — including Debug — when no explicit level is
configured. logrus defaults to InfoLevel, so switching to the zerolog
backend produced unexpected debug noise.

Use a package-level var initializer in zerolog_logger.go to set InfoLevel
before ghostferry's own init() runs, ensuring env-var and Config.LogLevel
overrides still take effect after the default is applied.

Also fix the test teardown helper to explicitly reset zerolog's global
level on every teardown: because zerolog's level is a separate atomic
from logrus's, restoring the logrus backend without also resetting
zerolog's level left it contaminated for subsequent zerolog tests.
…ostferry

Introduce slog_handler.go with a backend-agnostic slog.Handler that wraps
the ghostferry Logger interface. It maps slog levels to Logger.Debug/Info/
Warn/Error, flattens slog attributes and groups into structured fields via
WithField, and delegates level filtering to the active backend. Works with
both zerolog and logrus without any backend-specific code.

Wire it into BinlogStreamer.createBinlogSyncer via BinlogSyncerConfig.Logger
so that go-mysql-org/go-mysql's BinlogSyncer logs flow through the same
tagged ghostferry logger as the rest of the streamer, instead of bypassing
it via slog.Default().
The only usage was a single SetDefaultLogger(NullHandler{}) call in
ferry.go whose purpose was to silence go-log output. No vendored runtime
dependency imports go-log anymore — go-mysql-org/go-mysql switched to
log/slog — so the suppression call and the import are both dead code.

Remove the import and call, run go mod tidy and go mod vendor to drop
the module from go.mod, go.sum and vendor/.
@driv3r driv3r requested review from a team, austenLacy and forge33 April 14, 2026 19:00
driv3r added 2 commits April 14, 2026 21:49
test_interrupt_resume_idempotence_with_uuid_table (and similar tests)
failed spuriously when GHOSTFERRY_LOG_BACKEND=zerolog. The failure was
a test-harness race, not a product bug.

Root cause: Ghostferry panics on SIGTERM before cutover (the intended
behaviour). send_signal snapshots @pid before checking it, but the
subprocess thread can set @pid = 0 and the OS can reap the child between
the snapshot and Process.kill. This raises Errno::ESRCH, which propagates
out of the WEBrick status handler and fails the test.

The uuid-table variant is more susceptible because it has two tables in
scope and DataIterationConcurrency defaults to 4, so multiple concurrent
AFTER_ROW_COPY callbacks can arrive while the process is already
shutting down.

Fix 1 — ghostferry_helper.rb send_signal: snapshot @pid atomically,
guard against pid == 0, and rescue Errno::ESRCH as benign.

Fix 2 — test_helper.rb new_ghostferry_with_interrupt_after_row_copy:
add an `interrupted` flag so TERM is sent at most once per instance,
regardless of how many AFTER_ROW_COPY callbacks fire.

Add ghostferry_helper_test.rb with two deterministic regression tests
that verify send_signal is a no-op when @pid == 0 and when the target
process has already exited.
The previous commit introduced an `interrupted` one-shot guard in
new_ghostferry_with_interrupt_after_row_copy that changed the threshold
from `>=` to `>`, shifting the interrupt to one batch later than intended.

Tests such as test_interrupt_resume_idempotence_with_multiple_interrupts
and _with_multiple_interrupts_and_writes_to_source resume from a state
where few batches remain; with the off-by-one the process completed
normally instead of being interrupted, causing run_expecting_interrupt to
raise "Ghostferry did not get interrupted" consistently.

Revert new_ghostferry_with_interrupt_after_row_copy to the original `>=`
semantics with no one-shot guard. The Errno::ESRCH flake is already
handled by the ESRCH rescue in send_signal, so the guard is not needed.
@driv3r driv3r merged commit 9ac8861 into main Apr 15, 2026
13 checks passed
@driv3r driv3r deleted the logging-improvements-with-zerolog branch April 15, 2026 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants