Skip to content

Hub: silent error suppression in hubsync hook + replication loop #100

@v0lkan

Description

@v0lkan

Summary

Two hub-adjacent code paths swallow errors silently with no logging surface. Operators can't tell whether a session-start hub sync succeeded, partially failed, or never reached the network at all — only the count of successfully-fetched entries shows up in the nudge. Same story for the cluster-mode replication loop: dial, stream, send, close, and receive failures all return into nothing.

This was tracked in TASKS.md (sub-items 2 and 3 of a broader "fix silent error suppression in hub" entry, added 2026-04-08). Sub-item 1 from that entry (ctx add --share publish failure warning) is already done at internal/cli/add/core/run/run.go:105-107 via writeConnect.PublishFailed. This issue covers only the two remaining sub-items.

Locations

Hubsync hook — internal/cli/system/core/hubsync/sync.go

Four silent return sites in Sync:

func Sync(_ string) string {
    cfg, loadErr := connectCfg.Load()
    if loadErr != nil {
        return ""                          // ← silent
    }

    client, dialErr := hub.NewClient(cfg.HubAddr, cfg.Token)
    if dialErr != nil {
        return ""                          // ← silent
    }
    defer func() { _ = client.Close() }()

    entries, syncErr := client.Sync(context.Background(), cfg.Types, 0)
    if syncErr != nil || len(entries) == 0 {
        return ""                          // ← silent (and conflates error with empty result)
    }

    if writeErr := render.WriteEntries(entries); writeErr != nil {
        return ""                          // ← silent
    }
    ...
}

The package doc at internal/cli/system/core/hubsync/doc.go:26 is explicit about the design intent:

Every error is silently swallowed so the hook never blocks the session start.

The "never block" constraint is correct. The "silently" part is the bug — the hook can preserve its non-blocking behavior while still emitting structured warnings that operators (and CI / agents reading the event log) can see.

Replication loop — internal/hub/replicate.go

Five silent return sites in replicateOnce plus a discarded store.Append result:

func replicateOnce(ctx context.Context, masterAddr string, store *Store, clientToken string) {
    conn, dialErr := grpc.NewClient(masterAddr, ...)
    if dialErr != nil {
        return                             // ← silent
    }
    defer func() { _ = conn.Close() }()

    _, lastSeq := store.lastSequence()
    authed := addBearerMD(ctx, clientToken)

    stream, streamErr := conn.NewStream(authed, ..., cfgHub.PathSync)
    if streamErr != nil {
        return                             // ← silent
    }

    if sendErr := stream.SendMsg(&SyncRequest{SinceSequence: lastSeq}); sendErr != nil {
        return                             // ← silent
    }
    if closeErr := stream.CloseSend(); closeErr != nil {
        return                             // ← silent
    }

    for {
        msg := &EntryMsg{}
        if recvErr := stream.RecvMsg(msg); recvErr != nil {
            return                         // ← silent
        }
        ...
        _, _ = store.Append([]Entry{entry})     // ← silent
    }
}

Note: startReplication is currently held behind a var _ = startReplication sentinel at line 22 (the function is not wired into Server.Start yet, awaiting cluster-mode work). Logging needs to land regardless — the function will be hot the moment cluster mode activates, and silent failures will be impossible to debug.

Proposed Fix

Use the existing internal/log/warn package — warn.Warn(format, args...) writes to stderr with a ctx: prefix, is intentionally best-effort (silent on sink failures), and is the documented pattern for "errors not actionable by the caller but should not be silently swallowed."

Hubsync hook

import "github.com/ActiveMemory/ctx/internal/log/warn"

func Sync(_ string) string {
    cfg, loadErr := connectCfg.Load()
    if loadErr != nil {
        warn.Warn("hubsync: load connection config: %v", loadErr)
        return ""
    }

    client, dialErr := hub.NewClient(cfg.HubAddr, cfg.Token)
    if dialErr != nil {
        warn.Warn("hubsync: dial %s: %v", cfg.HubAddr, dialErr)
        return ""
    }
    defer func() { _ = client.Close() }()

    entries, syncErr := client.Sync(context.Background(), cfg.Types, 0)
    if syncErr != nil {
        warn.Warn("hubsync: sync from %s: %v", cfg.HubAddr, syncErr)
        return ""
    }
    if len(entries) == 0 {
        return ""                          // genuine empty result — not an error, no log
    }

    if writeErr := render.WriteEntries(entries); writeErr != nil {
        warn.Warn("hubsync: write %d entries: %v", len(entries), writeErr)
        return ""
    }
    ...
}

Note the un-conflation of syncErr and len(entries) == 0 — the current code treats them as the same condition, so a real sync error looks identical to "nothing new". Split them; only the error case warns.

Replication loop

Same pattern — wrap each silent return with warn.Warn including enough context to debug (masterAddr, sequence number where relevant). The discarded store.Append result at line 121 becomes:

if _, appendErr := store.Append([]Entry{entry}); appendErr != nil {
    warn.Warn("replication: append entry %s (seq=%d): %v", entry.ID, entry.Sequence, appendErr)
    // Don't return — keep consuming the stream; the next entry might succeed.
}

Update internal/cli/system/core/hubsync/doc.go to replace "Every error is silently swallowed" with the new contract: "Every error is warned via warn.Warn but does not block the session start."

Tests Required

  • TestSync_WarnsOnLoadError — wire a sink-replacement helper (the warn package's sink is a package-level io.Writer; tests can swap it for a bytes.Buffer via an unexported setter, similar to how existing warn tests likely work). Call Sync with a broken config and assert the buffer contains the expected hubsync: load connection config: prefix.
  • TestSync_WarnsOnDialError — same shape with an unreachable HubAddr.
  • TestSync_NoWarnOnEmptyResult — assert the buffer stays empty when the hub returns zero entries (the un-conflation case).
  • TestReplicateOnce_WarnsOnDialError — same shape for the replication loop.
  • TestReplicateOnce_KeepsConsumingAfterAppendError — the only behavior change beyond logging: append failures don't abort the receive loop. Inject a store that fails on the first append, succeeds on the second; assert both entries were attempted.

Out of Scope

  • Structured (JSON) event log emission. The task description mentions "event system"; for now, stderr via warn.Warn is the established pattern. If a separate JSON event log becomes a thing (the internal/log/event package exists but its surface is rotate/query, not emit-warn), a follow-up can route through both.
  • Making Sync return an error. The non-blocking-hook contract is a hard requirement; the function must continue returning string. Logging is the visibility mechanism.
  • Wiring startReplication into the active cluster path. Separate issue (Hub status: surface cluster leadership (IsLeader, LeaderAddr) in Status RPC + ctx hub status #96 covers cluster-leadership wiring; this issue is logging-only).

Scope for "good first issue"

Both halves are mechanical: find the silent return, add one warn.Warn call, repeat. The pattern is identical (existing internal/write/warn.Warn calls elsewhere in the codebase give a clear example). The only design judgment needed is the un-conflation of syncErr and empty-result in the hubsync hook — well-flagged above. A newcomer can land this in ~50 lines across two files plus tests.

🤖 Filed via gh by Claude Code on behalf of @v0lkan. Discovered during stale-task triage; companion to #93, #94, #95, #96.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggood first issueGood for newcomers

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions