Skip to content

Migrate to slog + Sentry fingerprinting#333

Merged
simonsmallchua merged 28 commits into
mainfrom
work/cranky-albattani
Apr 18, 2026
Merged

Migrate to slog + Sentry fingerprinting#333
simonsmallchua merged 28 commits into
mainfrom
work/cranky-albattani

Conversation

@simonsmallchua
Copy link
Copy Markdown
Contributor

@simonsmallchua simonsmallchua commented Apr 18, 2026

Summary

  • Replaces all ~1,024 zerolog calls across ~48 files with a unified internal/logging package built on stdlib log/slog + sentry-go/slog v0.45.0
  • Adds 14-component taxonomy (worker, queue, batch, db, api, crawler, archive, notify, auth, startup, techdetect, util, jobs, pressure) — every log line carries its component as a structured field and Sentry tag
  • Fixes Sentry fingerprint fragmentation: dynamic values (IDs, domains, counts) moved to structured fields; fingerprint set to [component, staticMessage]; BeforeSend hook normalises UUID/numeric tokens as a safety net
  • logging.NoCapture(ctx) suppresses Sentry capture for expected errors (404s, validation failures) without silencing the log
  • Removes github.com/rs/zerolog dependency entirely
  • Updates scripts/process_logs.py and scripts/aggregate_logs.py to extract the new component field, producing components_per_minute.csv and a component breakdown in summary.md
  • Updates .claude/skills/structured-logging/SKILL.md to document the new patterns

Test plan

  • go build ./... — clean compile
  • go test ./... — all 21 packages pass
  • go vet ./... — no issues
  • Spot-check JSON log output includes component field and static message strings
  • Verify Sentry issues group by [component, message] in staging
  • Run scripts/process_logs.py against a sample log file and confirm component_counts in output JSON
  • Run scripts/aggregate_logs.py and confirm components_per_minute.csv is written

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Component-scoped structured logging with deterministic message normalisation, automatic error capture integration, and context-based capture suppression.
  • Documentation

    • Added logging and Sentry audit reports and updated guidance and examples for component loggers, error-capture control and tracing.
  • Chores

    • Large-scale migration to the new logging API across the codebase, log-aggregation output changes, dependency updates, and added a pre-commit Go lint step.
  • Tests

    • New tests for logging behaviour, message normalisation and Sentry BeforeSend integration.

@supabase
Copy link
Copy Markdown

supabase Bot commented Apr 18, 2026

Updates to Preview Branch (work/cranky-albattani) ↗︎

Deployments Status Updated
Database Sat, 18 Apr 2026 21:11:09 UTC
Services Sat, 18 Apr 2026 21:11:09 UTC
APIs Sat, 18 Apr 2026 21:11:09 UTC

Tasks are run on every commit but only new migration files are pushed.
Close and reopen this PR if you want to apply changes from existing seed or migration files.

Tasks Status Updated
Configurations Sat, 18 Apr 2026 21:11:11 UTC
Migrations Sat, 18 Apr 2026 21:11:12 UTC
Seeding Sat, 18 Apr 2026 21:11:14 UTC
Edge Functions Sat, 18 Apr 2026 21:11:14 UTC

View logs for this Workflow Run ↗︎.
Learn more about Supabase for Git ↗︎.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • ✅ Review completed - (🔄 Check again to review again)
📝 Walkthrough

Walkthrough

Adds a new slog-backed centralized logging package (internal/logging) with Sentry integration and message normalisation; migrates code from zerolog + ad‑hoc Sentry captures to component-scoped loggers and structured key/value calls; updates deps, docs, tests, scripts and pre-commit tooling.

Changes

Cohort / File(s) Summary
Core logging infra & tests
internal/logging/logging.go, internal/logging/logging_test.go
New slog-based logging package: component-scoped Logger, ErrorContext, Fatal, NoCapture, BeforeSend Sentry normaliser, fanout handler (stdout + Sentry), Setup, ParseLevel and tests.
Entrypoints / startup
cmd/app/main.go, cmd/archive_key_migrate/main.go, cmd/test_jobs/main.go
Introduce startup component logger; call logging.Setup(...); replace global zerolog calls and inline Sentry captures with component-style structured logs and BeforeSend; small logging payload changes.
API foundation & middleware
internal/api/logging.go, internal/api/middleware.go, internal/api/errors.go, internal/api/response.go
Add api component logger; loggerWithRequest now returns *logging.Logger; middleware/error/response logging migrated to component API with explicit "error", err fields.
API handlers & auth
internal/api/*, internal/auth/*
Mass migration of handlers to component loggers and structured calls; several functions now accept *logging.Logger; many ad‑hoc Sentry captures removed while preserving control flow.
Database layer
internal/db/*
Introduce db (and related) component loggers; migrate zerolog usage to component key/value style, remove many explicit Sentry captures, add redactURL for safe DB URL logging.
Jobs & workers
internal/jobs/*
Add jobs/worker component loggers; migrate logging, use ErrorContext for context-aware error/panic logging; removed scattered Sentry captures in recovery paths.
Archive & crawler
internal/archive/..., internal/crawler/...
Add archive/crawler component loggers; convert log calls to new API while preserving messages and structured fields.
Notifications & listeners
internal/notifications/*
Introduce notify logger; migrate notification listener and Slack-related logging to component API.
Utilities & detection
internal/util/url.go, internal/techdetect/detector.go
Add util/techdetect component loggers and migrate debug/info/warn calls to structured API.
Scripts: log processing & aggregation
scripts/process_logs.py, scripts/aggregate_logs.py
Switch aggregation to component/event-centric outputs (component_counts, event_counts), add flat CSV export; updated function signatures and CLI behaviour.
Docs & audits
.claude/skills/structured-logging/SKILL.md, docs/diagnostics/LOGGING_AUDIT.md, docs/diagnostics/SENTRY_AUDIT.md
Rewrite logging guidance; add Zerolog and Sentry call-site audits and migration notes.
Dependency updates
go.mod
Remove rs/zerolog; upgrade github.com/getsentry/sentry-go and add github.com/getsentry/sentry-go/slog.
Pre-commit & tooling
.husky/pre-commit, scripts/lint-go.js
Pre-commit now runs scripts/lint-go.js which invokes golangci-lint for staged Go files when available.
Wide migration (many files)
internal/*, cmd/*, ...
Mass migration from zerolog fluent API and ad‑hoc Sentry captures to logging.Component(...) usage and structured "error", err fields across most packages; control flow mostly unchanged.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant Logging as internal/logging
    participant Sentry as Sentry SDK
    participant Stdout as Stdout/JSON

    App->>Logging: Setup(level, env)
    Logging->>Stdout: install slog handler (text/json)
    Logging->>Sentry: install Sentry slog handler (BeforeSend)
    App->>Logging: l := Component("component")
    App->>Logging: l.ErrorContext(ctx, "failed operation", "error", err)
    alt ctx marked NoCapture
        Logging->>Sentry: BeforeSend drops event (no_capture)
        Logging->>Stdout: emit structured slog record (with component)
    else
        Logging->>Sentry: send normalised event (message/fingerprint/tags)
        Logging->>Stdout: emit structured slog record (with component)
    end
Loading

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.39% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Migrate to slog + Sentry fingerprinting' accurately captures the main change: transitioning from zerolog to slog-based logging with Sentry fingerprinting improvements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 18, 2026

Release Versions

App patch: v0.32.6v0.32.7

Changelog

Added

  • internal/logging package: component-scoped slog + Sentry fanout logger with
    14 named components (worker, api, crawler, db, queue, batch, archive, notify,
    auth, startup, techdetect, util, jobs, pressure), BeforeSend UUID/number
    normalisation for Sentry issue grouping, and static fingerprinting
  • Cross-platform pre-commit golangci-lint hook (scripts/lint-go.js) — skips
    gracefully when golangci-lint is absent, fails closed on git errors

Changed

  • Migrated all logging from zerolog to stdlib log/slog with sentry-go/slog
    v0.45.0; all packages now use logging.Component("name") — no direct zerolog
    or bare sentry imports remain
  • Removed github.com/rs/zerolog dependency
  • Sentry error capture is now automatic for Error/ErrorContext calls via the
    fanout handler; use logging.NoCapture(ctx) inline to suppress capture for
    expected errors (e.g. 404s, client validation)
  • Log monitoring pipeline restructured for agent debugging: process_logs.py
    emits a flat per-iteration CSV
    (timestamp, level, component, message, extras) alongside the JSON summary;
    aggregate_logs.py produces time_series.csv (per-minute level counts) and
    events_per_minute.csv (per-minute component: message counts) using a
    lossless .aggregate_data.json state file

@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-333.fly.dev
Dashboard: https://hover-pr-333.fly.dev/dashboard

@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-333.fly.dev
Dashboard: https://hover-pr-333.fly.dev/dashboard

@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-333.fly.dev
Dashboard: https://hover-pr-333.fly.dev/dashboard

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)
internal/api/response.go (1)

19-25: 🧹 Nitpick | 🔵 Trivial

Avoid duplicating request_id in structured fields.

Line 25 already logs through loggerWithRequest(r), which includes request_id. Passing request_id again can produce duplicate keys in the same event.

Proposed cleanup
 func WriteJSON(w http.ResponseWriter, r *http.Request, data any, status int) {
-	requestID := GetRequestID(r)
-
 	w.Header().Set("Content-Type", "application/json")
 	w.WriteHeader(status)

 	if err := json.NewEncoder(w).Encode(data); err != nil {
-		loggerWithRequest(r).Error("Failed to encode JSON response", "error", err, "request_id", requestID)
+		loggerWithRequest(r).Error("Failed to encode JSON response", "error", err)
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/api/response.go` around lines 19 - 25, The log call duplicates the
request_id because loggerWithRequest(r) already attaches it; update the error
logging in the response flow (surrounding GetRequestID(r),
json.NewEncoder(w).Encode(data), and loggerWithRequest(r)) to remove the
explicit "request_id" field from the Error invocation so only
loggerWithRequest(r) supplies the request_id—keep the message and "error" field
intact.
internal/crawler/tls_aia.go (1)

3-17: ⚠️ Potential issue | 🟠 Major

Formatting check is currently blocking CI for this file.

The pipeline reports a gofmt failure. Please run gofmt and goimports over this touched file (and other touched Go files) before merge.

As per coding guidelines, **/*.go: Run gofmt and goimports formatting on all touched Go files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/crawler/tls_aia.go` around lines 3 - 17, The import block in
tls_aia.go is improperly formatted (extra blank line and spacing) causing
gofmt/goimports to fail CI; run gofmt and goimports over this file (and all
other touched .go files) to normalize imports and formatting, specifically fix
the import block around the import list (the import parentheses/import block) so
there are no stray blank lines and unused imports are organized, then re-run
gofmt/goimports and commit the formatted changes.
internal/crawler/robots.go (1)

3-14: ⚠️ Potential issue | 🟠 Major

Please run gofmt and goimports on this file.

The current PR is failing formatting checks, and this import section reflects that.

As per coding guidelines, **/*.go: Run gofmt and goimports formatting on all touched Go files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/crawler/robots.go` around lines 3 - 14, Run gofmt and goimports on
the file to fix the improperly formatted import block and any other formatting
issues; specifically reformat the import list and remove the stray blank line
and ensure imports are grouped/ordered correctly (the import block in
internal/crawler/robots.go), then save the file so the CI formatting checks
pass.
internal/db/webflow_connections.go (1)

3-10: ⚠️ Potential issue | 🟠 Major

This file still needs gofmt/goimports.

Formatting is currently failing in CI; please normalise this file before merge.

As per coding guidelines, **/*.go: Run gofmt and goimports formatting on all touched Go files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/db/webflow_connections.go` around lines 3 - 10, The import block in
webflow_connections.go is not formatted/organized (extra blank line and unsorted
imports) causing CI to fail; run gofmt and goimports on the file (or the repo)
to normalize formatting and fix imports, ensuring the import list (context,
database/sql, errors, fmt, time) is properly formatted/organized and any unused
imports removed so the file compiles cleanly; then save the file and re-run
build/CI.
internal/db/schedulers.go (1)

3-11: ⚠️ Potential issue | 🟠 Major

Formatting check is blocking merge — run gofmt and goimports on this file.

The CI output indicates formatting is failing, and this import block is not fully normalised.

As per coding guidelines, **/*.go: Run gofmt and goimports formatting on all touched Go files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/db/schedulers.go` around lines 3 - 11, The import block at the top
of schedulers.go is not formatted correctly (extra blank line and unordered
imports); run gofmt and goimports on this file and fix the import section so
imports are grouped and ordered per goimports (remove the stray blank line and
unused import placeholders), then re-run gofmt to normalise spacing throughout
the file; target the file's import block and save to make CI pass.
scripts/aggregate_logs.py (1)

339-357: ⚠️ Potential issue | 🟠 Major

Bug: Component counts not extracted during incremental aggregation.

The aggregate_logs() function has inline processing logic (lines 339-357) that extracts level_counts and message_counts from new JSON files, but it does not extract component_counts. The process_json_file() function (lines 121-125) correctly handles component_counts, but it's not called here.

This means components_per_minute.csv will be empty or incomplete when processing new files incrementally.

🐛 Proposed fix to extract component_counts
             # Process message counts
             for timestamp, messages in data.get('message_counts', {}).items():
                 minute_key = timestamp[:16]
                 for msg_data in messages:
                     message = msg_data.get('message', 'unknown')
                     count = msg_data.get('count', 0)
                     by_minute[minute_key]['message_counts'][message] += count
                     all_messages[message] += count

+            # Process component counts (new field from slog migration)
+            for timestamp, components in data.get('component_counts', {}).items():
+                minute_key = timestamp[:16]
+                for component, count in components.items():
+                    by_minute[minute_key]['component_counts'][component] += count
+
             processed_set.add(json_file.name)
             success_count += 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/aggregate_logs.py` around lines 339 - 357, aggregate_logs() currently
processes level_counts and message_counts inline but omits component_counts,
causing incomplete components_per_minute.csv; update the loop in aggregate_logs
(near the level_counts/message_counts handling) to also iterate
data.get('component_counts', {}) or extract component_counts from each message
entry like process_json_file() does, compute minute_key = timestamp[:16], ensure
by_minute[minute_key] is initialized, then increment
by_minute[minute_key]['component_counts'][component] and a top-level
all_components[component] accordingly (use the same keys/structures used for
level_counts/message_counts to mirror process_json_file()).
internal/jobs/worker.go (1)

1534-1543: ⚠️ Potential issue | 🟠 Major

Use a static error message for Sentry grouping.

Line 1539 logs a dynamic message (message includes streak and possibly the last error text). That defeats the new [component, staticMessage] grouping and will fragment Sentry issues again.

Suggested fix
-	message := fmt.Sprintf("Job failed after %d consecutive task failures", streak)
+	dbMessage := fmt.Sprintf("Job failed after %d consecutive task failures", streak)
 	if lastErr != nil {
-		message = fmt.Sprintf("%s (last error: %s)", message, lastErr.Error())
+		dbMessage = fmt.Sprintf("%s (last error: %s)", dbMessage, lastErr.Error())
 	}
 
-	workerLog.Error(message,
+	workerLog.Error("Job failed after consecutive task failures",
 		"job_id", jobID,
 		"failure_streak", streak,
 		"threshold", wp.jobFailureThreshold,
+		"last_error", lastErr,
 	)
@@
-		`, JobStatusFailed, now, message, jobID, JobStatusFailed, JobStatusCancelled)
+		`, JobStatusFailed, now, dbMessage, jobID, JobStatusFailed, JobStatusCancelled)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/jobs/worker.go` around lines 1534 - 1543, The current error call
builds a dynamic message (variable message) including streak and lastErr which
prevents Sentry grouping; update the workerLog.Error invocation in the job
failure path (where message, lastErr, jobID, streak and wp.jobFailureThreshold
are used) to pass a fixed/static error message string (e.g. "job failed after
consecutive task failures") as the first argument and move all dynamic details
(streak, lastErr.Error() or lastErr as a field, job_id, threshold) into
structured key/value pairs so the static message is used for grouping while
still recording the dynamic context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/test_jobs/main.go`:
- Line 73: The file fails gofmt/goimports checks; run automatic formatting on
the touched Go file and update the commit: run gofmt -w (or gofmt) and goimports
-w on cmd/test_jobs/main.go (or run gofmt/goimports over the package) to fix
formatting and import ordering so the call site startupLog.Info("Worker pool
started with "+strconv.Itoa(jobWorkers)+" workers") and surrounding code are
properly formatted; re-run `go vet`/CI to ensure the formatting errors are
resolved before pushing.

In `@docs/diagnostics/LOGGING_AUDIT.md`:
- Around line 1-4: Add an explicit “historical baseline / pre-migration
snapshot” note under the header "# Zerolog Call Site Audit" (near the
"Generated: 2026-04-11" line) so readers know these numbers reflect pre-slog
migration state; insert a single-sentence clarification such as "NOTE: This
document is a pre-migration snapshot and represents the state prior to the slog
migration (historical baseline)." to avoid confusion.

In `@internal/api/admin.go`:
- Around line 51-54: Extract the nil-safe organisation-ID logic into a small
helper (e.g., getOrgID or formatOrgID) that accepts the user (or a pointer to
user.OrganisationID) and returns the string to assign to orgIDStr; replace the
duplicated blocks that set orgIDStr by calling this helper wherever you
currently check user.OrganisationID and set orgIDStr to "none" or the
dereferenced value, and update all occurrences (the blocks using orgIDStr around
the two duplicated spots) to use the new helper to keep the logic DRY.

In `@internal/api/auth_webflow.go`:
- Around line 62-63: Remove the raw user identifier from the error-level log
call in the auth webflow: update the logger.Error invocation (the call using
logger.Error and userClaims.UserID) so it no longer attaches userClaims.UserID
directly; instead rely on the request-scoped correlation fields already present
on the logger or include a redacted/hashed version of userClaims.UserID (or
another non-PII token). Make the same change for the other occurrences
referenced (the logger.Error calls paired with InternalError) so error-level
logs don't contain raw PII while keeping the InternalError(w, r, err) behavior
unchanged.

In `@internal/api/auth.go`:
- Around line 130-131: Remove raw user identifiers from error-level logs in the
auth handlers by deleting the "user_id" field from the logger.Error calls (e.g.,
the call to logger.Error("Failed to create user with organisation", "error",
err, "user_id", req.UserID)) and any similar logger.Error usages around the
InternalError invocation and the other occurrence noted (~lines 213-214); if you
must record user context, replace raw req.UserID with a non-identifying token
(e.g., masked ID or request-scoped trace ID) or omit it entirely, and keep the
InternalError(w, r, err) call unchanged.

In `@internal/api/errors.go`:
- Around line 80-85: The loggerWithRequest(r) call already injects request_id,
so remove the redundant "request_id", requestID pair from the logger.Error and
logger.Debug calls in the error response path; locate the calls that follow
loggerWithRequest(r) (the logger.Error("API error response", ...) and
logger.Debug("API client error response", ...)) and leave the other structured
fields ("status", "code", "message") intact.
- Around line 54-59: The log calls in the error response path duplicate the
request_id because loggerWithRequest(r) already injects it; remove the explicit
"request_id", requestID argument from both logger.Error and logger.Debug
invocations in the function that calls loggerWithRequest(r) so the structured
logger only includes request_id once and retains the other fields ("error",
"status", "code").

In `@internal/auth/middleware.go`:
- Line 92: Remove the sensitive token material from the auth failure log call:
locate the authLog.Warn invocation that currently passes "token_prefix",
tokenString[:min(10, len(tokenString))] (in middleware.go) and remove that
key/value so no part of tokenString is logged; if you need non-sensitive info,
replace it with a safe alternative such as the token length or a fixed marker
(e.g., "token_present": true or "token_length": len(tokenString)) and ensure no
other logging in the same auth middleware references tokenString or slices of
it.

In `@internal/db/dashboard.go`:
- Around line 171-172: Update the error logs that call dbLog.Error("Failed to
scan activity row", "error", err) (and the other similar scan/iteration error
logs at the listed locations) to include the current tenant/organisation context
as a key-value pair—for example add "organisation_id", organisationID (or the
appropriate local variable name such as orgID/organisationID available in the
same scope) to the log call so it becomes dbLog.Error("Failed to scan activity
row", "error", err, "organisation_id", organisationID); make the same change for
the other occurrences (lines ~180, ~304, ~427, ~712, ~795, ~811) ensuring you
reference the correct in-scope identifier for the organisation in each
function/loop.

In `@internal/db/db.go`:
- Line 466: The call to dbLog.Info that logs the raw DATABASE_URL should be
changed to avoid leaking credentials; implement a helper (e.g., redactURL or
similar) that parses the connection string (using net/url), zeroes or replaces
parsed.User (username/password) and returns the sanitized URL, then replace the
direct "connection_url, url" usage in the dbLog.Info call with the redacted
value; ensure the helper returns a safe placeholder on parse error (e.g.,
"<invalid URL>") and update the log invocation in the same scope where
dbLog.Info is called.

In `@internal/db/webflow_connections.go`:
- Around line 202-203: The delete-failure log call in the delete path for
webflow connections currently logs "connection_id" and the error via
dbLog.Error; update that call to also include the existing organisationID (use
the key "organisation_id") so the log entry contains tenant context for
correlation (refer to dbLog.Error, connectionID, and organisationID in the
delete handler for webflow connections).

In `@internal/db/webflow_site_settings.go`:
- Around line 10-14: There are two loggers declared (webflowSettingsLog and
dbLog) with the same "db" component; remove the duplicate by deleting the
webflowSettingsLog variable and update all uses of webflowSettingsLog to use the
existing dbLog instead (ensure the single dbLog variable remains the shared
logger), and remove any other duplicate webflowSettingsLog declarations
elsewhere in the file so only dbLog is used throughout.

In `@internal/jobs/domain_limiter.go`:
- Around line 120-121: This file is not formatted with gofmt/goimports; run
gofmt and goimports on the modified Go files (including
internal/jobs/domain_limiter.go) to fix formatting and import ordering so CI
passes. Specifically, format the block around the DomainLimiter construction
(referencing jobsLog.Info and cfg.RobotsDelayMultiplier) and the nearby code
(lines ~514-535) so it adheres to gofmt standards and ensure imports are
organized via goimports before committing.

In `@internal/jobs/worker.go`:
- Around line 4562-4564: The error logs calling workerLog.Error that currently
include full objects (e.g., the call logging "request_diagnostics",
result.RequestDiagnostics and other occurrences that log "headers",
"second_headers", "cache_attempts") must stop emitting raw payloads; update
those Error calls to either omit the full objects or replace them with
redacted/summarized values (e.g., boolean flags, counts, masked header keys, or
a truncated hash/ID) and/or lower the log level to Debug where full content is
safe; locate the logging sites by searching for workerLog.Error invocations that
include keys "request_diagnostics", "headers", "second_headers", or
"cache_attempts" (including the shown call and the ranges around lines
~4923-4939 and ~4951-4953) and change the arguments to only include
non-sensitive summaries.
- Around line 3793-3797: The log call using workerLog.Info records
cleanupDuration (a time.Duration) under the key "duration_ms", causing unit
mismatch; update the third argument value to a numeric millisecond value by
replacing cleanupDuration with cleanupDuration.Milliseconds() (or cast the
result if necessary) in the workerLog.Info call that logs "Orphaned task cleanup
completed for job" alongside targetJobID and totalCleaned so downstream parsers
see correct milliseconds.

In `@internal/logging/logging_test.go`:
- Around line 1-14: The file package logging (internal/logging/logging_test.go)
is not gofmt-formatted; run gofmt -w on internal/logging/logging_test.go and
then run goimports (or gofmt -w followed by goimports -w) to fix import
formatting and ordering so the test file compiles and CI passes; ensure the file
remains unchanged semantically (imports like bytes, context, encoding/json,
errors, log/slog, strings, testing, time, and github.com/getsentry/sentry-go
stay present) and commit the formatted file.

In `@internal/logging/logging.go`:
- Around line 75-96: The Logger currently stores a concrete *slog.Logger which
snapshots handlers at init time; change Logger to keep only the component string
and accumulated attributes (e.g., []any) and update Component(name string) and
(l *Logger) With(args ...any) to return new Logger values that only set
component and append attributes; remove the cached slog field and, in all
log-emitting methods (the Logger methods that call slog (e.g.,
Debug/Info/Error/Log or whatever emit functions exist)), rebuild the concrete
logger at emit time by calling slog.Default().With("component", l.component,
l.attrs...) and use that to perform the output so the logger picks up
slog.SetDefault() done in Setup.
- Around line 299-308: The default branch in processAttr currently sends all
slog.KindString attributes into event.Tags, which causes high-cardinality
dynamic identifiers to become Sentry tags; change processAttr so that only
attributes coming from the explicit "tags" group are mapped into event.Tags and
all other string attributes (including user_id, organisation_id, connection_id,
account_id, property_id, google_account_id, session_id) are put into event.Extra
instead—locate the switch on a.Value.Kind() in processAttr and alter the
slog.KindString case to route non-"tags" keys to event.Extra (keeping the
existing group handling for slog.KindGroup and default Any() behavior).

---

Outside diff comments:
In `@internal/api/response.go`:
- Around line 19-25: The log call duplicates the request_id because
loggerWithRequest(r) already attaches it; update the error logging in the
response flow (surrounding GetRequestID(r), json.NewEncoder(w).Encode(data), and
loggerWithRequest(r)) to remove the explicit "request_id" field from the Error
invocation so only loggerWithRequest(r) supplies the request_id—keep the message
and "error" field intact.

In `@internal/crawler/robots.go`:
- Around line 3-14: Run gofmt and goimports on the file to fix the improperly
formatted import block and any other formatting issues; specifically reformat
the import list and remove the stray blank line and ensure imports are
grouped/ordered correctly (the import block in internal/crawler/robots.go), then
save the file so the CI formatting checks pass.

In `@internal/crawler/tls_aia.go`:
- Around line 3-17: The import block in tls_aia.go is improperly formatted
(extra blank line and spacing) causing gofmt/goimports to fail CI; run gofmt and
goimports over this file (and all other touched .go files) to normalize imports
and formatting, specifically fix the import block around the import list (the
import parentheses/import block) so there are no stray blank lines and unused
imports are organized, then re-run gofmt/goimports and commit the formatted
changes.

In `@internal/db/schedulers.go`:
- Around line 3-11: The import block at the top of schedulers.go is not
formatted correctly (extra blank line and unordered imports); run gofmt and
goimports on this file and fix the import section so imports are grouped and
ordered per goimports (remove the stray blank line and unused import
placeholders), then re-run gofmt to normalise spacing throughout the file;
target the file's import block and save to make CI pass.

In `@internal/db/webflow_connections.go`:
- Around line 3-10: The import block in webflow_connections.go is not
formatted/organized (extra blank line and unsorted imports) causing CI to fail;
run gofmt and goimports on the file (or the repo) to normalize formatting and
fix imports, ensuring the import list (context, database/sql, errors, fmt, time)
is properly formatted/organized and any unused imports removed so the file
compiles cleanly; then save the file and re-run build/CI.

In `@internal/jobs/worker.go`:
- Around line 1534-1543: The current error call builds a dynamic message
(variable message) including streak and lastErr which prevents Sentry grouping;
update the workerLog.Error invocation in the job failure path (where message,
lastErr, jobID, streak and wp.jobFailureThreshold are used) to pass a
fixed/static error message string (e.g. "job failed after consecutive task
failures") as the first argument and move all dynamic details (streak,
lastErr.Error() or lastErr as a field, job_id, threshold) into structured
key/value pairs so the static message is used for grouping while still recording
the dynamic context.

In `@scripts/aggregate_logs.py`:
- Around line 339-357: aggregate_logs() currently processes level_counts and
message_counts inline but omits component_counts, causing incomplete
components_per_minute.csv; update the loop in aggregate_logs (near the
level_counts/message_counts handling) to also iterate
data.get('component_counts', {}) or extract component_counts from each message
entry like process_json_file() does, compute minute_key = timestamp[:16], ensure
by_minute[minute_key] is initialized, then increment
by_minute[minute_key]['component_counts'][component] and a top-level
all_components[component] accordingly (use the same keys/structures used for
level_counts/message_counts to mirror process_json_file()).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 39edca18-45ef-4b2a-9a2e-541e7e44b6bc

📥 Commits

Reviewing files that changed from the base of the PR and between f69f922 and 6664664.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (58)
  • .claude/skills/structured-logging/SKILL.md
  • archive_key_migrate
  • cmd/app/main.go
  • cmd/archive_key_migrate/main.go
  • cmd/test_jobs/main.go
  • docs/diagnostics/LOGGING_AUDIT.md
  • docs/diagnostics/SENTRY_AUDIT.md
  • go.mod
  • internal/api/admin.go
  • internal/api/auth.go
  • internal/api/auth_google.go
  • internal/api/auth_webflow.go
  • internal/api/domains.go
  • internal/api/errors.go
  • internal/api/google_data_api.go
  • internal/api/handlers.go
  • internal/api/jobs.go
  • internal/api/logging.go
  • internal/api/middleware.go
  • internal/api/notifications.go
  • internal/api/organisations.go
  • internal/api/response.go
  • internal/api/schedulers.go
  • internal/api/share_links.go
  • internal/api/slack.go
  • internal/api/webflow_sites.go
  • internal/archive/scheduler.go
  • internal/auth/middleware.go
  • internal/crawler/crawler.go
  • internal/crawler/robots.go
  • internal/crawler/sitemap.go
  • internal/crawler/tls_aia.go
  • internal/db/batch.go
  • internal/db/dashboard.go
  • internal/db/db.go
  • internal/db/google_connections.go
  • internal/db/notifications.go
  • internal/db/pages.go
  • internal/db/pressure.go
  • internal/db/queue.go
  • internal/db/reset_migrations.go
  • internal/db/retry.go
  • internal/db/schedulers.go
  • internal/db/slack.go
  • internal/db/users.go
  • internal/db/webflow_connections.go
  • internal/db/webflow_site_settings.go
  • internal/jobs/domain_limiter.go
  • internal/jobs/manager.go
  • internal/jobs/worker.go
  • internal/logging/logging.go
  • internal/logging/logging_test.go
  • internal/notifications/listener.go
  • internal/notifications/slack.go
  • internal/techdetect/detector.go
  • internal/util/url.go
  • scripts/aggregate_logs.py
  • scripts/process_logs.py

Comment thread cmd/test_jobs/main.go Outdated
Comment thread docs/diagnostics/LOGGING_AUDIT.md
Comment thread internal/api/admin.go
Comment on lines +51 to +54
orgIDStr := "none"
if user.OrganisationID != nil {
orgIDStr = *user.OrganisationID
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Extract duplicated organisation-ID formatting into a helper.

Both handlers repeat the same nil-safe organisation ID logic; a small helper would keep this DRY.

Refactor sketch
+func orgIDOrNone(user *db.User) string {
+	if user.OrganisationID == nil {
+		return "none"
+	}
+	return *user.OrganisationID
+}
...
-	orgIDStr := "none"
-	if user.OrganisationID != nil {
-		orgIDStr = *user.OrganisationID
-	}
+	orgIDStr := orgIDOrNone(user)
...
-	orgIDStr2 := "none"
-	if user.OrganisationID != nil {
-		orgIDStr2 = *user.OrganisationID
-	}
+	orgIDStr2 := orgIDOrNone(user)

Also applies to: 165-168

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/api/admin.go` around lines 51 - 54, Extract the nil-safe
organisation-ID logic into a small helper (e.g., getOrgID or formatOrgID) that
accepts the user (or a pointer to user.OrganisationID) and returns the string to
assign to orgIDStr; replace the duplicated blocks that set orgIDStr by calling
this helper wherever you currently check user.OrganisationID and set orgIDStr to
"none" or the dereferenced value, and update all occurrences (the blocks using
orgIDStr around the two duplicated spots) to use the new helper to keep the
logic DRY.

Comment thread internal/api/auth_webflow.go Outdated
Comment thread internal/api/auth.go Outdated
Comment thread internal/jobs/worker.go
Comment thread internal/jobs/worker.go
Comment thread internal/logging/logging_test.go
Comment thread internal/logging/logging.go
Comment on lines +299 to +308
default:
switch a.Value.Kind() {
case slog.KindString:
event.Tags[a.Key] = a.Value.String()
case slog.KindGroup:
for _, ga := range a.Value.Group() {
event.Extra[a.Key+"."+ga.Key] = ga.Value.Any()
}
default:
event.Extra[a.Key] = a.Value.Any()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate the file and check its size
fd -t f "logging.go" | grep internal/logging
wc -l internal/logging/logging.go

Repository: Harvey-AU/hover

Length of output: 118


🏁 Script executed:

# Read the specific lines mentioned in the review (299-308)
sed -n '290,315p' internal/logging/logging.go

Repository: Harvey-AU/hover

Length of output: 718


🏁 Script executed:

# Search for the processAttr function to understand the full context
rg "func.*processAttr" -A 50 internal/logging/

Repository: Harvey-AU/hover

Length of output: 2707


🏁 Script executed:

# Check internal/api/auth_google.go for logging patterns
rg "slog\." internal/api/auth_google.go | head -20

Repository: Harvey-AU/hover

Length of output: 41


🏁 Script executed:

# Find auth_google.go
fd -t f "auth_google.go" internal/api/

Repository: Harvey-AU/hover

Length of output: 86


🏁 Script executed:

# Check if it exists and what logging it does
find . -name "*auth*google*" -type f

Repository: Harvey-AU/hover

Length of output: 88


🏁 Script executed:

# Search for any logging calls in internal/api/ related to auth
rg "logger\.|slog\." internal/api/ -l

Repository: Harvey-AU/hover

Length of output: 418


🏁 Script executed:

# Check for any error logs with string attributes that might be dynamic
rg "Error|With" internal/api/ | grep -E "(user_id|organisation_id|connection_id|google_account_id|request_id)" | head -10

Repository: Harvey-AU/hover

Length of output: 1286


🏁 Script executed:

# Check auth_google.go for logging patterns with dynamic fields
cat -n internal/api/auth_google.go | grep -A 2 -B 2 "logger\|slog"

Repository: Harvey-AU/hover

Length of output: 23543


🏁 Script executed:

# Look at how logger is defined in the codebase
rg "func.*Error" internal/logging/ -A 5 | head -30

Repository: Harvey-AU/hover

Length of output: 906


🏁 Script executed:

# Check the logger.Error signature and how it converts to slog attrs
rg "type.*Logger|func.*logger" internal/logging/logging.go -A 3 | head -40

Repository: Harvey-AU/hover

Length of output: 400


🏁 Script executed:

# Look for any existing PR context or comments about Tags vs Extra strategy
rg "Tags|Extra|cardinality|fingerprint" internal/logging/logging.go -B 2 -A 2

Repository: Harvey-AU/hover

Length of output: 2779


Move dynamic string attributes to Extra instead of Tags.

The default branch in processAttr() currently routes every slog.KindString attribute to event.Tags, which contradicts the PR's design objective. Error logs throughout internal/api/auth_google.go attach dynamic string identifiers—such as user_id, organisation_id, connection_id, account_id, property_id, google_account_id, and session_id—that would become Sentry tags under this logic. This reintroduces high-cardinality values into Sentry metadata, undermining the intent to reserve tags for stable taxonomy like component and maintain dynamic values in structured fields.

Restrict tags to the explicit tags group and route ordinary string attributes to event.Extra instead.

Suggested fix
 	default:
 		switch a.Value.Kind() {
 		case slog.KindString:
-			event.Tags[a.Key] = a.Value.String()
+			event.Extra[a.Key] = a.Value.String()
 		case slog.KindGroup:
 			for _, ga := range a.Value.Group() {
 				event.Extra[a.Key+"."+ga.Key] = ga.Value.Any()
 			}
 		default:
 			event.Extra[a.Key] = a.Value.Any()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
default:
switch a.Value.Kind() {
case slog.KindString:
event.Tags[a.Key] = a.Value.String()
case slog.KindGroup:
for _, ga := range a.Value.Group() {
event.Extra[a.Key+"."+ga.Key] = ga.Value.Any()
}
default:
event.Extra[a.Key] = a.Value.Any()
default:
switch a.Value.Kind() {
case slog.KindString:
event.Extra[a.Key] = a.Value.String()
case slog.KindGroup:
for _, ga := range a.Value.Group() {
event.Extra[a.Key+"."+ga.Key] = ga.Value.Any()
}
default:
event.Extra[a.Key] = a.Value.Any()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/logging/logging.go` around lines 299 - 308, The default branch in
processAttr currently sends all slog.KindString attributes into event.Tags,
which causes high-cardinality dynamic identifiers to become Sentry tags; change
processAttr so that only attributes coming from the explicit "tags" group are
mapped into event.Tags and all other string attributes (including user_id,
organisation_id, connection_id, account_id, property_id, google_account_id,
session_id) are put into event.Extra instead—locate the switch on a.Value.Kind()
in processAttr and alter the slog.KindString case to route non-"tags" keys to
event.Extra (keeping the existing group handling for slog.KindGroup and default
Any() behavior).

@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-333.fly.dev
Dashboard: https://hover-pr-333.fly.dev/dashboard

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 18, 2026

@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-333.fly.dev
Dashboard: https://hover-pr-333.fly.dev/dashboard

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
cmd/app/main.go (1)

407-425: ⚠️ Potential issue | 🟠 Major

Initialise Sentry before logging.Setup.

internal/logging.Setup is documented to run after sentry.Init, but this path wires the slog fanout first. That can leave the Sentry handler initialised before the client exists, so error logs may not be captured reliably. Move setupLogging(config) until after the Sentry branch has either initialised Sentry or intentionally skipped it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/app/main.go` around lines 407 - 425, The call to setupLogging(config)
runs before sentry.Init causing the slog fanout to wire Sentry handler before
the client exists; move the setupLogging(config) call so it executes after the
Sentry branch has either initialised the client or skipped it: keep the
sentry.Init(...) block (using logging.BeforeSend) and any error handling as-is,
and only then call setupLogging(config) (or call it inside both branches after
successful init/skip) so the logging stack is configured after sentry.Init has
completed.
internal/jobs/worker.go (2)

1533-1543: ⚠️ Potential issue | 🟠 Major

Keep this error log’s message static.

This message includes streak and optionally lastErr.Error(), so the Sentry fingerprint will still fragment on error text. Keep the log message static and move the variable parts into fields.

♻️ Proposed change
 	message := fmt.Sprintf("Job failed after %d consecutive task failures", streak)
 	if lastErr != nil {
 		message = fmt.Sprintf("%s (last error: %s)", message, lastErr.Error())
 	}
 
-	workerLog.Error(message,
-		"job_id", jobID,
-		"failure_streak", streak,
-		"threshold", wp.jobFailureThreshold,
-	)
+	logArgs := []any{
+		"job_id", jobID,
+		"failure_streak", streak,
+		"threshold", wp.jobFailureThreshold,
+	}
+	if lastErr != nil {
+		logArgs = append(logArgs, "last_error", lastErr)
+	}
+	workerLog.Error("Job failed after consecutive task failures", logArgs...)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/jobs/worker.go` around lines 1533 - 1543, The log message in
markJobFailedDueToConsecutiveFailures must be made static so Sentry fingerprints
don't fragment: stop interpolating streak/lastErr into the message string and
instead pass them as structured fields to workerLog.Error; keep the message text
like "Job failed due to consecutive task failures" and add fields
"failure_streak": streak, "last_error": lastErr (or lastErr.Error() only if
non-nil), "job_id": jobID, and "threshold": wp.jobFailureThreshold when calling
workerLog.Error.

2057-2069: ⚠️ Potential issue | 🟠 Major

Don’t capture the same panic twice.

This block calls hub.Recover/sentry.CaptureException and then emits workerLog.Error, which is already auto-captured by the logging fanout. One panic will generate two Sentry events. Keep one capture path only.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/jobs/worker.go` around lines 2057 - 2069, The defer in
processNextTask currently calls hub.Recover and also sentry.CaptureException
then logs via workerLog.Error, causing duplicate Sentry events; remove the
explicit sentry.CaptureException branch and keep only hub.Recover when
sentry.CurrentHub() != nil (or rely solely on the logging fanout), leaving the
workerLog.Error logging intact. Update the defer so it calls sentry.CurrentHub()
and invokes hub.Recover(r) if hub != nil and do not call sentry.CaptureException
anywhere in that block (symbols: processNextTask, sentry.CurrentHub,
hub.Recover, sentry.CaptureException, workerLog.Error).
♻️ Duplicate comments (3)
internal/db/webflow_connections.go (1)

201-201: ⚠️ Potential issue | 🟡 Minor

Add organisation_id to delete-failure log attributes.

At Line 201, this error log still only includes connection_id; organisation_id is available in scope and should be logged for tenant traceability.

Suggested change
-		dbLog.Error("Failed to delete webflow connection", "error", err, "connection_id", connectionID)
+		dbLog.Error("Failed to delete webflow connection", "error", err, "connection_id", connectionID, "organisation_id", organisationID)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/db/webflow_connections.go` at line 201, The delete-failure log call
using dbLog.Error currently only includes "connection_id"; update the
dbLog.Error invocation in the webflow connection delete routine (the function
that performs the deletion and references connectionID and organisationID) to
also include "organisation_id", passing the organisationID variable so the log
line contains both "connection_id" and "organisation_id" for tenant
traceability; ensure you modify the dbLog.Error call (the one that logs "Failed
to delete webflow connection") to add the "organisation_id", organisationID
key/value pair.
internal/jobs/worker.go (2)

4562-4564: ⚠️ Potential issue | 🟠 Major

Don’t send raw diagnostic/header payloads at error level.

These error logs include full request diagnostics and header objects. With error-level auto-capture to Sentry, that risks leaking sensitive data and oversizing events. Log a small summary instead.

Also applies to: 4923-4925, 4937-4939, 4951-4953

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/jobs/worker.go` around lines 4562 - 4564, The error logs currently
pass full request diagnostics and header payloads to workerLog.Error (see the
workerLog.Error call that includes "request_diagnostics",
result.RequestDiagnostics and similar calls around the other occurrences), which
can leak sensitive data; change each of those error logs to avoid including the
raw objects and instead log a small, non-sensitive summary (e.g., task.ID,
length of the diagnostics payload, number of headers, and a redacted flag for
auth/PII) and the marshal error; update the workerLog.Error invocations at the
shown sites to replace result.RequestDiagnostics / header objects with that
concise summary string or struct.

3793-3797: ⚠️ Potential issue | 🟡 Minor

Log actual milliseconds under duration_ms.

cleanupDuration is a time.Duration, not a millisecond count, so this field is unit-mismatched.

🛠️ Proposed fix
 	workerLog.Info("Orphaned task cleanup completed for job",
 		"job_id", targetJobID,
 		"tasks_cleaned", totalCleaned,
-		"duration_ms", cleanupDuration,
+		"duration_ms", cleanupDuration.Milliseconds(),
 	)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/jobs/worker.go` around lines 3793 - 3797, The log currently passes
cleanupDuration (a time.Duration) to workerLog.Info under the "duration_ms" key;
change that to pass the actual millisecond value by calling
cleanupDuration.Milliseconds() (or int64(cleanupDuration / time.Millisecond)) so
"duration_ms" is a numeric millisecond count—update the workerLog.Info call that
logs "job_id", targetJobID, "tasks_cleaned", totalCleaned, "duration_ms",
cleanupDuration to use cleanupDuration.Milliseconds() instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/test_jobs/main.go`:
- Line 73: The log currently builds a dynamic message string in startupLog.Info
using strconv.Itoa(jobWorkers); change this to a static message and pass
jobWorkers as a structured field instead (e.g., use startupLog.Info or
startupLog.With that sets "worker_count" or "jobWorkers" to jobWorkers) so the
message text is constant while the count is emitted as a field; update the call
around startupLog.Info and references to jobWorkers in main (or the function
creating the worker pool) to use the structured field API rather than string
concatenation.

In `@internal/db/retry.go`:
- Around line 51-53: The log call in internal/db/retry.go uses the key
"attempts" which is inconsistent with other retry logs; update the
dbLog.Info/dbLog.Warn/dbLog.Error calls (the ones that include "attempts" at the
success path around the DB connection retry and the other occurrences around
lines 142-145) to use the singular key "attempt" instead so all retry events use
the same field name; keep the value (attempt variable) and the "elapsed" field
unchanged to preserve semantics.

In `@internal/db/slack.go`:
- Line 172: The dbLog.Error calls that only log the error on row-scan failures
should include identifying fields from the scanned entity so failures can be
correlated; at each dbLog.Error invocation in internal/db/slack.go (the failing
scan points where you call dbLog.Error("Failed to scan slack connection row",
"error", err)) attach available identifiers from the row/struct being scanned
(for example connection ID, slack team ID, workspace ID or user ID) as
additional key/value pairs so the log reads with both the error and entity
context; update all three occurrences (around the current lines shown) to
include those identifier fields when present.
- Line 242: Replace the non-canonical log key "slack_connection_id" with the
standardized "connection_id" in the Slack DB error log call so logging is
consistent; locate the dbLog.Error invocation in internal/db/slack.go (the call
that logs "Failed to create slack user link" and currently includes "user_id",
link.UserID, "slack_connection_id", link.SlackConnectionID) and change that key
to "connection_id" while keeping the same value (link.SlackConnectionID).

In `@internal/db/webflow_connections.go`:
- Line 162: The log call dbLog.Error("Failed to scan webflow connection row",
"error", err) omits tenant context; update that call in webflow_connections.go
(the block that scans rows / handles webflow connection rows) to include the
organisation identifier field (e.g., add "organisation_id", organisationID or
orgID) as an additional key/value in the dbLog.Error invocation so
tenant/organisation context is logged alongside the error.

In `@internal/logging/logging_test.go`:
- Around line 223-239: Update the test that builds a slog.Record and calls
componentConverter so it no longer asserts a high-cardinality identifier
(job_id) is in event.Tags; instead assert that event.Tags["component"] ==
"worker" but check that event.Extra["job_id"] == "abc-123" (and remove or change
the assertion that reads event.Tags for job_id). Locate the test using the
record variable and the call to componentConverter, and change the expectation
from event.Tags to event.Extra for the "job_id" key while leaving
low-cardinality tag checks (like "component") unchanged.

---

Outside diff comments:
In `@cmd/app/main.go`:
- Around line 407-425: The call to setupLogging(config) runs before sentry.Init
causing the slog fanout to wire Sentry handler before the client exists; move
the setupLogging(config) call so it executes after the Sentry branch has either
initialised the client or skipped it: keep the sentry.Init(...) block (using
logging.BeforeSend) and any error handling as-is, and only then call
setupLogging(config) (or call it inside both branches after successful
init/skip) so the logging stack is configured after sentry.Init has completed.

In `@internal/jobs/worker.go`:
- Around line 1533-1543: The log message in
markJobFailedDueToConsecutiveFailures must be made static so Sentry fingerprints
don't fragment: stop interpolating streak/lastErr into the message string and
instead pass them as structured fields to workerLog.Error; keep the message text
like "Job failed due to consecutive task failures" and add fields
"failure_streak": streak, "last_error": lastErr (or lastErr.Error() only if
non-nil), "job_id": jobID, and "threshold": wp.jobFailureThreshold when calling
workerLog.Error.
- Around line 2057-2069: The defer in processNextTask currently calls
hub.Recover and also sentry.CaptureException then logs via workerLog.Error,
causing duplicate Sentry events; remove the explicit sentry.CaptureException
branch and keep only hub.Recover when sentry.CurrentHub() != nil (or rely solely
on the logging fanout), leaving the workerLog.Error logging intact. Update the
defer so it calls sentry.CurrentHub() and invokes hub.Recover(r) if hub != nil
and do not call sentry.CaptureException anywhere in that block (symbols:
processNextTask, sentry.CurrentHub, hub.Recover, sentry.CaptureException,
workerLog.Error).

---

Duplicate comments:
In `@internal/db/webflow_connections.go`:
- Line 201: The delete-failure log call using dbLog.Error currently only
includes "connection_id"; update the dbLog.Error invocation in the webflow
connection delete routine (the function that performs the deletion and
references connectionID and organisationID) to also include "organisation_id",
passing the organisationID variable so the log line contains both
"connection_id" and "organisation_id" for tenant traceability; ensure you modify
the dbLog.Error call (the one that logs "Failed to delete webflow connection")
to add the "organisation_id", organisationID key/value pair.

In `@internal/jobs/worker.go`:
- Around line 4562-4564: The error logs currently pass full request diagnostics
and header payloads to workerLog.Error (see the workerLog.Error call that
includes "request_diagnostics", result.RequestDiagnostics and similar calls
around the other occurrences), which can leak sensitive data; change each of
those error logs to avoid including the raw objects and instead log a small,
non-sensitive summary (e.g., task.ID, length of the diagnostics payload, number
of headers, and a redacted flag for auth/PII) and the marshal error; update the
workerLog.Error invocations at the shown sites to replace
result.RequestDiagnostics / header objects with that concise summary string or
struct.
- Around line 3793-3797: The log currently passes cleanupDuration (a
time.Duration) to workerLog.Info under the "duration_ms" key; change that to
pass the actual millisecond value by calling cleanupDuration.Milliseconds() (or
int64(cleanupDuration / time.Millisecond)) so "duration_ms" is a numeric
millisecond count—update the workerLog.Info call that logs "job_id",
targetJobID, "tasks_cleaned", totalCleaned, "duration_ms", cleanupDuration to
use cleanupDuration.Milliseconds() instead.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9054fdfd-8f32-45f2-a2e7-35102206fed3

📥 Commits

Reviewing files that changed from the base of the PR and between 6664664 and 5cdd1b2.

📒 Files selected for processing (11)
  • cmd/app/main.go
  • cmd/test_jobs/main.go
  • internal/crawler/robots.go
  • internal/crawler/tls_aia.go
  • internal/db/retry.go
  • internal/db/schedulers.go
  • internal/db/slack.go
  • internal/db/webflow_connections.go
  • internal/jobs/domain_limiter.go
  • internal/jobs/worker.go
  • internal/logging/logging_test.go

Comment thread cmd/test_jobs/main.go Outdated
Comment thread internal/db/retry.go
Comment thread internal/db/slack.go Outdated
Comment thread internal/db/slack.go Outdated
Comment thread internal/db/webflow_connections.go Outdated
Comment thread internal/logging/logging_test.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.husky/pre-commit:
- Around line 3-4: Modify the pre-commit hook so golangci-lint only runs when
staged files include Go files: detect staged filenames (use git diff --name-only
--cached) and check for entries matching *.go; if any exist, invoke the existing
golangci-lint command (golangci-lint run --config .golangci.yml --fast-only) as
before, otherwise skip it and exit successfully; update the .husky/pre-commit
logic around the golangci-lint invocation to gate execution on that staged-Go
check.
- Line 3: Replace the Bash-specific redirection in the pre-commit hook
conditional: update the line containing "if command -v golangci-lint &>
/dev/null; then" to use the POSIX-compatible redirection ">/dev/null 2>&1" so
the check for golangci-lint works in dash/other POSIX shells.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 48d7dcdb-d92c-41f7-b365-bdb830d911d5

📥 Commits

Reviewing files that changed from the base of the PR and between 5cdd1b2 and f21267a.

📒 Files selected for processing (1)
  • .husky/pre-commit

Comment thread .husky/pre-commit Outdated
Comment on lines +3 to +4
if command -v golangci-lint &> /dev/null; then
golangci-lint run --config .golangci.yml --fast-only
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Gate golangci-lint to staged Go changes for faster commits.

Line [4] currently runs golangci-lint on every commit when installed. Consider only running it when staged files include *.go, then let CI handle full coverage.

Proposed refactor
-if command -v golangci-lint >/dev/null 2>&1; then
-  golangci-lint run --config .golangci.yml --fast-only
+if git diff --cached --name-only -- '*.go' | grep -q .; then
+  if command -v golangci-lint >/dev/null 2>&1; then
+    golangci-lint run --config .golangci.yml --fast-only
+  else
+    echo "Warning: golangci-lint not installed, skipping local lint (will still run in CI)"
+  fi
 else
-  echo "Warning: golangci-lint not installed, skipping local lint (will still run in CI)"
+  echo "No staged Go files; skipping golangci-lint"
 fi

Based on learnings: Applies to **/*.go : Prefer go test and targeted checks before broader validation on Go files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.husky/pre-commit around lines 3 - 4, Modify the pre-commit hook so
golangci-lint only runs when staged files include Go files: detect staged
filenames (use git diff --name-only --cached) and check for entries matching
*.go; if any exist, invoke the existing golangci-lint command (golangci-lint run
--config .golangci.yml --fast-only) as before, otherwise skip it and exit
successfully; update the .husky/pre-commit logic around the golangci-lint
invocation to gate execution on that staged-Go check.

Comment thread .husky/pre-commit Outdated
@@ -1 +1,7 @@
npx lint-staged

if command -v golangci-lint &> /dev/null; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify POSIX shell compatibility for the hook
sh -n .husky/pre-commit

Repository: Harvey-AU/hover

Length of output: 41


🏁 Script executed:

cat -n .husky/pre-commit

Repository: Harvey-AU/hover

Length of output: 324


Use POSIX-compatible redirection for broader shell compatibility.

Line 3 contains &> /dev/null, which is Bash-specific syntax. While it passes syntax validation on this system, it is not POSIX-standard and may fail on stricter POSIX shells (e.g., dash). Use the portable equivalent >/dev/null 2>&1 to ensure the hook works reliably across different shell environments.

Suggested improvement
-if command -v golangci-lint &> /dev/null; then
+if command -v golangci-lint >/dev/null 2>&1; then
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if command -v golangci-lint &> /dev/null; then
if command -v golangci-lint >/dev/null 2>&1; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.husky/pre-commit at line 3, Replace the Bash-specific redirection in the
pre-commit hook conditional: update the line containing "if command -v
golangci-lint &> /dev/null; then" to use the POSIX-compatible redirection
">/dev/null 2>&1" so the check for golangci-lint works in dash/other POSIX
shells.

@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-333.fly.dev
Dashboard: https://hover-pr-333.fly.dev/dashboard

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
scripts/lint-go.js (1)

16-20: 🧹 Nitpick | 🔵 Trivial

Gate golangci-lint to staged Go changes before running the full lint command.

Line [16] runs linting whenever golangci-lint is installed, even when no Go files are staged. That slows non-Go commits and widens local checks unnecessarily.

Proposed refactor
 const check = spawnSync("golangci-lint", ["--version"], {
   stdio: "ignore",
   shell: false,
 });
 if (check.error) {
   console.log(
     "Warning: golangci-lint not installed, skipping local lint (will still run in CI)"
   );
   process.exit(0);
 }
 
+const stagedGo = spawnSync("git", ["diff", "--cached", "--name-only", "--", "*.go"], {
+  encoding: "utf8",
+  shell: false,
+});
+if (stagedGo.status !== 0) {
+  console.error("Failed to inspect staged files; aborting Go lint step.");
+  process.exit(stagedGo.status ?? 1);
+}
+if (!stagedGo.stdout.trim()) {
+  console.log("No staged Go files; skipping golangci-lint");
+  process.exit(0);
+}
+
 const result = spawnSync(
   "golangci-lint",
   ["run", "--config", ".golangci.yml", "--fast-only"],
   { stdio: "inherit", shell: false }
 );
 process.exit(result.status ?? 1);

Based on learnings: Applies to **/*.go : Prefer go test and targeted checks before broader validation on Go files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lint-go.js` around lines 16 - 20, The script currently always invokes
spawnSync("golangci-lint", ...) which runs the full linter even when no Go files
are staged; update scripts/lint-go.js to first collect staged file names (e.g.,
via git diff --name-only --staged), filter for files ending in .go, and only
call spawnSync("golangci-lint", ["run", ...]) when that filtered list is
non-empty; if there are staged Go files prefer running targeted checks (e.g.,
run golangci-lint only on those paths or run go test for packages touched) and
otherwise skip the full golangci-lint invocation to avoid slowing non-Go
commits.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@scripts/lint-go.js`:
- Around line 16-20: The script currently always invokes
spawnSync("golangci-lint", ...) which runs the full linter even when no Go files
are staged; update scripts/lint-go.js to first collect staged file names (e.g.,
via git diff --name-only --staged), filter for files ending in .go, and only
call spawnSync("golangci-lint", ["run", ...]) when that filtered list is
non-empty; if there are staged Go files prefer running targeted checks (e.g.,
run golangci-lint only on those paths or run go test for packages touched) and
otherwise skip the full golangci-lint invocation to avoid slowing non-Go
commits.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 13d6144a-c125-4631-8da9-468455778309

📥 Commits

Reviewing files that changed from the base of the PR and between f21267a and 8fc8e65.

📒 Files selected for processing (2)
  • .husky/pre-commit
  • scripts/lint-go.js

@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-333.fly.dev
Dashboard: https://hover-pr-333.fly.dev/dashboard

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
internal/jobs/worker.go (2)

1533-1543: ⚠️ Potential issue | 🟠 Major

Keep the error message static for fingerprinting.

message includes streak and lastErr, so each failure shape becomes a different log message and therefore a different Sentry fingerprint. That undoes the new [component, staticMessage] grouping.

Suggested fix
-	message := fmt.Sprintf("Job failed after %d consecutive task failures", streak)
-	if lastErr != nil {
-		message = fmt.Sprintf("%s (last error: %s)", message, lastErr.Error())
-	}
+	message := fmt.Sprintf("Job failed after %d consecutive task failures", streak)
+	logArgs := []any{
+		"job_id", jobID,
+		"failure_streak", streak,
+		"threshold", wp.jobFailureThreshold,
+	}
+	if lastErr != nil {
+		logArgs = append(logArgs, "last_error", lastErr)
+	}
 
-	workerLog.Error(message,
-		"job_id", jobID,
-		"failure_streak", streak,
-		"threshold", wp.jobFailureThreshold,
-	)
+	workerLog.Error("Job failed after consecutive task failures", logArgs...)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/jobs/worker.go` around lines 1533 - 1543, The log message in
markJobFailedDueToConsecutiveFailures should be a static string to preserve
Sentry grouping; change the interpolated message to a constant like "Job failed
due to consecutive task failures" and stop including streak or lastErr in the
formatted message, and instead pass the dynamic data as structured fields to
workerLog.Error (e.g. keep "job_id", "failure_streak", "threshold" and add a
field like "last_error" or "error" with lastErr) so the message text stays
constant while details remain in metadata.

2057-2069: ⚠️ Potential issue | 🟠 Major

Avoid double-capturing panics to Sentry.

This defer already reports the panic via hub.Recover / sentry.CaptureException. workerLog.Error auto-captures error-level logs as well, so one panic becomes two Sentry events.

Suggested fix
-			workerLog.Error("Recovered from panic in processNextTask",
+			workerLog.ErrorContext(logging.NoCapture(ctx), "Recovered from panic in processNextTask",
 				"panic", fmt.Sprintf("%v", r),
 				"stack", string(stack),
 			)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/jobs/worker.go` around lines 2057 - 2069, The defer in
processNextTask double-reports panics to Sentry because it calls hub.Recover /
sentry.CaptureException and then logs with workerLog.Error (which auto-captures
error-level logs); remove the explicit Sentry calls in that defer and keep only
the formatted recoveredErr, stack and workerLog.Error call so the panic is
reported once, or alternatively disable auto-capture on workerLog.Error and keep
only hub.Recover/sentry.CaptureException—update the defer that references
hub.Recover, sentry.CaptureException, workerLog.Error, recoveredErr and
debug.Stack accordingly so a single Sentry event is emitted.
♻️ Duplicate comments (2)
internal/jobs/worker.go (2)

4562-4564: ⚠️ Potential issue | 🟠 Major

Don't attach raw diagnostics and header payloads to error logs.

These Error calls still emit full request_diagnostics, headers, second_headers, and cache_attempts objects. With central error-level capture, that risks leaking sensitive values such as authorisation or cookie data and can blow up Sentry payload size. Keep only safe summaries here, or move the full payloads to Debug.

Suggested fix pattern
-		workerLog.Error("Failed to marshal request diagnostics",
-			"error", err, "task_id", task.ID, "request_diagnostics", result.RequestDiagnostics)
+		workerLog.Error("Failed to marshal request diagnostics",
+			"error", err, "task_id", task.ID)

-			workerLog.Error("Failed to marshal headers",
-				"error", err, "task_id", task.ID, "headers", result.Headers)
+			workerLog.Error("Failed to marshal headers",
+				"error", err, "task_id", task.ID, "header_keys", len(result.Headers))

-			workerLog.Error("Failed to marshal second headers",
-				"error", err, "task_id", task.ID, "second_headers", result.SecondHeaders)
+			workerLog.Error("Failed to marshal second headers",
+				"error", err, "task_id", task.ID, "second_header_keys", len(result.SecondHeaders))

-			workerLog.Error("Failed to marshal cache check attempts",
-				"error", err, "task_id", task.ID, "cache_attempts", result.CacheCheckAttempts)
+			workerLog.Error("Failed to marshal cache check attempts",
+				"error", err, "task_id", task.ID, "attempt_count", len(result.CacheCheckAttempts))

Also applies to: 4923-4939, 4951-4953

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/jobs/worker.go` around lines 4562 - 4564, The Error-level logging
calls in worker.go (e.g., the workerLog.Error invocation that currently logs
"Failed to marshal request diagnostics") must not include raw payloads like
result.RequestDiagnostics, headers, second_headers, or cache_attempts; instead
replace those fields with safe summaries (e.g., booleans, counts, and
redacted/hashed indicators) and move the full objects to a Debug-level log if
needed. Update the Error calls (search for workerLog.Error lines around the
referenced sections and the uses of request_diagnostics, headers,
second_headers, cache_attempts and task.ID) to emit only task.ID, a short
summary (e.g., len(...) or "present"/"redacted"), and the error; add a
workerLog.Debug that includes the full payloads for troubleshooting. Ensure no
Error-level log contains raw header or cookie content.

3793-3797: ⚠️ Potential issue | 🟡 Minor

duration_ms is still logging the wrong unit.

The field name says milliseconds, but Line 3796 still emits the raw time.Duration value instead of an integer millisecond count.

Suggested fix
 	workerLog.Info("Orphaned task cleanup completed for job",
 		"job_id", targetJobID,
 		"tasks_cleaned", totalCleaned,
-		"duration_ms", cleanupDuration,
+		"duration_ms", cleanupDuration.Milliseconds(),
 	)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/jobs/worker.go` around lines 3793 - 3797, The log call to
workerLog.Info is emitting cleanupDuration (symbol cleanupDuration) which is a
time.Duration but the field name "duration_ms" expects an integer millisecond
count; update the call that logs the orphaned task cleanup (workerLog.Info with
keys "job_id", "tasks_cleaned", "duration_ms") to pass
cleanupDuration.Milliseconds() (or int64(cleanupDuration / time.Millisecond))
instead of the raw duration so the logged value is milliseconds as an integer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/jobs/worker.go`:
- Line 142: The package-level variable workerLog created with
logging.Component("worker") captures the slog.Default() too early; change it to
be resolved lazily so it uses the handler installed by logging.Setup() — for
example replace the package-scope workerLog with a function or getter (e.g.,
getWorkerLog or workerLogger()) that calls logging.Component("worker") on first
use, or initialize workerLog from that getter after logging.Setup() runs; update
all references to use the new getter so logging.Component("worker") is not
invoked at package init time.

---

Outside diff comments:
In `@internal/jobs/worker.go`:
- Around line 1533-1543: The log message in
markJobFailedDueToConsecutiveFailures should be a static string to preserve
Sentry grouping; change the interpolated message to a constant like "Job failed
due to consecutive task failures" and stop including streak or lastErr in the
formatted message, and instead pass the dynamic data as structured fields to
workerLog.Error (e.g. keep "job_id", "failure_streak", "threshold" and add a
field like "last_error" or "error" with lastErr) so the message text stays
constant while details remain in metadata.
- Around line 2057-2069: The defer in processNextTask double-reports panics to
Sentry because it calls hub.Recover / sentry.CaptureException and then logs with
workerLog.Error (which auto-captures error-level logs); remove the explicit
Sentry calls in that defer and keep only the formatted recoveredErr, stack and
workerLog.Error call so the panic is reported once, or alternatively disable
auto-capture on workerLog.Error and keep only
hub.Recover/sentry.CaptureException—update the defer that references
hub.Recover, sentry.CaptureException, workerLog.Error, recoveredErr and
debug.Stack accordingly so a single Sentry event is emitted.

---

Duplicate comments:
In `@internal/jobs/worker.go`:
- Around line 4562-4564: The Error-level logging calls in worker.go (e.g., the
workerLog.Error invocation that currently logs "Failed to marshal request
diagnostics") must not include raw payloads like result.RequestDiagnostics,
headers, second_headers, or cache_attempts; instead replace those fields with
safe summaries (e.g., booleans, counts, and redacted/hashed indicators) and move
the full objects to a Debug-level log if needed. Update the Error calls (search
for workerLog.Error lines around the referenced sections and the uses of
request_diagnostics, headers, second_headers, cache_attempts and task.ID) to
emit only task.ID, a short summary (e.g., len(...) or "present"/"redacted"), and
the error; add a workerLog.Debug that includes the full payloads for
troubleshooting. Ensure no Error-level log contains raw header or cookie
content.
- Around line 3793-3797: The log call to workerLog.Info is emitting
cleanupDuration (symbol cleanupDuration) which is a time.Duration but the field
name "duration_ms" expects an integer millisecond count; update the call that
logs the orphaned task cleanup (workerLog.Info with keys "job_id",
"tasks_cleaned", "duration_ms") to pass cleanupDuration.Milliseconds() (or
int64(cleanupDuration / time.Millisecond)) instead of the raw duration so the
logged value is milliseconds as an integer.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cca6ecd6-035d-45f4-a307-4038ece26267

📥 Commits

Reviewing files that changed from the base of the PR and between 8fc8e65 and b2217fe.

📒 Files selected for processing (1)
  • internal/jobs/worker.go

Comment thread internal/jobs/worker.go
// pick a different task rather than blocking on the delay.
var ErrDomainDelay = errors.New("domain rate limit delay")

var workerLog = logging.Component("worker")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Initialising workerLog here binds it to the wrong handler.

logging.Component() snapshots slog.Default() when this package is initialised. logging.Setup() swaps the default later, so this file can keep logging through the pre-setup handler and miss the new JSON/Sentry fan-out entirely. Please resolve the logger lazily, or only construct the component logger after setup has run.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/jobs/worker.go` at line 142, The package-level variable workerLog
created with logging.Component("worker") captures the slog.Default() too early;
change it to be resolved lazily so it uses the handler installed by
logging.Setup() — for example replace the package-scope workerLog with a
function or getter (e.g., getWorkerLog or workerLogger()) that calls
logging.Component("worker") on first use, or initialize workerLog from that
getter after logging.Setup() runs; update all references to use the new getter
so logging.Component("worker") is not invoked at package init time.

@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-333.fly.dev
Dashboard: https://hover-pr-333.fly.dev/dashboard

@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-333.fly.dev
Dashboard: https://hover-pr-333.fly.dev/dashboard

@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-333.fly.dev
Dashboard: https://hover-pr-333.fly.dev/dashboard

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/aggregate_logs.py`:
- Around line 193-205: The CSV writing in write_components_csv currently
concatenates strings and will break if component names contain commas or quotes;
change it to use Python's csv.writer: open the file with open(csv_path, "w",
newline="") and create a csv.writer(f), then write the header with
writer.writerow(["timestamp"] + all_components) and write each row using
writer.writerow([minute] + counts) (ensure counts are strings or ints) so the
csv module handles quoting/escaping for component names and values.

In `@scripts/process_logs.py`:
- Around line 22-24: The for-loop that checks timestamp keys currently uses the
verbose pattern `if key in record and record[key]:` — make it consistent with
`_normalise_full_timestamp` by using the cleaner `if record.get(key):` pattern;
update the loop that assigns `raw = str(record[key])` to retrieve the value via
`record.get(key)` (and keep assigning to `raw`) so the key-checking style
matches the `_normalise_full_timestamp` implementation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7848c7f7-095c-438d-a571-5264c69818ac

📥 Commits

Reviewing files that changed from the base of the PR and between b6c1c0c and 9c89179.

📒 Files selected for processing (2)
  • scripts/aggregate_logs.py
  • scripts/process_logs.py

Comment thread scripts/aggregate_logs.py Outdated
Comment thread scripts/process_logs.py
@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-333.fly.dev
Dashboard: https://hover-pr-333.fly.dev/dashboard

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
scripts/aggregate_logs.py (1)

172-190: ⚠️ Potential issue | 🟠 Major

Don’t use the top-50 CSV as incremental state.

load_existing_data() restores event_counts from events_per_minute.csv, but this writer only persists the current top 50 events. After the next incremental run, every historical event outside that set is gone, so summary.md and future top-event selection undercount older rare events. Please keep the full per-minute event map in durable state and treat this CSV as presentation only.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/aggregate_logs.py` around lines 172 - 190, The write_events_csv
function currently writes only the top_n events to events_per_minute.csv which
is then read by load_existing_data, causing loss of older events; instead
persist the full per-minute event_counts to durable storage and reserve
write_events_csv for presentation: modify write_events_csv to write only a
human-facing CSV and add a new durable writer (or change the existing file
target) that serializes the complete by_minute[minute]["event_counts"] (e.g.,
JSON or full CSV with event keys) so load_existing_data reads the full
per-minute map rather than the truncated top-N; update references to
events_per_minute.csv, load_existing_data, write_events_csv, and summary.md to
use the full durable store for state and keep the top-N CSV as a read-only
report.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/aggregate_logs.py`:
- Around line 132-136: The current merge only adds file-level totals when
file_minutes is non-empty, causing summaries with parsed==0 to be dropped;
always merge total_lines and failed_to_parse into the aggregate even if
file_minutes is empty by adding them to a dedicated bucket or global counters.
Update the block that references file_minutes, first_minute, by_minute,
total_lines and failed_to_parse so that if file_minutes is empty you still
record the file-level counts (e.g., create a sentinel minute key or increment
top-level aggregate counters) rather than skipping the merge. Ensure the change
preserves existing behavior when file_minutes is non-empty (still add to
by_minute[first_minute]).
- Around line 308-311: The summary currently reports attempted files because
write_summary(summary_path, by_minute, len(new_files)) uses len(new_files) even
though failed files are excluded from by_minute; change the call to pass the
actual successful list/count from process_json_file (use the success variable) —
e.g., call write_summary(summary_path, by_minute, len(success)) or pass success
directly if write_summary expects the list — so summary.md reflects processed
(successful) inputs; update any write_summary parameter expectation if needed.

In `@scripts/process_logs.py`:
- Around line 119-123: The current branch skips creating the flat CSV when
flat_rows is empty; always open flat_csv_path and write the CSV header so an
empty-run doesn't leave stale files—inside the block checking flat_csv_path is
not None, open flat_csv_path with the same newline/encoding settings, create
csv.DictWriter(...) and call writer.writeheader() unconditionally, then only
call writer.writerows(flat_rows) when flat_rows is truthy; update the code
around flat_csv_path, flat_rows and the writer/DictWriter usage in
scripts/process_logs.py accordingly.

---

Duplicate comments:
In `@scripts/aggregate_logs.py`:
- Around line 172-190: The write_events_csv function currently writes only the
top_n events to events_per_minute.csv which is then read by load_existing_data,
causing loss of older events; instead persist the full per-minute event_counts
to durable storage and reserve write_events_csv for presentation: modify
write_events_csv to write only a human-facing CSV and add a new durable writer
(or change the existing file target) that serializes the complete
by_minute[minute]["event_counts"] (e.g., JSON or full CSV with event keys) so
load_existing_data reads the full per-minute map rather than the truncated
top-N; update references to events_per_minute.csv, load_existing_data,
write_events_csv, and summary.md to use the full durable store for state and
keep the top-N CSV as a read-only report.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0b8b6514-3dc5-4633-bab0-212549cad631

📥 Commits

Reviewing files that changed from the base of the PR and between 9c89179 and d2ce703.

📒 Files selected for processing (2)
  • scripts/aggregate_logs.py
  • scripts/process_logs.py

Comment thread scripts/aggregate_logs.py Outdated
Comment thread scripts/aggregate_logs.py Outdated
Comment thread scripts/process_logs.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-333.fly.dev
Dashboard: https://hover-pr-333.fly.dev/dashboard

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/aggregate_logs.py`:
- Around line 132-136: The current code uses the sentinel anchor =
"_unparseable" and writes totals into by_minute[anchor], which later gets
treated as a timestamp key; instead, avoid injecting that sentinel into the
time-series map by moving those file-level totals out of by_minute: if
file_minutes is empty, accumulate total_lines and failed_to_parse into a
separate scalar or dict (e.g., unparseable_totals) or always add totals only to
the first minute when file_minutes exists; alternatively ensure any writer that
iterates by_minute filters out the "_unparseable" key before
sorting/formatting—update the logic around anchor, by_minute, total_lines, and
failed_to_parse (and any writer functions that consume by_minute) to implement
one of these fixes so the sentinel never becomes a timestamp row in outputs.
- Around line 58-109: The current load_existing_data function rebuilds by_minute
from lossy CSV reports (time_series.csv, events_per_minute.csv,
components_per_minute.csv) which drops low-volume events and doesn't persist
totals (total_lines, failed_to_parse); instead persist and load a lossless JSON
state file and stop reconstructing full state from the CSVs. Change
load_existing_data to first look for a single durable state file (e.g.,
aggregate_state.json) and load the complete by_minute structure and aggregate
fields (total_lines, failed_to_parse) from it; only fall back to CSV parsing as
a best-effort / partial fallback (or remove CSV-based reconstruction entirely),
and ensure the code paths that write state (where incremental runs update state)
write the full JSON snapshot so functions referencing by_minute, total_lines,
and failed_to_parse can trust restored data.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a7f4bb65-ba9a-4a7c-b2c0-aebd9d4ddcbe

📥 Commits

Reviewing files that changed from the base of the PR and between d2ce703 and e809670.

📒 Files selected for processing (2)
  • scripts/aggregate_logs.py
  • scripts/process_logs.py

Comment thread scripts/aggregate_logs.py Outdated
Comment thread scripts/aggregate_logs.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-333.fly.dev
Dashboard: https://hover-pr-333.fly.dev/dashboard

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/aggregate_logs.py`:
- Around line 283-284: The current logic only calls save_data(log_path,
by_minute, totals, processed_set) when incremental is true, so a full rebuild
(when incremental is False) leaves the on-disk state (.aggregate_data.json)
stale; either call save_data unconditionally after rebuilding (remove the
incremental check so save_data(...) runs for both full and incremental paths),
or explicitly remove the stale state file when running with --full (delete the
.aggregate_data.json path referenced by log_path or the module-level state
filename) to ensure future incremental runs don't reload outdated
processed_set/by_minute/totals.
- Around line 104-125: The code mutates totals and by_minute while iterating
data (variables totals, by_minute, and data) so a partial failure (during
event_counts or component_counts) leaves partial updates; fix by building local
temporary accumulators (e.g., local_totals and local_by_minute) and apply the
changes to totals and by_minute only after all parsing of data["level_counts"],
data["event_counts"], and data["component_counts"] completes successfully;
ensure you copy the same update logic (incrementing samples, level_counts,
event_counts, component_counts) into the temp structures and only commit the
temp accumulators to totals and by_minute at the end of the function (or return
them) to make each file’s merge atomic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b98254f3-bdf0-4d4c-b7aa-22d2e8181c16

📥 Commits

Reviewing files that changed from the base of the PR and between e809670 and 605f36b.

📒 Files selected for processing (1)
  • scripts/aggregate_logs.py

Comment thread scripts/aggregate_logs.py
Comment thread scripts/aggregate_logs.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-333.fly.dev
Dashboard: https://hover-pr-333.fly.dev/dashboard

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/aggregate_logs.py`:
- Around line 238-245: The current warn_and_error grouping in
scripts/aggregate_logs.py (using event_totals and the warn_and_error variable)
relies on message keyword heuristics because scripts/process_logs.py serialises
a flattened event_counts without level information; update the pipeline so the
serialized summary includes per-level counts (e.g., event_counts_by_level or a
dedicated warn_error_map) in process_logs.py instead of dropping level, then
change aggregate_logs.py to build warn_and_error from those per-level counts
(reference event_counts/event_totals and the warn_and_error construction) so the
table is generated from true severity counts rather than message keyword
matching.
- Around line 93-95: The current write to data_file (constructed from log_dir /
DATA_FILE_NAME) truncates the target before json.dump finishes; change the write
to first serialize to a temp file in the same directory (e.g., create a temp
file alongside data_file using tempfile.NamedTemporaryFile or a unique suffix),
write json.dump(data, temp_f, separators=(",", ":")) and flush/fsync as needed,
then atomically replace the target with os.replace(temp_path, data_file) so the
aggregate state is only swapped in after a successful write.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1751b3f4-35d6-4f75-94de-a4fcfea989ac

📥 Commits

Reviewing files that changed from the base of the PR and between 605f36b and d457d5f.

📒 Files selected for processing (1)
  • scripts/aggregate_logs.py

Comment thread scripts/aggregate_logs.py
Comment thread scripts/aggregate_logs.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-333.fly.dev
Dashboard: https://hover-pr-333.fly.dev/dashboard

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/process_logs.py (1)

88-128: ⚠️ Potential issue | 🟠 Major

Stream the flat CSV instead of buffering every parsed row.

main() always passes flat_csv_path, so this path holds one dict per parsed log line in flat_rows before anything is written. On large Fly.io dumps that adds O(n) memory for output that is fully streamable. Open the writer once and emit rows inside the read loop.

🛠️ Proposed refactor
-    flat_rows = []
+    flat_file = None
+    writer = None
+    if flat_csv_path is not None:
+        flat_file = flat_csv_path.open("w", newline="", encoding="utf-8")
+        writer = csv.DictWriter(
+            flat_file,
+            fieldnames=["timestamp", "level", "component", "message", "extras"],
+        )
+        writer.writeheader()

-    with raw_path.open("r", encoding="utf-8", errors="ignore") as handle:
-        for record, _original in _iter_records(handle):
+    try:
+        with raw_path.open("r", encoding="utf-8", errors="ignore") as handle:
+            for record, _original in _iter_records(handle):
                 total += 1
                 if record is None:
                     errors += 1
                     continue
@@
-            if flat_csv_path is not None:
+            if writer is not None:
                 extras = {k: v for k, v in record.items() if k not in _CORE_FIELDS}
-                flat_rows.append({
+                writer.writerow({
                     "timestamp": _normalise_full_timestamp(record),
                     "level": level,
                     "component": component,
                     "message": message,
                     "extras": json.dumps(extras, separators=(",", ":")) if extras else "",
                 })
-
-    if flat_csv_path is not None:
-        with flat_csv_path.open("w", newline="", encoding="utf-8") as f:
-            writer = csv.DictWriter(f, fieldnames=["timestamp", "level", "component", "message", "extras"])
-            writer.writeheader()
-            if flat_rows:
-                writer.writerows(flat_rows)
+    finally:
+        if flat_file is not None:
+            flat_file.close()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/process_logs.py` around lines 88 - 128, The code currently buffers
every parsed row in flat_rows when flat_csv_path is set; change this to stream
rows as they are parsed by opening the CSV file and creating a csv.DictWriter
once (writeheader()) before iterating _iter_records, then inside the loop build
the row dict (use _normalise_full_timestamp(record), level, component, message
and json.dumps extras for keys not in _CORE_FIELDS) and call
writer.writerow(row) immediately instead of appending to flat_rows; remove the
flat_rows accumulation and keep the existing logic that only opens/writes the
CSV when flat_csv_path is not None so the writer is only active when needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/aggregate_logs.py`:
- Around line 134-138: The commit of warn_error_counts must be made atomic in
process_json_file: currently totals["total_lines"] and totals["failed_to_parse"]
are updated before merging warn_error_counts, so exceptions from a
missing/malformed warn_error_counts can leave scalars double-counted on retry;
instead, validate and stage warn_error_counts first (ensure
data.get("warn_error_counts") is a dict and all counts are numeric), build a
local staged_counts dict summing the file's warn_error_counts, and only after
successful validation update totals by adding file_total_lines, file_failed, and
merging staged_counts into totals["warn_error_counts"] so the three updates
occur together.

---

Outside diff comments:
In `@scripts/process_logs.py`:
- Around line 88-128: The code currently buffers every parsed row in flat_rows
when flat_csv_path is set; change this to stream rows as they are parsed by
opening the CSV file and creating a csv.DictWriter once (writeheader()) before
iterating _iter_records, then inside the loop build the row dict (use
_normalise_full_timestamp(record), level, component, message and json.dumps
extras for keys not in _CORE_FIELDS) and call writer.writerow(row) immediately
instead of appending to flat_rows; remove the flat_rows accumulation and keep
the existing logic that only opens/writes the CSV when flat_csv_path is not None
so the writer is only active when needed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e10bcc74-b521-4fcb-8a84-aaf4571cc7bb

📥 Commits

Reviewing files that changed from the base of the PR and between 605f36b and 4a33a10.

📒 Files selected for processing (2)
  • scripts/aggregate_logs.py
  • scripts/process_logs.py

Comment thread scripts/aggregate_logs.py
@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-333.fly.dev
Dashboard: https://hover-pr-333.fly.dev/dashboard

@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-333.fly.dev
Dashboard: https://hover-pr-333.fly.dev/dashboard

@simonsmallchua simonsmallchua merged commit aa2c69d into main Apr 18, 2026
11 checks passed
@simonsmallchua simonsmallchua deleted the work/cranky-albattani branch April 18, 2026 21:14
simonsmallchua added a commit that referenced this pull request Apr 19, 2026
Migrate to slog + Sentry fingerprinting
@coderabbitai coderabbitai Bot mentioned this pull request Apr 19, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant