Skip to content

Merge main into feature/redis-broker-v2#335

Merged
simonsmallchua merged 46 commits into
feature/redis-broker-v2from
work/funny-bohr-6c8c3e
Apr 19, 2026
Merged

Merge main into feature/redis-broker-v2#335
simonsmallchua merged 46 commits into
feature/redis-broker-v2from
work/funny-bohr-6c8c3e

Conversation

@simonsmallchua
Copy link
Copy Markdown
Contributor

@simonsmallchua simonsmallchua commented Apr 19, 2026

Summary

Brings origin/main (37 commits ahead, notably the zerolog → slog migration) into the Stage 3 Redis broker branch so PR #330 can resume from a fresh base.

Conflict resolution

  • Delete/modify kept as deletes: worker.go, domain_limiter.go, test_jobs/main.go — main's changes were logger-only; the files remain removed under the Redis dispatch architecture.
  • Content: go.mod, Dockerfile, fly.toml, cmd/app/main.go, internal/jobs/manager.go, internal/db/queue.go — resolved in favour of the Redis broker wiring (kept OnTasksEnqueued callback, dropped legacy worker-pool env vars + DB queue helpers with no remaining callers).

zerolog → slog port

Converted 13 files to the project's existing logging.Component("...") pattern (no injected loggers, matches internal/jobs/manager.go):

  • internal/broker/{redis,scheduler,counters,pacer,dispatcher,consumer}.go + tests (miniredis tests no longer need zerolog.Nop())
  • internal/jobs/{stream_worker,executor,link_discovery,manager}.go
  • cmd/worker/main.go (added logging.Setup + sentry.Init with BeforeSend: logging.BeforeSend)
  • cmd/app/main.go (startup log ported; removed unused getEnvInt)

go.mod drops rs/zerolog entirely; keeps redis/go-redis/v9 v9.18.0.

Inherited from main

archive_key_migrate — a 37 MB Mach-O binary accidentally committed in baca9f6 on main. Left as-is; should be addressed on main.

Verification

  • go build ./... clean
  • go vet ./... clean
  • go test ./... all packages pass (broker, jobs, db, api, logging, …)
  • gofmt -l . / goimports -l cmd internal clean
  • prettier --check clean

Summary by CodeRabbit

  • New Features

    • Added Grafana Cloud metrics agent for enhanced application monitoring and observability.
    • Implemented centralized structured logging with automatic error capture and correlation.
    • Added configurable crawler concurrency control via environment variable.
  • Refactoring

    • Modernized logging infrastructure from legacy logger to unified structured logging system.
    • Simplified error reporting through automatic capture mechanisms.
  • Chores

    • Updated Go toolchain from 1.26.1 to 1.26.2.
    • Upgraded Sentry integration to v0.45.0.
    • Added pre-commit Go linting validation.

simonsmallchua and others added 30 commits April 14, 2026 22:45
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 19d420ce-d0b1-43f1-b678-a1b558d9a999

📥 Commits

Reviewing files that changed from the base of the PR and between 7ccce3b and 601340a.

📒 Files selected for processing (6)
  • .coderabbit.yaml
  • .github/workflows/fly-deploy.yml
  • .github/workflows/review-apps.yml
  • .github/workflows/test.yml
  • Dockerfile
  • go.mod
✅ Files skipped from review due to trivial changes (3)
  • .coderabbit.yaml
  • .github/workflows/test.yml
  • go.mod
🚧 Files skipped from review as they are similar to previous changes (3)
  • .github/workflows/review-apps.yml
  • .github/workflows/fly-deploy.yml
  • Dockerfile

📝 Walkthrough

Walkthrough

This PR migrates the application's logging infrastructure from zerolog to stdlib log/slog with integrated Sentry error capture, introduces a new internal/logging package providing component-scoped structured loggers, and adds Grafana Alloy observability infrastructure for metrics collection and forwarding to Grafana Cloud.

Changes

Cohort / File(s) Summary
Internal Logging Package
internal/logging/logging.go, internal/logging/logging_test.go
New unified structured logger built on log/slog with automatic Sentry error capture, component naming, request-scoped attributes, message normalization, and NoCapture(ctx) suppression mechanism. Includes Sentry BeforeSend processor and fanout handler for multiplexing to multiple slog.Handlers.
Go Module and Toolchain Updates
go.mod, Dockerfile, .github/workflows/fly-deploy.yml, .github/workflows/review-apps.yml, .github/workflows/test.yml
Bumped Go toolchain from 1.26.1 to 1.26.2 across build configurations and go.mod; upgraded sentry-go from v0.36.2 to v0.45.0 and added sentry-go/slog integration; removed zerolog dependency and related indirect requirements.
Application Entry Points
cmd/app/main.go, cmd/worker/main.go, cmd/archive_key_migrate/main.go
Migrated startup logging to component-scoped loggers using new internal/logging package; replaced direct Sentry calls with logging integration that auto-captures errors; updated broker/service initialization to remove zerolog logger arguments.
API Handler Logging Migration
internal/api/logging.go, internal/api/middleware.go, internal/api/handlers.go, internal/api/admin.go, internal/api/auth.go, internal/api/auth_google.go, internal/api/auth_webflow.go, internal/api/domains.go, internal/api/errors.go, internal/api/google_data_api.go, internal/api/jobs.go, internal/api/notifications.go, internal/api/organisations.go, internal/api/response.go, internal/api/schedulers.go, internal/api/share_links.go, internal/api/slack.go, internal/api/webflow_sites.go
Converted all logging calls from zerolog chained builders to structured key/value API; replaced package-level zerolog/log imports with request-scoped loggers; updated loggerWithRequest to return *logging.Logger and added loggerWithRequestPath helper; standardized error field passing via "error", err pattern.
Database and Queue Logging
internal/db/batch.go, internal/db/dashboard.go, internal/db/db.go, internal/db/google_connections.go, internal/db/notifications.go, internal/db/pages.go, internal/db/pressure.go, internal/db/queue.go, internal/db/reset_migrations.go, internal/db/retry.go, internal/db/schedulers.go, internal/db/slack.go, internal/db/users.go, internal/db/webflow_connections.go, internal/db/webflow_site_settings.go
Replaced zerolog and Sentry calls with dbLog/queueLog component loggers; removed .CaptureException calls and converted structured logging to key/value pairs; added ErrUserNotFound sentinel error and refined error handling logic in user/connection lookups; added redactURL helper for safe connection string logging.
Jobs and Crawler Logging
internal/jobs/executor.go, internal/jobs/link_discovery.go, internal/jobs/manager.go, internal/jobs/stream_worker.go, internal/crawler/crawler.go, internal/crawler/robots.go, internal/crawler/sitemap.go, internal/crawler/tls_aia.go, internal/techdetect/detector.go
Migrated from zerolog global logger to component-scoped loggers (jobsLog, crawlerLog, techdetectLog); removed logger field from LinkDiscoveryDeps and StreamWorkerPool; removed Sentry capture calls; updated all logging to structured key/value format; added AIA URL malformation handling in tls_aia.go.
Broker Components Logging Refactor
internal/broker/redis.go, internal/broker/consumer.go, internal/broker/counters.go, internal/broker/counters_test.go, internal/broker/dispatcher.go, internal/broker/pacer.go, internal/broker/pacer_test.go, internal/broker/scheduler.go, internal/broker/scheduler_test.go
Removed logger field and zerolog dependency from Client, Consumer, RunningCounters, Dispatcher, and DomainPacer; updated constructor signatures to eliminate logger parameter; replaced all logging with brokerLog component logger; updated test constructors accordingly.
Notifications and Auth
internal/notifications/listener.go, internal/notifications/slack.go, internal/auth/middleware.go
Migrated from zerolog to component-scoped loggers (notifyLog, authLog); replaced Sentry capture calls with logging; refactored listener.go to return early when NewListener returns nil; converted all structured logging to key/value pattern.
Utilities
internal/util/url.go
Added utilLog component logger; replaced zerolog debug logging with structured key/value calls in URL normalization.
Crawler Configuration
internal/crawler/config.go, internal/crawler/config_test.go
Added environment variable GNH_CRAWLER_MAX_CONCURRENCY support with fallback to 10; implemented getEnvInt helper; added two unit tests validating env var parsing and fallback behavior.
Deployment and Infrastructure
Dockerfile, fly.toml, scripts/start.sh, .gitignore
Updated Dockerfile to include Grafana Alloy metrics agent stage; added alloy.river config copying and execution; changed process command to ./start.sh entrypoint; updated Go builder base image to 1.26.2; added gcompat library for Alloy compatibility; added scripts/start.sh shell entrypoint that manages file descriptor limits, conditionally starts Alloy agent, and forwards signals to both processes; removed alloy.river from .gitignore.
Grafana Cloud Integration
alloy.river, .github/workflows/fly-deploy.yml, .github/workflows/review-apps.yml
Added prometheus.scrape "hover" job targeting localhost:9464 with 60s interval and identity labels; added prometheus.remote_write "grafana_cloud" block for metrics push to Grafana Cloud with HTTP basic auth using GRAFANA_CLOUD_USER and GRAFANA_CLOUD_API_KEY; extended CI workflows to export and set these credentials via 1Password secrets to Fly environment.
Documentation and Logging Audit
.claude/skills/structured-logging/SKILL.md, docs/diagnostics/LOGGING_AUDIT.md, docs/diagnostics/SENTRY_AUDIT.md, docs/architecture/CONFIG_REFERENCE.md, docs/operations/ENV_VARS.md, CHANGELOG.md
Updated skill documentation to direct to internal/logging package; added component taxonomy and Sentry best practices; created LOGGING_AUDIT and SENTRY_AUDIT diagnostic reports; documented crawler concurrency configuration; added changelog entry for v0.32.7 covering logging migration and Alloy infrastructure.
Log Processing and Linting
scripts/process_logs.py, scripts/aggregate_logs.py, scripts/lint-go.js, .husky/pre-commit, .coderabbit.yaml
Refactored log processing to extract and aggregate component/event counts per minute; added flat CSV output with extras field; replaced incremental state tracking with fingerprint-based change detection; added new write_events_csv() and write_components_csv() output generators; added lint-go.js Node.js script for conditional golangci-lint execution on staged Go files; added pre-commit hook to run lint-go.js; enabled CodeRabbit auto-review in .coderabbit.yaml.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant Log as slog.Default()
    participant Fanout as fanoutHandler
    participant Text as Text Handler
    participant Sentry as Sentry Handler
    participant SentryAPI as Sentry Backend

    App->>Log: logger.Error("msg", "error", err, ...)
    Log->>Fanout: Handle(record)
    par Text Output
        Fanout->>Text: Handle(record)
        Text->>Text: Format to stdout
    and Sentry Capture
        Fanout->>Sentry: Handle(record)
        Note over Sentry: Check level >= error<br/>Normalize message (UUIDs, numbers)<br/>Check no_capture attribute
        alt no_capture present
            Sentry->>Sentry: Drop event
        else capture enabled
            Sentry->>Sentry: Build Sentry event<br/>component tag, fingerprint,<br/>exception from error field
            Sentry->>SentryAPI: Send event
            SentryAPI->>Sentry: Event accepted
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

This PR involves substantial scope—a comprehensive logging infrastructure migration across dozens of files—but the changes are largely homogeneous (repetitive application of the same refactoring pattern). The new internal/logging package introduces moderate complexity with Sentry integration, message normalization, and fanout handling. Careful review is needed for: (1) the core logging package implementation and Sentry integration logic; (2) exported type/function signature changes in broker and db packages; (3) error handling refinements (e.g., ErrUserNotFound sentinel); (4) deployment script correctness and signal handling; (5) spot-checking a sample of the many logging refactors for consistency. The repetitive nature of most changes (same pattern applied many times) reduces per-file cognitive load, but the sheer volume and breadth of affected code requires thorough validation.

Possibly related PRs

  • Move Alloy metrics agent to hover sidecar #334 — Adds Grafana Alloy sidecar infrastructure and Grafana Cloud secret wiring (identical deployment/observability changes as this PR: Dockerfile, scripts/start.sh, alloy.river, CI workflows, fly.toml modifications).
  • Migrate to slog + Sentry fingerprinting #333 — Implements the same internal/logging package with slog-based component loggers and Sentry integration, and replaces zerolog across the codebase (directly overlapping logging migration).
  • Fix crawler throughput under load #325 — Modifies internal/db/batch.go, internal/db/pressure.go, and multiple internal/jobs files for throughput/queue optimization, directly overlapping with this PR's logging refactors in the same subsystems.

Poem

🐰 Logs now flow in structured streams so clean,
With components named, no noise in between,
Sentry listens to errors that dare,
While Alloy gathers metrics in the air,
A hoppy migration, from logs to slog cheer! 🌟

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch work/funny-bohr-6c8c3e

@supabase
Copy link
Copy Markdown

supabase Bot commented Apr 19, 2026

Updates to Preview Branch (work/funny-bohr-6c8c3e) ↗︎

Deployments Status Updated
Database Sun, 19 Apr 2026 05:43:18 UTC
Services Sun, 19 Apr 2026 05:43:18 UTC
APIs Sun, 19 Apr 2026 05:43:18 UTC

Tasks are run on every commit but only new migration files are pushed.
Close and reopen this PR if you want to apply changes from existing seed or migration files.

Tasks Status Updated
Configurations Sun, 19 Apr 2026 05:43:19 UTC
Migrations Sun, 19 Apr 2026 05:43:21 UTC
Seeding Sun, 19 Apr 2026 05:43:23 UTC
Edge Functions Sun, 19 Apr 2026 05:43:23 UTC

View logs for this Workflow Run ↗︎.
Learn more about Supabase for Git ↗︎.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 19, 2026

@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-335.fly.dev
Dashboard: https://hover-pr-335.fly.dev/dashboard

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/api/handlers.go (1)

627-635: ⚠️ Potential issue | 🟠 Major

Avoid Sentry-flooding /config.js misconfigurations.

apiLog.Error(...) is auto-captured, and /config.js is requested on every page load. One missing env var will now produce an event per request until the deploy is fixed. Please validate this once at startup, or log this handler failure without capture.

Minimal mitigation in this handler
-		apiLog.Error("supabase config missing", "error", err)
+		apiLog.Warn("Supabase config unavailable for config.js", "error", err)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/api/handlers.go` around lines 627 - 635, Call buildConfigSnippet
once at startup (init/bootstrap) and cache its result so the /config.js handler
reads the cached snippet instead of calling buildConfigSnippet per request; if
the startup call fails, fail fast or record a single non-capturing error there.
In the handler (the block currently calling buildConfigSnippet and
apiLog.Error), remove the apiLog.Error call that triggers automatic capture and
instead use the cached value and, on missing config, log with a non-capturing
mechanism (e.g., standard log.Printf or a warning-level logger) and return the
same HTTP 500 message. Ensure references: buildConfigSnippet, the /config.js
handler block, and apiLog.Error so you locate and replace the request-time error
capture with startup validation and non-capturing request-time logging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/structured-logging/SKILL.md:
- Around line 31-46: The component taxonomy references deleted files—remove or
update the `worker` row (currently pointing to `internal/jobs/worker.go`) and
the `jobs` row that lists `domain_limiter.go` so contributors aren’t led to dead
paths; replace them with the current locations (or consolidate under the `jobs`
component to point to existing files in `internal/jobs/`), and ensure any
mention of `domain_limiter.go` is removed or replaced with the new
symbol/location for the domain-limiting logic.

In `@alloy.river`:
- Around line 1-6: The scrape target in prometheus.scrape "hover" currently uses
identical addresses so add a unique label on the target before forwarding:
update the targets block for prometheus.scrape "hover" to include a
distinguishing label (for example an env/app/instance label) for each
machine/app, so that when forward_to =
[prometheus.remote_write.grafana_cloud.receiver] the remote write stream
contains the unique label; modify the target definition inside prometheus.scrape
"hover" (the targets array) to inject that label so the remote write receiver
sees non-colliding series.

In `@CHANGELOG.md`:
- Around line 89-95: The release note is inaccurate about removals—while zerolog
is gone, svc-level Sentry SDK imports remain in cmd/app/main.go and
cmd/worker/main.go for initialization; update the CHANGELOG entry to tighten
wording to state that zerolog has been removed and that only SDK initialization
still imports sentry-go in the main entrypoints (e.g., cmd/app/main.go and
cmd/worker/main.go), or otherwise rephrase to avoid claiming “no direct ...
sentry imports remain.”

In `@cmd/app/main.go`:
- Around line 699-706: Replace the hard-coded development message so it’s
environment-neutral and optionally include the current env; specifically, change
the call to startupLog.Info("Hover Development Server Ready!") to a neutral
message like startupLog.Info("Hover Server Ready") or startupLog.Info("Hover
Server Ready", "env", config.Env); update references around baseURL and other
startupLog.Info calls to keep consistent logging format.

In `@cmd/worker/main.go`:
- Around line 76-77: Remove the explicit sentry.CaptureException calls and rely
on workerLog.Fatal to emit the error via the slog→Sentry fanout; specifically
delete the sentry.CaptureException(err) calls that appear immediately before
workerLog.Fatal("failed to connect to PostgreSQL", "error", err) and the similar
block around the Redis/startup failure (lines referenced in the review), leaving
only the workerLog.Fatal calls; also remove the Sentry package import if it
becomes unused after these deletions.

In `@docs/diagnostics/SENTRY_AUDIT.md`:
- Around line 24-35: Regenerate the SENTRY_AUDIT.md from the current branch
state and update the table so it no longer shows stale findings (remove entries
for symbols no longer present such as internal/jobs/worker.go and ensure
cmd/app/main.go’s actual BeforeSend status is reflected); run the
audit-generation tool or script you used originally to produce the table, verify
entries for startHealthMonitoring, Flush, flushIndividualUpdates, GetJobStats,
and ProcessJob (or their replacements) match the live code, update rows covering
the 95-108 range accordingly, and commit the regenerated SENTRY_AUDIT.md.

In `@internal/api/admin.go`:
- Around line 51-54: Introduce a small helper function
organisationIDOrNone(orgID *string) string that returns "none" when orgID is nil
and the dereferenced string otherwise, then replace the duplicated conversion
logic that sets orgIDStr and orgIDStr2 in admin.go with calls to
organisationIDOrNone(user.OrganisationID) (and the corresponding other variable)
so both places use the helper and remove the repeated if-nil branches.

In `@internal/api/auth_google.go`:
- Around line 328-330: The code is logging the raw pending GA session ID
(sessionID) which is a bearer key; remove it from the logger.Info call in the
block that calls storePendingGASession(session) and only log non-sensitive
metadata (e.g., "account_count" and "property_count"). If you still need a
correlate-able value for debugging, replace session_id with a non-reversible
fingerprint (e.g., a hash or masked prefix) computed from sessionID and log that
instead, but do not emit the raw sessionID; update references to
storePendingGASession(session) and the logger.Info invocation accordingly.

In `@internal/crawler/tls_aia.go`:
- Around line 184-187: The code calls parsed.Host even when url.Parse(rawURL)
may return a nil parsed and non-nil err; update the logic in the TLS AIA
handling so you first check if err != nil and handle/log the parse error
(include rawURL and err) and return nil before referencing parsed; then perform
the scheme check on the non-nil parsed (ensure parsed.Scheme is only accessed
when parsed != nil) and log the rejected non-HTTP(S) URL using parsed.Host only
after successful parse.

In `@internal/db/pages.go`:
- Line 73: The log call dbLog.Warn("Skipping invalid URL", "error", err, "url",
u) writes the raw untrusted URL value (u) to logs; instead sanitize or omit
sensitive parts before logging: parse u with the URL parser, remove or mask
RawQuery and Fragment (or only log hostname and path), then pass the sanitized
string to dbLog.Warn (e.g., replace the "url" field value with the
sanitized/masked URL or a boolean flag). Update the call site that constructs
this warning so it no longer logs the raw u variable but the safely redacted
value.

In `@internal/db/reset_migrations.go`:
- Around line 16-18: Replace the banner log calls that use dbLog.Warn with
dbLog.Info because these are normal lifecycle progress messages; specifically
locate the calls emitting "=== DATABASE RESET STARTED ===", "Dropping
job-related tables, clearing migrations, and rebuilding schema", and "Users and
organisations will be preserved" (and the similar set at the later block around
the other banner lines) and change them from dbLog.Warn(...) to dbLog.Info(...);
keep the message text identical and only adjust the log level so dashboards
won't be skewed by benign warnings.

In `@internal/db/webflow_connections.go`:
- Line 201: The db log at the dbLog.Error call in webflow_connections.go uses
the key "org_id"; change that key to the standardized "organisation_id" so it
matches other DB logs (update the call in the error path that logs connectionID
and organisationID inside the function handling deletion of webflow connections,
e.g., the dbLog.Error invocation referencing connectionID and organisationID).
Ensure any other nearby log statements in the same file use the same
"organisation_id" key for consistency.

In `@internal/jobs/stream_worker.go`:
- Around line 418-420: The refreshActiveJobs loop currently calls jobsLog.Error
on failure which will auto-capture to Sentry every 10s; change the call in
refreshActiveJobs to jobsLog.Warn (or use the logging API flag to suppress
capture) so transient control DB outages do not flood Sentry. Locate the error
handling in refreshActiveJobs where jobsLog.Error("failed to refresh active
jobs", "error", err) is invoked and replace it with a non-capturing warning
(e.g., jobsLog.Warn with the same message and error) or call the logger method
that explicitly disables Sentry capture.

In `@internal/notifications/listener.go`:
- Around line 136-139: The warning branch in NewListener currently starts the
polling goroutine unconditionally, which will dereference a nil service and
panic; modify the logic so startPolling is only invoked when service is non-nil
(or replace the call with a safe no-op) — update the branch that checks if
listener == nil and the similar block around lines 198-200 so they do not call
startPolling(ctx, service) when service == nil, referencing the NewListener
function, the listener variable, the service variable, and the startPolling
function to locate the code to change.

In `@scripts/aggregate_logs.py`:
- Around line 5-8: The documentation header under the Outputs block in
scripts/aggregate_logs.py is stale because the script now always writes
components_per_minute.csv; update the Outputs list to include a line describing
components_per_minute.csv (e.g., "components_per_minute.csv — timestamp |
component: count | ...") alongside the existing time_series.csv,
events_per_minute.csv and summary.md entries so the built-in usage docs match
the actual outputs.

In `@scripts/process_logs.py`:
- Around line 38-52: The function _normalise_full_timestamp currently drops the
source timezone by using dt.strftime("%Y-%m-%dT%H:%M:%S"); change it to emit the
timezone offset as well by returning dt.isoformat(timespec="seconds") (the
cleaned variable already converts "Z" to "+00:00" so isoformat will preserve the
offset), leaving the rest of the parsing and the ValueError fallback unchanged.

---

Outside diff comments:
In `@internal/api/handlers.go`:
- Around line 627-635: Call buildConfigSnippet once at startup (init/bootstrap)
and cache its result so the /config.js handler reads the cached snippet instead
of calling buildConfigSnippet per request; if the startup call fails, fail fast
or record a single non-capturing error there. In the handler (the block
currently calling buildConfigSnippet and apiLog.Error), remove the apiLog.Error
call that triggers automatic capture and instead use the cached value and, on
missing config, log with a non-capturing mechanism (e.g., standard log.Printf or
a warning-level logger) and return the same HTTP 500 message. Ensure references:
buildConfigSnippet, the /config.js handler block, and apiLog.Error so you locate
and replace the request-time error capture with startup validation and
non-capturing request-time logging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5b871799-64ad-4b98-a92a-bc1a23e1f972

📥 Commits

Reviewing files that changed from the base of the PR and between b0f7a5e and d6dd3df.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (82)
  • .claude/skills/structured-logging/SKILL.md
  • .github/workflows/fly-deploy.yml
  • .github/workflows/review-apps.yml
  • .gitignore
  • .husky/pre-commit
  • CHANGELOG.md
  • Dockerfile
  • alloy.river
  • archive_key_migrate
  • cmd/app/main.go
  • cmd/archive_key_migrate/main.go
  • cmd/worker/main.go
  • docs/architecture/CONFIG_REFERENCE.md
  • docs/diagnostics/LOGGING_AUDIT.md
  • docs/diagnostics/SENTRY_AUDIT.md
  • docs/operations/ENV_VARS.md
  • fly.toml
  • go.mod
  • internal/api/admin.go
  • internal/api/auth.go
  • internal/api/auth_google.go
  • internal/api/auth_webflow.go
  • internal/api/domains.go
  • internal/api/errors.go
  • internal/api/google_data_api.go
  • internal/api/handlers.go
  • internal/api/jobs.go
  • internal/api/logging.go
  • internal/api/middleware.go
  • internal/api/notifications.go
  • internal/api/organisations.go
  • internal/api/response.go
  • internal/api/schedulers.go
  • internal/api/share_links.go
  • internal/api/slack.go
  • internal/api/webflow_sites.go
  • internal/archive/scheduler.go
  • internal/auth/middleware.go
  • internal/broker/consumer.go
  • internal/broker/counters.go
  • internal/broker/counters_test.go
  • internal/broker/dispatcher.go
  • internal/broker/pacer.go
  • internal/broker/pacer_test.go
  • internal/broker/redis.go
  • internal/broker/scheduler.go
  • internal/broker/scheduler_test.go
  • internal/crawler/config.go
  • internal/crawler/config_test.go
  • internal/crawler/crawler.go
  • internal/crawler/robots.go
  • internal/crawler/sitemap.go
  • internal/crawler/tls_aia.go
  • internal/db/batch.go
  • internal/db/dashboard.go
  • internal/db/db.go
  • internal/db/google_connections.go
  • internal/db/notifications.go
  • internal/db/pages.go
  • internal/db/pressure.go
  • internal/db/queue.go
  • internal/db/reset_migrations.go
  • internal/db/retry.go
  • internal/db/schedulers.go
  • internal/db/slack.go
  • internal/db/users.go
  • internal/db/webflow_connections.go
  • internal/db/webflow_site_settings.go
  • internal/jobs/executor.go
  • internal/jobs/link_discovery.go
  • internal/jobs/manager.go
  • internal/jobs/stream_worker.go
  • internal/logging/logging.go
  • internal/logging/logging_test.go
  • internal/notifications/listener.go
  • internal/notifications/slack.go
  • internal/techdetect/detector.go
  • internal/util/url.go
  • scripts/aggregate_logs.py
  • scripts/lint-go.js
  • scripts/process_logs.py
  • scripts/start.sh
💤 Files with no reviewable changes (1)
  • .gitignore

Comment thread .claude/skills/structured-logging/SKILL.md Outdated
Comment thread alloy.river
Comment thread CHANGELOG.md
Comment thread cmd/app/main.go
Comment thread cmd/worker/main.go Outdated
Comment thread internal/db/webflow_connections.go Outdated
Comment thread internal/jobs/stream_worker.go
Comment thread internal/notifications/listener.go
Comment thread scripts/aggregate_logs.py Outdated
Comment thread scripts/process_logs.py
@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-335.fly.dev
Dashboard: https://hover-pr-335.fly.dev/dashboard

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
internal/db/reset_migrations.go (1)

91-103: ⚠️ Potential issue | 🟡 Minor

Handle row-iteration errors after rows.Next() loops.

These loops ignore rows.Err() and silently skip some scan failures. That can mask partial reset failures and leave stale DB objects/connections.

Suggested pattern to apply to each loop
  for viewRows.Next() {
    var v string
-   if err := viewRows.Scan(&v); err == nil {
-     views = append(views, v)
-   }
+   if err := viewRows.Scan(&v); err != nil {
+     _ = viewRows.Close()
+     return fmt.Errorf("failed to scan view row: %w", err)
+   }
+   views = append(views, v)
  }
+ if err := viewRows.Err(); err != nil {
+   _ = viewRows.Close()
+   return fmt.Errorf("failed while iterating views: %w", err)
+ }

Also applies to: 118-130, 145-157, 352-360, 379-387

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/db/reset_migrations.go` around lines 91 - 103, The row-iteration
loops (e.g., viewRows with viewRows.Scan) currently ignore rows.Err() and
potential Scan failures; after each for viewRows.Next() { ... } loop call and
check err := viewRows.Err() and handle it (log via dbLog.Error/dbLog.Warn and
return the error or abort the reset) rather than silently continuing, also
ensure any scan failure inside the loop is propagated instead of skipped; apply
the same fix to the other row loops (e.g., seqRows, tableRows, typeRows or
similarly named row variables) and ensure rows.Close() is checked (or use defer
rows.Close()) so Close errors are not ignored.
scripts/process_logs.py (1)

90-130: ⚠️ Potential issue | 🟠 Major

Stream the flat CSV instead of buffering every row.

Lines 90-130 keep a full copy of every parsed record in flat_rows before writing it out. On large log exports that doubles memory usage for no benefit and can turn this into an OOM-prone path. Open the CSV once and write each row inside the read loop.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/process_logs.py` around lines 90 - 130, The code currently
accumulates every row in flat_rows before writing, which uses unbounded memory;
instead, if flat_csv_path is not None open the CSV file and create a
csv.DictWriter and write the header before entering the _iter_records loop, then
inside the loop (where flat_rows.append is called) build the same dict using
_normalise_full_timestamp(record), level, component, message and extras
(filtered by _CORE_FIELDS and json.dumps) and call writer.writerow(...)
immediately rather than appending to flat_rows; remove the flat_rows list and
the post-loop writer.writerows block so the file is streamed while iterating
through _iter_records.
internal/api/auth_google.go (1)

328-337: ⚠️ Potential issue | 🟠 Major

Do not ship the pending GA session as a URL query token.

Line 336 puts sessionID into ga_session on the redirect URL. That ID is the bearer key for pendingGASessions, so it can leak via browser history, copied links, and referrers even though you've stopped logging it. Bind the pending session to the authenticated user/org server-side and move the token into an HttpOnly cookie or another non-URL channel.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/api/auth_google.go` around lines 328 - 337, Currently the code
stores the pending GA session via storePendingGASession(session) and places the
returned sessionID into the redirect URL using buildSettingsURL("analytics",
params, "google-analytics"); instead of exposing sessionID in the query string,
bind the pending session server-side to the authenticated user/org (e.g.
associate the pending session ID with the current user/org record in the
pendingGASessions store) and return the redirect without ga_session, or set the
sessionID in a Secure, HttpOnly, SameSite cookie (set by the handler that calls
storePendingGASession) so the client can continue the flow without the token
appearing in URLs or logs; ensure cookie attributes (HttpOnly, Secure,
appropriate Path/Expiry) are applied and remove any code that adds ga_session to
the URL in the redirect logic.
♻️ Duplicate comments (2)
internal/db/pages.go (1)

73-73: ⚠️ Potential issue | 🟠 Major

Avoid logging raw URL content via err at Line 73.

Even after removing the "url" field, the logged err can still contain the original URL (including query params/tokens) from parse/normalisation failures. Log a sanitised reason instead of the raw error text.

🔧 Proposed fix
-			dbLog.Warn("Skipping invalid URL", "error", err, "domain", domain)
+			dbLog.Warn("Skipping invalid URL", "domain", domain, "reason", "URL normalisation failed")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/db/pages.go` at line 73, The dbLog.Warn call in pages.go currently
logs the raw err which may contain the original URL and sensitive query params;
change the warning to log a sanitized reason instead of the raw error text by
replacing the err argument in dbLog.Warn("Skipping invalid URL", ...) with a
concise, non-sensitive message (e.g., "parse/normalisation failed" or err type
only) and/or call a small helper (e.g., sanitizeError or buildSafeErrorMessage)
that returns a redacted string; keep the existing "domain" and other context but
never include err.Error() or the raw URL in the log entry.
internal/jobs/stream_worker.go (1)

170-173: ⚠️ Potential issue | 🟠 Major

Avoid auto-capturing stream read failures in the hot loop.

Line 172 now emits an Error on every failed ReadNonBlocking. With the new logger semantics, that becomes one Sentry event per worker × job iteration while Redis is degraded, which will drown out the signal from real incidents very quickly. Please downgrade this path to Warn, or suppress capture explicitly, because the loop is already retrying.

Possible fix
-				logger.Error("stream read error", "error", err, "job_id", jobID)
+				logger.Warn("stream read error", "error", err, "job_id", jobID)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/jobs/stream_worker.go` around lines 170 - 173, The hot loop
currently logs every ReadNonBlocking failure at Error level (logger.Error) for
swp.consumer.ReadNonBlocking with jobID, which creates noisy Sentry events;
change that call to a non-capturing/Warn-level log (e.g., logger.Warn) or
explicitly disable error capture for that log so retries don't generate Sentry
events—update the logging call in stream_worker.go where
swp.consumer.ReadNonBlocking is handled to use Warn (or pass a flag/field to
suppress capture) while keeping jobID and the error message for local debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/app/main.go`:
- Around line 697-708: The readiness logs are emitted before the HTTP listener
is bound; change the flow to create a net.Listener first, then log readiness and
URLs only after Listen succeeds, and finally call server.Serve(listener) instead
of server.ListenAndServe(). Specifically, replace the current pre-Listen
startupLog.Info calls around server.ListenAndServe() with code that calls
net.Listen("tcp", ":"+config.Port), checks for errors, then on success compute
baseURL and call startupLog.Info("Starting server"...), startupLog.Info("Hover
server ready"...), startupLog.Info("Open Homepage"...), startupLog.Info("Open
Dashboard"...), startupLog.Info("Health Check"...), and the Supabase Studio log
under config.Env == "development", and then pass that listener into
server.Serve(listener) (referencing server.ListenAndServe and server.Serve and
the startupLog.Info and config.Env identifiers).
- Around line 59-63: In startJobScheduler, replace the startupLog.Error calls in
the retry branches (the error handling after calls like
pgDB.GetSchedulersReadyToRun and the other similar DB/service calls in the loop)
with startupLog.Warn (or otherwise use a non-error/no-capture logging path) so
transient/retryable dependency failures don’t generate Error-level Sentry
alerts; update each occurrence referenced (the error branches in
startJobScheduler at the shown ranges) to log a warning with the error attached
instead of an error-level log.

In `@docs/diagnostics/SENTRY_AUDIT.md`:
- Around line 44-65: The audit table still references removed legacy DB queue
helpers (executeOnce, executeOnceWithContext, ExecuteMaintenance,
CleanupStuckJobs) in internal/db/queue.go; update
docs/diagnostics/SENTRY_AUDIT.md by removing or replacing those rows (and the
duplicate references at lines 84–86) with the current live symbols or file
locations that replaced the legacy helpers, and regenerate the table entries so
the audit points to existing functions (search the repo for the new
queue-related functions to reference instead of the deleted symbols).

In `@internal/api/handlers.go`:
- Around line 979-983: The DB lookup errors for webhook auth resolution (calls
to h.DB.GetPlatformOrgMapping and the other platform/webhook lookup around the
second block) are being treated identically and returned as a 404; change the
error handling to distinguish a "not found" result from other DB failures: if
the error indicates no rows (e.g., sql.ErrNoRows or your project's NotFound
sentinel) then call NotFound(w, r, ...) as before, otherwise log the full error
(use logger.Error or logger.Warn with the error) and return an internal server
error response (500) so real backend issues are surfaced and webhook retries are
not masked. Apply this change to the GetPlatformOrgMapping handling and the
other lookup block referenced near the second call so only true
missing-resources produce 404s.

In `@internal/db/reset_migrations.go`:
- Around line 207-214: The schema verification currently only logs warnings and
still marks Step 4 complete; modify ResetSchema() so that any schema
verification failure causes the function to return an error instead of
continuing. In the verification loop (where dbLog.Warn("Schema verification
warning", "table", table, "exists", exists) is called) set a failure flag (e.g.,
verificationFailed := true) or immediately call dbLog.Error with context and
return a descriptive error from ResetSchema(); ensure you do not log "Step 4/4
completed" when verificationFailed is true. Update the final block that logs
time.Since(verifyStart) to check the flag and either return the error or log
completion only on success.

In `@scripts/aggregate_logs.py`:
- Around line 69-70: The code currently keys incremental state by filename (see
processed_files and saved.get("processed_files", [])), which causes stale skips
when a file is rewritten; change processed_files to store a fingerprint per file
(e.g., dict mapping basename -> fingerprint) instead of a plain list, and
persist that in the .aggregate_data.json save/load logic. Update the check that
treats a file as already-processed to compare the stored fingerprint against the
current fingerprint (use os.path.getmtime/size or a content hash), and when a
file changes update its fingerprint and reprocess; apply the same change to the
other occurrences referenced around lines 77-85 and 277-304 so all load/save and
membership checks use the fingerprinted mapping rather than filenames alone.

---

Outside diff comments:
In `@internal/api/auth_google.go`:
- Around line 328-337: Currently the code stores the pending GA session via
storePendingGASession(session) and places the returned sessionID into the
redirect URL using buildSettingsURL("analytics", params, "google-analytics");
instead of exposing sessionID in the query string, bind the pending session
server-side to the authenticated user/org (e.g. associate the pending session ID
with the current user/org record in the pendingGASessions store) and return the
redirect without ga_session, or set the sessionID in a Secure, HttpOnly,
SameSite cookie (set by the handler that calls storePendingGASession) so the
client can continue the flow without the token appearing in URLs or logs; ensure
cookie attributes (HttpOnly, Secure, appropriate Path/Expiry) are applied and
remove any code that adds ga_session to the URL in the redirect logic.

In `@internal/db/reset_migrations.go`:
- Around line 91-103: The row-iteration loops (e.g., viewRows with
viewRows.Scan) currently ignore rows.Err() and potential Scan failures; after
each for viewRows.Next() { ... } loop call and check err := viewRows.Err() and
handle it (log via dbLog.Error/dbLog.Warn and return the error or abort the
reset) rather than silently continuing, also ensure any scan failure inside the
loop is propagated instead of skipped; apply the same fix to the other row loops
(e.g., seqRows, tableRows, typeRows or similarly named row variables) and ensure
rows.Close() is checked (or use defer rows.Close()) so Close errors are not
ignored.

In `@scripts/process_logs.py`:
- Around line 90-130: The code currently accumulates every row in flat_rows
before writing, which uses unbounded memory; instead, if flat_csv_path is not
None open the CSV file and create a csv.DictWriter and write the header before
entering the _iter_records loop, then inside the loop (where flat_rows.append is
called) build the same dict using _normalise_full_timestamp(record), level,
component, message and extras (filtered by _CORE_FIELDS and json.dumps) and call
writer.writerow(...) immediately rather than appending to flat_rows; remove the
flat_rows list and the post-loop writer.writerows block so the file is streamed
while iterating through _iter_records.

---

Duplicate comments:
In `@internal/db/pages.go`:
- Line 73: The dbLog.Warn call in pages.go currently logs the raw err which may
contain the original URL and sensitive query params; change the warning to log a
sanitized reason instead of the raw error text by replacing the err argument in
dbLog.Warn("Skipping invalid URL", ...) with a concise, non-sensitive message
(e.g., "parse/normalisation failed" or err type only) and/or call a small helper
(e.g., sanitizeError or buildSafeErrorMessage) that returns a redacted string;
keep the existing "domain" and other context but never include err.Error() or
the raw URL in the log entry.

In `@internal/jobs/stream_worker.go`:
- Around line 170-173: The hot loop currently logs every ReadNonBlocking failure
at Error level (logger.Error) for swp.consumer.ReadNonBlocking with jobID, which
creates noisy Sentry events; change that call to a non-capturing/Warn-level log
(e.g., logger.Warn) or explicitly disable error capture for that log so retries
don't generate Sentry events—update the logging call in stream_worker.go where
swp.consumer.ReadNonBlocking is handled to use Warn (or pass a flag/field to
suppress capture) while keeping jobID and the error message for local debugging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 377c8fe7-f06d-479e-aa6a-3b4e9198ecaf

📥 Commits

Reviewing files that changed from the base of the PR and between d6dd3df and bee8078.

📒 Files selected for processing (17)
  • .claude/skills/structured-logging/SKILL.md
  • CHANGELOG.md
  • alloy.river
  • cmd/app/main.go
  • cmd/worker/main.go
  • docs/diagnostics/SENTRY_AUDIT.md
  • internal/api/admin.go
  • internal/api/auth_google.go
  • internal/api/handlers.go
  • internal/crawler/tls_aia.go
  • internal/db/pages.go
  • internal/db/reset_migrations.go
  • internal/db/webflow_connections.go
  • internal/jobs/stream_worker.go
  • internal/notifications/listener.go
  • scripts/aggregate_logs.py
  • scripts/process_logs.py

Comment thread cmd/app/main.go
Comment thread cmd/app/main.go Outdated
Comment thread docs/diagnostics/SENTRY_AUDIT.md Outdated
Comment on lines +44 to +65
| # | File | Line | Function | Error Source |
| --- | ----------------------------- | ---- | ------------------------ | ----------------------------- |
| 1 | `cmd/app/main.go` | 183 | `startHealthMonitoring` | DB job update query |
| 2 | `cmd/app/main.go` | 508 | `main` (goroutine) | Metrics HTTP server |
| 3 | `cmd/app/main.go` | 533 | `main` | PostgreSQL close |
| 4 | `cmd/app/main.go` | 558 | `main` | Queue DB close |
| 5 | `cmd/app/main.go` | 767 | `main` | Server.Shutdown |
| 6 | `cmd/app/main.go` | 818 | `main` | Final server error |
| 7 | `internal/auth/middleware.go` | 103 | `AuthMiddleware` | Token signature invalid |
| 8 | `internal/auth/middleware.go` | 108 | `AuthMiddleware` | JWKS/keyfunc config |
| 9 | `internal/api/auth.go` | 132 | `SignupUser` | CreateUser DB failure |
| 10 | `internal/api/auth.go` | 216 | `GetCurrentUser` | GetOrCreateUser DB failure |
| 11 | `internal/db/batch.go` | 252 | `Flush` | Poison pill detection |
| 12 | `internal/db/batch.go` | 349 | `Flush` | DB unavailable on shutdown |
| 13 | `internal/db/batch.go` | 371 | `Flush` | Shutdown data errors |
| 14 | `internal/db/batch.go` | 604 | `flushIndividualUpdates` | Poison pill individual update |
| 15 | `internal/db/dashboard.go` | 76 | `GetJobStats` | Dashboard stats query |
| 16 | `internal/db/queue.go` | 489 | `executeOnce` | Begin transaction |
| 17 | `internal/db/queue.go` | 521 | `executeOnce` | Commit transaction |
| 18 | `internal/db/queue.go` | 553 | `executeOnceWithContext` | Begin transaction |
| 19 | `internal/db/queue.go` | 585 | `executeOnceWithContext` | Commit transaction |
| 20 | `internal/db/queue.go` | 762 | `ExecuteMaintenance` | Maintenance begin |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Regenerate the internal/db/queue.go audit rows.

The PR objectives say this merge removed the legacy DB queue helpers, but the audit still points at executeOnce, executeOnceWithContext, ExecuteMaintenance, and CleanupStuckJobs under internal/db/queue.go. That leaves the document directing the next clean-up pass at dead symbols.

Also applies to: 84-86

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/diagnostics/SENTRY_AUDIT.md` around lines 44 - 65, The audit table still
references removed legacy DB queue helpers (executeOnce, executeOnceWithContext,
ExecuteMaintenance, CleanupStuckJobs) in internal/db/queue.go; update
docs/diagnostics/SENTRY_AUDIT.md by removing or replacing those rows (and the
duplicate references at lines 84–86) with the current live symbols or file
locations that replaced the legacy helpers, and regenerate the table entries so
the audit points to existing functions (search the repo for the new
queue-related functions to reference instead of the deleted symbols).

Comment thread internal/api/handlers.go
Comment thread internal/db/reset_migrations.go Outdated
Comment thread scripts/aggregate_logs.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-335.fly.dev
Dashboard: https://hover-pr-335.fly.dev/dashboard

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cmd/app/main.go (1)

554-597: ⚠️ Potential issue | 🟠 Major

Don't accept jobs when Redis dispatch is unavailable.

This branch has no local worker fallback, but both failure paths here still let the app continue: REDIS_URL missing only warns, and ScheduleBatch failure only logs. That leaves tasks committed in Postgres with no consumer, so job creation can succeed while execution is impossible. Please fail fast when Redis dispatch is required, or make enqueueing propagate an error / durable retry so the job is not considered created until dispatch succeeds.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/app/main.go` around lines 554 - 597, The code currently allows job
creation to succeed when Redis dispatch is unavailable (redisCfg.URL empty or
scheduler.ScheduleBatch fails); change this to fail fast and/or propagate
enqueue errors: if redisCfg.URL == "" replace startupLog.Warn with a
startupLog.Fatal (or otherwise exit) so the process doesn't accept jobs without
a dispatcher, and change jobsManager.OnTasksEnqueued (currently assigned as
func(ctx context.Context, jobID string, entries []jobs.TaskScheduleEntry)) to
return an error (e.g., func(... ) error) and update all callers so that
scheduler.ScheduleBatch errors are returned up the stack and cause job creation
to fail (or trigger durable retries) instead of only logging; ensure
scheduler.ScheduleBatch remains used but its errors are propagated from
OnTasksEnqueued to the job creation flow so failures block commit.
♻️ Duplicate comments (1)
docs/diagnostics/SENTRY_AUDIT.md (1)

3-8: ⚠️ Potential issue | 🟡 Minor

Regenerate this audit from the current branch state.

The note on Lines 3-8 says the symbols and files are accurate, but the tables still are not: cmd/app/main.go no longer uses the bare CaptureException call-sites listed here, and its stuck-job/task reporting now goes through startupLog with static messages plus logging.BeforeSend. The audit also still points at removed internal/db/queue.go symbols like executeOnceWithContext, ExecuteMaintenance, and CleanupStuckJobs. As written, this document will send the next clean-up pass back to stale or dead code.

Also applies to: 29-63, 82-86

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/diagnostics/SENTRY_AUDIT.md` around lines 3 - 8, The audit is stale:
regenerate the SENTRY_AUDIT.md from current branch state and update all
symbol/file references—remove obsolete internal/db/queue.go entries
(executeOnceWithContext, ExecuteMaintenance, CleanupStuckJobs), stop listing
bare CaptureException call-sites for cmd/app/main.go (it now uses startupLog and
logging.BeforeSend), and replace removed worker references with the current
worker lifecycle locations (cmd/worker/main.go and
internal/jobs/stream_worker.go) including the slog→Sentry fanout behavior; run
the repo-wide symbol scan and refresh the tables so the document points only to
existing functions and the correct logging hooks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/app/main.go`:
- Around line 474-477: Change the metrics goroutine to bind the socket before
logging readiness: replace the direct call to metricsSrv.ListenAndServe() with
net.Listen("tcp", config.MetricsAddr) to obtain a listener, log readiness via
startupLog.Info("Metrics server listening", "addr", config.MetricsAddr) only
after the bind succeeds, and then call metricsSrv.Serve(listener); keep the
existing error check and startupLog.Error("Metrics server failed", "error", err)
but treat listener bind errors separately so you don't emit a false ready
message for metricsSrv, referencing metricsSrv, config.MetricsAddr,
startupLog.Info, startupLog.Error, ListenAndServe and Serve in your changes.

In `@internal/api/handlers.go`:
- Around line 949-950: The request-scoped logger is inheriting r.URL.Path which
on routes like /v1/webhooks/webflow/{token} contains the raw secret; before
calling loggerWithRequest(r) or before logging (e.g., logger.Warn in the Webflow
handler and subsequent calls shown), sanitize the path by redacting the token
segment (replace that path segment with "<redacted>" or remove it) and use that
sanitized path when attaching the "path" field or by cloning the request with a
sanitized URL.Path; update all occurrences (the logger.Warn/BadRequest calls
around the Webflow handler and the other ranges noted) to log the sanitized path
instead of the raw r.URL.Path.

In `@internal/db/users.go`:
- Around line 19-21: GetOrCreateUser currently treats any error from GetUser as
“user missing” and calls CreateUser; change it so only ErrUserNotFound triggers
creation. Specifically, in GetOrCreateUser check the error returned by GetUser
with errors.Is(err, ErrUserNotFound) and call CreateUser only in that case; for
any other non-nil error return/propagate the original error instead of creating
a new user. Ensure you reference ErrUserNotFound, GetUser, and CreateUser in the
updated logic.
- Around line 432-436: The organisation lookup failure is being
swallowed—replace the dbLog.Warn + "return existingUser, nil, nil" behavior so
that the function logs the failure as an error (use dbLog.Error and include the
error and *existingUser.OrganisationID) and returns the lookup error to callers
(return existingUser, nil, err or a wrapped error) instead of nil error; locate
the handling around GetOrganisation (the block that currently calls dbLog.Warn
and returns existingUser, nil, nil) and change it to log the error at error
level with context and propagate the error upward.

In `@scripts/aggregate_logs.py`:
- Line 263: The current write uses event[:80] which truncates by code points not
visual display width; replace this by truncating based on display width (e.g.,
use the wcwidth library) before escaping '|' so the table column aligns
visually. Add a small helper (referencing event and the f.write call) that walks
characters, accumulating wcwidth.wcwidth() until the target width (80) would be
exceeded, append a truncation marker if needed, then escape '|' and use that
result in the existing f.write call. If you prefer not to depend on wcwidth, add
a clear comment near event[:80] and keep the code-point truncation but document
it as not being a visual-width trim.
- Around line 69-77: The current incremental logic only tracks processed_files
as a fingerprint map and thus cannot remove prior contributions when a file
changes; update the code to persist per-file contributions and subtract them on
reprocess: extend processed_files entries to store both the fingerprint and the
file's contribution (e.g., a map of event keys to counts and per-minute
buckets), on load handle the legacy non-dict case by coercing to an empty map,
and in the reprocessing path (where you compare the new fingerprint against
processed_files[basename]) if the fingerprint differs first subtract the stored
per-file contribution from totals and by_minute before adding the newly parsed
data, then overwrite the processed_files entry with the new fingerprint and
contribution; ensure functions/variables mentioned—processed_files,
raw_processed, by_minute, totals—are updated accordingly so stale data is
removed when files are regenerated.

---

Outside diff comments:
In `@cmd/app/main.go`:
- Around line 554-597: The code currently allows job creation to succeed when
Redis dispatch is unavailable (redisCfg.URL empty or scheduler.ScheduleBatch
fails); change this to fail fast and/or propagate enqueue errors: if
redisCfg.URL == "" replace startupLog.Warn with a startupLog.Fatal (or otherwise
exit) so the process doesn't accept jobs without a dispatcher, and change
jobsManager.OnTasksEnqueued (currently assigned as func(ctx context.Context,
jobID string, entries []jobs.TaskScheduleEntry)) to return an error (e.g.,
func(... ) error) and update all callers so that scheduler.ScheduleBatch errors
are returned up the stack and cause job creation to fail (or trigger durable
retries) instead of only logging; ensure scheduler.ScheduleBatch remains used
but its errors are propagated from OnTasksEnqueued to the job creation flow so
failures block commit.

---

Duplicate comments:
In `@docs/diagnostics/SENTRY_AUDIT.md`:
- Around line 3-8: The audit is stale: regenerate the SENTRY_AUDIT.md from
current branch state and update all symbol/file references—remove obsolete
internal/db/queue.go entries (executeOnceWithContext, ExecuteMaintenance,
CleanupStuckJobs), stop listing bare CaptureException call-sites for
cmd/app/main.go (it now uses startupLog and logging.BeforeSend), and replace
removed worker references with the current worker lifecycle locations
(cmd/worker/main.go and internal/jobs/stream_worker.go) including the
slog→Sentry fanout behavior; run the repo-wide symbol scan and refresh the
tables so the document points only to existing functions and the correct logging
hooks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0ce01afc-57a9-4a5e-9805-c555ac37cdde

📥 Commits

Reviewing files that changed from the base of the PR and between bee8078 and b95eee9.

📒 Files selected for processing (6)
  • cmd/app/main.go
  • docs/diagnostics/SENTRY_AUDIT.md
  • internal/api/handlers.go
  • internal/db/reset_migrations.go
  • internal/db/users.go
  • scripts/aggregate_logs.py

Comment thread cmd/app/main.go Outdated
Comment thread internal/api/handlers.go Outdated
Comment thread internal/db/users.go
Comment thread internal/db/users.go Outdated
Comment thread scripts/aggregate_logs.py
Comment on lines +69 to +77
# processed_files is a {basename: "mtime:size"} fingerprint map.
# Older runs stored a plain list; coerce that into an empty map so
# rewritten files are reprocessed rather than silently skipped.
raw_processed = saved.get("processed_files", {})
if isinstance(raw_processed, dict):
processed_files = dict(raw_processed)
else:
processed_files = {}
return by_minute, totals, processed_files
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Changed files cause double-counting in incremental mode.

The fingerprint mechanism correctly detects when a file has been modified (comparing mtime:size), but when reprocessing, the old contribution from that file is not subtracted before adding the new one. This leads to inflated aggregates.

Example: If foo.json is processed with {event_A: 10}, then regenerated with {event_A: 5}, the result is {event_A: 15} instead of {event_A: 5}.

To fix properly, either:

  1. Store per-file contributions so old data can be removed before adding new, or
  2. Document that --full is required when input files may be regenerated

This addresses the detection part of the past review comment but not the stale data aspect.

Also applies to: 295-299, 312-313

🧰 Tools
🪛 Ruff (0.15.10)

[warning] 77-77: Consider moving this statement to an else block

(TRY300)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/aggregate_logs.py` around lines 69 - 77, The current incremental
logic only tracks processed_files as a fingerprint map and thus cannot remove
prior contributions when a file changes; update the code to persist per-file
contributions and subtract them on reprocess: extend processed_files entries to
store both the fingerprint and the file's contribution (e.g., a map of event
keys to counts and per-minute buckets), on load handle the legacy non-dict case
by coercing to an empty map, and in the reprocessing path (where you compare the
new fingerprint against processed_files[basename]) if the fingerprint differs
first subtract the stored per-file contribution from totals and by_minute before
adding the newly parsed data, then overwrite the processed_files entry with the
new fingerprint and contribution; ensure functions/variables
mentioned—processed_files, raw_processed, by_minute, totals—are updated
accordingly so stale data is removed when files are regenerated.

Comment thread scripts/aggregate_logs.py Outdated
f.write("| Count | Event |\n")
f.write("|-------|-------|\n")
for event, count in sorted(event_totals.items(), key=lambda x: -x[1])[:30]:
f.write(f"| {count:,} | {event[:80].replace('|', '\\|')} |\n")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

String slicing may split multi-byte UTF-8 characters.

event[:80] slices by code points in Python 3, so this is safe for Unicode. However, if you intend to limit display width (e.g., for terminal alignment), code points don't equal visual width for CJK or emoji characters.

If display width matters, consider using a library like wcwidth or simply note that this is a code-point limit, not a visual-width limit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/aggregate_logs.py` at line 263, The current write uses event[:80]
which truncates by code points not visual display width; replace this by
truncating based on display width (e.g., use the wcwidth library) before
escaping '|' so the table column aligns visually. Add a small helper
(referencing event and the f.write call) that walks characters, accumulating
wcwidth.wcwidth() until the target width (80) would be exceeded, append a
truncation marker if needed, then escape '|' and use that result in the existing
f.write call. If you prefer not to depend on wcwidth, add a clear comment near
event[:80] and keep the code-point truncation but document it as not being a
visual-width trim.

@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-335.fly.dev
Dashboard: https://hover-pr-335.fly.dev/dashboard

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
internal/db/users.go (1)

469-477: ⚠️ Potential issue | 🟠 Major

Do not treat an organisation lookup failure as “organisation missing”.

Any GetOrganisationByName error other than nil currently falls through to the create path. If that lookup fails transiently, this code will try to create a fresh organisation instead of surfacing the read failure, which can duplicate data. Mirror the new ErrUserNotFound pattern here and only create on an explicit not-found result.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/db/users.go` around lines 469 - 477, The current logic assumes any
non-nil error from db.GetOrganisationByName means "not found" and proceeds to
create an organisation; change this to mirror the ErrUserNotFound pattern: after
calling db.GetOrganisationByName (used when isBusinessEmail(email) is true),
handle three cases—if err == nil use existing org; else if errors.Is(err,
ErrOrganisationNotFound) (or the project equivalent) proceed to create a new
organisation; otherwise return or propagate the read error (do not fall through
to create). Update the branch around db.GetOrganisationByName to explicitly
check for the not-found sentinel and to surface other errors instead of creating
a duplicate organisation.
cmd/app/main.go (1)

577-600: ⚠️ Potential issue | 🟠 Major

Don't silently drop Redis dispatch failures when tasks are already persisted to Postgres.

The OnTasksEnqueued callback only logs ScheduleBatch errors, allowing job creation to succeed with tasks written to Postgres but never published to Redis. Because the worker service retrieves tasks exclusively from Redis (via DueItems), these stranded tasks will never execute. No replay or outbox mechanism exists to recover them.

Either make the callback retryable or propagate the failure back into the job creation path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/app/main.go` around lines 577 - 600, The OnTasksEnqueued handler
currently swallows scheduler.ScheduleBatch failures (jobsManager.OnTasksEnqueued
-> scheduler.ScheduleBatch) which leaves tasks persisted in Postgres but never
published to Redis; change the handler to make ScheduleBatch retry with backoff
(context-aware exponential backoff + jitter using ctx.Done()) and, if retries
exhaust, propagate the failure back to the job creation flow instead of only
logging: after final failure call the job-facing failure path (e.g., return or
invoke a job-failure/rollback method on jobsManager or signal the caller so job
creation returns an error) and include startupLog.Error("failed to schedule
tasks into Redis", "error", err, "job_id", jobID, "count", len(schedEntries))
with the actual error. Ensure you use jobsManager.OnTasksEnqueued,
scheduler.ScheduleBatch and startupLog.Error as the points to change.
♻️ Duplicate comments (2)
internal/api/handlers.go (1)

937-949: ⚠️ Potential issue | 🟠 Major

Sanitise the request passed to shared response helpers as well.

This only redacts the local logger. The handler still passes the original r into BadRequest, NotFound, InternalError, and WriteSuccess. If any of those helpers derive their own request logger from r.URL.Path, the webhook token or workspace ID will still reach logs or Sentry. Please either clone r with a redacted URL.Path before any helper call, or plumb the redacted path through those helpers too.

Run this to confirm whether the shared API helpers rebuild log context from the raw request path:

#!/bin/bash
rg -n -C4 'func (BadRequest|NotFound|InternalError|WriteSuccess)\(' internal/api
rg -n -C4 'loggerWithRequest\(r\)|r\.URL\.Path' internal/api
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/api/handlers.go` around lines 937 - 949, The request's URL.Path is
still passed unredacted into shared helpers (BadRequest, NotFound,
InternalError, WriteSuccess) so sensitive tokens can leak; create a redacted
copy of the request (e.g., clone r via r.Clone(r.Context()) or shallow-copy and
replace URL.Path) and set its URL.Path to "/v1/webhooks/webflow/[redacted]" or
"/v1/webhooks/webflow/workspaces/[redacted]" to match the loggerWithRequestPath
logic, then pass that redacted request into the helper calls (or alternatively
modify those helpers to accept and use the already-created request-scoped logger
from loggerWithRequest/loggerWithRequestPath) so no code derives log context
from the original r.URL.Path.
scripts/aggregate_logs.py (1)

78-85: ⚠️ Potential issue | 🟠 Major

Changed inputs still inflate the aggregate.

processed_files only stores the latest fingerprint, so when a summary is regenerated you merge the new counts without removing the old contribution first. A file changing from {event_A: 10} to {event_A: 5} leaves 15 in by_minute/warn_error_counts, which then skews every derived CSV and the summary. Persist per-file contributions and subtract them before re-merging, or fall back to a full rebuild whenever a fingerprint changes.

Also applies to: 105-120, 304-323

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/aggregate_logs.py` around lines 78 - 85, The aggregate inflation bug
occurs because processed_files only stores the latest fingerprint and you
re-merge counts without removing a previous contribution; update the logic
around raw_processed/processed_files to either (a) persist per-file contribution
vectors (e.g., store per-basename contribution objects) and on detecting a
fingerprint change subtract the stored per-file contributions from by_minute and
warn_error_counts before adding the regenerated counts, or (b) simpler: detect
any fingerprint mismatch for a basename and trigger a full rebuild of the
aggregated structures instead of incremental merge; change the code that reads
saved.get("processed_files", {}) and the merge paths that update by_minute and
warn_error_counts to implement one of these two approaches and ensure
processed_files stores whatever per-file contribution you need to subtract
later.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/db/users.go`:
- Around line 103-108: GetOrCreateUser now returns ErrUserNotFound only for
genuine "not found" and surfaces other DB failures; update the notification
handlers that call GetOrCreateUser so they distinguish errors: when
errors.Is(err, ErrUserNotFound) return the existing Unauthorised("User not
found") response, but for any other error return a 5xx/internal error (or
propagate the error) instead of collapsing everything into Unauthorised. Locate
the calls to GetOrCreateUser in the notification handlers and change the error
handling to check errors.Is(err, ErrUserNotFound) first, then handle
non-ErrUserNotFound errors as server/database failures.

In `@scripts/aggregate_logs.py`:
- Around line 296-313: The current early return when incremental is True and
new_files is empty prevents rewriting derived outputs (events_per_minute.csv,
components_per_minute.csv, summary.md) even though load_data() has already
restored the aggregates; remove the early return and instead fall through to the
existing output-writing logic so that by_minute, totals, and processed_map (from
load_data) are used to regenerate the CSVs and summary; in practice modify the
block around new_files/if not new_files to skip processing new files but still
call the same output writers (the code that writes events_per_minute.csv,
components_per_minute.csv, and summary.md) and then return True, keeping use of
load_data, new_files, processed_map, and _file_fingerprint.
- Around line 271-282: Precompute the escaped event string before using it in
f-strings to avoid backslash-in-expression syntax issues: in the loop that
iterates over event_totals (symbol: event_totals) and in the warn_and_error loop
(symbols: warn_and_error, totals["warn_error_counts"]) compute something like
escaped_event = event[:80].replace('|', '\\|') and then use f.write(f"|
{count:,} | {escaped_event} |\n") instead of embedding .replace(...) directly
inside the f-string.

---

Outside diff comments:
In `@cmd/app/main.go`:
- Around line 577-600: The OnTasksEnqueued handler currently swallows
scheduler.ScheduleBatch failures (jobsManager.OnTasksEnqueued ->
scheduler.ScheduleBatch) which leaves tasks persisted in Postgres but never
published to Redis; change the handler to make ScheduleBatch retry with backoff
(context-aware exponential backoff + jitter using ctx.Done()) and, if retries
exhaust, propagate the failure back to the job creation flow instead of only
logging: after final failure call the job-facing failure path (e.g., return or
invoke a job-failure/rollback method on jobsManager or signal the caller so job
creation returns an error) and include startupLog.Error("failed to schedule
tasks into Redis", "error", err, "job_id", jobID, "count", len(schedEntries))
with the actual error. Ensure you use jobsManager.OnTasksEnqueued,
scheduler.ScheduleBatch and startupLog.Error as the points to change.

In `@internal/db/users.go`:
- Around line 469-477: The current logic assumes any non-nil error from
db.GetOrganisationByName means "not found" and proceeds to create an
organisation; change this to mirror the ErrUserNotFound pattern: after calling
db.GetOrganisationByName (used when isBusinessEmail(email) is true), handle
three cases—if err == nil use existing org; else if errors.Is(err,
ErrOrganisationNotFound) (or the project equivalent) proceed to create a new
organisation; otherwise return or propagate the read error (do not fall through
to create). Update the branch around db.GetOrganisationByName to explicitly
check for the not-found sentinel and to surface other errors instead of creating
a duplicate organisation.

---

Duplicate comments:
In `@internal/api/handlers.go`:
- Around line 937-949: The request's URL.Path is still passed unredacted into
shared helpers (BadRequest, NotFound, InternalError, WriteSuccess) so sensitive
tokens can leak; create a redacted copy of the request (e.g., clone r via
r.Clone(r.Context()) or shallow-copy and replace URL.Path) and set its URL.Path
to "/v1/webhooks/webflow/[redacted]" or
"/v1/webhooks/webflow/workspaces/[redacted]" to match the loggerWithRequestPath
logic, then pass that redacted request into the helper calls (or alternatively
modify those helpers to accept and use the already-created request-scoped logger
from loggerWithRequest/loggerWithRequestPath) so no code derives log context
from the original r.URL.Path.

In `@scripts/aggregate_logs.py`:
- Around line 78-85: The aggregate inflation bug occurs because processed_files
only stores the latest fingerprint and you re-merge counts without removing a
previous contribution; update the logic around raw_processed/processed_files to
either (a) persist per-file contribution vectors (e.g., store per-basename
contribution objects) and on detecting a fingerprint change subtract the stored
per-file contributions from by_minute and warn_error_counts before adding the
regenerated counts, or (b) simpler: detect any fingerprint mismatch for a
basename and trigger a full rebuild of the aggregated structures instead of
incremental merge; change the code that reads saved.get("processed_files", {})
and the merge paths that update by_minute and warn_error_counts to implement one
of these two approaches and ensure processed_files stores whatever per-file
contribution you need to subtract later.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 38d8ee16-8df4-47dc-ad9d-a199237f3905

📥 Commits

Reviewing files that changed from the base of the PR and between b95eee9 and be56f50.

📒 Files selected for processing (5)
  • cmd/app/main.go
  • internal/api/handlers.go
  • internal/api/logging.go
  • internal/db/users.go
  • scripts/aggregate_logs.py

Comment thread internal/db/users.go
Comment thread scripts/aggregate_logs.py Outdated
Comment thread scripts/aggregate_logs.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/api/notifications.go`:
- Around line 90-100: Create a single helper (e.g., resolveCurrentUser or
loadCurrentUser) that encapsulates the repeated GetOrCreateUser call and the
ErrUserNotFound/5xx branching so all three handlers use the same behaviour and
logging: call h.DB.GetOrCreateUser, if errors.Is(err, db.ErrUserNotFound) then
logger.Warn("User not found", "error", err) and call Unauthorised(w, r, "User
not found"); on other errors log via logger.Error("Failed to load user for
notifications", "error", err) and call InternalError(w, r, fmt.Errorf("failed to
load user: %w", err)); return the resolved user on success so the existing
handlers can replace their duplicated blocks with a single call to this helper.

In `@scripts/aggregate_logs.py`:
- Around line 179-181: Narrow the broad except in the log-processing function by
replacing "except Exception as e:" with a specific tuple of exceptions (e.g.,
OSError, json.JSONDecodeError, KeyError, TypeError) to avoid masking bugs;
follow the same pattern used in load_data() so the error handling in the
function that prints "Error processing {json_file}: {e}" only catches expected
IO/parsing/key errors and lets unexpected exceptions surface.
- Around line 16-23: Clarify the docstring in scripts/aggregate_logs.py to
explicitly state the double-counting limitation: note that incremental mode
fingerprints inputs by mtime+size and, if a file's fingerprint changes and it is
re-ingested, the old contribution is not subtracted from aggregates (so
regenerated files can inflate totals); advise callers to run with --full to
force a clean reprocess when upstream producers may rewrite files in place, and
add a short example or sentence mentioning this specific caveat next to the
existing fingerprinting explanation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4376cf98-6845-4a0c-a7bb-9fbf2bca036a

📥 Commits

Reviewing files that changed from the base of the PR and between be56f50 and 137c41b.

📒 Files selected for processing (2)
  • internal/api/notifications.go
  • scripts/aggregate_logs.py

Comment thread internal/api/notifications.go Outdated
Comment thread scripts/aggregate_logs.py Outdated
Comment thread scripts/aggregate_logs.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-335.fly.dev
Dashboard: https://hover-pr-335.fly.dev/dashboard

@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-335.fly.dev
Dashboard: https://hover-pr-335.fly.dev/dashboard

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/api/notifications.go (1)

183-192: 🧹 Nitpick | 🔵 Trivial

Consider using a sentinel error instead of string matching.

The strings.Contains(err.Error(), "not found") check on line 185 is fragile—it relies on the error message text which could change. This pattern exists elsewhere in the codebase too.

A more robust approach would be to define a sentinel error (e.g., db.ErrNotificationNotFound) in the DB layer and check with errors.Is(), consistent with how ErrUserNotFound is handled.

This is pre-existing code, so not blocking this PR, but worth addressing in a follow-up.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/api/notifications.go` around lines 183 - 192, Replace the fragile
string match on err.Error() in the MarkNotificationRead call with a sentinel
error check: add a package-level sentinel (e.g., db.ErrNotificationNotFound) in
the DB layer and have MarkNotificationRead return that sentinel when the row is
absent; then change the handler to use errors.Is(err,
db.ErrNotificationNotFound) (similar to existing ErrUserNotFound usage) and call
NotFound(w, r, ...) on that condition; keep all other error logging/handling the
same.
♻️ Duplicate comments (1)
scripts/aggregate_logs.py (1)

319-337: ⚠️ Potential issue | 🟠 Major

Incremental reprocessing still double-counts changed files.

When a fingerprint changes, the old contribution for that basename is still present in by_minute and totals; process_json_file() only adds the new snapshot on top. A regenerated summary therefore inflates counts instead of replacing them. Persist per-file contributions and subtract them before re-merge, or force --full whenever an already-seen file changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/aggregate_logs.py` around lines 319 - 337, The incremental path
double-counts because when a file's fingerprint changes you currently only add
the new snapshot via process_json_file() without removing the old contribution;
update the logic so processed_map stores per-file contribution summaries (e.g.,
per-file minute buckets and totals) keyed by filename instead of only a
fingerprint, and when you detect processed_map.get(f.name) !=
_file_fingerprint(f) subtract the stored per-file contribution from by_minute
and totals before calling process_json_file() (or replace the stored
contribution entirely after processing); use the existing symbols processed_map,
_file_fingerprint, new_files, process_json_file, by_minute and totals to locate
where to persist, subtract, and update per-file aggregates so regenerated
summaries replace rather than accumulate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@internal/api/notifications.go`:
- Around line 183-192: Replace the fragile string match on err.Error() in the
MarkNotificationRead call with a sentinel error check: add a package-level
sentinel (e.g., db.ErrNotificationNotFound) in the DB layer and have
MarkNotificationRead return that sentinel when the row is absent; then change
the handler to use errors.Is(err, db.ErrNotificationNotFound) (similar to
existing ErrUserNotFound usage) and call NotFound(w, r, ...) on that condition;
keep all other error logging/handling the same.

---

Duplicate comments:
In `@scripts/aggregate_logs.py`:
- Around line 319-337: The incremental path double-counts because when a file's
fingerprint changes you currently only add the new snapshot via
process_json_file() without removing the old contribution; update the logic so
processed_map stores per-file contribution summaries (e.g., per-file minute
buckets and totals) keyed by filename instead of only a fingerprint, and when
you detect processed_map.get(f.name) != _file_fingerprint(f) subtract the stored
per-file contribution from by_minute and totals before calling
process_json_file() (or replace the stored contribution entirely after
processing); use the existing symbols processed_map, _file_fingerprint,
new_files, process_json_file, by_minute and totals to locate where to persist,
subtract, and update per-file aggregates so regenerated summaries replace rather
than accumulate.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 941b173c-3a56-4a36-b9fe-d302c1b27c35

📥 Commits

Reviewing files that changed from the base of the PR and between be56f50 and 7ccce3b.

📒 Files selected for processing (2)
  • internal/api/notifications.go
  • scripts/aggregate_logs.py

@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-335.fly.dev
Dashboard: https://hover-pr-335.fly.dev/dashboard

@simonsmallchua simonsmallchua merged commit ca34309 into feature/redis-broker-v2 Apr 19, 2026
10 checks passed
@simonsmallchua simonsmallchua deleted the work/funny-bohr-6c8c3e branch April 19, 2026 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant