Migrate to slog + Sentry fingerprinting#333
Conversation
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
Updates to Preview Branch (work/cranky-albattani) ↗︎
Tasks are run on every commit but only new migration files are pushed.
View logs for this Workflow Run ↗︎. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new slog-backed centralized logging package ( Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant Logging as internal/logging
participant Sentry as Sentry SDK
participant Stdout as Stdout/JSON
App->>Logging: Setup(level, env)
Logging->>Stdout: install slog handler (text/json)
Logging->>Sentry: install Sentry slog handler (BeforeSend)
App->>Logging: l := Component("component")
App->>Logging: l.ErrorContext(ctx, "failed operation", "error", err)
alt ctx marked NoCapture
Logging->>Sentry: BeforeSend drops event (no_capture)
Logging->>Stdout: emit structured slog record (with component)
else
Logging->>Sentry: send normalised event (message/fingerprint/tags)
Logging->>Stdout: emit structured slog record (with component)
end
Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Release VersionsApp patch: ChangelogAdded
Changed
|
|
🐝 Review App Deployed Homepage: https://hover-pr-333.fly.dev |
|
🐝 Review App Deployed Homepage: https://hover-pr-333.fly.dev |
|
🐝 Review App Deployed Homepage: https://hover-pr-333.fly.dev |
There was a problem hiding this comment.
Actionable comments posted: 18
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
internal/api/response.go (1)
19-25: 🧹 Nitpick | 🔵 TrivialAvoid duplicating
request_idin structured fields.Line 25 already logs through
loggerWithRequest(r), which includesrequest_id. Passingrequest_idagain can produce duplicate keys in the same event.Proposed cleanup
func WriteJSON(w http.ResponseWriter, r *http.Request, data any, status int) { - requestID := GetRequestID(r) - w.Header().Set("Content-Type", "application/json") w.WriteHeader(status) if err := json.NewEncoder(w).Encode(data); err != nil { - loggerWithRequest(r).Error("Failed to encode JSON response", "error", err, "request_id", requestID) + loggerWithRequest(r).Error("Failed to encode JSON response", "error", err) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/api/response.go` around lines 19 - 25, The log call duplicates the request_id because loggerWithRequest(r) already attaches it; update the error logging in the response flow (surrounding GetRequestID(r), json.NewEncoder(w).Encode(data), and loggerWithRequest(r)) to remove the explicit "request_id" field from the Error invocation so only loggerWithRequest(r) supplies the request_id—keep the message and "error" field intact.internal/crawler/tls_aia.go (1)
3-17:⚠️ Potential issue | 🟠 MajorFormatting check is currently blocking CI for this file.
The pipeline reports a
gofmtfailure. Please rungofmtandgoimportsover this touched file (and other touched Go files) before merge.As per coding guidelines,
**/*.go: Rungofmtandgoimportsformatting on all touched Go files.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/crawler/tls_aia.go` around lines 3 - 17, The import block in tls_aia.go is improperly formatted (extra blank line and spacing) causing gofmt/goimports to fail CI; run gofmt and goimports over this file (and all other touched .go files) to normalize imports and formatting, specifically fix the import block around the import list (the import parentheses/import block) so there are no stray blank lines and unused imports are organized, then re-run gofmt/goimports and commit the formatted changes.internal/crawler/robots.go (1)
3-14:⚠️ Potential issue | 🟠 MajorPlease run
gofmtandgoimportson this file.The current PR is failing formatting checks, and this import section reflects that.
As per coding guidelines,
**/*.go: Rungofmtandgoimportsformatting on all touched Go files.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/crawler/robots.go` around lines 3 - 14, Run gofmt and goimports on the file to fix the improperly formatted import block and any other formatting issues; specifically reformat the import list and remove the stray blank line and ensure imports are grouped/ordered correctly (the import block in internal/crawler/robots.go), then save the file so the CI formatting checks pass.internal/db/webflow_connections.go (1)
3-10:⚠️ Potential issue | 🟠 MajorThis file still needs
gofmt/goimports.Formatting is currently failing in CI; please normalise this file before merge.
As per coding guidelines,
**/*.go: Rungofmtandgoimportsformatting on all touched Go files.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/db/webflow_connections.go` around lines 3 - 10, The import block in webflow_connections.go is not formatted/organized (extra blank line and unsorted imports) causing CI to fail; run gofmt and goimports on the file (or the repo) to normalize formatting and fix imports, ensuring the import list (context, database/sql, errors, fmt, time) is properly formatted/organized and any unused imports removed so the file compiles cleanly; then save the file and re-run build/CI.internal/db/schedulers.go (1)
3-11:⚠️ Potential issue | 🟠 MajorFormatting check is blocking merge — run
gofmtandgoimportson this file.The CI output indicates formatting is failing, and this import block is not fully normalised.
As per coding guidelines,
**/*.go: Rungofmtandgoimportsformatting on all touched Go files.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/db/schedulers.go` around lines 3 - 11, The import block at the top of schedulers.go is not formatted correctly (extra blank line and unordered imports); run gofmt and goimports on this file and fix the import section so imports are grouped and ordered per goimports (remove the stray blank line and unused import placeholders), then re-run gofmt to normalise spacing throughout the file; target the file's import block and save to make CI pass.scripts/aggregate_logs.py (1)
339-357:⚠️ Potential issue | 🟠 MajorBug: Component counts not extracted during incremental aggregation.
The
aggregate_logs()function has inline processing logic (lines 339-357) that extractslevel_countsandmessage_countsfrom new JSON files, but it does not extractcomponent_counts. Theprocess_json_file()function (lines 121-125) correctly handlescomponent_counts, but it's not called here.This means
components_per_minute.csvwill be empty or incomplete when processing new files incrementally.🐛 Proposed fix to extract component_counts
# Process message counts for timestamp, messages in data.get('message_counts', {}).items(): minute_key = timestamp[:16] for msg_data in messages: message = msg_data.get('message', 'unknown') count = msg_data.get('count', 0) by_minute[minute_key]['message_counts'][message] += count all_messages[message] += count + # Process component counts (new field from slog migration) + for timestamp, components in data.get('component_counts', {}).items(): + minute_key = timestamp[:16] + for component, count in components.items(): + by_minute[minute_key]['component_counts'][component] += count + processed_set.add(json_file.name) success_count += 1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/aggregate_logs.py` around lines 339 - 357, aggregate_logs() currently processes level_counts and message_counts inline but omits component_counts, causing incomplete components_per_minute.csv; update the loop in aggregate_logs (near the level_counts/message_counts handling) to also iterate data.get('component_counts', {}) or extract component_counts from each message entry like process_json_file() does, compute minute_key = timestamp[:16], ensure by_minute[minute_key] is initialized, then increment by_minute[minute_key]['component_counts'][component] and a top-level all_components[component] accordingly (use the same keys/structures used for level_counts/message_counts to mirror process_json_file()).internal/jobs/worker.go (1)
1534-1543:⚠️ Potential issue | 🟠 MajorUse a static error message for Sentry grouping.
Line 1539 logs a dynamic message (
messageincludes streak and possibly the last error text). That defeats the new[component, staticMessage]grouping and will fragment Sentry issues again.Suggested fix
- message := fmt.Sprintf("Job failed after %d consecutive task failures", streak) + dbMessage := fmt.Sprintf("Job failed after %d consecutive task failures", streak) if lastErr != nil { - message = fmt.Sprintf("%s (last error: %s)", message, lastErr.Error()) + dbMessage = fmt.Sprintf("%s (last error: %s)", dbMessage, lastErr.Error()) } - workerLog.Error(message, + workerLog.Error("Job failed after consecutive task failures", "job_id", jobID, "failure_streak", streak, "threshold", wp.jobFailureThreshold, + "last_error", lastErr, ) @@ - `, JobStatusFailed, now, message, jobID, JobStatusFailed, JobStatusCancelled) + `, JobStatusFailed, now, dbMessage, jobID, JobStatusFailed, JobStatusCancelled)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/jobs/worker.go` around lines 1534 - 1543, The current error call builds a dynamic message (variable message) including streak and lastErr which prevents Sentry grouping; update the workerLog.Error invocation in the job failure path (where message, lastErr, jobID, streak and wp.jobFailureThreshold are used) to pass a fixed/static error message string (e.g. "job failed after consecutive task failures") as the first argument and move all dynamic details (streak, lastErr.Error() or lastErr as a field, job_id, threshold) into structured key/value pairs so the static message is used for grouping while still recording the dynamic context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/test_jobs/main.go`:
- Line 73: The file fails gofmt/goimports checks; run automatic formatting on
the touched Go file and update the commit: run gofmt -w (or gofmt) and goimports
-w on cmd/test_jobs/main.go (or run gofmt/goimports over the package) to fix
formatting and import ordering so the call site startupLog.Info("Worker pool
started with "+strconv.Itoa(jobWorkers)+" workers") and surrounding code are
properly formatted; re-run `go vet`/CI to ensure the formatting errors are
resolved before pushing.
In `@docs/diagnostics/LOGGING_AUDIT.md`:
- Around line 1-4: Add an explicit “historical baseline / pre-migration
snapshot” note under the header "# Zerolog Call Site Audit" (near the
"Generated: 2026-04-11" line) so readers know these numbers reflect pre-slog
migration state; insert a single-sentence clarification such as "NOTE: This
document is a pre-migration snapshot and represents the state prior to the slog
migration (historical baseline)." to avoid confusion.
In `@internal/api/admin.go`:
- Around line 51-54: Extract the nil-safe organisation-ID logic into a small
helper (e.g., getOrgID or formatOrgID) that accepts the user (or a pointer to
user.OrganisationID) and returns the string to assign to orgIDStr; replace the
duplicated blocks that set orgIDStr by calling this helper wherever you
currently check user.OrganisationID and set orgIDStr to "none" or the
dereferenced value, and update all occurrences (the blocks using orgIDStr around
the two duplicated spots) to use the new helper to keep the logic DRY.
In `@internal/api/auth_webflow.go`:
- Around line 62-63: Remove the raw user identifier from the error-level log
call in the auth webflow: update the logger.Error invocation (the call using
logger.Error and userClaims.UserID) so it no longer attaches userClaims.UserID
directly; instead rely on the request-scoped correlation fields already present
on the logger or include a redacted/hashed version of userClaims.UserID (or
another non-PII token). Make the same change for the other occurrences
referenced (the logger.Error calls paired with InternalError) so error-level
logs don't contain raw PII while keeping the InternalError(w, r, err) behavior
unchanged.
In `@internal/api/auth.go`:
- Around line 130-131: Remove raw user identifiers from error-level logs in the
auth handlers by deleting the "user_id" field from the logger.Error calls (e.g.,
the call to logger.Error("Failed to create user with organisation", "error",
err, "user_id", req.UserID)) and any similar logger.Error usages around the
InternalError invocation and the other occurrence noted (~lines 213-214); if you
must record user context, replace raw req.UserID with a non-identifying token
(e.g., masked ID or request-scoped trace ID) or omit it entirely, and keep the
InternalError(w, r, err) call unchanged.
In `@internal/api/errors.go`:
- Around line 80-85: The loggerWithRequest(r) call already injects request_id,
so remove the redundant "request_id", requestID pair from the logger.Error and
logger.Debug calls in the error response path; locate the calls that follow
loggerWithRequest(r) (the logger.Error("API error response", ...) and
logger.Debug("API client error response", ...)) and leave the other structured
fields ("status", "code", "message") intact.
- Around line 54-59: The log calls in the error response path duplicate the
request_id because loggerWithRequest(r) already injects it; remove the explicit
"request_id", requestID argument from both logger.Error and logger.Debug
invocations in the function that calls loggerWithRequest(r) so the structured
logger only includes request_id once and retains the other fields ("error",
"status", "code").
In `@internal/auth/middleware.go`:
- Line 92: Remove the sensitive token material from the auth failure log call:
locate the authLog.Warn invocation that currently passes "token_prefix",
tokenString[:min(10, len(tokenString))] (in middleware.go) and remove that
key/value so no part of tokenString is logged; if you need non-sensitive info,
replace it with a safe alternative such as the token length or a fixed marker
(e.g., "token_present": true or "token_length": len(tokenString)) and ensure no
other logging in the same auth middleware references tokenString or slices of
it.
In `@internal/db/dashboard.go`:
- Around line 171-172: Update the error logs that call dbLog.Error("Failed to
scan activity row", "error", err) (and the other similar scan/iteration error
logs at the listed locations) to include the current tenant/organisation context
as a key-value pair—for example add "organisation_id", organisationID (or the
appropriate local variable name such as orgID/organisationID available in the
same scope) to the log call so it becomes dbLog.Error("Failed to scan activity
row", "error", err, "organisation_id", organisationID); make the same change for
the other occurrences (lines ~180, ~304, ~427, ~712, ~795, ~811) ensuring you
reference the correct in-scope identifier for the organisation in each
function/loop.
In `@internal/db/db.go`:
- Line 466: The call to dbLog.Info that logs the raw DATABASE_URL should be
changed to avoid leaking credentials; implement a helper (e.g., redactURL or
similar) that parses the connection string (using net/url), zeroes or replaces
parsed.User (username/password) and returns the sanitized URL, then replace the
direct "connection_url, url" usage in the dbLog.Info call with the redacted
value; ensure the helper returns a safe placeholder on parse error (e.g.,
"<invalid URL>") and update the log invocation in the same scope where
dbLog.Info is called.
In `@internal/db/webflow_connections.go`:
- Around line 202-203: The delete-failure log call in the delete path for
webflow connections currently logs "connection_id" and the error via
dbLog.Error; update that call to also include the existing organisationID (use
the key "organisation_id") so the log entry contains tenant context for
correlation (refer to dbLog.Error, connectionID, and organisationID in the
delete handler for webflow connections).
In `@internal/db/webflow_site_settings.go`:
- Around line 10-14: There are two loggers declared (webflowSettingsLog and
dbLog) with the same "db" component; remove the duplicate by deleting the
webflowSettingsLog variable and update all uses of webflowSettingsLog to use the
existing dbLog instead (ensure the single dbLog variable remains the shared
logger), and remove any other duplicate webflowSettingsLog declarations
elsewhere in the file so only dbLog is used throughout.
In `@internal/jobs/domain_limiter.go`:
- Around line 120-121: This file is not formatted with gofmt/goimports; run
gofmt and goimports on the modified Go files (including
internal/jobs/domain_limiter.go) to fix formatting and import ordering so CI
passes. Specifically, format the block around the DomainLimiter construction
(referencing jobsLog.Info and cfg.RobotsDelayMultiplier) and the nearby code
(lines ~514-535) so it adheres to gofmt standards and ensure imports are
organized via goimports before committing.
In `@internal/jobs/worker.go`:
- Around line 4562-4564: The error logs calling workerLog.Error that currently
include full objects (e.g., the call logging "request_diagnostics",
result.RequestDiagnostics and other occurrences that log "headers",
"second_headers", "cache_attempts") must stop emitting raw payloads; update
those Error calls to either omit the full objects or replace them with
redacted/summarized values (e.g., boolean flags, counts, masked header keys, or
a truncated hash/ID) and/or lower the log level to Debug where full content is
safe; locate the logging sites by searching for workerLog.Error invocations that
include keys "request_diagnostics", "headers", "second_headers", or
"cache_attempts" (including the shown call and the ranges around lines
~4923-4939 and ~4951-4953) and change the arguments to only include
non-sensitive summaries.
- Around line 3793-3797: The log call using workerLog.Info records
cleanupDuration (a time.Duration) under the key "duration_ms", causing unit
mismatch; update the third argument value to a numeric millisecond value by
replacing cleanupDuration with cleanupDuration.Milliseconds() (or cast the
result if necessary) in the workerLog.Info call that logs "Orphaned task cleanup
completed for job" alongside targetJobID and totalCleaned so downstream parsers
see correct milliseconds.
In `@internal/logging/logging_test.go`:
- Around line 1-14: The file package logging (internal/logging/logging_test.go)
is not gofmt-formatted; run gofmt -w on internal/logging/logging_test.go and
then run goimports (or gofmt -w followed by goimports -w) to fix import
formatting and ordering so the test file compiles and CI passes; ensure the file
remains unchanged semantically (imports like bytes, context, encoding/json,
errors, log/slog, strings, testing, time, and github.com/getsentry/sentry-go
stay present) and commit the formatted file.
In `@internal/logging/logging.go`:
- Around line 75-96: The Logger currently stores a concrete *slog.Logger which
snapshots handlers at init time; change Logger to keep only the component string
and accumulated attributes (e.g., []any) and update Component(name string) and
(l *Logger) With(args ...any) to return new Logger values that only set
component and append attributes; remove the cached slog field and, in all
log-emitting methods (the Logger methods that call slog (e.g.,
Debug/Info/Error/Log or whatever emit functions exist)), rebuild the concrete
logger at emit time by calling slog.Default().With("component", l.component,
l.attrs...) and use that to perform the output so the logger picks up
slog.SetDefault() done in Setup.
- Around line 299-308: The default branch in processAttr currently sends all
slog.KindString attributes into event.Tags, which causes high-cardinality
dynamic identifiers to become Sentry tags; change processAttr so that only
attributes coming from the explicit "tags" group are mapped into event.Tags and
all other string attributes (including user_id, organisation_id, connection_id,
account_id, property_id, google_account_id, session_id) are put into event.Extra
instead—locate the switch on a.Value.Kind() in processAttr and alter the
slog.KindString case to route non-"tags" keys to event.Extra (keeping the
existing group handling for slog.KindGroup and default Any() behavior).
---
Outside diff comments:
In `@internal/api/response.go`:
- Around line 19-25: The log call duplicates the request_id because
loggerWithRequest(r) already attaches it; update the error logging in the
response flow (surrounding GetRequestID(r), json.NewEncoder(w).Encode(data), and
loggerWithRequest(r)) to remove the explicit "request_id" field from the Error
invocation so only loggerWithRequest(r) supplies the request_id—keep the message
and "error" field intact.
In `@internal/crawler/robots.go`:
- Around line 3-14: Run gofmt and goimports on the file to fix the improperly
formatted import block and any other formatting issues; specifically reformat
the import list and remove the stray blank line and ensure imports are
grouped/ordered correctly (the import block in internal/crawler/robots.go), then
save the file so the CI formatting checks pass.
In `@internal/crawler/tls_aia.go`:
- Around line 3-17: The import block in tls_aia.go is improperly formatted
(extra blank line and spacing) causing gofmt/goimports to fail CI; run gofmt and
goimports over this file (and all other touched .go files) to normalize imports
and formatting, specifically fix the import block around the import list (the
import parentheses/import block) so there are no stray blank lines and unused
imports are organized, then re-run gofmt/goimports and commit the formatted
changes.
In `@internal/db/schedulers.go`:
- Around line 3-11: The import block at the top of schedulers.go is not
formatted correctly (extra blank line and unordered imports); run gofmt and
goimports on this file and fix the import section so imports are grouped and
ordered per goimports (remove the stray blank line and unused import
placeholders), then re-run gofmt to normalise spacing throughout the file;
target the file's import block and save to make CI pass.
In `@internal/db/webflow_connections.go`:
- Around line 3-10: The import block in webflow_connections.go is not
formatted/organized (extra blank line and unsorted imports) causing CI to fail;
run gofmt and goimports on the file (or the repo) to normalize formatting and
fix imports, ensuring the import list (context, database/sql, errors, fmt, time)
is properly formatted/organized and any unused imports removed so the file
compiles cleanly; then save the file and re-run build/CI.
In `@internal/jobs/worker.go`:
- Around line 1534-1543: The current error call builds a dynamic message
(variable message) including streak and lastErr which prevents Sentry grouping;
update the workerLog.Error invocation in the job failure path (where message,
lastErr, jobID, streak and wp.jobFailureThreshold are used) to pass a
fixed/static error message string (e.g. "job failed after consecutive task
failures") as the first argument and move all dynamic details (streak,
lastErr.Error() or lastErr as a field, job_id, threshold) into structured
key/value pairs so the static message is used for grouping while still recording
the dynamic context.
In `@scripts/aggregate_logs.py`:
- Around line 339-357: aggregate_logs() currently processes level_counts and
message_counts inline but omits component_counts, causing incomplete
components_per_minute.csv; update the loop in aggregate_logs (near the
level_counts/message_counts handling) to also iterate
data.get('component_counts', {}) or extract component_counts from each message
entry like process_json_file() does, compute minute_key = timestamp[:16], ensure
by_minute[minute_key] is initialized, then increment
by_minute[minute_key]['component_counts'][component] and a top-level
all_components[component] accordingly (use the same keys/structures used for
level_counts/message_counts to mirror process_json_file()).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 39edca18-45ef-4b2a-9a2e-541e7e44b6bc
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (58)
.claude/skills/structured-logging/SKILL.mdarchive_key_migratecmd/app/main.gocmd/archive_key_migrate/main.gocmd/test_jobs/main.godocs/diagnostics/LOGGING_AUDIT.mddocs/diagnostics/SENTRY_AUDIT.mdgo.modinternal/api/admin.gointernal/api/auth.gointernal/api/auth_google.gointernal/api/auth_webflow.gointernal/api/domains.gointernal/api/errors.gointernal/api/google_data_api.gointernal/api/handlers.gointernal/api/jobs.gointernal/api/logging.gointernal/api/middleware.gointernal/api/notifications.gointernal/api/organisations.gointernal/api/response.gointernal/api/schedulers.gointernal/api/share_links.gointernal/api/slack.gointernal/api/webflow_sites.gointernal/archive/scheduler.gointernal/auth/middleware.gointernal/crawler/crawler.gointernal/crawler/robots.gointernal/crawler/sitemap.gointernal/crawler/tls_aia.gointernal/db/batch.gointernal/db/dashboard.gointernal/db/db.gointernal/db/google_connections.gointernal/db/notifications.gointernal/db/pages.gointernal/db/pressure.gointernal/db/queue.gointernal/db/reset_migrations.gointernal/db/retry.gointernal/db/schedulers.gointernal/db/slack.gointernal/db/users.gointernal/db/webflow_connections.gointernal/db/webflow_site_settings.gointernal/jobs/domain_limiter.gointernal/jobs/manager.gointernal/jobs/worker.gointernal/logging/logging.gointernal/logging/logging_test.gointernal/notifications/listener.gointernal/notifications/slack.gointernal/techdetect/detector.gointernal/util/url.goscripts/aggregate_logs.pyscripts/process_logs.py
| orgIDStr := "none" | ||
| if user.OrganisationID != nil { | ||
| orgIDStr = *user.OrganisationID | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Extract duplicated organisation-ID formatting into a helper.
Both handlers repeat the same nil-safe organisation ID logic; a small helper would keep this DRY.
Refactor sketch
+func orgIDOrNone(user *db.User) string {
+ if user.OrganisationID == nil {
+ return "none"
+ }
+ return *user.OrganisationID
+}
...
- orgIDStr := "none"
- if user.OrganisationID != nil {
- orgIDStr = *user.OrganisationID
- }
+ orgIDStr := orgIDOrNone(user)
...
- orgIDStr2 := "none"
- if user.OrganisationID != nil {
- orgIDStr2 = *user.OrganisationID
- }
+ orgIDStr2 := orgIDOrNone(user)Also applies to: 165-168
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/api/admin.go` around lines 51 - 54, Extract the nil-safe
organisation-ID logic into a small helper (e.g., getOrgID or formatOrgID) that
accepts the user (or a pointer to user.OrganisationID) and returns the string to
assign to orgIDStr; replace the duplicated blocks that set orgIDStr by calling
this helper wherever you currently check user.OrganisationID and set orgIDStr to
"none" or the dereferenced value, and update all occurrences (the blocks using
orgIDStr around the two duplicated spots) to use the new helper to keep the
logic DRY.
| default: | ||
| switch a.Value.Kind() { | ||
| case slog.KindString: | ||
| event.Tags[a.Key] = a.Value.String() | ||
| case slog.KindGroup: | ||
| for _, ga := range a.Value.Group() { | ||
| event.Extra[a.Key+"."+ga.Key] = ga.Value.Any() | ||
| } | ||
| default: | ||
| event.Extra[a.Key] = a.Value.Any() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate the file and check its size
fd -t f "logging.go" | grep internal/logging
wc -l internal/logging/logging.goRepository: Harvey-AU/hover
Length of output: 118
🏁 Script executed:
# Read the specific lines mentioned in the review (299-308)
sed -n '290,315p' internal/logging/logging.goRepository: Harvey-AU/hover
Length of output: 718
🏁 Script executed:
# Search for the processAttr function to understand the full context
rg "func.*processAttr" -A 50 internal/logging/Repository: Harvey-AU/hover
Length of output: 2707
🏁 Script executed:
# Check internal/api/auth_google.go for logging patterns
rg "slog\." internal/api/auth_google.go | head -20Repository: Harvey-AU/hover
Length of output: 41
🏁 Script executed:
# Find auth_google.go
fd -t f "auth_google.go" internal/api/Repository: Harvey-AU/hover
Length of output: 86
🏁 Script executed:
# Check if it exists and what logging it does
find . -name "*auth*google*" -type fRepository: Harvey-AU/hover
Length of output: 88
🏁 Script executed:
# Search for any logging calls in internal/api/ related to auth
rg "logger\.|slog\." internal/api/ -lRepository: Harvey-AU/hover
Length of output: 418
🏁 Script executed:
# Check for any error logs with string attributes that might be dynamic
rg "Error|With" internal/api/ | grep -E "(user_id|organisation_id|connection_id|google_account_id|request_id)" | head -10Repository: Harvey-AU/hover
Length of output: 1286
🏁 Script executed:
# Check auth_google.go for logging patterns with dynamic fields
cat -n internal/api/auth_google.go | grep -A 2 -B 2 "logger\|slog"Repository: Harvey-AU/hover
Length of output: 23543
🏁 Script executed:
# Look at how logger is defined in the codebase
rg "func.*Error" internal/logging/ -A 5 | head -30Repository: Harvey-AU/hover
Length of output: 906
🏁 Script executed:
# Check the logger.Error signature and how it converts to slog attrs
rg "type.*Logger|func.*logger" internal/logging/logging.go -A 3 | head -40Repository: Harvey-AU/hover
Length of output: 400
🏁 Script executed:
# Look for any existing PR context or comments about Tags vs Extra strategy
rg "Tags|Extra|cardinality|fingerprint" internal/logging/logging.go -B 2 -A 2Repository: Harvey-AU/hover
Length of output: 2779
Move dynamic string attributes to Extra instead of Tags.
The default branch in processAttr() currently routes every slog.KindString attribute to event.Tags, which contradicts the PR's design objective. Error logs throughout internal/api/auth_google.go attach dynamic string identifiers—such as user_id, organisation_id, connection_id, account_id, property_id, google_account_id, and session_id—that would become Sentry tags under this logic. This reintroduces high-cardinality values into Sentry metadata, undermining the intent to reserve tags for stable taxonomy like component and maintain dynamic values in structured fields.
Restrict tags to the explicit tags group and route ordinary string attributes to event.Extra instead.
Suggested fix
default:
switch a.Value.Kind() {
case slog.KindString:
- event.Tags[a.Key] = a.Value.String()
+ event.Extra[a.Key] = a.Value.String()
case slog.KindGroup:
for _, ga := range a.Value.Group() {
event.Extra[a.Key+"."+ga.Key] = ga.Value.Any()
}
default:
event.Extra[a.Key] = a.Value.Any()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| default: | |
| switch a.Value.Kind() { | |
| case slog.KindString: | |
| event.Tags[a.Key] = a.Value.String() | |
| case slog.KindGroup: | |
| for _, ga := range a.Value.Group() { | |
| event.Extra[a.Key+"."+ga.Key] = ga.Value.Any() | |
| } | |
| default: | |
| event.Extra[a.Key] = a.Value.Any() | |
| default: | |
| switch a.Value.Kind() { | |
| case slog.KindString: | |
| event.Extra[a.Key] = a.Value.String() | |
| case slog.KindGroup: | |
| for _, ga := range a.Value.Group() { | |
| event.Extra[a.Key+"."+ga.Key] = ga.Value.Any() | |
| } | |
| default: | |
| event.Extra[a.Key] = a.Value.Any() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/logging/logging.go` around lines 299 - 308, The default branch in
processAttr currently sends all slog.KindString attributes into event.Tags,
which causes high-cardinality dynamic identifiers to become Sentry tags; change
processAttr so that only attributes coming from the explicit "tags" group are
mapped into event.Tags and all other string attributes (including user_id,
organisation_id, connection_id, account_id, property_id, google_account_id,
session_id) are put into event.Extra instead—locate the switch on a.Value.Kind()
in processAttr and alter the slog.KindString case to route non-"tags" keys to
event.Extra (keeping the existing group handling for slog.KindGroup and default
Any() behavior).
|
🐝 Review App Deployed Homepage: https://hover-pr-333.fly.dev |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
🐝 Review App Deployed Homepage: https://hover-pr-333.fly.dev |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
cmd/app/main.go (1)
407-425:⚠️ Potential issue | 🟠 MajorInitialise Sentry before
logging.Setup.
internal/logging.Setupis documented to run aftersentry.Init, but this path wires the slog fanout first. That can leave the Sentry handler initialised before the client exists, so error logs may not be captured reliably. MovesetupLogging(config)until after the Sentry branch has either initialised Sentry or intentionally skipped it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/app/main.go` around lines 407 - 425, The call to setupLogging(config) runs before sentry.Init causing the slog fanout to wire Sentry handler before the client exists; move the setupLogging(config) call so it executes after the Sentry branch has either initialised the client or skipped it: keep the sentry.Init(...) block (using logging.BeforeSend) and any error handling as-is, and only then call setupLogging(config) (or call it inside both branches after successful init/skip) so the logging stack is configured after sentry.Init has completed.internal/jobs/worker.go (2)
1533-1543:⚠️ Potential issue | 🟠 MajorKeep this error log’s message static.
This message includes
streakand optionallylastErr.Error(), so the Sentry fingerprint will still fragment on error text. Keep the log message static and move the variable parts into fields.♻️ Proposed change
message := fmt.Sprintf("Job failed after %d consecutive task failures", streak) if lastErr != nil { message = fmt.Sprintf("%s (last error: %s)", message, lastErr.Error()) } - workerLog.Error(message, - "job_id", jobID, - "failure_streak", streak, - "threshold", wp.jobFailureThreshold, - ) + logArgs := []any{ + "job_id", jobID, + "failure_streak", streak, + "threshold", wp.jobFailureThreshold, + } + if lastErr != nil { + logArgs = append(logArgs, "last_error", lastErr) + } + workerLog.Error("Job failed after consecutive task failures", logArgs...)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/jobs/worker.go` around lines 1533 - 1543, The log message in markJobFailedDueToConsecutiveFailures must be made static so Sentry fingerprints don't fragment: stop interpolating streak/lastErr into the message string and instead pass them as structured fields to workerLog.Error; keep the message text like "Job failed due to consecutive task failures" and add fields "failure_streak": streak, "last_error": lastErr (or lastErr.Error() only if non-nil), "job_id": jobID, and "threshold": wp.jobFailureThreshold when calling workerLog.Error.
2057-2069:⚠️ Potential issue | 🟠 MajorDon’t capture the same panic twice.
This block calls
hub.Recover/sentry.CaptureExceptionand then emitsworkerLog.Error, which is already auto-captured by the logging fanout. One panic will generate two Sentry events. Keep one capture path only.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/jobs/worker.go` around lines 2057 - 2069, The defer in processNextTask currently calls hub.Recover and also sentry.CaptureException then logs via workerLog.Error, causing duplicate Sentry events; remove the explicit sentry.CaptureException branch and keep only hub.Recover when sentry.CurrentHub() != nil (or rely solely on the logging fanout), leaving the workerLog.Error logging intact. Update the defer so it calls sentry.CurrentHub() and invokes hub.Recover(r) if hub != nil and do not call sentry.CaptureException anywhere in that block (symbols: processNextTask, sentry.CurrentHub, hub.Recover, sentry.CaptureException, workerLog.Error).
♻️ Duplicate comments (3)
internal/db/webflow_connections.go (1)
201-201:⚠️ Potential issue | 🟡 MinorAdd
organisation_idto delete-failure log attributes.At Line 201, this error log still only includes
connection_id;organisation_idis available in scope and should be logged for tenant traceability.Suggested change
- dbLog.Error("Failed to delete webflow connection", "error", err, "connection_id", connectionID) + dbLog.Error("Failed to delete webflow connection", "error", err, "connection_id", connectionID, "organisation_id", organisationID)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/db/webflow_connections.go` at line 201, The delete-failure log call using dbLog.Error currently only includes "connection_id"; update the dbLog.Error invocation in the webflow connection delete routine (the function that performs the deletion and references connectionID and organisationID) to also include "organisation_id", passing the organisationID variable so the log line contains both "connection_id" and "organisation_id" for tenant traceability; ensure you modify the dbLog.Error call (the one that logs "Failed to delete webflow connection") to add the "organisation_id", organisationID key/value pair.internal/jobs/worker.go (2)
4562-4564:⚠️ Potential issue | 🟠 MajorDon’t send raw diagnostic/header payloads at error level.
These error logs include full request diagnostics and header objects. With error-level auto-capture to Sentry, that risks leaking sensitive data and oversizing events. Log a small summary instead.
Also applies to: 4923-4925, 4937-4939, 4951-4953
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/jobs/worker.go` around lines 4562 - 4564, The error logs currently pass full request diagnostics and header payloads to workerLog.Error (see the workerLog.Error call that includes "request_diagnostics", result.RequestDiagnostics and similar calls around the other occurrences), which can leak sensitive data; change each of those error logs to avoid including the raw objects and instead log a small, non-sensitive summary (e.g., task.ID, length of the diagnostics payload, number of headers, and a redacted flag for auth/PII) and the marshal error; update the workerLog.Error invocations at the shown sites to replace result.RequestDiagnostics / header objects with that concise summary string or struct.
3793-3797:⚠️ Potential issue | 🟡 MinorLog actual milliseconds under
duration_ms.
cleanupDurationis atime.Duration, not a millisecond count, so this field is unit-mismatched.🛠️ Proposed fix
workerLog.Info("Orphaned task cleanup completed for job", "job_id", targetJobID, "tasks_cleaned", totalCleaned, - "duration_ms", cleanupDuration, + "duration_ms", cleanupDuration.Milliseconds(), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/jobs/worker.go` around lines 3793 - 3797, The log currently passes cleanupDuration (a time.Duration) to workerLog.Info under the "duration_ms" key; change that to pass the actual millisecond value by calling cleanupDuration.Milliseconds() (or int64(cleanupDuration / time.Millisecond)) so "duration_ms" is a numeric millisecond count—update the workerLog.Info call that logs "job_id", targetJobID, "tasks_cleaned", totalCleaned, "duration_ms", cleanupDuration to use cleanupDuration.Milliseconds() instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/test_jobs/main.go`:
- Line 73: The log currently builds a dynamic message string in startupLog.Info
using strconv.Itoa(jobWorkers); change this to a static message and pass
jobWorkers as a structured field instead (e.g., use startupLog.Info or
startupLog.With that sets "worker_count" or "jobWorkers" to jobWorkers) so the
message text is constant while the count is emitted as a field; update the call
around startupLog.Info and references to jobWorkers in main (or the function
creating the worker pool) to use the structured field API rather than string
concatenation.
In `@internal/db/retry.go`:
- Around line 51-53: The log call in internal/db/retry.go uses the key
"attempts" which is inconsistent with other retry logs; update the
dbLog.Info/dbLog.Warn/dbLog.Error calls (the ones that include "attempts" at the
success path around the DB connection retry and the other occurrences around
lines 142-145) to use the singular key "attempt" instead so all retry events use
the same field name; keep the value (attempt variable) and the "elapsed" field
unchanged to preserve semantics.
In `@internal/db/slack.go`:
- Line 172: The dbLog.Error calls that only log the error on row-scan failures
should include identifying fields from the scanned entity so failures can be
correlated; at each dbLog.Error invocation in internal/db/slack.go (the failing
scan points where you call dbLog.Error("Failed to scan slack connection row",
"error", err)) attach available identifiers from the row/struct being scanned
(for example connection ID, slack team ID, workspace ID or user ID) as
additional key/value pairs so the log reads with both the error and entity
context; update all three occurrences (around the current lines shown) to
include those identifier fields when present.
- Line 242: Replace the non-canonical log key "slack_connection_id" with the
standardized "connection_id" in the Slack DB error log call so logging is
consistent; locate the dbLog.Error invocation in internal/db/slack.go (the call
that logs "Failed to create slack user link" and currently includes "user_id",
link.UserID, "slack_connection_id", link.SlackConnectionID) and change that key
to "connection_id" while keeping the same value (link.SlackConnectionID).
In `@internal/db/webflow_connections.go`:
- Line 162: The log call dbLog.Error("Failed to scan webflow connection row",
"error", err) omits tenant context; update that call in webflow_connections.go
(the block that scans rows / handles webflow connection rows) to include the
organisation identifier field (e.g., add "organisation_id", organisationID or
orgID) as an additional key/value in the dbLog.Error invocation so
tenant/organisation context is logged alongside the error.
In `@internal/logging/logging_test.go`:
- Around line 223-239: Update the test that builds a slog.Record and calls
componentConverter so it no longer asserts a high-cardinality identifier
(job_id) is in event.Tags; instead assert that event.Tags["component"] ==
"worker" but check that event.Extra["job_id"] == "abc-123" (and remove or change
the assertion that reads event.Tags for job_id). Locate the test using the
record variable and the call to componentConverter, and change the expectation
from event.Tags to event.Extra for the "job_id" key while leaving
low-cardinality tag checks (like "component") unchanged.
---
Outside diff comments:
In `@cmd/app/main.go`:
- Around line 407-425: The call to setupLogging(config) runs before sentry.Init
causing the slog fanout to wire Sentry handler before the client exists; move
the setupLogging(config) call so it executes after the Sentry branch has either
initialised the client or skipped it: keep the sentry.Init(...) block (using
logging.BeforeSend) and any error handling as-is, and only then call
setupLogging(config) (or call it inside both branches after successful
init/skip) so the logging stack is configured after sentry.Init has completed.
In `@internal/jobs/worker.go`:
- Around line 1533-1543: The log message in
markJobFailedDueToConsecutiveFailures must be made static so Sentry fingerprints
don't fragment: stop interpolating streak/lastErr into the message string and
instead pass them as structured fields to workerLog.Error; keep the message text
like "Job failed due to consecutive task failures" and add fields
"failure_streak": streak, "last_error": lastErr (or lastErr.Error() only if
non-nil), "job_id": jobID, and "threshold": wp.jobFailureThreshold when calling
workerLog.Error.
- Around line 2057-2069: The defer in processNextTask currently calls
hub.Recover and also sentry.CaptureException then logs via workerLog.Error,
causing duplicate Sentry events; remove the explicit sentry.CaptureException
branch and keep only hub.Recover when sentry.CurrentHub() != nil (or rely solely
on the logging fanout), leaving the workerLog.Error logging intact. Update the
defer so it calls sentry.CurrentHub() and invokes hub.Recover(r) if hub != nil
and do not call sentry.CaptureException anywhere in that block (symbols:
processNextTask, sentry.CurrentHub, hub.Recover, sentry.CaptureException,
workerLog.Error).
---
Duplicate comments:
In `@internal/db/webflow_connections.go`:
- Line 201: The delete-failure log call using dbLog.Error currently only
includes "connection_id"; update the dbLog.Error invocation in the webflow
connection delete routine (the function that performs the deletion and
references connectionID and organisationID) to also include "organisation_id",
passing the organisationID variable so the log line contains both
"connection_id" and "organisation_id" for tenant traceability; ensure you modify
the dbLog.Error call (the one that logs "Failed to delete webflow connection")
to add the "organisation_id", organisationID key/value pair.
In `@internal/jobs/worker.go`:
- Around line 4562-4564: The error logs currently pass full request diagnostics
and header payloads to workerLog.Error (see the workerLog.Error call that
includes "request_diagnostics", result.RequestDiagnostics and similar calls
around the other occurrences), which can leak sensitive data; change each of
those error logs to avoid including the raw objects and instead log a small,
non-sensitive summary (e.g., task.ID, length of the diagnostics payload, number
of headers, and a redacted flag for auth/PII) and the marshal error; update the
workerLog.Error invocations at the shown sites to replace
result.RequestDiagnostics / header objects with that concise summary string or
struct.
- Around line 3793-3797: The log currently passes cleanupDuration (a
time.Duration) to workerLog.Info under the "duration_ms" key; change that to
pass the actual millisecond value by calling cleanupDuration.Milliseconds() (or
int64(cleanupDuration / time.Millisecond)) so "duration_ms" is a numeric
millisecond count—update the workerLog.Info call that logs "job_id",
targetJobID, "tasks_cleaned", totalCleaned, "duration_ms", cleanupDuration to
use cleanupDuration.Milliseconds() instead.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9054fdfd-8f32-45f2-a2e7-35102206fed3
📒 Files selected for processing (11)
cmd/app/main.gocmd/test_jobs/main.gointernal/crawler/robots.gointernal/crawler/tls_aia.gointernal/db/retry.gointernal/db/schedulers.gointernal/db/slack.gointernal/db/webflow_connections.gointernal/jobs/domain_limiter.gointernal/jobs/worker.gointernal/logging/logging_test.go
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.husky/pre-commit:
- Around line 3-4: Modify the pre-commit hook so golangci-lint only runs when
staged files include Go files: detect staged filenames (use git diff --name-only
--cached) and check for entries matching *.go; if any exist, invoke the existing
golangci-lint command (golangci-lint run --config .golangci.yml --fast-only) as
before, otherwise skip it and exit successfully; update the .husky/pre-commit
logic around the golangci-lint invocation to gate execution on that staged-Go
check.
- Line 3: Replace the Bash-specific redirection in the pre-commit hook
conditional: update the line containing "if command -v golangci-lint &>
/dev/null; then" to use the POSIX-compatible redirection ">/dev/null 2>&1" so
the check for golangci-lint works in dash/other POSIX shells.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 48d7dcdb-d92c-41f7-b365-bdb830d911d5
📒 Files selected for processing (1)
.husky/pre-commit
| if command -v golangci-lint &> /dev/null; then | ||
| golangci-lint run --config .golangci.yml --fast-only |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Gate golangci-lint to staged Go changes for faster commits.
Line [4] currently runs golangci-lint on every commit when installed. Consider only running it when staged files include *.go, then let CI handle full coverage.
Proposed refactor
-if command -v golangci-lint >/dev/null 2>&1; then
- golangci-lint run --config .golangci.yml --fast-only
+if git diff --cached --name-only -- '*.go' | grep -q .; then
+ if command -v golangci-lint >/dev/null 2>&1; then
+ golangci-lint run --config .golangci.yml --fast-only
+ else
+ echo "Warning: golangci-lint not installed, skipping local lint (will still run in CI)"
+ fi
else
- echo "Warning: golangci-lint not installed, skipping local lint (will still run in CI)"
+ echo "No staged Go files; skipping golangci-lint"
fiBased on learnings: Applies to **/*.go : Prefer go test and targeted checks before broader validation on Go files.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.husky/pre-commit around lines 3 - 4, Modify the pre-commit hook so
golangci-lint only runs when staged files include Go files: detect staged
filenames (use git diff --name-only --cached) and check for entries matching
*.go; if any exist, invoke the existing golangci-lint command (golangci-lint run
--config .golangci.yml --fast-only) as before, otherwise skip it and exit
successfully; update the .husky/pre-commit logic around the golangci-lint
invocation to gate execution on that staged-Go check.
| @@ -1 +1,7 @@ | |||
| npx lint-staged | |||
|
|
|||
| if command -v golangci-lint &> /dev/null; then | |||
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify POSIX shell compatibility for the hook
sh -n .husky/pre-commitRepository: Harvey-AU/hover
Length of output: 41
🏁 Script executed:
cat -n .husky/pre-commitRepository: Harvey-AU/hover
Length of output: 324
Use POSIX-compatible redirection for broader shell compatibility.
Line 3 contains &> /dev/null, which is Bash-specific syntax. While it passes syntax validation on this system, it is not POSIX-standard and may fail on stricter POSIX shells (e.g., dash). Use the portable equivalent >/dev/null 2>&1 to ensure the hook works reliably across different shell environments.
Suggested improvement
-if command -v golangci-lint &> /dev/null; then
+if command -v golangci-lint >/dev/null 2>&1; then📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if command -v golangci-lint &> /dev/null; then | |
| if command -v golangci-lint >/dev/null 2>&1; then |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.husky/pre-commit at line 3, Replace the Bash-specific redirection in the
pre-commit hook conditional: update the line containing "if command -v
golangci-lint &> /dev/null; then" to use the POSIX-compatible redirection
">/dev/null 2>&1" so the check for golangci-lint works in dash/other POSIX
shells.
|
🐝 Review App Deployed Homepage: https://hover-pr-333.fly.dev |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
scripts/lint-go.js (1)
16-20: 🧹 Nitpick | 🔵 TrivialGate
golangci-lintto staged Go changes before running the full lint command.Line [16] runs linting whenever
golangci-lintis installed, even when no Go files are staged. That slows non-Go commits and widens local checks unnecessarily.Proposed refactor
const check = spawnSync("golangci-lint", ["--version"], { stdio: "ignore", shell: false, }); if (check.error) { console.log( "Warning: golangci-lint not installed, skipping local lint (will still run in CI)" ); process.exit(0); } +const stagedGo = spawnSync("git", ["diff", "--cached", "--name-only", "--", "*.go"], { + encoding: "utf8", + shell: false, +}); +if (stagedGo.status !== 0) { + console.error("Failed to inspect staged files; aborting Go lint step."); + process.exit(stagedGo.status ?? 1); +} +if (!stagedGo.stdout.trim()) { + console.log("No staged Go files; skipping golangci-lint"); + process.exit(0); +} + const result = spawnSync( "golangci-lint", ["run", "--config", ".golangci.yml", "--fast-only"], { stdio: "inherit", shell: false } ); process.exit(result.status ?? 1);Based on learnings: Applies to **/*.go : Prefer
go testand targeted checks before broader validation on Go files.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/lint-go.js` around lines 16 - 20, The script currently always invokes spawnSync("golangci-lint", ...) which runs the full linter even when no Go files are staged; update scripts/lint-go.js to first collect staged file names (e.g., via git diff --name-only --staged), filter for files ending in .go, and only call spawnSync("golangci-lint", ["run", ...]) when that filtered list is non-empty; if there are staged Go files prefer running targeted checks (e.g., run golangci-lint only on those paths or run go test for packages touched) and otherwise skip the full golangci-lint invocation to avoid slowing non-Go commits.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@scripts/lint-go.js`:
- Around line 16-20: The script currently always invokes
spawnSync("golangci-lint", ...) which runs the full linter even when no Go files
are staged; update scripts/lint-go.js to first collect staged file names (e.g.,
via git diff --name-only --staged), filter for files ending in .go, and only
call spawnSync("golangci-lint", ["run", ...]) when that filtered list is
non-empty; if there are staged Go files prefer running targeted checks (e.g.,
run golangci-lint only on those paths or run go test for packages touched) and
otherwise skip the full golangci-lint invocation to avoid slowing non-Go
commits.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 13d6144a-c125-4631-8da9-468455778309
📒 Files selected for processing (2)
.husky/pre-commitscripts/lint-go.js
|
🐝 Review App Deployed Homepage: https://hover-pr-333.fly.dev |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
internal/jobs/worker.go (2)
1533-1543:⚠️ Potential issue | 🟠 MajorKeep the error message static for fingerprinting.
messageincludesstreakandlastErr, so each failure shape becomes a different log message and therefore a different Sentry fingerprint. That undoes the new[component, staticMessage]grouping.Suggested fix
- message := fmt.Sprintf("Job failed after %d consecutive task failures", streak) - if lastErr != nil { - message = fmt.Sprintf("%s (last error: %s)", message, lastErr.Error()) - } + message := fmt.Sprintf("Job failed after %d consecutive task failures", streak) + logArgs := []any{ + "job_id", jobID, + "failure_streak", streak, + "threshold", wp.jobFailureThreshold, + } + if lastErr != nil { + logArgs = append(logArgs, "last_error", lastErr) + } - workerLog.Error(message, - "job_id", jobID, - "failure_streak", streak, - "threshold", wp.jobFailureThreshold, - ) + workerLog.Error("Job failed after consecutive task failures", logArgs...)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/jobs/worker.go` around lines 1533 - 1543, The log message in markJobFailedDueToConsecutiveFailures should be a static string to preserve Sentry grouping; change the interpolated message to a constant like "Job failed due to consecutive task failures" and stop including streak or lastErr in the formatted message, and instead pass the dynamic data as structured fields to workerLog.Error (e.g. keep "job_id", "failure_streak", "threshold" and add a field like "last_error" or "error" with lastErr) so the message text stays constant while details remain in metadata.
2057-2069:⚠️ Potential issue | 🟠 MajorAvoid double-capturing panics to Sentry.
This defer already reports the panic via
hub.Recover/sentry.CaptureException.workerLog.Errorauto-captures error-level logs as well, so one panic becomes two Sentry events.Suggested fix
- workerLog.Error("Recovered from panic in processNextTask", + workerLog.ErrorContext(logging.NoCapture(ctx), "Recovered from panic in processNextTask", "panic", fmt.Sprintf("%v", r), "stack", string(stack), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/jobs/worker.go` around lines 2057 - 2069, The defer in processNextTask double-reports panics to Sentry because it calls hub.Recover / sentry.CaptureException and then logs with workerLog.Error (which auto-captures error-level logs); remove the explicit Sentry calls in that defer and keep only the formatted recoveredErr, stack and workerLog.Error call so the panic is reported once, or alternatively disable auto-capture on workerLog.Error and keep only hub.Recover/sentry.CaptureException—update the defer that references hub.Recover, sentry.CaptureException, workerLog.Error, recoveredErr and debug.Stack accordingly so a single Sentry event is emitted.
♻️ Duplicate comments (2)
internal/jobs/worker.go (2)
4562-4564:⚠️ Potential issue | 🟠 MajorDon't attach raw diagnostics and header payloads to error logs.
These
Errorcalls still emit fullrequest_diagnostics,headers,second_headers, andcache_attemptsobjects. With central error-level capture, that risks leaking sensitive values such as authorisation or cookie data and can blow up Sentry payload size. Keep only safe summaries here, or move the full payloads toDebug.Suggested fix pattern
- workerLog.Error("Failed to marshal request diagnostics", - "error", err, "task_id", task.ID, "request_diagnostics", result.RequestDiagnostics) + workerLog.Error("Failed to marshal request diagnostics", + "error", err, "task_id", task.ID) - workerLog.Error("Failed to marshal headers", - "error", err, "task_id", task.ID, "headers", result.Headers) + workerLog.Error("Failed to marshal headers", + "error", err, "task_id", task.ID, "header_keys", len(result.Headers)) - workerLog.Error("Failed to marshal second headers", - "error", err, "task_id", task.ID, "second_headers", result.SecondHeaders) + workerLog.Error("Failed to marshal second headers", + "error", err, "task_id", task.ID, "second_header_keys", len(result.SecondHeaders)) - workerLog.Error("Failed to marshal cache check attempts", - "error", err, "task_id", task.ID, "cache_attempts", result.CacheCheckAttempts) + workerLog.Error("Failed to marshal cache check attempts", + "error", err, "task_id", task.ID, "attempt_count", len(result.CacheCheckAttempts))Also applies to: 4923-4939, 4951-4953
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/jobs/worker.go` around lines 4562 - 4564, The Error-level logging calls in worker.go (e.g., the workerLog.Error invocation that currently logs "Failed to marshal request diagnostics") must not include raw payloads like result.RequestDiagnostics, headers, second_headers, or cache_attempts; instead replace those fields with safe summaries (e.g., booleans, counts, and redacted/hashed indicators) and move the full objects to a Debug-level log if needed. Update the Error calls (search for workerLog.Error lines around the referenced sections and the uses of request_diagnostics, headers, second_headers, cache_attempts and task.ID) to emit only task.ID, a short summary (e.g., len(...) or "present"/"redacted"), and the error; add a workerLog.Debug that includes the full payloads for troubleshooting. Ensure no Error-level log contains raw header or cookie content.
3793-3797:⚠️ Potential issue | 🟡 Minor
duration_msis still logging the wrong unit.The field name says milliseconds, but Line 3796 still emits the raw
time.Durationvalue instead of an integer millisecond count.Suggested fix
workerLog.Info("Orphaned task cleanup completed for job", "job_id", targetJobID, "tasks_cleaned", totalCleaned, - "duration_ms", cleanupDuration, + "duration_ms", cleanupDuration.Milliseconds(), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/jobs/worker.go` around lines 3793 - 3797, The log call to workerLog.Info is emitting cleanupDuration (symbol cleanupDuration) which is a time.Duration but the field name "duration_ms" expects an integer millisecond count; update the call that logs the orphaned task cleanup (workerLog.Info with keys "job_id", "tasks_cleaned", "duration_ms") to pass cleanupDuration.Milliseconds() (or int64(cleanupDuration / time.Millisecond)) instead of the raw duration so the logged value is milliseconds as an integer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/jobs/worker.go`:
- Line 142: The package-level variable workerLog created with
logging.Component("worker") captures the slog.Default() too early; change it to
be resolved lazily so it uses the handler installed by logging.Setup() — for
example replace the package-scope workerLog with a function or getter (e.g.,
getWorkerLog or workerLogger()) that calls logging.Component("worker") on first
use, or initialize workerLog from that getter after logging.Setup() runs; update
all references to use the new getter so logging.Component("worker") is not
invoked at package init time.
---
Outside diff comments:
In `@internal/jobs/worker.go`:
- Around line 1533-1543: The log message in
markJobFailedDueToConsecutiveFailures should be a static string to preserve
Sentry grouping; change the interpolated message to a constant like "Job failed
due to consecutive task failures" and stop including streak or lastErr in the
formatted message, and instead pass the dynamic data as structured fields to
workerLog.Error (e.g. keep "job_id", "failure_streak", "threshold" and add a
field like "last_error" or "error" with lastErr) so the message text stays
constant while details remain in metadata.
- Around line 2057-2069: The defer in processNextTask double-reports panics to
Sentry because it calls hub.Recover / sentry.CaptureException and then logs with
workerLog.Error (which auto-captures error-level logs); remove the explicit
Sentry calls in that defer and keep only the formatted recoveredErr, stack and
workerLog.Error call so the panic is reported once, or alternatively disable
auto-capture on workerLog.Error and keep only
hub.Recover/sentry.CaptureException—update the defer that references
hub.Recover, sentry.CaptureException, workerLog.Error, recoveredErr and
debug.Stack accordingly so a single Sentry event is emitted.
---
Duplicate comments:
In `@internal/jobs/worker.go`:
- Around line 4562-4564: The Error-level logging calls in worker.go (e.g., the
workerLog.Error invocation that currently logs "Failed to marshal request
diagnostics") must not include raw payloads like result.RequestDiagnostics,
headers, second_headers, or cache_attempts; instead replace those fields with
safe summaries (e.g., booleans, counts, and redacted/hashed indicators) and move
the full objects to a Debug-level log if needed. Update the Error calls (search
for workerLog.Error lines around the referenced sections and the uses of
request_diagnostics, headers, second_headers, cache_attempts and task.ID) to
emit only task.ID, a short summary (e.g., len(...) or "present"/"redacted"), and
the error; add a workerLog.Debug that includes the full payloads for
troubleshooting. Ensure no Error-level log contains raw header or cookie
content.
- Around line 3793-3797: The log call to workerLog.Info is emitting
cleanupDuration (symbol cleanupDuration) which is a time.Duration but the field
name "duration_ms" expects an integer millisecond count; update the call that
logs the orphaned task cleanup (workerLog.Info with keys "job_id",
"tasks_cleaned", "duration_ms") to pass cleanupDuration.Milliseconds() (or
int64(cleanupDuration / time.Millisecond)) instead of the raw duration so the
logged value is milliseconds as an integer.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: cca6ecd6-035d-45f4-a307-4038ece26267
📒 Files selected for processing (1)
internal/jobs/worker.go
| // pick a different task rather than blocking on the delay. | ||
| var ErrDomainDelay = errors.New("domain rate limit delay") | ||
|
|
||
| var workerLog = logging.Component("worker") |
There was a problem hiding this comment.
Initialising workerLog here binds it to the wrong handler.
logging.Component() snapshots slog.Default() when this package is initialised. logging.Setup() swaps the default later, so this file can keep logging through the pre-setup handler and miss the new JSON/Sentry fan-out entirely. Please resolve the logger lazily, or only construct the component logger after setup has run.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/jobs/worker.go` at line 142, The package-level variable workerLog
created with logging.Component("worker") captures the slog.Default() too early;
change it to be resolved lazily so it uses the handler installed by
logging.Setup() — for example replace the package-scope workerLog with a
function or getter (e.g., getWorkerLog or workerLogger()) that calls
logging.Component("worker") on first use, or initialize workerLog from that
getter after logging.Setup() runs; update all references to use the new getter
so logging.Component("worker") is not invoked at package init time.
|
🐝 Review App Deployed Homepage: https://hover-pr-333.fly.dev |
|
🐝 Review App Deployed Homepage: https://hover-pr-333.fly.dev |
|
🐝 Review App Deployed Homepage: https://hover-pr-333.fly.dev |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/aggregate_logs.py`:
- Around line 193-205: The CSV writing in write_components_csv currently
concatenates strings and will break if component names contain commas or quotes;
change it to use Python's csv.writer: open the file with open(csv_path, "w",
newline="") and create a csv.writer(f), then write the header with
writer.writerow(["timestamp"] + all_components) and write each row using
writer.writerow([minute] + counts) (ensure counts are strings or ints) so the
csv module handles quoting/escaping for component names and values.
In `@scripts/process_logs.py`:
- Around line 22-24: The for-loop that checks timestamp keys currently uses the
verbose pattern `if key in record and record[key]:` — make it consistent with
`_normalise_full_timestamp` by using the cleaner `if record.get(key):` pattern;
update the loop that assigns `raw = str(record[key])` to retrieve the value via
`record.get(key)` (and keep assigning to `raw`) so the key-checking style
matches the `_normalise_full_timestamp` implementation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7848c7f7-095c-438d-a571-5264c69818ac
📒 Files selected for processing (2)
scripts/aggregate_logs.pyscripts/process_logs.py
|
🐝 Review App Deployed Homepage: https://hover-pr-333.fly.dev |
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
scripts/aggregate_logs.py (1)
172-190:⚠️ Potential issue | 🟠 MajorDon’t use the top-50 CSV as incremental state.
load_existing_data()restoresevent_countsfromevents_per_minute.csv, but this writer only persists the current top 50 events. After the next incremental run, every historical event outside that set is gone, sosummary.mdand future top-event selection undercount older rare events. Please keep the full per-minute event map in durable state and treat this CSV as presentation only.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/aggregate_logs.py` around lines 172 - 190, The write_events_csv function currently writes only the top_n events to events_per_minute.csv which is then read by load_existing_data, causing loss of older events; instead persist the full per-minute event_counts to durable storage and reserve write_events_csv for presentation: modify write_events_csv to write only a human-facing CSV and add a new durable writer (or change the existing file target) that serializes the complete by_minute[minute]["event_counts"] (e.g., JSON or full CSV with event keys) so load_existing_data reads the full per-minute map rather than the truncated top-N; update references to events_per_minute.csv, load_existing_data, write_events_csv, and summary.md to use the full durable store for state and keep the top-N CSV as a read-only report.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/aggregate_logs.py`:
- Around line 132-136: The current merge only adds file-level totals when
file_minutes is non-empty, causing summaries with parsed==0 to be dropped;
always merge total_lines and failed_to_parse into the aggregate even if
file_minutes is empty by adding them to a dedicated bucket or global counters.
Update the block that references file_minutes, first_minute, by_minute,
total_lines and failed_to_parse so that if file_minutes is empty you still
record the file-level counts (e.g., create a sentinel minute key or increment
top-level aggregate counters) rather than skipping the merge. Ensure the change
preserves existing behavior when file_minutes is non-empty (still add to
by_minute[first_minute]).
- Around line 308-311: The summary currently reports attempted files because
write_summary(summary_path, by_minute, len(new_files)) uses len(new_files) even
though failed files are excluded from by_minute; change the call to pass the
actual successful list/count from process_json_file (use the success variable) —
e.g., call write_summary(summary_path, by_minute, len(success)) or pass success
directly if write_summary expects the list — so summary.md reflects processed
(successful) inputs; update any write_summary parameter expectation if needed.
In `@scripts/process_logs.py`:
- Around line 119-123: The current branch skips creating the flat CSV when
flat_rows is empty; always open flat_csv_path and write the CSV header so an
empty-run doesn't leave stale files—inside the block checking flat_csv_path is
not None, open flat_csv_path with the same newline/encoding settings, create
csv.DictWriter(...) and call writer.writeheader() unconditionally, then only
call writer.writerows(flat_rows) when flat_rows is truthy; update the code
around flat_csv_path, flat_rows and the writer/DictWriter usage in
scripts/process_logs.py accordingly.
---
Duplicate comments:
In `@scripts/aggregate_logs.py`:
- Around line 172-190: The write_events_csv function currently writes only the
top_n events to events_per_minute.csv which is then read by load_existing_data,
causing loss of older events; instead persist the full per-minute event_counts
to durable storage and reserve write_events_csv for presentation: modify
write_events_csv to write only a human-facing CSV and add a new durable writer
(or change the existing file target) that serializes the complete
by_minute[minute]["event_counts"] (e.g., JSON or full CSV with event keys) so
load_existing_data reads the full per-minute map rather than the truncated
top-N; update references to events_per_minute.csv, load_existing_data,
write_events_csv, and summary.md to use the full durable store for state and
keep the top-N CSV as a read-only report.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0b8b6514-3dc5-4633-bab0-212549cad631
📒 Files selected for processing (2)
scripts/aggregate_logs.pyscripts/process_logs.py
|
🐝 Review App Deployed Homepage: https://hover-pr-333.fly.dev |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/aggregate_logs.py`:
- Around line 132-136: The current code uses the sentinel anchor =
"_unparseable" and writes totals into by_minute[anchor], which later gets
treated as a timestamp key; instead, avoid injecting that sentinel into the
time-series map by moving those file-level totals out of by_minute: if
file_minutes is empty, accumulate total_lines and failed_to_parse into a
separate scalar or dict (e.g., unparseable_totals) or always add totals only to
the first minute when file_minutes exists; alternatively ensure any writer that
iterates by_minute filters out the "_unparseable" key before
sorting/formatting—update the logic around anchor, by_minute, total_lines, and
failed_to_parse (and any writer functions that consume by_minute) to implement
one of these fixes so the sentinel never becomes a timestamp row in outputs.
- Around line 58-109: The current load_existing_data function rebuilds by_minute
from lossy CSV reports (time_series.csv, events_per_minute.csv,
components_per_minute.csv) which drops low-volume events and doesn't persist
totals (total_lines, failed_to_parse); instead persist and load a lossless JSON
state file and stop reconstructing full state from the CSVs. Change
load_existing_data to first look for a single durable state file (e.g.,
aggregate_state.json) and load the complete by_minute structure and aggregate
fields (total_lines, failed_to_parse) from it; only fall back to CSV parsing as
a best-effort / partial fallback (or remove CSV-based reconstruction entirely),
and ensure the code paths that write state (where incremental runs update state)
write the full JSON snapshot so functions referencing by_minute, total_lines,
and failed_to_parse can trust restored data.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a7f4bb65-ba9a-4a7c-b2c0-aebd9d4ddcbe
📒 Files selected for processing (2)
scripts/aggregate_logs.pyscripts/process_logs.py
|
🐝 Review App Deployed Homepage: https://hover-pr-333.fly.dev |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/aggregate_logs.py`:
- Around line 283-284: The current logic only calls save_data(log_path,
by_minute, totals, processed_set) when incremental is true, so a full rebuild
(when incremental is False) leaves the on-disk state (.aggregate_data.json)
stale; either call save_data unconditionally after rebuilding (remove the
incremental check so save_data(...) runs for both full and incremental paths),
or explicitly remove the stale state file when running with --full (delete the
.aggregate_data.json path referenced by log_path or the module-level state
filename) to ensure future incremental runs don't reload outdated
processed_set/by_minute/totals.
- Around line 104-125: The code mutates totals and by_minute while iterating
data (variables totals, by_minute, and data) so a partial failure (during
event_counts or component_counts) leaves partial updates; fix by building local
temporary accumulators (e.g., local_totals and local_by_minute) and apply the
changes to totals and by_minute only after all parsing of data["level_counts"],
data["event_counts"], and data["component_counts"] completes successfully;
ensure you copy the same update logic (incrementing samples, level_counts,
event_counts, component_counts) into the temp structures and only commit the
temp accumulators to totals and by_minute at the end of the function (or return
them) to make each file’s merge atomic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: b98254f3-bdf0-4d4c-b7aa-22d2e8181c16
📒 Files selected for processing (1)
scripts/aggregate_logs.py
|
🐝 Review App Deployed Homepage: https://hover-pr-333.fly.dev |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/aggregate_logs.py`:
- Around line 238-245: The current warn_and_error grouping in
scripts/aggregate_logs.py (using event_totals and the warn_and_error variable)
relies on message keyword heuristics because scripts/process_logs.py serialises
a flattened event_counts without level information; update the pipeline so the
serialized summary includes per-level counts (e.g., event_counts_by_level or a
dedicated warn_error_map) in process_logs.py instead of dropping level, then
change aggregate_logs.py to build warn_and_error from those per-level counts
(reference event_counts/event_totals and the warn_and_error construction) so the
table is generated from true severity counts rather than message keyword
matching.
- Around line 93-95: The current write to data_file (constructed from log_dir /
DATA_FILE_NAME) truncates the target before json.dump finishes; change the write
to first serialize to a temp file in the same directory (e.g., create a temp
file alongside data_file using tempfile.NamedTemporaryFile or a unique suffix),
write json.dump(data, temp_f, separators=(",", ":")) and flush/fsync as needed,
then atomically replace the target with os.replace(temp_path, data_file) so the
aggregate state is only swapped in after a successful write.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1751b3f4-35d6-4f75-94de-a4fcfea989ac
📒 Files selected for processing (1)
scripts/aggregate_logs.py
|
🐝 Review App Deployed Homepage: https://hover-pr-333.fly.dev |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/process_logs.py (1)
88-128:⚠️ Potential issue | 🟠 MajorStream the flat CSV instead of buffering every parsed row.
main()always passesflat_csv_path, so this path holds one dict per parsed log line inflat_rowsbefore anything is written. On large Fly.io dumps that adds O(n) memory for output that is fully streamable. Open the writer once and emit rows inside the read loop.🛠️ Proposed refactor
- flat_rows = [] + flat_file = None + writer = None + if flat_csv_path is not None: + flat_file = flat_csv_path.open("w", newline="", encoding="utf-8") + writer = csv.DictWriter( + flat_file, + fieldnames=["timestamp", "level", "component", "message", "extras"], + ) + writer.writeheader() - with raw_path.open("r", encoding="utf-8", errors="ignore") as handle: - for record, _original in _iter_records(handle): + try: + with raw_path.open("r", encoding="utf-8", errors="ignore") as handle: + for record, _original in _iter_records(handle): total += 1 if record is None: errors += 1 continue @@ - if flat_csv_path is not None: + if writer is not None: extras = {k: v for k, v in record.items() if k not in _CORE_FIELDS} - flat_rows.append({ + writer.writerow({ "timestamp": _normalise_full_timestamp(record), "level": level, "component": component, "message": message, "extras": json.dumps(extras, separators=(",", ":")) if extras else "", }) - - if flat_csv_path is not None: - with flat_csv_path.open("w", newline="", encoding="utf-8") as f: - writer = csv.DictWriter(f, fieldnames=["timestamp", "level", "component", "message", "extras"]) - writer.writeheader() - if flat_rows: - writer.writerows(flat_rows) + finally: + if flat_file is not None: + flat_file.close()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/process_logs.py` around lines 88 - 128, The code currently buffers every parsed row in flat_rows when flat_csv_path is set; change this to stream rows as they are parsed by opening the CSV file and creating a csv.DictWriter once (writeheader()) before iterating _iter_records, then inside the loop build the row dict (use _normalise_full_timestamp(record), level, component, message and json.dumps extras for keys not in _CORE_FIELDS) and call writer.writerow(row) immediately instead of appending to flat_rows; remove the flat_rows accumulation and keep the existing logic that only opens/writes the CSV when flat_csv_path is not None so the writer is only active when needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/aggregate_logs.py`:
- Around line 134-138: The commit of warn_error_counts must be made atomic in
process_json_file: currently totals["total_lines"] and totals["failed_to_parse"]
are updated before merging warn_error_counts, so exceptions from a
missing/malformed warn_error_counts can leave scalars double-counted on retry;
instead, validate and stage warn_error_counts first (ensure
data.get("warn_error_counts") is a dict and all counts are numeric), build a
local staged_counts dict summing the file's warn_error_counts, and only after
successful validation update totals by adding file_total_lines, file_failed, and
merging staged_counts into totals["warn_error_counts"] so the three updates
occur together.
---
Outside diff comments:
In `@scripts/process_logs.py`:
- Around line 88-128: The code currently buffers every parsed row in flat_rows
when flat_csv_path is set; change this to stream rows as they are parsed by
opening the CSV file and creating a csv.DictWriter once (writeheader()) before
iterating _iter_records, then inside the loop build the row dict (use
_normalise_full_timestamp(record), level, component, message and json.dumps
extras for keys not in _CORE_FIELDS) and call writer.writerow(row) immediately
instead of appending to flat_rows; remove the flat_rows accumulation and keep
the existing logic that only opens/writes the CSV when flat_csv_path is not None
so the writer is only active when needed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e10bcc74-b521-4fcb-8a84-aaf4571cc7bb
📒 Files selected for processing (2)
scripts/aggregate_logs.pyscripts/process_logs.py
|
🐝 Review App Deployed Homepage: https://hover-pr-333.fly.dev |
|
🐝 Review App Deployed Homepage: https://hover-pr-333.fly.dev |
Migrate to slog + Sentry fingerprinting
Summary
internal/loggingpackage built on stdliblog/slog+sentry-go/slogv0.45.0worker,queue,batch,db,api,crawler,archive,notify,auth,startup,techdetect,util,jobs,pressure) — every log line carries its component as a structured field and Sentry tag[component, staticMessage];BeforeSendhook normalises UUID/numeric tokens as a safety netlogging.NoCapture(ctx)suppresses Sentry capture for expected errors (404s, validation failures) without silencing the loggithub.com/rs/zerologdependency entirelyscripts/process_logs.pyandscripts/aggregate_logs.pyto extract the newcomponentfield, producingcomponents_per_minute.csvand a component breakdown insummary.md.claude/skills/structured-logging/SKILL.mdto document the new patternsTest plan
go build ./...— clean compilego test ./...— all 21 packages passgo vet ./...— no issuescomponentfield and static message strings[component, message]in stagingscripts/process_logs.pyagainst a sample log file and confirmcomponent_countsin output JSONscripts/aggregate_logs.pyand confirmcomponents_per_minute.csvis written🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Chores
Tests