Refactor by Varun5711 · Pull Request #11 · Varun5711/tiny

Varun5711 · 2026-05-31T13:27:50Z

Summary by CodeRabbit

New Features
- Added Elasticsearch support for URL search and click event indexing
- Introduced distributed tracing with OpenTelemetry integration
- Added /api/search endpoint for querying URLs
- Implemented URL pagination
- Added Jaeger deployment for trace visualization
- Configurable CORS and JWT support
Infrastructure & Deployment
- Added GitHub Actions CI/CD pipeline with linting, testing, and Docker builds
- Introduced Kubernetes network policies for secure pod communication
- Added Elasticsearch and Jaeger Kubernetes deployments
- Enhanced container security (non-root execution, read-only filesystems, dropped capabilities)
- Pinned Alpine base images to version 3.21
Improvements
- Enhanced health check endpoint with database and Redis verification
- Improved rate limiter IP detection from proxied requests
- Added input size limits to HTTP handlers

Add uber fx, zap, opentelemetry, jaeger exporter, otelgrpc, otelhttp, and elasticsearch client libraries.

Rewrite internal/logger to wrap go.uber.org/zap. Outputs JSON in production (default) and colored text in dev (LOG_FORMAT=text). Same public API preserved so all callers work unchanged.

Update Storage interface and PostgresStorage to accept ctx from callers instead of creating context.Background() internally. Fix redirect handler to use r.Context(). Add timeout to ClickHouse startup ping.

DeleteURL now deletes from Postgres before clearing cache. Add ListPaginated/ListByUserIDPaginated with SQL LIMIT/OFFSET and COUNT(*) instead of fetching all rows and slicing in memory.

- CORS: configurable allowed origins via CORS_ALLOWED_ORIGINS env var - Rate limiter: parse X-Forwarded-For properly, validate IPs - JWT: remove hardcoded fallback secret, require JWT_SECRET env var - HTTP handlers: add 1MB request body size limits

Rewrite all 7 cmd/*/main.go files to use go.uber.org/fx for declarative dependency injection with proper lifecycle management. - Graceful shutdown via fx.Lifecycle OnStop hooks for all services - HTTP servers use configurable timeouts (Read/Write/Idle) - Health endpoints check actual DB and Redis connectivity - Redirect service gets /health endpoint (fixes K8s probe) - Workers use WaitGroup for clean goroutine drain on shutdown - Remove all manual signal handling (FX handles SIGINT/SIGTERM)

- URL search index (CRUD + full-text search across long_url/short_code) - Click event indexing with bulk API for pipeline-worker - Log shipping via zap WriteSyncer with batched bulk writes - ES client as optional dependency (nil-safe, services work without ES) - /api/search endpoint on api-gateway backed by ES - ElasticsearchConfig in config with ES_ENABLED toggle - Logger supports extra write syncer for multi-output (stdout + ES)

- OTel tracer provider with OTLP/HTTP exporter for Jaeger - Configurable sampling rate (TRACING_SAMPLE_RATE) and toggle (TRACING_ENABLED) - HTTP tracing middleware with X-Trace-ID response header - gRPC client and server instrumentation via otelgrpc StatsHandler - Tracer provider lifecycle management (init on start, flush on shutdown) - All 7 services instrumented with tracer provider + graceful shutdown

- Add .dockerignore to exclude .git, docs, deployments, test files - Pin alpine:latest to alpine:3.21 across all 8 Dockerfiles - Fix api-gateway HEALTHCHECK: replace broken binary flag with wget health probe

- Add container-level securityContext to all app deployments (runAsNonRoot, readOnlyRootFilesystem, drop ALL capabilities) - Change imagePullPolicy from Never to IfNotPresent across all deployments - Add dev-only warning comment to secrets.yaml - Add NetworkPolicy rules restricting inter-service communication - Add Elasticsearch single-node StatefulSet + Service manifest - Add Jaeger all-in-one Deployment + Service manifest - Update kustomization.yaml with new resources

- GitHub Actions workflow with lint (golangci-lint), test (race detector + coverage), vulnerability scanning (govulncheck), and Go build verification - Docker image builds for all 7 services using BuildKit with GHA cache - Push to GHCR on main branch merge, build-only on PRs

coderabbitai · 2026-05-31T13:28:01Z

Warning

Review limit reached

@Varun5711, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 14 minutes and 9 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 985d6372-9bdf-4062-a74f-b0286da64499

📥 Commits

Reviewing files that changed from the base of the PR and between bb3df81 and db1209a.

📒 Files selected for processing (26)

.github/workflows/ci.yaml
cmd/analytics-worker/main.go
cmd/api-gateway/main.go
cmd/cleanup-worker/main.go
cmd/pipeline-worker/main.go
cmd/redirect-service/main.go
cmd/tui/ui/analytics.go
cmd/url-service/main.go
cmd/user-service/main.go
internal/clickhouse/analytics.go
internal/clickhouse/client.go
internal/elasticsearch/click_index.go
internal/elasticsearch/client.go
internal/elasticsearch/log_index.go
internal/elasticsearch/url_index.go
internal/grpc/client.go
internal/handlers/analytics_handler.go
internal/handlers/auth_handler.go
internal/handlers/http_handler.go
internal/handlers/swagger_handler.go
internal/idgen/base62_test.go
internal/idgen/snowflake_test.go
internal/middleware/ratelimit.go
internal/redis/client.go
internal/service/url_service.go
test/integration/integration_test.go

📝 Walkthrough

Walkthrough

This PR refactors the URL shortener service to use Uber fx for dependency injection while introducing OpenTelemetry observability, Elasticsearch integration, and hardening infrastructure. Configuration system expands to support Elasticsearch, tracing, CORS, and JWT settings. Storage interface now accepts context throughout. All service entrypoints migrate from manual initialization to fx-managed lifecycles. Docker images are pinned to Alpine 3.21, Kubernetes deployments gain security contexts, and new observability infrastructure (Jaeger, Elasticsearch) is deployed alongside network policies.

Changes

Dependency Injection, Observability, and Elasticsearch

Layer / File(s)	Summary
Configuration, Logging, and Tracing Foundation `internal/config/config.go`, `internal/logger/logger.go`, `internal/tracing/tracer.go`	Config struct expanded with Elasticsearch, Tracing, CORS, JWT subsections. Logger replaced with Zap-backed implementation supporting structured fields and external syncer injection. OpenTelemetry tracer initialization added with OTLP HTTP exporter and dynamic sampler selection.
Storage Interface and Context Propagation `internal/storage/storage.go`, `internal/storage/postgres.go`, `internal/storage/postgres_user_filters.go`	Storage interface and PostgresStorage updated to accept caller-provided context for all CRUD operations. Removes internal timeout/cancel behavior. Adds Delete, ListPaginated, and ListByUserIDPaginated methods with total counts for pagination.
Elasticsearch Client and Indexing `internal/elasticsearch/client.go`, `internal/elasticsearch/click_index.go`, `internal/elasticsearch/url_index.go`, `internal/elasticsearch/log_index.go`	New Elasticsearch client wrapper with connectivity verification and index prefix configuration. Click event indexing (single/bulk) with filtering and search. URL document indexing/searching/deleting/updating. Log event batching/flushing with periodic sync and date-based index naming.
Service and Handler Integration with Elasticsearch `internal/service/url_service.go`, `internal/handlers/http_handler.go`, `internal/handlers/auth_handler.go`	URLService wired with optional Elasticsearch client for conditional indexing/deletion. HTTPHandler.SearchURLs endpoint added with query validation. All storage method calls updated to use request context. Request body size limits via MaxBytesReader added to auth handlers.
Middleware Updates and Observability `internal/middleware/cors.go`, `internal/middleware/ratelimit.go`, `internal/middleware/tracing.go`, `internal/handlers/redirect_handler.go`	CORS middleware replaced with origin-allowlist factory. Rate-limiting IP detection enhanced with validated header parsing. OpenTelemetry tracing middleware added extracting trace IDs to response headers. Redirect handler updated to use request context.
gRPC OpenTelemetry Instrumentation `internal/grpc/client.go`	gRPC client instrumented with OpenTelemetry stats handler for call tracing.
User Service FX Refactoring `cmd/user-service/main.go`	User service refactored to fx-based DI with provider functions for config, logger, DB manager, JWT, storage, service, and tracing. Lifecycle hook manages gRPC server startup and graceful shutdown.
URL Service FX Refactoring `cmd/url-service/main.go`	URL service migrated to fx with providers for config, logger, tracing, Redis, DB, ID generator, cache, storage, optional Elasticsearch, gRPC server, and listener. Lifecycle hook manages server and resource lifecycle.
Redirect Service FX Refactoring `cmd/redirect-service/main.go`	Redirect service converted to fx with providers for config, logger, Redis, cache, producer, handler, tracer, and rate limiter. HTTP server wired with routes, middleware (tracing, recovery, rate limiting), and health check.
Analytics Worker FX Refactoring `cmd/analytics-worker/main.go`	Analytics worker refactored with WorkerParams struct for Redis configuration. Lifecycle hook manages consumer group creation, processEvents execution with cancelable context, and shutdown coordination. processEvents updated to accept params/logger and use context-driven polling.
Cleanup Worker FX Refactoring `cmd/cleanup-worker/main.go`	Cleanup worker migrated to fx with providers for config, logger, Redis, DB, cache, tracer, and storage. Lifecycle hook manages 24-hour cleanup loop with immediate initial run and coordinated shutdown.
Pipeline Worker FX Refactoring `cmd/pipeline-worker/main.go`	Pipeline worker converted to fx with providers for config, logger, Redis, ClickHouse, tracing, optional Elasticsearch, and GeoIP enrichment. processBatch extended to conditionally index events into Elasticsearch after ClickHouse insertion. Timestamp parsing updated to inline json.Unmarshal.
API Gateway FX Refactoring `cmd/api-gateway/main.go`	API gateway refactored to fx-based DI with extensive providers for config, logger, Redis, DB, ClickHouse, optional Elasticsearch, gRPC connections, handlers, middleware, tracing, mux routing, and HTTP server. Routes expanded with /health (DB+Redis ping), /api/search, Swagger, and auth-protected analytics.
ClickHouse Client Context Timeout `internal/clickhouse/client.go`	ClickHouse client NewClient updated to use context with 10-second timeout for connectivity verification.
Docker Base Image Pinning `build/docker/Dockerfile.*`	All service Dockerfiles' runtime base images pinned from alpine:latest to alpine:3.21. API gateway healthcheck updated to use wget HTTP probe.
Kubernetes Security Hardening `deployments/k8s/base//deployment.yaml`, `deployments/k8s/base//cronjob.yaml`, `deployments/k8s/base/secrets.yaml`	All service deployments updated with securityContext enforcing non-root execution, read-only filesystem, no privilege escalation, and dropped capabilities. imagePullPolicy changed from Never to IfNotPresent. Secrets manifest annotated with dev-only warning.
Kubernetes Observability Deployments `deployments/k8s/base/elasticsearch/statefulset.yaml`, `deployments/k8s/base/jaeger/deployment.yaml`, `deployments/k8s/base/kustomization.yaml`	Elasticsearch StatefulSet with 50Gi persistent storage, health probes, JVM sizing, and ClusterIP Service. Jaeger all-in-one Deployment with UI (16686) and OTLP HTTP (4318) ports plus Service. Kustomization updated to include both resources.
Kubernetes Network Policies `deployments/k8s/base/network-policies.yaml`	Comprehensive NetworkPolicy resources for shorternit namespace: DNS egress for all pods, service-specific ingress/egress for api-gateway/url-service/redirect-service, and worker policies with explicit ingress denial and restricted egress to backends.
CI Workflow and Build Context Optimization `.github/workflows/ci.yaml`, `.dockerignore`	GitHub Actions CI workflow added with lint, test, vuln, build, and docker (matrix) jobs with caching. .dockerignore added to exclude dev/test/build artifacts from Docker context.
Go Module Dependencies `go.mod`	Direct requirements updated with OpenTelemetry packages, instrumentation modules (otelgrpc, otelhttp), Uber fx and Zap, and updated crypto/grpc/protobuf. Transitive dependencies refreshed with new indirect modules (OTLP proto, dig, multierr, grpc-gateway).

🎯 4 (Complex) | ⏱️ ~60 minutes

🐰 A refactoring tale with care so grand,
FX wires services across the land,
Observability traces each request's flight,
Elasticsearch searches URLs day and night,
Kubernetes guards with policies tight! 🔐✨

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch refactor

- Pin golangci-lint to v2.1.6 (supports Go 1.25) - Make vulnerability check informational (continue-on-error) since current findings are Go stdlib issues requiring Go version upgrade

Pre-built binary was compiled with Go 1.24 and can't lint Go 1.25 code. Using goinstall mode builds it with the project's Go 1.25.3 toolchain.

No golangci-lint release supports Go 1.25 yet. Using go vet as the lint step until golangci-lint catches up with the Go 1.25 toolchain.

coderabbitai

Actionable comments posted: 2

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (9)

cmd/api-gateway/main.go (1)
215-231: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Protect the analytics catch-all route.

/api/analytics/clicks is authenticated, but /api/analytics/{id}/stats|timeline|geo|devices|referrers is registered naked here, so those analytics endpoints are public.
Suggested fix
-	mux.HandleFunc("/api/analytics/", func(w http.ResponseWriter, r *http.Request) {
+	mux.HandleFunc("/api/analytics/", authMiddleware.RequireAuth(func(w http.ResponseWriter, r *http.Request) {
 		path := r.URL.Path
 		switch {
 		case strings.HasSuffix(path, "/stats"):
 			analyticsHandler.GetStats(w, r)
@@
 		default:
 			http.NotFound(w, r)
 		}
-	})
+	}))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/api-gateway/main.go` around lines 215 - 231, The analytics catch-all
handler in main.go is exposing dynamic analytics subroutes without auth. Update
the /api/analytics/ registration so the paths handled by
analyticsHandler.GetStats, GetTimeline, GetGeoStats, GetDeviceStats, and
GetReferrers are wrapped with the same authentication/authorization middleware
used by the protected analytics routes, or move them under the existing secured
route group to ensure /api/analytics/{id}/stats|timeline|geo|devices|referrers
are not publicly accessible.
internal/storage/postgres.go (2)
91-93: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return a sentinel not found error instead of formatted text.

This PR already forces callers into strings.Contains(err.Error(), "not found"), and IncrementClicks now maps a missing row to codes.Internal because there is no stable error contract. Export a package-level ErrURLNotFound and wrap it with %w so callers can use errors.Is.

Also applies to: 143-145
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/storage/postgres.go` around lines 91 - 93, Create and export a
package-level sentinel error named ErrURLNotFound and replace the plain
formatted error returns when a DB update/delete affects zero rows (the branches
using cmdTag.RowsAffected() == 0) with a wrapped error using %w so callers can
use errors.Is; e.g., return fmt.Errorf("URL with short code %s not found: %w",
shortCode, ErrURLNotFound). Apply the same change to the other occurrence around
lines 143-145 (and any similar “not found” branches such as in IncrementClicks)
so all missing-row cases return the sentinel ErrURLNotFound wrapped for reliable
detection.
25-31: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't drop the storage-level deadline without replacing it upstream.

These calls now pass the incoming context straight into pgx, but the request contexts shown in this PR are bare handler contexts with no local timeout. A slow or wedged database can now pin request goroutines indefinitely; keep a bounded timeout here or enforce one before every storage call.

Also applies to: 49-58, 78-86, 98-106, 137-139, 149-163, 183-197
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/storage/postgres.go` around lines 25 - 31, The storage methods
(e.g., PostgresStorage.Save and other PostgresStorage DB callers in this file)
currently pass the incoming ctx straight into pgx which can be a plain handler
context with no timeout; wrap the incoming ctx with a bounded timeout when there
is no deadline before calling s.db.Write().Exec / Query methods: check
ctx.Deadline(), and if absent create a context.WithTimeout (choose a sensible
default timeout used across storage, e.g., 2s–5s) and defer its cancel, then
pass that wrapped ctx into the DB calls; apply the same pattern to every DB call
in this file so storage never relies on an unbounded request context.
internal/service/url_service.go (2)
229-249: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep the search index in sync when clicks change.

IncrementClicks updates PostgreSQL only. Search responses come from Elasticsearch, so clicks will stay at the original indexed value forever unless UpdateClicks is called on the ES document here.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/url_service.go` around lines 229 - 249, IncrementClicks
currently updates PostgreSQL only (s.store.IncrementClicks) then reads the URL
(s.store.GetByShortCode) but never updates the search index; call the
search-index update method (e.g. s.search.UpdateClicks or
s.searchClient.UpdateClicks) after you obtain the updated url.Clicks and pass
the short code and new clicks count to keep Elasticsearch in sync, and handle
the update error (log and/or return a gRPC status error) before returning the
IncrementClicksResponse.
336-349: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Custom aliases never get indexed into Elasticsearch.

The normal create flow writes a URL document to Elasticsearch, but the custom-alias flow stops after PostgreSQL and cache. Those URLs won't show up in /api/search at all.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/url_service.go` around lines 336 - 349, The custom-alias
path stops after postgresStore.CreateCustomURL and cache.Set so those records
never get indexed into Elasticsearch; after successfully creating the custom URL
(the block that calls postgresStore.CreateCustomURL and sets s.cache.Set), call
the same indexing routine used by the normal create flow (the service's
Elasticsearch index method—e.g., the function used elsewhere to write URL
documents to ES) with the same document payload
(alias/shortURL/longURL/expiresAt/userID/createdAt) and handle/indexing errors
similarly before returning the CreateURLResult so custom aliases are searchable
via /api/search.
cmd/pipeline-worker/main.go (2)
156-168: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The outer ticker is throttling stream consumption.

processBatch already blocks in XReadGroup for w.blockTime, so waiting on pollInterval before every call delays the first read and adds idle time between batches while backlog exists. Loop directly on processBatch and keep pollInterval only as an error/backoff delay.
♻️ Suggested shape
 func (w *PipelineWorker) Start(ctx context.Context, log *logger.Logger) {
-	ticker := time.NewTicker(w.pollInterval)
-	defer ticker.Stop()
-
 	for {
-		select {
-		case <-ctx.Done():
-			return
-		case <-ticker.C:
-			if err := w.processBatch(ctx, log); err != nil {
-				log.Error("Failed to process batch: %v", err)
-			}
+		if ctx.Err() != nil {
+			return
+		}
+		if err := w.processBatch(ctx, log); err != nil {
+			log.Error("Failed to process batch: %v", err)
+			if ctx.Err() == nil {
+				time.Sleep(w.pollInterval)
+			}
 		}
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/pipeline-worker/main.go` around lines 156 - 168, The current Start loop
uses a time.Ticker that waits pollInterval between calls, which throttles reads
even though processBatch already blocks in XReadGroup for w.blockTime; remove
the outer ticker and change PipelineWorker.Start to continuously call
w.processBatch(ctx, log) inside a for loop that checks ctx.Done(), and only
sleep for w.pollInterval (or apply backoff) when processBatch returns an error
(or when you need to pause), thereby preserving w.blockTime as the primary
blocking read and using pollInterval strictly as an error/backoff delay; keep
references to PipelineWorker.Start, w.processBatch, w.pollInterval and
w.blockTime when making the change.
198-209: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Bad/invalid events can stay in Redis stream PEL indefinitely

cmd/pipeline-worker/main.go skips messages that fail w.enrichEvent(...) (they never get appended to messageIDs), and if all messages in the batch fail it returns before any XAck; otherwise it only XAcks the successfully enriched msg.IDs (lines 198-209, 243-247).

Since the consumer reads with XREADGROUP ... Streams: []string{w.streamName, ">"}, it won’t redeliver already-pending entries from the PEL, and there’s no pending-entry recovery (XPENDING/XCLAIM/XAUTOCLAIM) or dead-letter/ack path in code—so malformed events can accumulate permanently unless handled externally. https://redis.io/docs/latest/commands/xreadgroup/
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/pipeline-worker/main.go` around lines 198 - 209, The loop in the consumer
(where w.enrichEvent(msg.Values) is called) currently skips and never
acknowledges malformed messages so they remain in the Redis stream PEL
indefinitely; modify the workflow in cmd/pipeline-worker/main.go so that every
processed msg.ID is acknowledged or moved to a dead-letter path regardless of
enrichEvent success: collect failed IDs as well as successful ones, call XACK
(or XCLAIM/XAUTOCLAIM handling for pending recovery) for all msg.IDs before
returning (including the branch where len(clickEvents)==0), and for failed
enrichments push the original message to a dead-letter stream or log/metric via
the same code paths that reference messageIDs, clickEvents, w.enrichEvent and
w.streamName so malformed events are removed from the PEL and not permanently
stuck.
cmd/cleanup-worker/main.go (1)
123-137: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Expired URLs are removed from Postgres but left live in cache.

runCleanup ignores urlCache, while the cache is configured with a fixed cfg.Cache.L2TTL. That means an already-expired short code can continue resolving from cache until TTL expiry even after the row is deleted from storage. This cleanup path needs the deleted codes so it can evict both cache tiers when rows are removed.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/cleanup-worker/main.go` around lines 123 - 137, runCleanup currently
ignores urlCache so rows deleted by store.DeleteExpiredURLs keep resolving from
cache; change the storage cleanup to return the keys removed (e.g. modify
DeleteExpiredURLs to return ([]string, error) or add a new
GetAndDeleteExpiredURLs that returns deleted short codes) and update runCleanup
to iterate those codes and evict them from the cache (use the cache API on
urlCache — e.g. urlCache.Delete/Remove/Evict for each short code and any
L1/L2-specific eviction method your cache exposes) before logging the
deletedCount; keep the existing logging but derive deletedCount from the
returned slice length.
cmd/analytics-worker/main.go (1)
151-165: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Unacknowledged analytics events are left in the Redis consumer-group pending list with no retry/recovery path.

In cmd/analytics-worker/main.go, malformed events (missing/incorrect msg.Values["short_code"] type) are continued before being appended to messageIDs, so they never reach XAck. If updateClickCounts(...) returns an error, the code continues and skips XAck for the whole batch (so those message IDs stay pending). Since XReadGroup uses Streams: []string{params.StreamName, ">"}, it only returns never-delivered messages, and there’s no XPENDING/XCLAIM/XAUTOCLAIM logic elsewhere to reclaim pending entries.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/analytics-worker/main.go` around lines 151 - 165, Malformed messages are
being skipped before their IDs are collected and batches are sometimes not
acknowledged when updateClickCounts fails, leaving entries in the Redis
consumer-group pending list; change the message loop to always append msg.ID to
messageIDs (move messageIDs = append(messageIDs, msg.ID) before the short_code
type-check) so every consumed message is tracked, and ensure XAck is invoked for
all messageIDs regardless of malformed events or updateClickCounts errors (e.g.,
call the XAck/acknowledgement path in a defer/finally or after handling errors,
and optionally route malformed events to a dead-letter/logging path instead of
skipping them) so no messages remain permanently pending.

🟠 Major comments (24)

internal/clickhouse/client.go-40-41 (1)
40-41: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Close the ClickHouse connection on ping failure.

On Line 40 / Line 41, a failed Ping returns early without closing the previously opened connection, which can leak resources during startup retries.
Proposed fix
  if err := conn.Ping(pingCtx); err != nil {
+   _ = conn.Close()
    return nil, fmt.Errorf("failed to ping clickhouse: %w", err)
  }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/clickhouse/client.go` around lines 40 - 41, The Ping error path in
the ClickHouse connection setup (inside the function that calls
conn.Ping(pingCtx)) returns early without closing the opened connection
(variable conn), leaking resources; modify the error branch so you first close
the connection (e.g., call conn.Close(...) with an appropriate context) and
handle/ignore any close error, then return the wrapped ping error
(fmt.Errorf("failed to ping clickhouse: %w", err)); update the code around
conn.Ping(pingCtx) to ensure conn is closed on failure before returning.
.github/workflows/ci.yaml-18-18 (1)
18-18: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Disable credential persistence on all checkout steps.

Each checkout currently leaves credentials in local git config. Set persist-credentials: false to reduce token exposure risk.
Suggested change pattern
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@v4
+        with:
+          persist-credentials: false
Also applies to: 30-30, 44-44, 55-55, 76-76
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci.yaml at line 18, Update every checkout step that uses
the actions/checkout@v4 action to explicitly set persist-credentials: false so
credentials are not left in the local git config; locate each job step with
uses: actions/checkout@v4 (multiple occurrences) and add the
persist-credentials: false input under that step's configuration, preserving any
existing settings.
.github/workflows/ci.yaml-9-11 (1)
9-11: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scope packages: write to the Docker job only.

Line 11 grants package write permission to every job. That’s broader than necessary and weakens least-privilege posture.
Suggested change
 permissions:
   contents: read
-  packages: write

 jobs:
+  # ... lint/test/vuln/build unchanged ...
   docker:
     name: Docker Build
     runs-on: ubuntu-latest
     needs: [lint, test, build]
+    permissions:
+      contents: read
+      packages: write
Also applies to: 61-92
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci.yaml around lines 9 - 11, The workflow currently grants
packages: write under the top-level permissions block which applies to every
job; narrow this by removing or setting packages to read at the top-level
permissions and instead add packages: write only to the Docker-related job
(refer to the top-level permissions section and the Docker job definition, e.g.,
the job named "docker" or any job in the 61-92 range), ensuring other jobs keep
least-privilege (packages: read) while the Docker job retains packages: write.
.github/workflows/ci.yaml-18-22 (1)
18-22: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pin GitHub Actions to immutable commit SHAs

.github/workflows/ci.yaml still uses mutable action tags (not full commit SHAs), e.g. actions/checkout@v4, actions/setup-go@v5, golangci/golangci-lint-action@v6, actions/upload-artifact@v4, docker/setup-buildx-action@v3, docker/login-action@v3, docker/build-push-action@v6 (lines 18-22, 30-31, 35, 44-45, 55-56, 76-78, 84).
Suggested change pattern
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@<full-commit-sha>
Apply similarly to:

actions/setup-go

golangci/golangci-lint-action

actions/upload-artifact

docker/setup-buildx-action

docker/login-action

docker/build-push-action
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci.yaml around lines 18 - 22, The workflow uses mutable
action tags (e.g., actions/checkout@v4, actions/setup-go@v5,
golangci/golangci-lint-action@v6 and others) which should be pinned to immutable
commit SHAs; update each step reference in .github/workflows/ci.yaml (the steps
using actions/checkout, actions/setup-go, golangci/golangci-lint-action,
actions/upload-artifact, docker/setup-buildx-action, docker/login-action,
docker/build-push-action) to the corresponding full commit SHA for the release
you want to follow, replacing the `@vX` tags with @<commit-sha> and verify the
actions still work in CI.
internal/middleware/tracing.go-13-18 (1)
13-18: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix Trace ID extraction + move X-Trace-ID header write before response is served (internal/middleware/tracing.go)

trace.SpanFromContext(r.Context()) runs after otelhttp.NewHandler(...).ServeHTTP; the OTel handler adds the created span to its local request context (via r.WithContext), so the outer r.Context() won’t contain that span.

Setting X-Trace-ID after ServeHTTP can be too late if the downstream handler has already written/committed the response.
Suggested fix
 func Tracing(serviceName string) func(http.Handler) http.Handler {
 	return func(next http.Handler) http.Handler {
-		handler := otelhttp.NewHandler(next, serviceName)
-		return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
-			handler.ServeHTTP(w, r)
-			span := trace.SpanFromContext(r.Context())
-			if span.SpanContext().HasTraceID() {
-				w.Header().Set("X-Trace-ID", span.SpanContext().TraceID().String())
-			}
-		})
+		instrumented := otelhttp.NewHandler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+			span := trace.SpanFromContext(r.Context())
+			if span.SpanContext().HasTraceID() {
+				w.Header().Set("X-Trace-ID", span.SpanContext().TraceID().String())
+			}
+			next.ServeHTTP(w, r)
+		}), serviceName)
+		return instrumented
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/middleware/tracing.go` around lines 13 - 18, The middleware is
reading the span from the wrong context and writing the X-Trace-ID after the
response may already be committed; wrap the downstream handler so you extract
the span from the request context that the downstream actually receives and set
the "X-Trace-ID" header before that handler writes the response. Concretely,
replace the current inline call that does handler.ServeHTTP(w, r) then
trace.SpanFromContext(r.Context()) with a wrapper http.HandlerFunc that, given
the incoming *http.Request r, reads span := trace.SpanFromContext(r.Context())
and if span.SpanContext().HasTraceID() sets w.Header().Set("X-Trace-ID",
span.SpanContext().TraceID().String()) immediately, then calls the original
handler.ServeHTTP(w, r); update the code paths around
ServeHTTP/trace.SpanFromContext to use this wrapper so header is written from
the same request context the handler uses.
cmd/api-gateway/main.go-190-199 (1)
190-199: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep backend error details out of /health.

Returning raw err text here exposes internal DB and Redis details to any caller. Log the full error, but send a generic response body.
Suggested fix
-			http.Error(w, fmt.Sprintf("DB unavailable: %v", err), http.StatusServiceUnavailable)
+			http.Error(w, "DB unavailable", http.StatusServiceUnavailable)
@@
-			http.Error(w, fmt.Sprintf("Redis unavailable: %v", err), http.StatusServiceUnavailable)
+			http.Error(w, "Redis unavailable", http.StatusServiceUnavailable)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/api-gateway/main.go` around lines 190 - 199, The health handler is
returning raw backend error text in HTTP responses; instead, keep logging the
full error via log.Error but replace the response body with a generic message.
In the block using dbManager.Primary().Ping(ctx) and the block using
redisClient.GetClient().Ping(ctx), stop including err in fmt.Sprintf when
calling http.Error — send a fixed, non-sensitive string like "DB unavailable" or
"Redis unavailable" (no error details) while still logging the original err with
log.Error for diagnostics.
cmd/url-service/main.go-114-143 (1)
114-143: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Close the Elasticsearch client during shutdown.

provideESClient can return a live client, but OnStop only closes Redis and Postgres. That leaves transport resources open across restarts.
Suggested fix
 func registerLifecycle(
 	lc fx.Lifecycle,
 	grpcServer *grpc.Server,
 	urlService *service.URLService,
 	listener net.Listener,
 	tp *sdktrace.TracerProvider,
 	redisClient *redis.RedisClient,
 	dbManager *database.DBManager,
+	esClient *es.Client,
 	log *logger.Logger,
 ) {
@@
 		OnStop: func(ctx context.Context) error {
 			log.Info("Shutting down url-service...")
 			grpcServer.GracefulStop()
 			tracing.ShutdownTracer(ctx, tp)
 			redisClient.Close()
 			dbManager.Close()
+			if esClient != nil {
+				esClient.Close()
+			}
 			return nil
 		},
 	})
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/url-service/main.go` around lines 114 - 143, registerLifecycle currently
doesn't close the Elasticsearch client returned by provideESClient; add an
esClient parameter (the value returned from provideESClient) to
registerLifecycle and ensure OnStop closes its transport/resources—e.g., call
the client's Close() if available or at minimum invoke
esClient.Transport.CloseIdleConnections() (and any other client-specific
shutdown API) inside the OnStop block so Elasticsearch connections are released
during shutdown.
cmd/api-gateway/main.go-273-280 (1)
273-280: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast on TCP bind errors during fx startup.

OnStart spawns server.ListenAndServe() in a goroutine and then returns nil, so a “port already in use” bind failure is only logged and fx may still report a healthy startup.
Suggested fix
 		OnStart: func(ctx context.Context) error {
 			log.Info("Listening on %s", server.Addr)
+			ln, err := net.Listen("tcp", server.Addr)
+			if err != nil {
+				return err
+			}
 			go func() {
-				if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
+				if err := server.Serve(ln); err != nil && err != http.ErrServerClosed {
 					log.Error("Server error: %v", err)
 				}
 			}()
 			return nil
 		},
Add the missing net import to make the change compile.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/api-gateway/main.go` around lines 273 - 280, OnStart currently launches
server.ListenAndServe() in a goroutine and returns nil so bind errors are only
logged; change it to perform a TCP bind first (use net.Listen on server.Addr),
return the bind error immediately if it fails, then start the HTTP server with
server.Serve(listener) in a goroutine and keep the listener reference for
shutdown in OnStop; update the OnStart/OnStop handlers around the existing
server variable and ensure you add the missing net import so the code compiles.
cmd/user-service/main.go-99-104 (1)
99-104: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Bound gRPC shutdown by the lifecycle context.

grpcServer.GracefulStop() does not accept a context and will block indefinitely on stuck RPCs, causing the fx shutdown to exceed its timeout and force-kill the application.
Suggested fix
 		OnStop: func(ctx context.Context) error {
 			log.Info("Shutting down user-service...")
-			grpcServer.GracefulStop()
+			timedOut := false
+			stopped := make(chan struct{})
+			go func() {
+				grpcServer.GracefulStop()
+				close(stopped)
+			}()
+			select {
+			case <-stopped:
+			case <-ctx.Done():
+				grpcServer.Stop()
+				timedOut = true
+			}
 			tracing.ShutdownTracer(ctx, tp)
 			dbManager.Close()
+			if timedOut {
+				return ctx.Err()
+			}
 			return nil
 		},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/user-service/main.go` around lines 99 - 104, OnStop currently calls
grpcServer.GracefulStop() directly which can block forever; instead start
GracefulStop in a goroutine and wait for either that goroutine to finish or the
provided lifecycle context (ctx) to be done, and if ctx is done first call
grpcServer.Stop() to force close; keep tracing.ShutdownTracer(ctx, tp) and
dbManager.Close() after the server shutdown sequence so they still run during
teardown. Locate the OnStop func and replace the direct
grpcServer.GracefulStop() call with a goroutine + done channel + select on
ctx.Done() to call grpcServer.Stop() if needed, ensuring graceful finish
otherwise.
cmd/redirect-service/main.go-106-113 (1)
106-113: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate port binding errors to Fx startup.

ListenAndServe() binds the socket inside the goroutine, so OnStart returns nil before the bind completes. Port binding failures (already in use, permission denied) are only logged—Fx considers startup successful even when the server never accepts traffic.
Suggested fix
 		OnStart: func(ctx context.Context) error {
 			log.Info("Listening on %s", server.Addr)
+			ln, err := net.Listen("tcp", server.Addr)
+			if err != nil {
+				return err
+			}
 			go func() {
-				if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
+				if err := server.Serve(ln); err != nil && err != http.ErrServerClosed {
 					log.Error("Server error: %v", err)
 				}
 			}()
 			return nil
 		},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/redirect-service/main.go` around lines 106 - 113, OnStart currently
starts server.ListenAndServe inside a goroutine so port bind errors happen
asynchronously and OnStart returns nil; change it to perform the bind
synchronously and return any bind error. Specifically, replace the goroutine
call to server.ListenAndServe with: call net.Listen("tcp", server.Addr) and if
it returns an error return that error from OnStart, then start a goroutine that
calls server.Serve(listener) (checking for err != http.ErrServerClosed and
logging). Use listener.Addr() in your log instead of server.Addr so the actual
bound address is reported; keep the existing error handling for server.Serve.
cmd/url-service/main.go-136-142 (1)
136-142: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Bound gRPC shutdown by the lifecycle context.

grpcServer.GracefulStop() ignores the ctx parameter and blocks indefinitely if any in-flight RPC is stuck, causing the fx shutdown to hang beyond the configured StopTimeout. Implement a timeout safety net as recommended by the gRPC documentation to forcefully stop the server if graceful shutdown does not complete in time.
Suggested fix
 		OnStop: func(ctx context.Context) error {
 			log.Info("Shutting down url-service...")
-			grpcServer.GracefulStop()
+			timedOut := false
+			stopped := make(chan struct{})
+			go func() {
+				grpcServer.GracefulStop()
+				close(stopped)
+			}()
+			select {
+			case <-stopped:
+			case <-ctx.Done():
+				grpcServer.Stop()
+				timedOut = true
+			}
 			tracing.ShutdownTracer(ctx, tp)
 			redisClient.Close()
 			dbManager.Close()
+			if timedOut {
+				return ctx.Err()
+			}
 			return nil
 		},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/url-service/main.go` around lines 136 - 142, OnStop's current call to
grpcServer.GracefulStop() can block indefinitely; wrap the graceful stop in a
goroutine and wait for it with the lifecycle context (ctx) as a timeout safety
net, selecting between a done channel and ctx.Done() (or a derived timeout) and
call grpcServer.Stop() to forcefully terminate if the graceful shutdown doesn't
finish in time. Modify the OnStop func to spawn the goroutine that runs
grpcServer.GracefulStop(), wait on a done signal vs ctx.Done(), and only after
either graceful completion or forced grpcServer.Stop() proceed to call
tracing.ShutdownTracer(ctx, tp), redisClient.Close(), and dbManager.Close() so
shutdown completes reliably within fx's StopTimeout.
deployments/k8s/base/secrets.yaml-1-2 (1)
1-2: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

A warning comment doesn't stop weak secrets from shipping.

Because this manifest still lives in base, anyone applying the base kustomization can roll out the placeholder DSNs and JWT secret by accident. Move dev defaults out of base and require an overlay-specific generator or externally managed Secret instead.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deployments/k8s/base/secrets.yaml` around lines 1 - 2, The base secrets.yaml
contains dev placeholder secrets that can be accidentally applied; remove these
defaults from the base and instead require overlay-specific secrets or an
external secret manager. Concretely: delete the dev DSNs/JWT values from
deployments/k8s/base/secrets.yaml, update the base kustomization to no longer
include a Secret generator for these values, and add overlay kustomizations
(e.g., deployments/k8s/overlays/dev) that provide a secretGenerator or a
reference to an externally-managed Secret (Vault/SOPS/SealedSecret) for the DSNs
and JWT secret; ensure code paths/configs referencing the secret name remain
unchanged so overlays supply the actual values at apply time.
deployments/k8s/base/analytics-worker/deployment.yaml-29-29 (1)
29-29: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't pair image: ...:latest with imagePullPolicy: IfNotPresent

In deployments/k8s/base/analytics-worker/deployment.yaml, line 29 uses image: ghcr.io/varun5711/tiny-analytics-worker:latest with imagePullPolicy: IfNotPresent, which can leave nodes running a stale cached image after rebuilds/rollouts. The same :latest + IfNotPresent combo appears in:

deployments/k8s/base/api-gateway/deployment.yaml

deployments/k8s/base/cleanup-worker/cronjob.yaml

deployments/k8s/base/jaeger/deployment.yaml

deployments/k8s/base/pipeline-worker/deployment.yaml

deployments/k8s/base/redirect-service/deployment.yaml

deployments/k8s/base/url-service/deployment.yaml

deployments/k8s/base/user-service/deployment.yaml

Switch to immutable image tags/digests (preferred) or use imagePullPolicy: Always until tags are versioned.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deployments/k8s/base/analytics-worker/deployment.yaml` at line 29, The
deployment uses image: ghcr.io/varun5711/tiny-analytics-worker:latest together
with imagePullPolicy: IfNotPresent (in
deployments/k8s/base/analytics-worker/deployment.yaml), which can serve stale
cached images; update the manifest so the container image is immutable
(preferred) by replacing the :latest tag with a specific version tag or digest,
or if you must keep non-versioned tags temporarily, change imagePullPolicy from
IfNotPresent to Always; apply the same fix pattern to the other manifests listed
(api-gateway, cleanup-worker, jaeger, pipeline-worker, redirect-service,
url-service, user-service).
deployments/k8s/base/elasticsearch/statefulset.yaml-20-61 (1)
20-61: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the missing pod/container security context.

This StatefulSet is still using the default root-capable security posture while the rest of this cohort was hardened. Please align it with the same runAsNonRoot, allowPrivilegeEscalation: false, and capabilities.drop: ["ALL"] pattern before shipping this.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deployments/k8s/base/elasticsearch/statefulset.yaml` around lines 20 - 61,
The StatefulSet is missing hardened pod/container security context; add a
pod-level securityContext under spec.template.spec with runAsNonRoot: true (and
optionally runAsUser to a non-root uid) and then add a container-level
securityContext inside the container with name: elasticsearch setting
allowPrivilegeEscalation: false and capabilities.drop: ["ALL"] so the
elasticsearch container and pod follow the same non-root,
no-privilege-escalation policy as the rest of the cohort.
deployments/k8s/base/network-policies.yaml-40-64 (1)
40-64: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Allow OTLP egress to Jaeger.

These egress rules never permit traffic to app: jaeger on port 4318, so the services/workers you just instrumented for tracing will be unable to export spans once these policies are enforced.

Also applies to: 88-97, 113-121, 137-153, 169-185, 201-217
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deployments/k8s/base/network-policies.yaml` around lines 40 - 64, The egress
rules are missing an allowance for OTLP/Jaeger traffic, so add an egress entry
that permits traffic to podSelector matchLabels: app: jaeger on port 4318/TCP;
update each egress list (the blocks around the existing podSelector entries such
as the ones currently allowing url-service, user-service, redis, postgresql,
clickhouse, elasticsearch and the other similar blocks at ranges 88-97, 113-121,
137-153, 169-185, 201-217) to include a to: podSelector: matchLabels: app:
jaeger with a ports: - port: 4318 protocol: TCP so instrumented services can
export OTLP spans to Jaeger.
deployments/k8s/base/jaeger/deployment.yaml-19-56 (1)
19-56: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Harden the Jaeger pod like the other workloads.

This deployment still runs with the default security context, so it misses the non-root / no-privilege-escalation / dropped-capabilities controls added elsewhere in this PR.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deployments/k8s/base/jaeger/deployment.yaml` around lines 19 - 56, The Jaeger
container spec (containers: - name: jaeger) is missing hardened securityContext
settings; update the pod and container security contexts near that containers
block so the pod runs non-root and cannot escalate privileges — add a pod-level
spec.securityContext with runAsNonRoot: true and runAsUser (e.g. 1000) and a
container-level securityContext for the jaeger container that sets
allowPrivilegeEscalation: false, capabilities.drop: ["ALL"],
readOnlyRootFilesystem: true (and optionally seccompProfile: { type:
RuntimeDefault }) so the jaeger container has the same non-root /
no-privilege-escalation / dropped-capabilities controls as the other workloads
referenced in this PR.
internal/elasticsearch/click_index.go-96-110 (1)
96-110: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Bulk response is not inspected for per-item failures.

res.IsError() only catches HTTP-level errors. The _bulk API returns HTTP 200 even when individual documents fail (mapping conflicts, version conflicts, etc.). With the current check, partial failures are silently swallowed and the caller treats the batch as fully indexed, leading to silent data loss. Decode the response and inspect the top-level errors flag / per-item status.
🛠️ Sketch of partial-failure handling
 	if res.IsError() {
 		return fmt.Errorf("elasticsearch bulk index error: %s", res.String())
 	}
 
+	var bulkResp struct {
+		Errors bool `json:"errors"`
+		Items  []map[string]struct {
+			Status int             `json:"status"`
+			Error  json.RawMessage `json:"error,omitempty"`
+		} `json:"items"`
+	}
+	if err := json.NewDecoder(res.Body).Decode(&bulkResp); err != nil {
+		return fmt.Errorf("failed to decode bulk response: %w", err)
+	}
+	if bulkResp.Errors {
+		return fmt.Errorf("elasticsearch bulk index had item-level failures")
+	}
+
 	return nil
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/elasticsearch/click_index.go` around lines 96 - 110, The bulk
response handling in the Bulk call using c.es.Bulk must be extended to decode
and inspect the Elasticsearch bulk response JSON (not just res.IsError()) — read
and json.Decode res.Body into a struct that includes the top-level "errors" bool
and the per-item "items" array (e.g., map[string]map[string]interface{}), check
if response.Errors is true and iterate c.items to collect any item-level
failures (non-2xx statuses or presence of an "error" object), and return a
descriptive error (or include in processLogger) that lists failed item statuses
and error reasons; adjust the existing function around c.es.Bulk /
c.clickIndex() to close the body, decode the JSON, and fail when any per-item
error is found instead of treating HTTP 200 as success.
internal/elasticsearch/client.go-34-42 (1)
34-42: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Timeout context isn’t applied to the Elasticsearch connectivity check

The 10s timeout context is discarded (_ = ctx), so es.Info() runs without it and won’t respect the intended bound. Pass the context into the call (es.Info(es.Info.WithContext(ctx))).
🛠️ Proposed fix
 	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
 	defer cancel()
 
-	_ = ctx
-	res, err := es.Info()
+	res, err := es.Info(es.Info.WithContext(ctx))
 	if err != nil {
 		return nil, fmt.Errorf("failed to connect to elasticsearch: %w", err)
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/elasticsearch/client.go` around lines 34 - 42, The timeout context
created by context.WithTimeout is being discarded (_ = ctx) so the Elasticsearch
info check doesn't respect the 10s bound; pass the context into the call by
using the context created (ctx) with es.Info (e.g., call es.Info with the
WithContext option) and keep the defer cancel() so the request uses the timeout;
update the code around ctx, cancel, and es.Info to call
es.Info(es.Info.WithContext(ctx)) instead of es.Info().
internal/elasticsearch/url_index.go-57-71 (1)
57-71: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Filter expired URLs out of Elasticsearch search results.

Every SQL list/get path in this PR excludes expired URLs, but this query does not. Once a URL expires, /api/search can keep returning it until some separate cleanup removes the document.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/elasticsearch/url_index.go` around lines 57 - 71, The Elasticsearch
query in SearchURLs currently returns expired documents; update the searchBody
in Client.SearchURLs to wrap the existing "multi_match" under a "bool" query and
add a "filter" that only allows URLs whose "expires_at" is greater than "now" or
have no "expires_at" field (i.e. not expired). Concretely, replace the top-level
"query": {"multi_match": ...} with "query": {"bool": {"must": {the multi_match
map}, "filter":
[{"bool":{"should":[{"range":{"expires_at":{"gt":"now"}}},{"bool":{"must_not":{"exists":{"field":"expires_at"}}}}]}}]}}
so searches exclude expired documents.
internal/elasticsearch/url_index.go-13-25 (1)
13-25: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't expose user_id in the public search payload.

HTTPHandler.SearchURLs writes URLSearchResult straight to the response, so reusing URLDocument here leaks internal user identifiers to API callers. Split the index model from the response DTO or omit UserID before returning search results.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/elasticsearch/url_index.go` around lines 13 - 25, The URLDocument
struct currently contains UserID and is being serialized directly by
HTTPHandler.SearchURLs, leaking internal user identifiers; create a separate
response DTO (e.g., PublicURLDocument without the UserID field) or a sanitized
projection for URLSearchResult and map each URLDocument -> PublicURLDocument (or
strip UserID) before marshalling in HTTPHandler.SearchURLs so responses never
include user_id; update URLSearchResult to hold the public DTO (or ensure
omission) and adjust any consumers accordingly.
internal/service/url_service.go-88-97 (1)
88-97: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't silently ignore Elasticsearch write failures.

If PostgreSQL succeeds and the Elasticsearch side-effect fails here, /api/search drifts with no signal or retry path. At minimum this needs structured logging plus a retry/backfill path; otherwise the request should fail when search consistency matters.

Also applies to: 217-219
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/service/url_service.go` around lines 88 - 97, The Elasticsearch
IndexURL call (s.esClient.IndexURL in internal/service/url_service.go) currently
ignores errors; change this to capture the error result and handle it: at
minimum call structured logging (include shortCode, req.UserId, createdAt,
expiresAt and the error) instead of discarding, and implement a retry/backfill
strategy — e.g., push a retry task or enqueue a backfill record when IndexURL
returns an error so a background worker can reattempt, or propagate the error to
the API caller when search consistency is required; apply the same fix for the
other silent IndexURL invocation (around the second occurrence referenced in the
comment).
cmd/analytics-worker/main.go-83-115 (1)
83-115: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix fx lifecycle shutdown hook registration and Redis stream pending handling

cmd/analytics-worker/main.go registers OnStop by calling lc.Append(...) inside OnStart (nested lifecycle append). Fx expects all lifecycle hooks to be appended during initialization, so this OnStop cleanup isn’t reliably part of Fx’s shutdown sequence.

processEvents uses XReadGroup(..., Streams: []string{params.StreamName, ">"}) (new messages only). When updateClickCounts fails (or a message is malformed), the code continues without XACK, leaving those entries in the consumer-group PEL indefinitely—there’s no XPENDING/XCLAIM/pending-read path to re-process them.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/analytics-worker/main.go` around lines 83 - 115, The OnStop hook is being
appended inside OnStart (lc.Append within the fx.Hook's OnStart) which breaks Fx
lifecycle expectations; move the OnStop registration out of the nested append by
registering a single fx.Hook (or appending both hooks at init) that declares
OnStart and OnStop together, capture cancel and wg in variables scoped so OnStop
can call cancel(), wg.Wait(), tracing.ShutdownTracer, redisClient.Close and
dbManager.Close; additionally, update processEvents to handle pending stream
entries: before reading ">" call XPending/XPendingExt to list pending IDs for
params.ConsumerGroup and use XClaim (or XReadGroup with specific IDs) to claim
and re-process pending messages, ensure updateClickCounts always XACKs on
success and on unrecoverable/malformed messages either XACK after moving to a
DLQ or increment a retry counter and XCLAIM/XPENDING-based retry policy so
entries aren’t left indefinitely in the PEL (refer to processEvents,
updateClickCounts, XReadGroup, XACK, XPENDING, XCLAIM).
cmd/pipeline-worker/main.go-108-140 (1)
108-140: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix Fx lifecycle hook registration + simplify batch polling + prevent stuck pending messages

cmd/pipeline-worker/main.go registers the OnStop hook by calling lc.Append(fx.Hook{OnStop: ...}) inside OnStart; register both callbacks in the same lc.Append(...) block to match Fx lifecycle expectations: https://uber-go.github.io/fx/lifecycle.html.

Start waits for pollInterval via time.NewTicker(...) before calling processBatch, even though processBatch already blocks on XReadGroup with Block: w.blockTime; the outer ticker adds avoidable latency/rate-limits effective blocking.

If enrichment fails for all messages, processBatch returns before XAck (because messageIDs is only populated for successfully enriched events), leaving those entries in the consumer-group PEL; there’s no XPENDING/XCLAIM-based reclaim logic in code to retry them.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/pipeline-worker/main.go` around lines 108 - 140, The Fx hooks are being
registered incorrectly (OnStop is appended inside OnStart) and the worker loop
and acking logic can leave messages stuck; change the lifecycle registration to
append a single lc.Append(...) with both OnStart and OnStop hooks (use the
existing lc.Append call that currently wraps OnStart and add OnStop there),
remove the outer time.Ticker/pollInterval in worker.Start so it relies on
XReadGroup's Block (use w.blockTime in XReadGroup and call processBatch
immediately when XReadGroup returns), and ensure processBatch always XAck every
message ID read (populate messageIDs for all messages or ack before enrichment)
so failed enrichments don't leave entries in the PEL; additionally implement
reclaim logic using Redis XPENDING/XCLAIM (or a reclaimPending function invoked
periodically from Start) to claim and reprocess stale pending messages.
cmd/cleanup-worker/main.go-78-104 (1)
78-104: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix Fx lifecycle hook registration order and invalidate expired-URL cache entries (cmd/cleanup-worker/main.go: 78-104, 123-137)

OnStop is registered inside OnStart (lc.Append(fx.Hook{ ... OnStart ... lc.Append(fx.Hook{OnStop...}) ...})); register OnStop in the same (outer) lc.Append instead of mutating lifecycle hooks during OnStart. See https://uber-go.github.io/fx/lifecycle.html

runCleanup deletes expired URLs (store.DeleteExpiredURLs) but never invalidates urlCache keys; redirect_handler serves cache.Get("url:"+shortCode) directly on cache hit without any expires_at check, so expired redirects can remain until L1 eviction and/or L2 TTL.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/cleanup-worker/main.go` around lines 78 - 104, The OnStop hook must not
be appended from inside the OnStart hook; move the
lc.Append(fx.Hook{OnStop:...}) registration out so both OnStart and OnStop are
registered on the same outer lc.Append and keep OnStart creating
workerCtx/cancel and starting runCleanupLoop(workerCtx, store, urlCache, log)
while OnStop calls cancel(), wg.Wait(), tracing.ShutdownTracer,
redisClient.Close(), dbManager.Close(). Also update runCleanupLoop (and the code
path that calls store.DeleteExpiredURLs) to remove expired entries from the
in-memory cache by deleting/invalidation using urlCache.Delete or
urlCache.Invalidate for keys built the same way redirect_handler reads them
(e.g., "url:"+shortCode) immediately after deleting from the store so expired
redirects are not served from cache.

🟡 Minor comments (3)

internal/middleware/cors.go-17-21 (1)
17-21: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Set Vary: Origin unconditionally.

Since the Access-Control-Allow-Origin value depends on the request Origin, Vary: Origin must be emitted on every response, not only when the origin matches. Otherwise a shared cache may serve a response containing an allowed origin's CORS headers to a disallowed origin.
🔧 Proposed fix
+			w.Header().Set("Vary", "Origin")
 			if origin != "" && originSet[origin] {
 				w.Header().Set("Access-Control-Allow-Origin", origin)
 				w.Header().Set("Access-Control-Allow-Credentials", "true")
-				w.Header().Set("Vary", "Origin")
 			}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/middleware/cors.go` around lines 17 - 21, The Vary header must be
sent on every response; update the middleware so w.Header().Set("Vary",
"Origin") is executed unconditionally (move it out of the if-block) while
leaving the conditional setting of Access-Control-Allow-Origin and
Access-Control-Allow-Credentials that use origin and originSet intact; ensure
the change touches the block that currently checks if origin != "" &&
originSet[origin] so that Vary: Origin is always added regardless of that
condition.
deployments/k8s/base/secrets.yaml-1-2 (1)
1-2: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Normalize this file to LF line endings.

YAMLlint is already failing here with wrong new line character: expected \n, so this file will keep breaking CI until the CRLFs are removed.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deployments/k8s/base/secrets.yaml` around lines 1 - 2, The file
deployments/k8s/base/secrets.yaml contains CRLF line endings causing yamllint
errors; convert the file to use LF only (e.g., run a dos2unix conversion or your
editor’s "normalize to LF" option), add a .gitattributes entry (e.g., *.yaml
text eol=lf) to enforce LF on commit, verify with git status and re-run
yamllint, then commit the normalized secrets.yaml so CI stops failing on "wrong
new line character".
internal/logger/logger.go-56-58 (1)
56-58: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use the same JSON schema for the mirrored sink.

LogSyncer.Write only understands a string timestamp, but this encoder emits Zap's default ts numeric field. That makes every mirrored log fall back to ingestion time, so Elasticsearch ordering/filtering drifts from the real event time.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/logger/logger.go` around lines 56 - 58, The mirrored JSON encoder
(created via zapcore.NewJSONEncoder(zap.NewProductionEncoderConfig())) emits
Zap's numeric "ts" field, which LogSyncer.Write doesn't understand; update the
extra sink encoder to use the same encoder config/schema as the primary logger
so it emits a string "timestamp" and the same time encoding. Concretely, reuse
the existing encoder config (or construct one with TimeKey set to "timestamp"
and TimeEncoder set to an ISO8601/string encoder) instead of calling
zap.NewProductionEncoderConfig() directly when creating jsonEnc for extraSyncer
(see jsonEnc, extraSyncer, zapcore.NewJSONEncoder, zapcore.NewCore).

🧹 Nitpick comments (3)

.github/workflows/ci.yaml (1)
24-24: ⚡ Quick win

Avoid floating latest versions for CI tooling.

Line 24 and Line 48 use latest, which makes CI non-reproducible and can introduce surprise failures. Pin explicit versions.
Suggested change
       - uses: golangci/golangci-lint-action@v6
         with:
-          version: latest
+          version: v1.59.1 # example: pin exact version in repo policy

-      - run: go install golang.org/x/vuln/cmd/govulncheck@latest
+      - run: go install golang.org/x/vuln/cmd/govulncheck@v1.1.3 # example: pin exact version
Also applies to: 48-48
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci.yaml at line 24, The workflow currently uses floating
"version: latest" pins (shown as the literal token version: latest) which makes
CI non-reproducible; locate every occurrence of the string "version: latest" in
the .github/workflows/ci.yaml (including the second occurrence) and replace it
with an explicit, tested version tag or SHA for the tool/action (e.g., a
specific semver or commit SHA) appropriate for that step; ensure the chosen
version is documented in the workflow comment or repo changelog so future
updates are intentional.
go.mod (1)
7-7: Dependency security posture looks OK for the pinned versions (no returned advisory ranges match).

GitHub advisory data flags vulnerabilities only in earlier version ranges that are below what this go.mod pins:

OpenTelemetry core at go.opentelemetry.io/otel + go.opentelemetry.io/otel/sdk v1.44.0 (shown issues were patched at 1.41.0/1.43.0, with vulnerable ranges ≤1.40/≤1.42)

OpenTelemetry contrib at otelgrpc v0.69.0 and otelhttp v0.49.0 (shown issues patched at 0.46.0/0.44.0; vulnerable ranges <0.46/<0.44)

golang.org/x/crypto v0.51.0 (shown issues were vulnerable only before 0.45.0)

google.golang.org/grpc v1.81.1 (shown issues were vulnerable only before 1.79.3 and before 1.58.3)

google.golang.org/protobuf v1.36.11 (shown issues were vulnerable only before 1.33.0 / in a narrow 1.29.0–<1.29.1 window)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go.mod` at line 7, Confirm and lock the verified safe dependency versions in
the module file by ensuring the listed modules (e.g.,
github.com/elastic/go-elasticsearch/v8 v8.19.6, go.opentelemetry.io/otel
v1.44.0, go.opentelemetry.io/otel/sdk v1.44.0, golang.org/x/crypto v0.51.0,
google.golang.org/grpc v1.81.1, google.golang.org/protobuf v1.36.11) are present
and up-to-date in go.mod, run go mod tidy to refresh go.sum, commit the updated
go.mod/go.sum, and add a brief note to the repo (SECURITY.md or a comment)
summarizing that these pinned versions were audited and no advisory ranges
matched.
deployments/k8s/base/jaeger/deployment.yaml (1)
21-23: ⚡ Quick win

Pin Jaeger to a tested version or digest.

jaegertracing/all-in-one:latest makes rollouts non-deterministic, and paired with IfNotPresent it can leave different nodes running different binaries after reschedules. Please pin an explicit tag or digest here.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deployments/k8s/base/jaeger/deployment.yaml` around lines 21 - 23, The jaeger
container currently uses the non-deterministic image
"jaegertracing/all-in-one:latest" (container name: jaeger, image field:
jaegertracing/all-in-one:latest, imagePullPolicy: IfNotPresent); change the
image field to a specific, tested tag or an immutable digest (for example
replace ":latest" with a pinned tag like ":1.45.0" or "sha256:...") so
deployments are deterministic, and adjust imagePullPolicy if needed to match
your update strategy.

coderabbitai · 2026-05-31T13:44:34Z

GHSA-9jj7-4m8r-rfcm

+	github.com/elastic/go-elasticsearch/v8 v8.19.6
 	github.com/golang-jwt/jwt/v5 v5.3.0
 	github.com/google/uuid v1.6.0
 	github.com/jackc/pgx/v5 v5.7.6


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Check GitHub security advisories for pgx and identify the latest secure version echo "=== Checking pgx security advisories ===" gh api graphql -f query=' { securityVulnerabilities(first: 10, ecosystem: GO, package: "github.com/jackc/pgx/v5") { nodes { advisory { summary severity publishedAt identifiers { type value } } vulnerableVersionRange firstPatchedVersion { identifier } } } }' echo "" echo "=== Checking latest pgx/v5 release ===" gh api repos/jackc/pgx/releases/latest --jq '.tag_name'

Repository: Varun5711/tiny

Length of output: 1654

go.mod: upgrade github.com/jackc/pgx/v5 (v5.7.6) to a patched release

Line 10: github.com/jackc/pgx/v5 v5.7.6 is affected by GHSA-9jj7-4m8r-rfcm / CVE-2026-33816 (CRITICAL, memory-safety), patched in 5.9.0.

Line 10: also affected by GHSA-j88v-2chj-qfwx (SQL injection via placeholder confusion), patched in 5.9.2.
Upgrade to github.com/jackc/pgx/v5 v5.9.2 or newer.

🧰 Tools

🪛 OSV Scanner (2.3.8)

[CRITICAL] 10-10: github.com/jackc/pgx/v5 5.7.6: CVE-2026-33815 in github.com/jackc/pgx

(GO-2026-4771)

[CRITICAL] 10-10: github.com/jackc/pgx/v5 5.7.6: CVE-2026-33816 in github.com/jackc/pgx

(GO-2026-4772)

[CRITICAL] 10-10: github.com/jackc/pgx/v5 5.7.6: Memory-safety vulnerability in github.com/jackc/pgx/v5.

(GHSA-9jj7-4m8r-rfcm)

[CRITICAL] 10-10: github.com/jackc/pgx/v5 5.7.6: pgx: SQL Injection via placeholder confusion with dollar quoted string literals

(GHSA-j88v-2chj-qfwx)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go.mod` at line 10, Update the pgx dependency version in go.mod to a patched release: replace the current module version "github.com/jackc/pgx/v5 v5.7.6" with "github.com/jackc/pgx/v5 v5.9.2" (or newer) to address GHSA-9jj7-4m8r-rfcm / CVE-2026-33816 and GHSA-j88v-2chj-qfwx; after updating the version string run `go mod tidy` to refresh go.sum and ensure the new release is used by the build.

coderabbitai · 2026-05-31T13:44:34Z

+func (ls *LogSyncer) Flush() {
+	ls.mu.Lock()
+	if len(ls.buffer) == 0 {
+		ls.mu.Unlock()
+		return
+	}
+	entries := ls.buffer
+	ls.buffer = make([]LogEntry, 0, ls.flushSize)
+	ls.mu.Unlock()
+
+	index := ls.logIndex()
+
+	var buf bytes.Buffer
+	for _, entry := range entries {
+		meta := map[string]interface{}{
+			"index": map[string]interface{}{
+				"_index": index,
+			},
+		}
+		metaLine, _ := json.Marshal(meta)
+		buf.Write(metaLine)
+		buf.WriteByte('\n')
+
+		dataLine, _ := json.Marshal(entry)
+		buf.Write(dataLine)
+		buf.WriteByte('\n')
+	}
+
+	ls.client.es.Bulk(bytes.NewReader(buf.Bytes()))
+}


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Close the bulk response body and stop swallowing transport failures.

Bulk returns an HTTP response that must be closed. Ignoring both err and res.Body leaks connections on every flush, and bulk indexing failures are currently silent.

Suggested fix

func (ls *LogSyncer) Flush() { ls.mu.Lock() if len(ls.buffer) == 0 { ls.mu.Unlock() return } @@ - ls.client.es.Bulk(bytes.NewReader(buf.Bytes())) + res, err := ls.client.es.Bulk(bytes.NewReader(buf.Bytes())) + if err != nil { + return + } + defer res.Body.Close() + if res.IsError() { + return + } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func (ls *LogSyncer) Flush() {

ls.mu.Lock()

if len(ls.buffer) == 0 {

ls.mu.Unlock()

return

}

entries := ls.buffer

ls.buffer = make([]LogEntry, 0, ls.flushSize)

ls.mu.Unlock()

index := ls.logIndex()

var buf bytes.Buffer

for _, entry := range entries {

meta := map[string]interface{}{

"index": map[string]interface{}{

"_index": index,

},

}

metaLine, _ := json.Marshal(meta)

buf.Write(metaLine)

buf.WriteByte('\n')

dataLine, _ := json.Marshal(entry)

buf.Write(dataLine)

buf.WriteByte('\n')

}

ls.client.es.Bulk(bytes.NewReader(buf.Bytes()))

}

func (ls *LogSyncer) Flush() {

ls.mu.Lock()

if len(ls.buffer) == 0 {

ls.mu.Unlock()

return

}

entries := ls.buffer

ls.buffer = make([]LogEntry, 0, ls.flushSize)

ls.mu.Unlock()

index := ls.logIndex()

var buf bytes.Buffer

for _, entry := range entries {

meta := map[string]interface{}{

"index": map[string]interface{}{

"_index": index,

},

}

metaLine, _ := json.Marshal(meta)

buf.Write(metaLine)

buf.WriteByte('\n')

dataLine, _ := json.Marshal(entry)

buf.Write(dataLine)

buf.WriteByte('\n')

}

res, err := ls.client.es.Bulk(bytes.NewReader(buf.Bytes()))

if err != nil {

return

}

defer res.Body.Close()

if res.IsError() {

return

}

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/elasticsearch/log_index.go` around lines 107 - 136, The Flush method on LogSyncer currently ignores the result of ls.client.es.Bulk, leaking response bodies and swallowing transport errors; update LogSyncer.Flush to capture the (res, err) returned by ls.client.es.Bulk, immediately handle non-nil err (log/emit/return), ensure res.Body is closed (defer res.Body.Close() right after checking res != nil), and check the HTTP status code or response body for errors so bulk failures are surfaced (e.g. treat non-2xx as an error and log the response body). Make these changes in the LogSyncer.Flush implementation around the ls.client.es.Bulk call so failures and resource cleanup are not ignored.

Handle unchecked error returns across 15 files, replace deprecated grpc.Dial with grpc.NewClient, fix nil pointer dereference in test, and restore golangci-lint in CI with v2.1.6 goinstall mode.

Fix all defer Close/Shutdown patterns and json.Encode calls that the linter flags for unchecked error returns.

GITHUB_REPOSITORY can contain uppercase characters which Docker rejects in image tags. Use bash lowercase expansion to fix.

Varun5711 added 11 commits May 31, 2026 17:53

chore: add production dependencies

80a3977

Add uber fx, zap, opentelemetry, jaeger exporter, otelgrpc, otelhttp, and elasticsearch client libraries.

refactor: replace custom logger with zap structured logging

d645b3c

Rewrite internal/logger to wrap go.uber.org/zap. Outputs JSON in production (default) and colored text in dev (LOG_FORMAT=text). Same public API preserved so all callers work unchanged.

fix: propagate request context through storage layer

bfb6724

Update Storage interface and PostgresStorage to accept ctx from callers instead of creating context.Background() internally. Fix redirect handler to use r.Context(). Add timeout to ClickHouse startup ping.

fix: implement actual DB deletion in DeleteURL and fix pagination

ef7b509

DeleteURL now deletes from Postgres before clearing cache. Add ListPaginated/ListByUserIDPaginated with SQL LIMIT/OFFSET and COUNT(*) instead of fetching all rows and slicing in memory.

fix: security hardening

0ff22b4

- CORS: configurable allowed origins via CORS_ALLOWED_ORIGINS env var - Rate limiter: parse X-Forwarded-For properly, validate IPs - JWT: remove hardcoded fallback secret, require JWT_SECRET env var - HTTP handlers: add 1MB request body size limits

chore: docker improvements

d1994df

- Add .dockerignore to exclude .git, docs, deployments, test files - Pin alpine:latest to alpine:3.21 across all 8 Dockerfiles - Fix api-gateway HEALTHCHECK: replace broken binary flag with wget health probe

Varun5711 added 4 commits May 31, 2026 19:06

fix: CI pipeline lint and vuln check failures

14ce3b4

- Pin golangci-lint to v2.1.6 (supports Go 1.25) - Make vulnerability check informational (continue-on-error) since current findings are Go stdlib issues requiring Go version upgrade

fix: use golangci-lint-action v7 for golangci-lint v2 compatibility

a2542b4

fix: build golangci-lint from source for Go 1.25 compatibility

cc5efe9

Pre-built binary was compiled with Go 1.24 and can't lint Go 1.25 code. Using goinstall mode builds it with the project's Go 1.25.3 toolchain.

fix: replace golangci-lint with go vet for Go 1.25 compatibility

7a99744

No golangci-lint release supports Go 1.25 yet. Using go vet as the lint step until golangci-lint catches up with the Go 1.25 toolchain.

coderabbitai Bot reviewed May 31, 2026

View reviewed changes

Varun5711 added 3 commits May 31, 2026 19:23

fix: resolve all golangci-lint findings (errcheck + staticcheck)

7d41c90

Handle unchecked error returns across 15 files, replace deprecated grpc.Dial with grpc.NewClient, fix nil pointer dereference in test, and restore golangci-lint in CI with v2.1.6 goinstall mode.

fix: handle remaining unchecked error returns across codebase

e15f2cf

Fix all defer Close/Shutdown patterns and json.Encode calls that the linter flags for unchecked error returns.

fix: lowercase Docker image tags for GHCR compatibility

db1209a

GITHUB_REPOSITORY can contain uppercase characters which Docker rejects in image tags. Use bash lowercase expansion to fix.

Varun5711 merged commit 02d693d into main May 31, 2026
22 of 24 checks passed

coderabbitai Bot mentioned this pull request Jun 1, 2026

Chore/comments #12

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor#11

Refactor#11
Varun5711 merged 18 commits into
mainfrom
refactor

Varun5711 commented May 31, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 31, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 31, 2026

Uh oh!

coderabbitai Bot May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Varun5711 commented May 31, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Varun5711 commented May 31, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 31, 2026 •

edited

Loading