Skip to content

feat: add ConnectWithRetry function for database connections with retry logic#178

Merged
phev8 merged 3 commits into
mainfrom
db-connection-at-init-retry-or-panic
Jun 12, 2026
Merged

feat: add ConnectWithRetry function for database connections with retry logic#178
phev8 merged 3 commits into
mainfrom
db-connection-at-init-retry-or-panic

Conversation

@phev8

@phev8 phev8 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • New Features

    • Database connections now support automatic retry logic with configurable attempts and retry delays to improve startup resilience
    • Improved startup reliability when external databases are slow or temporarily unavailable
  • Refactor

    • Database initialization unified to use a single retry mechanism across services
    • Connection failures now consistently halt service startup for predictable behavior
    • Enhanced error logging with detailed context for connection diagnostics

@greptile-apps

greptile-apps Bot commented Jun 11, 2026

Copy link
Copy Markdown

Greptile Summary

This PR introduces a generic ConnectWithRetry utility that retries each database connection up to 10 times with a 10-second back-off, improving startup resilience when databases are temporarily unavailable. It also fixes two pre-existing bugs: a mongo client resource leak when Ping failed (connection was abandoned without Disconnect), and silent startup success when ParticipantUserDB or GlobalInfosDB failed to connect (the old return left those service pointers nil, causing later panics with no context).

  • pkg/db/utils.go: New generic ConnectWithRetry[T] with a maxAttempts <= 0 guard, per-attempt structured error logging, and skips the sleep after the final attempt.
  • pkg/db/*/db.go: All five constructors now call dbClient.Disconnect(context.Background()) before returning a Ping error, preventing connection pool leaks on each retry.
  • services/*/init.go: Both services uniformly wrap every DB init with ConnectWithRetry and replace silent return with panic(err), so any connection failure halts startup with a clear message.

Confidence Score: 5/5

Safe to merge — the changes fix two real pre-existing bugs and add well-guarded retry logic with no new defects introduced.

The retry loop is correctly implemented, the mongo client leak on Ping failure is properly closed, and the silent nil-service bug in both init files is resolved by switching to panic. No new error paths are introduced that could leave the service in a broken state.

No files require special attention.

Important Files Changed

Filename Overview
pkg/db/utils.go Adds ConnectWithRetry generic function with proper maxAttempts guard, per-attempt error logging, and skip-sleep-on-last-attempt logic. Clean implementation.
pkg/db/global-infos/db.go Adds Disconnect call on Ping failure to fix a resource leak where the mongo client was abandoned without cleanup.
pkg/db/management-user/db.go Same Disconnect-on-Ping-failure fix as the other DB constructors.
pkg/db/messaging/db.go Same Disconnect-on-Ping-failure fix as the other DB constructors.
pkg/db/participant-user/db.go Same Disconnect-on-Ping-failure fix as the other DB constructors.
pkg/db/study/db.go Same Disconnect-on-Ping-failure fix as the other DB constructors.
services/management-api/init.go Wraps all DB connections with ConnectWithRetry; changes silent return on ParticipantUserDB/GlobalInfosDB failure to panic, making startup consistently fail-fast on any DB connection error.
services/participant-api/init.go Same ConnectWithRetry adoption and return-to-panic fix as management-api/init.go.

Sequence Diagram

sequenceDiagram
    participant Init as initDBs()
    participant CWR as ConnectWithRetry
    participant Ctor as NewXxxDBService
    participant Mongo as MongoDB

    Init->>CWR: ConnectWithRetry(label, 10, 10s, func)
    loop attempt 1..maxAttempts
        CWR->>Ctor: connect()
        Ctor->>Mongo: mongo.Connect()
        Ctor->>Mongo: Ping()
        alt Ping success
            Mongo-->>Ctor: OK
            Ctor-->>CWR: (service, nil)
            CWR-->>Init: (service, nil)
        else Ping failure
            Mongo-->>Ctor: error
            Ctor->>Mongo: Disconnect(context.Background())
            Ctor-->>CWR: (nil, err)
            CWR->>CWR: slog.Error(attempt, maxAttempts, err)
            alt "attempt < maxAttempts"
                CWR->>CWR: time.Sleep(retryDelay)
            end
        end
    end
    alt all attempts exhausted
        CWR-->>Init: (nil, lastErr)
        Init->>Init: slog.Error + panic(err)
    end
Loading

Reviews (3): Last reviewed commit: "fix: ensure database disconnection on er..." | Re-trigger Greptile

Comment thread pkg/db/utils.go
Comment thread services/management-api/init.go Outdated
Comment on lines 126 to 132
muDBService, err = db.ConnectWithRetry("Management User DB", dbConnectMaxAttempts, dbConnectRetryDelay, func() (*muDB.ManagementUserDBService, error) {
return muDB.NewManagementUserDBService(db.DBConfigFromYamlObj(conf.DBConfigs.ManagementUserDB, conf.AllowedInstanceIDs))
})
if err != nil {
slog.Error("Error connecting to Management User DB", slog.String("error", err.Error()))
panic(err)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 ConnectWithRetry already logs an error message (with dbLabel, attempt count, and the error string) for every failed attempt including the last one. The slog.Error call in the caller immediately before panic(err) duplicates that final log entry, producing two nearly-identical error lines for the same failure. The same pattern appears for all five connections in this file and the four in participant-api/init.go.

Suggested change
muDBService, err = db.ConnectWithRetry("Management User DB", dbConnectMaxAttempts, dbConnectRetryDelay, func() (*muDB.ManagementUserDBService, error) {
return muDB.NewManagementUserDBService(db.DBConfigFromYamlObj(conf.DBConfigs.ManagementUserDB, conf.AllowedInstanceIDs))
})
if err != nil {
slog.Error("Error connecting to Management User DB", slog.String("error", err.Error()))
panic(err)
}
muDBService, err = db.ConnectWithRetry("Management User DB", dbConnectMaxAttempts, dbConnectRetryDelay, func() (*muDB.ManagementUserDBService, error) {
return muDB.NewManagementUserDBService(db.DBConfigFromYamlObj(conf.DBConfigs.ManagementUserDB, conf.AllowedInstanceIDs))
})
if err != nil {
panic(err)
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: services/management-api/init.go
Line: 126-132

Comment:
`ConnectWithRetry` already logs an error message (with `dbLabel`, attempt count, and the error string) for every failed attempt including the last one. The `slog.Error` call in the caller immediately before `panic(err)` duplicates that final log entry, producing two nearly-identical error lines for the same failure. The same pattern appears for all five connections in this file and the four in `participant-api/init.go`.

```suggestion
	muDBService, err = db.ConnectWithRetry("Management User DB", dbConnectMaxAttempts, dbConnectRetryDelay, func() (*muDB.ManagementUserDBService, error) {
		return muDB.NewManagementUserDBService(db.DBConfigFromYamlObj(conf.DBConfigs.ManagementUserDB, conf.AllowedInstanceIDs))
	})
	if err != nil {
		panic(err)
	}
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Codex

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@phev8, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 35 minutes and 51 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4a3ff6b5-43c7-4efd-893d-d8f879953ea6

📥 Commits

Reviewing files that changed from the base of the PR and between 2059ce1 and b1d3235.

📒 Files selected for processing (5)
  • pkg/db/global-infos/db.go
  • pkg/db/management-user/db.go
  • pkg/db/messaging/db.go
  • pkg/db/participant-user/db.go
  • pkg/db/study/db.go
📝 Walkthrough

Walkthrough

A generic retry helper ConnectWithRetry is added in pkg/db/utils.go and used by management-api and participant-api init routines to establish DB connections with configurable attempts and delays; init code now logs errors and panics on connection failure after retries.

Changes

Database Connection Retry

Layer / File(s) Summary
Retry utility function
pkg/db/utils.go
Adds fmt, log/slog, and time imports; defines DefaultConnectMaxAttempts and DefaultConnectRetryDelay; introduces ConnectWithRetry[T any](dbLabel string, maxAttempts int, retryDelay time.Duration, connect func() (T, error)) (T, error) which validates maxAttempts, retries the connect callback up to maxAttempts with time.Sleep(retryDelay), logs structured errors (db, attempt, maxAttempts, error) and logs an info on success when retries occurred, and returns the final (result, err).
Management API initialization update
services/management-api/init.go
Refactors initDBs() to obtain each DB service via db.ConnectWithRetry using db.DefaultConnectMaxAttempts and db.DefaultConnectRetryDelay; connection failures are logged with slog.Error and then panic(err) after retries.
Participant API initialization update
services/participant-api/init.go
Refactors initDBs() to establish Study, Participant User, Global Infos, and Messaging DBs via db.ConnectWithRetry with default retry parameters; on connection errors each failure is logged and panic(err) is invoked after retries.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibble logs beneath the moonlit sky,
Retrying softly when connections cry,
Sleep a beat, then try once more,
Till DB greets my hopping door,
A cheerful ping — we’re live, oh my!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: introducing a new ConnectWithRetry function with retry logic for database connections, which is the central feature added in pkg/db/utils.go and applied across multiple service initializers.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch db-connection-at-init-retry-or-panic

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
services/management-api/init.go (1)

55-58: ⚡ Quick win

Centralize retry configuration constants.

The retry constants dbConnectMaxAttempts and dbConnectRetryDelay are duplicated across both service initialization files. Consider moving them to pkg/db/utils.go as package-level constants or exports to establish a single source of truth and prevent configuration drift.

♻️ Proposed refactor

In pkg/db/utils.go:

 package db

 import (
 	"context"
 	"errors"
 	"log/slog"
 	"time"

 	"go.mongodb.org/mongo-driver/bson"
 	"go.mongodb.org/mongo-driver/mongo"
 )

+const (
+	DefaultConnectMaxAttempts = 10
+	DefaultConnectRetryDelay  = 10 * time.Second
+)
+
 func ListCollectionIndexes(ctx context.Context, collection *mongo.Collection) ([]bson.M, error) {

Then in both init files, replace the local constants with db.DefaultConnectMaxAttempts and db.DefaultConnectRetryDelay.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/management-api/init.go` around lines 55 - 58, The two local retry
constants dbConnectMaxAttempts and dbConnectRetryDelay are duplicated; move them
into pkg/db/utils.go as exported package-level constants (e.g.,
DefaultConnectMaxAttempts int = 10 and DefaultConnectRetryDelay = 10 *
time.Second) and update the references in services/management-api init (and the
other init file) to use db.DefaultConnectMaxAttempts and
db.DefaultConnectRetryDelay; ensure the exported names and types match usages
(int and time.Duration) and update imports to include the pkg/db package.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@services/management-api/init.go`:
- Around line 55-58: The two local retry constants dbConnectMaxAttempts and
dbConnectRetryDelay are duplicated; move them into pkg/db/utils.go as exported
package-level constants (e.g., DefaultConnectMaxAttempts int = 10 and
DefaultConnectRetryDelay = 10 * time.Second) and update the references in
services/management-api init (and the other init file) to use
db.DefaultConnectMaxAttempts and db.DefaultConnectRetryDelay; ensure the
exported names and types match usages (int and time.Duration) and update imports
to include the pkg/db package.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 276fbe54-b660-4491-99db-6268dd8f39db

📥 Commits

Reviewing files that changed from the base of the PR and between 08cdbde and a70f3cb.

📒 Files selected for processing (3)
  • pkg/db/utils.go
  • services/management-api/init.go
  • services/participant-api/init.go

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
services/management-api/init.go (1)

121-159: 💤 Low value

Duplicate logging: slog.Error calls before panic are redundant (affects all DB connections in both services).

db.ConnectWithRetry already logs each connection failure with structured details (db name, attempt number, maxAttempts, error) at each retry. The additional slog.Error calls in services/management-api/init.go (lines 125, 133, 141, 149, 157) and services/participant-api/init.go (lines 315, 323, 331, 339) before each panic(err) create duplicate log entries for the same error.

Since the panic stack trace will also include the error message, these explicit logging calls could be removed without losing diagnostic information. This is a minor maintainability improvement to reduce log noise.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/management-api/init.go` around lines 121 - 159, Remove the redundant
slog.Error calls that immediately precede panics after db.ConnectWithRetry fails
for each DB init (muDBService, messagingDBService, studyDBService,
participantUserDBService, globalInfosDBService); db.ConnectWithRetry already
logs structured retry/failure details, and the panic(err) will surface the
error, so delete the slog.Error(...) lines in the initialization blocks that
call db.ConnectWithRetry to avoid duplicate log entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/management-api/init.go`:
- Around line 121-127: The New*DBService constructors (e.g.,
NewManagementUserDBService, and the equivalents in pkg/db/messaging/db.go,
pkg/db/study/db.go, pkg/db/participant-user/db.go, pkg/db/global-infos/db.go)
currently call mongo.Connect and then return an error when dbClient.Ping fails
but do not call dbClient.Disconnect, leaking clients across retry attempts;
update each constructor so that if Ping (or any subsequent client initialization
step) returns an error you call dbClient.Disconnect(context.Background()) before
returning the error, ensuring the partially-created mongo client is cleaned up
on all failure paths.

---

Nitpick comments:
In `@services/management-api/init.go`:
- Around line 121-159: Remove the redundant slog.Error calls that immediately
precede panics after db.ConnectWithRetry fails for each DB init (muDBService,
messagingDBService, studyDBService, participantUserDBService,
globalInfosDBService); db.ConnectWithRetry already logs structured retry/failure
details, and the panic(err) will surface the error, so delete the
slog.Error(...) lines in the initialization blocks that call db.ConnectWithRetry
to avoid duplicate log entries.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f06bcf0d-a4e9-49cc-8dc0-9dfe2a58add4

📥 Commits

Reviewing files that changed from the base of the PR and between a70f3cb and 2059ce1.

📒 Files selected for processing (3)
  • pkg/db/utils.go
  • services/management-api/init.go
  • services/participant-api/init.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/db/utils.go

Comment thread services/management-api/init.go
@phev8 phev8 merged commit 7dc4685 into main Jun 12, 2026
1 check passed
@phev8 phev8 deleted the db-connection-at-init-retry-or-panic branch June 12, 2026 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant