adapter: properly retry when LD client initialization fails #32030

teskje · 2025-03-27T16:57:54Z

Previously the code initializing the LD client would correctly await initialized_async to see if the initialization succeeded. However, if it didn't succeed it would simply wait a bit and then call initialized_async again. Reading the LD server sdk code, there is no reason to assume that the call would return something different if repeated.

This PR changes the logic to call start_with_default_executor again when initialized_async reports failure, to attempt a new initialization. It also moves to the mz-ore Retry type, instead of implementing manual retry logic.

Motivation

This PR fixes a previously unreported bug.

If initializing the LD client fails the first time, the code gets stuck forever in a retry loop.

Tips for reviewer

I only stumbled over this when reading the code and I might be missing something. Please check my work!

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

teskje · 2025-03-28T11:42:07Z

src/dyncfg-launchdarkly/src/lib.rs

        if tokio::time::timeout(config_sync_timeout, init)
            .await
            .is_err()


The timeout error handling here also looks very suspicious to me. We just log an INFO event, but the LD sync might be broken. Shouldn't we at least return an error?

+1 it should be an error

Previously the code initializing the LD client would correctly await `initialized_async` to see if the initialization succeeded. However, if it didn't succeed it would simply wait a bit and then call `initialized_async` again. Reading the LD server sdk code, there is no reason to assume that the call would return something different if repeated. This commit changes the logic to call `start_with_default_executor` again when `initialized_async` reports failure, to attempt a new initialization. It also moves to the mz-ore `Retry` type, instead of implementing manual retry logic.

ParkMyCar

Sorry for the delay on this one @teskje, thanks for making the change though!

ParkMyCar · 2025-04-02T19:10:01Z

src/dyncfg-launchdarkly/src/lib.rs

        if tokio::time::timeout(config_sync_timeout, init)
            .await
            .is_err()


+1 it should be an error

teskje marked this pull request as ready for review March 28, 2025 10:11

teskje requested a review from a team as a code owner March 28, 2025 10:11

teskje requested a review from ParkMyCar March 28, 2025 10:11

teskje force-pushed the retry-ld-init branch from 233700f to 2cd6000 Compare March 28, 2025 11:40

teskje commented Mar 28, 2025

View reviewed changes

teskje force-pushed the retry-ld-init branch from 2cd6000 to 7519a4d Compare March 28, 2025 16:01

ParkMyCar approved these changes Apr 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adapter: properly retry when LD client initialization fails #32030

adapter: properly retry when LD client initialization fails #32030

Uh oh!

teskje commented Mar 27, 2025 •

edited

Loading

Uh oh!

teskje Mar 28, 2025

Uh oh!

ParkMyCar Apr 2, 2025

Uh oh!

ParkMyCar left a comment

Uh oh!

ParkMyCar Apr 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adapter: properly retry when LD client initialization fails #32030

Are you sure you want to change the base?

adapter: properly retry when LD client initialization fails #32030

Uh oh!

Conversation

teskje commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Tips for reviewer

Checklist

Uh oh!

teskje Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

ParkMyCar Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

ParkMyCar left a comment

Choose a reason for hiding this comment

Uh oh!

ParkMyCar Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

teskje commented Mar 27, 2025 •

edited

Loading