Skip to content

Redis *IS* a required part of our infra#2473

Merged
dobrac merged 1 commit intomainfrom
fix-redis
Apr 21, 2026
Merged

Redis *IS* a required part of our infra#2473
dobrac merged 1 commit intomainfrom
fix-redis

Conversation

@djeebus
Copy link
Copy Markdown
Contributor

@djeebus djeebus commented Apr 21, 2026

This reverts commit 760df97.

@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 21, 2026

PR Summary

Medium Risk
Moderate risk because it changes startup/behavior when Redis is missing, potentially allowing misconfiguration to silently degrade multi-instance functionality and disable Redis-backed event delivery/peer discovery.

Overview
Redis initialization is now treated as optional in client-proxy and the orchestrator: when Redis is not configured a new ErrRedisDisabled sentinel is returned, client-proxy falls back to an in-memory sandboxes catalog (with a warning), and the orchestrator skips Redis-dependent peer registry and sandbox-event stream delivery while still failing fast on real connection errors.

Reviewed by Cursor Bugbot for commit f9b1447. Bugbot is set up for automated code reviews on this repo. Configure here.

@dobrac dobrac enabled auto-merge (squash) April 21, 2026 19:47
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f9b14476ab

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

catalog = e2bcatalog.NewRedisSandboxCatalog(redisClient)

l.Warn(ctx, "Redis environment variable is not set, will fallback to in-memory sandboxes catalog that works only with one instance setup")
catalog = e2bcatalog.NewMemorySandboxesCatalog()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove unusable in-memory catalog fallback

Falling back to NewMemorySandboxesCatalog() here makes routing fail when Redis is unset, because the client-proxy codepath only reads catalog entries (catalogResolution uses GetSandbox) and does not populate them in production (the only StoreSandbox usages under packages/client-proxy are tests). In practice this leaves the in-memory catalog empty, so requests constantly miss and return sandbox-not-found unless the gRPC auto-resume path happens to recover that sandbox.

Useful? React with 👍 / 👎.

Comment on lines +387 to 393
if err != nil && !errors.Is(err, sharedFactories.ErrRedisDisabled) {
logger.L().Fatal(ctx, "Could not connect to Redis", zap.Error(err))
} else if err == nil {
closers = append(closers, closer{"redis client", func(context.Context) error {
return sharedFactories.CloseCleanly(redisClient)
}})
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 When NewRedisClient returns ErrRedisDisabled, neither branch of the if/else if at run.go:387–393 executes, so the orchestrator silently continues with redisClient == nil and zero log output. This means operators have no visibility that the orchestrator is running in a degraded mode (no peer-to-peer template resolution, no Redis sandbox event streaming), making cluster debugging significantly harder. A warning log similar to the one in client-proxy/main.go:133 should be emitted in an else branch.

Extended reasoning...

What the bug is and how it manifests

In packages/orchestrator/pkg/factories/run.go lines 387–393, after calling NewRedisClient, the error is handled with an if/else if pattern:

if err \!= nil && \!errors.Is(err, sharedFactories.ErrRedisDisabled) {
    logger.L().Fatal(ctx, "Could not connect to Redis", zap.Error(err))
} else if err == nil {
    closers = append(closers, closer{"redis client", ...})
}

There is no else branch to handle the ErrRedisDisabled case, and no log statement is emitted for it.

The specific code path that triggers it

When neither RedisURL nor RedisClusterURL is set in the config, NewRedisClient hits the default case in redis.go and returns (nil, ErrRedisDisabled). Back in run.go: the first condition evaluates to err \!= nil (true) AND \!errors.Is(err, ErrRedisDisabled) (false) → overall false. The second condition evaluates to err == nil → false. Neither branch executes.

Why existing code doesn't prevent it

The conditional structure was written to handle three cases (fatal error, success, and disabled), but only two branches were written. The ErrRedisDisabled case falls through the entire if/else-if with no log, no action — the function simply continues with redisClient == nil.

What the impact would be

With redisClient == nil, two downstream code paths silently degrade:

  1. The peer registry/resolver stays as NopRegistry/NopResolver (the redisClient \!= nil guard at line ~400 prevents setting up Redis-backed peer discovery), meaning peer-to-peer template resolution is silently unavailable.
  2. The Redis sandbox events delivery target is never appended to sbxEventsDeliveryTargets (behind another redisClient \!= nil guard at line ~454). Operators expecting Redis event streaming will see no events.

Neither degradation produces any log output, making it impossible to diagnose without source-level inspection.

How to fix it

Add an else branch after the else if err == nil block that emits a warning log, mirroring the pattern already in client-proxy/main.go:133:

} else {
    // ErrRedisDisabled: no Redis URL configured
    logger.L().Warn(ctx, "Redis is not configured, running without peer registry and Redis event streaming")
}

Step-by-step proof

  1. Operator deploys orchestrator without setting REDIS_URL or REDIS_CLUSTER_URL.
  2. NewRedisClient returns (nil, ErrRedisDisabled).
  3. run.go checks err \!= nil && \!errors.Is(err, ErrRedisDisabled)true && falsefalse. Fatal not triggered.
  4. Checks err == nilfalse. Closer not appended, but also no log.
  5. redisClient is nil. Peer registry stays NopRegistry, Redis event delivery is skipped.
  6. Orchestrator starts cleanly with zero log lines indicating Redis is absent.
  7. Operator sees sandboxes not resolving templates from peers or events missing in Redis streams, with no log clue pointing to missing Redis config.

@dobrac dobrac merged commit 97b9269 into main Apr 21, 2026
47 checks passed
@dobrac dobrac deleted the fix-redis branch April 21, 2026 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants