Conversation
This reverts commit 760df97.
PR SummaryMedium Risk Overview Reviewed by Cursor Bugbot for commit f9b1447. Bugbot is set up for automated code reviews on this repo. Configure here. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f9b14476ab
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| catalog = e2bcatalog.NewRedisSandboxCatalog(redisClient) | ||
|
|
||
| l.Warn(ctx, "Redis environment variable is not set, will fallback to in-memory sandboxes catalog that works only with one instance setup") | ||
| catalog = e2bcatalog.NewMemorySandboxesCatalog() |
There was a problem hiding this comment.
Remove unusable in-memory catalog fallback
Falling back to NewMemorySandboxesCatalog() here makes routing fail when Redis is unset, because the client-proxy codepath only reads catalog entries (catalogResolution uses GetSandbox) and does not populate them in production (the only StoreSandbox usages under packages/client-proxy are tests). In practice this leaves the in-memory catalog empty, so requests constantly miss and return sandbox-not-found unless the gRPC auto-resume path happens to recover that sandbox.
Useful? React with 👍 / 👎.
| if err != nil && !errors.Is(err, sharedFactories.ErrRedisDisabled) { | ||
| logger.L().Fatal(ctx, "Could not connect to Redis", zap.Error(err)) | ||
| } else if err == nil { | ||
| closers = append(closers, closer{"redis client", func(context.Context) error { | ||
| return sharedFactories.CloseCleanly(redisClient) | ||
| }}) | ||
| } |
There was a problem hiding this comment.
🔴 When NewRedisClient returns ErrRedisDisabled, neither branch of the if/else if at run.go:387–393 executes, so the orchestrator silently continues with redisClient == nil and zero log output. This means operators have no visibility that the orchestrator is running in a degraded mode (no peer-to-peer template resolution, no Redis sandbox event streaming), making cluster debugging significantly harder. A warning log similar to the one in client-proxy/main.go:133 should be emitted in an else branch.
Extended reasoning...
What the bug is and how it manifests
In packages/orchestrator/pkg/factories/run.go lines 387–393, after calling NewRedisClient, the error is handled with an if/else if pattern:
if err \!= nil && \!errors.Is(err, sharedFactories.ErrRedisDisabled) {
logger.L().Fatal(ctx, "Could not connect to Redis", zap.Error(err))
} else if err == nil {
closers = append(closers, closer{"redis client", ...})
}There is no else branch to handle the ErrRedisDisabled case, and no log statement is emitted for it.
The specific code path that triggers it
When neither RedisURL nor RedisClusterURL is set in the config, NewRedisClient hits the default case in redis.go and returns (nil, ErrRedisDisabled). Back in run.go: the first condition evaluates to err \!= nil (true) AND \!errors.Is(err, ErrRedisDisabled) (false) → overall false. The second condition evaluates to err == nil → false. Neither branch executes.
Why existing code doesn't prevent it
The conditional structure was written to handle three cases (fatal error, success, and disabled), but only two branches were written. The ErrRedisDisabled case falls through the entire if/else-if with no log, no action — the function simply continues with redisClient == nil.
What the impact would be
With redisClient == nil, two downstream code paths silently degrade:
- The peer registry/resolver stays as
NopRegistry/NopResolver(theredisClient \!= nilguard at line ~400 prevents setting up Redis-backed peer discovery), meaning peer-to-peer template resolution is silently unavailable. - The Redis sandbox events delivery target is never appended to
sbxEventsDeliveryTargets(behind anotherredisClient \!= nilguard at line ~454). Operators expecting Redis event streaming will see no events.
Neither degradation produces any log output, making it impossible to diagnose without source-level inspection.
How to fix it
Add an else branch after the else if err == nil block that emits a warning log, mirroring the pattern already in client-proxy/main.go:133:
} else {
// ErrRedisDisabled: no Redis URL configured
logger.L().Warn(ctx, "Redis is not configured, running without peer registry and Redis event streaming")
}Step-by-step proof
- Operator deploys orchestrator without setting
REDIS_URLorREDIS_CLUSTER_URL. NewRedisClientreturns(nil, ErrRedisDisabled).run.gocheckserr \!= nil && \!errors.Is(err, ErrRedisDisabled)→true && false→false. Fatal not triggered.- Checks
err == nil→false. Closer not appended, but also no log. redisClientisnil. Peer registry staysNopRegistry, Redis event delivery is skipped.- Orchestrator starts cleanly with zero log lines indicating Redis is absent.
- Operator sees sandboxes not resolving templates from peers or events missing in Redis streams, with no log clue pointing to missing Redis config.
This reverts commit 760df97.