- 
                Notifications
    You must be signed in to change notification settings 
- Fork 0
Redis failover detection on app launch #361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 💰 Infracost reportMonthly estimate generatedThis comment will be updated when code changes. | 
| WalkthroughReplaces synchronous Redis construction with an async factory that probes primary and fallback Redis URLs (classifying as read-write/read-only/down), selects the best client, adds  Changes
 Sequence Diagram(s)sequenceDiagram
    participant Caller as Caller
    participant Factory as createRedisModule
    participant Primary as Primary Redis
    participant Fallback as Fallback Redis
    participant Client as Redis Client
    Caller->>Factory: createRedisModule(primaryUrl, fallbackUrl, logger)
    Factory->>Primary: checkRedisMode(primaryUrl)
    Primary-->>Factory: mode (read-write|read-only|down)
    alt primary read-write
        Factory->>Client: connect to Primary (read-write)
    else primary read-only
        Factory->>Client: connect to Primary (read-only) and log warning
    else primary down
        Factory->>Fallback: checkRedisMode(fallbackUrl)
        Fallback-->>Factory: mode (read-write|read-only|down)
        alt fallback read-write
            Factory->>Client: connect to Fallback (read-write) and log warning
        else fallback read-only
            Factory->>Client: connect to Fallback (read-only) and log error
        else both down
            Factory->>Client: connect to Primary (best-effort) and log error
        end
    end
    Client-->>Caller: Redis client instance
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 
 Pre-merge checks and finishing touches❌ Failed checks (1 warning)
 ✅ Passed checks (2 passed)
 ✨ Finishing touches
 🧪 Generate unit tests (beta)
 Comment  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (5)
- src/api/index.ts(2 hunks)
- src/api/redis.ts(1 hunks)
- src/api/sqs/handlers/createOrgGithubTeam.ts(2 hunks)
- src/api/sqs/handlers/provisionNewMember.ts(2 hunks)
- src/common/config.ts(1 hunks)
🧰 Additional context used
🪛 ESLint
src/api/sqs/handlers/createOrgGithubTeam.ts
[error] 23-23: Unexpected use of file extension "js" for "api/redis.js"
(import/extensions)
src/api/sqs/handlers/provisionNewMember.ts
[error] 16-16: Unexpected use of file extension "js" for "api/redis.js"
(import/extensions)
src/api/redis.ts
[error] 1-1: Resolve error: EACCES: permission denied, open '/JVkhMHfHlX'
at Object.writeFileSync (node:fs:2409:20)
at l (/home/jailuser/git/node_modules/get-tsconfig/dist/index.cjs:7:13685)
at createFilesMatcher (/home/jailuser/git/node_modules/get-tsconfig/dist/index.cjs:7:14437)
at Object.resolve (/home/jailuser/git/node_modules/eslint-import-resolver-typescript/lib/index.cjs:298:107)
at withResolver (/home/jailuser/git/node_modules/eslint-module-utils/resolve.js:180:23)
at fullResolve (/home/jailuser/git/node_modules/eslint-module-utils/resolve.js:201:22)
at relative (/home/jailuser/git/node_modules/eslint-module-utils/resolve.js:217:10)
at resolve (/home/jailuser/git/node_modules/eslint-module-utils/resolve.js:233:12)
at checkFileExtension (/home/jailuser/git/node_modules/eslint-plugin-import/lib/rules/extensions.js:205:53)
at checkSourceValue (/home/jailuser/git/node_modules/eslint-module-utils/moduleVisitor.js:32:5)
(import/extensions)
[error] 2-2: Unexpected use of file extension "js" for "./types.js"
(import/extensions)
src/api/index.ts
[error] 63-63: Unexpected use of file extension "js" for "./redis.js"
(import/extensions)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run Unit Tests
🔇 Additional comments (4)
src/common/config.ts (1)
192-192: Ensure secrets ship the fallback URLPlease double-check that every populated Secrets Manager payload now contains
fallback_redis_url;createRedisModuletreats this as required, and if a secret is missing the field ioredis will quietly connect to 127.0.0.1:6379 instead of the intended secondary endpoint.src/api/sqs/handlers/provisionNewMember.ts (1)
34-38: Async factory swap looks solidThe handler now runs through the shared failover path before touching Redis, which matches the PR intent.
src/api/sqs/handlers/createOrgGithubTeam.ts (1)
32-36: Good hook into the new Redis factoryThis keeps the lock acquisition logic aligned with the centralized failover checks.
src/api/index.ts (1)
312-316: Nice reuse during bootstrapHaving the Fastify instance go through
createRedisModuleensures the shared client respects the same health-check logic as the SQS handlers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/api/redis.ts (2)
15-21: Consider logging write errors for observability.The write error is silently caught when classifying a read-only instance. While the classification is correct, logging the error would help with debugging and understanding why an instance is read-only.
Apply this diff:
try { await testModule.set(testKey, "test", "EX", 1); return "read-write"; - } catch (writeError) { + } catch (writeError) { + // Instance is read-only (e.g., READONLY error from Redis) return "read-only"; }
37-43: Add error handling to the non-blocking health check.The non-blocking fallback health check on line 42 doesn't handle promise rejection, which could lead to unhandled promise rejection warnings in Node.js.
Apply this diff:
if (Math.random() < 0.01) { // Upstash will complain if we never hit the instance // Fire a hit every so often on the fallback // Don't block on it - // Given we create a client once ever hour, we should hit the node at least once every 4 days at least - checkRedisMode(fallbackUrl); + // Given we create a client once every hour, we should hit the node at least once every 4 days + checkRedisMode(fallbackUrl).catch(() => {}); }Note: Also fixed typo "ever hour" → "every hour" and removed redundant "at least" duplication.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (1)
- src/api/redis.ts(1 hunks)
🧰 Additional context used
🪛 ESLint
src/api/redis.ts
[error] 1-1: Resolve error: EACCES: permission denied, open '/bbjemSanEn'
at Object.writeFileSync (node:fs:2409:20)
at l (/home/jailuser/git/node_modules/get-tsconfig/dist/index.cjs:7:13685)
at createFilesMatcher (/home/jailuser/git/node_modules/get-tsconfig/dist/index.cjs:7:14437)
at Object.resolve (/home/jailuser/git/node_modules/eslint-import-resolver-typescript/lib/index.cjs:298:107)
at withResolver (/home/jailuser/git/node_modules/eslint-module-utils/resolve.js:180:23)
at fullResolve (/home/jailuser/git/node_modules/eslint-module-utils/resolve.js:201:22)
at relative (/home/jailuser/git/node_modules/eslint-module-utils/resolve.js:217:10)
at resolve (/home/jailuser/git/node_modules/eslint-module-utils/resolve.js:233:12)
at checkFileExtension (/home/jailuser/git/node_modules/eslint-plugin-import/lib/rules/extensions.js:205:53)
at checkSourceValue (/home/jailuser/git/node_modules/eslint-module-utils/moduleVisitor.js:32:5)
(import/extensions)
[error] 2-2: Unexpected use of file extension "js" for "./types.js"
(import/extensions)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run Unit Tests
- GitHub Check: Build Application
🔇 Additional comments (2)
src/api/redis.ts (2)
1-4: LGTM!The imports and type definition are clean. The
RedisModetype clearly captures the three states of a Redis instance.
31-76: The failover logic correctly addresses the previous review concern.The control flow now properly fails with the primary when
primaryMode === "down"andfallbackMode === "read-only", rather than returning a read-only fallback client that would cause write operations to fail. The decision tree correctly implements the PR objectives:
- Primary read-write → use primary
- Primary impaired + fallback read-write → use fallback
- Both impaired → fail with primary
| if (fallbackMode === "read-only") { | ||
| logger.error( | ||
| "Primary Redis is down and Fallback Redis is down, using primary anyway", | ||
| ); | ||
| return new RedisModule.default(primaryUrl); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect log message.
The log message states "Fallback Redis is down" but this branch is entered when fallbackMode === "read-only", not "down". This will mislead engineers during incidents when they're trying to understand the Redis failover state.
Apply this diff:
   if (fallbackMode === "read-only") {
     logger.error(
-      "Primary Redis is down and Fallback Redis is down, using primary anyway",
+      "Primary Redis is down and fallback is read-only; failing with primary to surface write failures immediately",
     );
     return new RedisModule.default(primaryUrl);
   }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (fallbackMode === "read-only") { | |
| logger.error( | |
| "Primary Redis is down and Fallback Redis is down, using primary anyway", | |
| ); | |
| return new RedisModule.default(primaryUrl); | |
| } | |
| if (fallbackMode === "read-only") { | |
| logger.error( | |
| "Primary Redis is down and fallback is read-only; failing with primary to surface write failures immediately", | |
| ); | |
| return new RedisModule.default(primaryUrl); | |
| } | 
🤖 Prompt for AI Agents
In src/api/redis.ts around lines 65 to 70, the error message incorrectly says
"Fallback Redis is down" while the branch is taken when fallbackMode ===
"read-only"; update the log text to accurately reflect the state (e.g., "Primary
Redis is down and Fallback Redis is in read-only mode, using primary anyway") so
engineers are not misled during incidents; simply replace the logger.error
string with the corrected message and keep the existing return new
RedisModule.default(primaryUrl) behavior.
Check to see if the primary is available (writes in us-east-2). If primary is in read-only mode, move to secondary. If both are impaired, fail to primary.
Summary by CodeRabbit
Chores
Reliability