Skip to content

Conversation

@devksingh4
Copy link
Member

@devksingh4 devksingh4 commented Oct 29, 2025

Check to see if the primary is available (writes in us-east-2). If primary is in read-only mode, move to secondary. If both are impaired, fail to primary.

Summary by CodeRabbit

  • Chores

    • Added fallback Redis URL to configuration.
    • Switched to an asynchronous, logged Redis client creation used across services.
  • Reliability

    • Improved Redis connectivity with automatic selection between primary and fallback (read-write / read-only / down), better logging, periodic probes, and safer cleanup handling.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 29, 2025

💰 Infracost report

Monthly estimate generated

This comment will be updated when code changes.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 29, 2025

Walkthrough

Replaces synchronous Redis construction with an async factory that probes primary and fallback Redis URLs (classifying as read-write/read-only/down), selects the best client, adds fallback_redis_url to config, and updates call sites to await createRedisModule(...).

Changes

Cohort / File(s) Summary
Configuration Schema
src/common/config.ts
Added fallback_redis_url: string to the SecretConfig type
New Redis Module
src/api/redis.ts
Added createRedisModule(primaryUrl, fallbackUrl, logger): Promise<RedisModule.default> which probes instances via checkRedisMode, classifies modes (read-write/read-only/down), and implements fallback selection, logging, and cleanup
API Initialization
src/api/index.ts
Replaced direct new Redis(...) instantiation with await createRedisModule(app.secretConfig.redis_url, app.secretConfig.fallback_redis_url, app.log) and updated imports
SQS Handler Updates
src/api/sqs/handlers/createOrgGithubTeam.ts, src/api/sqs/handlers/provisionNewMember.ts
Replaced synchronous new Redis(...) with await createRedisModule(secretConfig.redis_url, secretConfig.fallback_redis_url, logger) and added createRedisModule import; maintained existing cleanup patterns

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Caller
    participant Factory as createRedisModule
    participant Primary as Primary Redis
    participant Fallback as Fallback Redis
    participant Client as Redis Client

    Caller->>Factory: createRedisModule(primaryUrl, fallbackUrl, logger)
    Factory->>Primary: checkRedisMode(primaryUrl)
    Primary-->>Factory: mode (read-write|read-only|down)

    alt primary read-write
        Factory->>Client: connect to Primary (read-write)
    else primary read-only
        Factory->>Client: connect to Primary (read-only) and log warning
    else primary down
        Factory->>Fallback: checkRedisMode(fallbackUrl)
        Fallback-->>Factory: mode (read-write|read-only|down)
        alt fallback read-write
            Factory->>Client: connect to Fallback (read-write) and log warning
        else fallback read-only
            Factory->>Client: connect to Fallback (read-only) and log error
        else both down
            Factory->>Client: connect to Primary (best-effort) and log error
        end
    end

    Client-->>Caller: Redis client instance
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Inspect src/api/redis.ts for correctness of probe logic, timeouts, error handling, and ensuring test connections are always closed.
  • Verify call sites (src/api/index.ts, src/api/sqs/handlers/*) properly await the factory and handle initialization errors.
  • Confirm configuration loading includes fallback_redis_url everywhere the app is instantiated.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Redis failover detection on app launch" accurately and directly describes the core purpose of this pull request. The changeset implements a new createRedisModule factory function that performs health checks on primary and fallback Redis instances at application startup, determining which instance to use based on their availability and capabilities. This is exactly what the title promises—a failover detection system activated on app launch. The title is concise, specific, and clearly communicates the main change without vague language or unnecessary detail.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dsingh14/redis-failover

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between e4ce2f5 and 9b518b8.

📒 Files selected for processing (5)
  • src/api/index.ts (2 hunks)
  • src/api/redis.ts (1 hunks)
  • src/api/sqs/handlers/createOrgGithubTeam.ts (2 hunks)
  • src/api/sqs/handlers/provisionNewMember.ts (2 hunks)
  • src/common/config.ts (1 hunks)
🧰 Additional context used
🪛 ESLint
src/api/sqs/handlers/createOrgGithubTeam.ts

[error] 23-23: Unexpected use of file extension "js" for "api/redis.js"

(import/extensions)

src/api/sqs/handlers/provisionNewMember.ts

[error] 16-16: Unexpected use of file extension "js" for "api/redis.js"

(import/extensions)

src/api/redis.ts

[error] 1-1: Resolve error: EACCES: permission denied, open '/JVkhMHfHlX'
at Object.writeFileSync (node:fs:2409:20)
at l (/home/jailuser/git/node_modules/get-tsconfig/dist/index.cjs:7:13685)
at createFilesMatcher (/home/jailuser/git/node_modules/get-tsconfig/dist/index.cjs:7:14437)
at Object.resolve (/home/jailuser/git/node_modules/eslint-import-resolver-typescript/lib/index.cjs:298:107)
at withResolver (/home/jailuser/git/node_modules/eslint-module-utils/resolve.js:180:23)
at fullResolve (/home/jailuser/git/node_modules/eslint-module-utils/resolve.js:201:22)
at relative (/home/jailuser/git/node_modules/eslint-module-utils/resolve.js:217:10)
at resolve (/home/jailuser/git/node_modules/eslint-module-utils/resolve.js:233:12)
at checkFileExtension (/home/jailuser/git/node_modules/eslint-plugin-import/lib/rules/extensions.js:205:53)
at checkSourceValue (/home/jailuser/git/node_modules/eslint-module-utils/moduleVisitor.js:32:5)

(import/extensions)


[error] 2-2: Unexpected use of file extension "js" for "./types.js"

(import/extensions)

src/api/index.ts

[error] 63-63: Unexpected use of file extension "js" for "./redis.js"

(import/extensions)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run Unit Tests
🔇 Additional comments (4)
src/common/config.ts (1)

192-192: Ensure secrets ship the fallback URL

Please double-check that every populated Secrets Manager payload now contains fallback_redis_url; createRedisModule treats this as required, and if a secret is missing the field ioredis will quietly connect to 127.0.0.1:6379 instead of the intended secondary endpoint.

src/api/sqs/handlers/provisionNewMember.ts (1)

34-38: Async factory swap looks solid

The handler now runs through the shared failover path before touching Redis, which matches the PR intent.

src/api/sqs/handlers/createOrgGithubTeam.ts (1)

32-36: Good hook into the new Redis factory

This keeps the lock acquisition logic aligned with the centralized failover checks.

src/api/index.ts (1)

312-316: Nice reuse during bootstrap

Having the Fastify instance go through createRedisModule ensures the shared client respects the same health-check logic as the SQS handlers.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/api/redis.ts (2)

15-21: Consider logging write errors for observability.

The write error is silently caught when classifying a read-only instance. While the classification is correct, logging the error would help with debugging and understanding why an instance is read-only.

Apply this diff:

     try {
       await testModule.set(testKey, "test", "EX", 1);
       return "read-write";
-    } catch (writeError) {
+    } catch (writeError) {
+      // Instance is read-only (e.g., READONLY error from Redis)
       return "read-only";
     }

37-43: Add error handling to the non-blocking health check.

The non-blocking fallback health check on line 42 doesn't handle promise rejection, which could lead to unhandled promise rejection warnings in Node.js.

Apply this diff:

   if (Math.random() < 0.01) {
     // Upstash will complain if we never hit the instance
     // Fire a hit every so often on the fallback
     // Don't block on it
-    // Given we create a client once ever hour, we should hit the node at least once every 4 days at least
-    checkRedisMode(fallbackUrl);
+    // Given we create a client once every hour, we should hit the node at least once every 4 days
+    checkRedisMode(fallbackUrl).catch(() => {});
   }

Note: Also fixed typo "ever hour" → "every hour" and removed redundant "at least" duplication.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 8976c8e and 1de0be1.

📒 Files selected for processing (1)
  • src/api/redis.ts (1 hunks)
🧰 Additional context used
🪛 ESLint
src/api/redis.ts

[error] 1-1: Resolve error: EACCES: permission denied, open '/bbjemSanEn'
at Object.writeFileSync (node:fs:2409:20)
at l (/home/jailuser/git/node_modules/get-tsconfig/dist/index.cjs:7:13685)
at createFilesMatcher (/home/jailuser/git/node_modules/get-tsconfig/dist/index.cjs:7:14437)
at Object.resolve (/home/jailuser/git/node_modules/eslint-import-resolver-typescript/lib/index.cjs:298:107)
at withResolver (/home/jailuser/git/node_modules/eslint-module-utils/resolve.js:180:23)
at fullResolve (/home/jailuser/git/node_modules/eslint-module-utils/resolve.js:201:22)
at relative (/home/jailuser/git/node_modules/eslint-module-utils/resolve.js:217:10)
at resolve (/home/jailuser/git/node_modules/eslint-module-utils/resolve.js:233:12)
at checkFileExtension (/home/jailuser/git/node_modules/eslint-plugin-import/lib/rules/extensions.js:205:53)
at checkSourceValue (/home/jailuser/git/node_modules/eslint-module-utils/moduleVisitor.js:32:5)

(import/extensions)


[error] 2-2: Unexpected use of file extension "js" for "./types.js"

(import/extensions)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run Unit Tests
  • GitHub Check: Build Application
🔇 Additional comments (2)
src/api/redis.ts (2)

1-4: LGTM!

The imports and type definition are clean. The RedisMode type clearly captures the three states of a Redis instance.


31-76: The failover logic correctly addresses the previous review concern.

The control flow now properly fails with the primary when primaryMode === "down" and fallbackMode === "read-only", rather than returning a read-only fallback client that would cause write operations to fail. The decision tree correctly implements the PR objectives:

  • Primary read-write → use primary
  • Primary impaired + fallback read-write → use fallback
  • Both impaired → fail with primary

Comment on lines +65 to +70
if (fallbackMode === "read-only") {
logger.error(
"Primary Redis is down and Fallback Redis is down, using primary anyway",
);
return new RedisModule.default(primaryUrl);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix incorrect log message.

The log message states "Fallback Redis is down" but this branch is entered when fallbackMode === "read-only", not "down". This will mislead engineers during incidents when they're trying to understand the Redis failover state.

Apply this diff:

   if (fallbackMode === "read-only") {
     logger.error(
-      "Primary Redis is down and Fallback Redis is down, using primary anyway",
+      "Primary Redis is down and fallback is read-only; failing with primary to surface write failures immediately",
     );
     return new RedisModule.default(primaryUrl);
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (fallbackMode === "read-only") {
logger.error(
"Primary Redis is down and Fallback Redis is down, using primary anyway",
);
return new RedisModule.default(primaryUrl);
}
if (fallbackMode === "read-only") {
logger.error(
"Primary Redis is down and fallback is read-only; failing with primary to surface write failures immediately",
);
return new RedisModule.default(primaryUrl);
}
🤖 Prompt for AI Agents
In src/api/redis.ts around lines 65 to 70, the error message incorrectly says
"Fallback Redis is down" while the branch is taken when fallbackMode ===
"read-only"; update the log text to accurately reflect the state (e.g., "Primary
Redis is down and Fallback Redis is in read-only mode, using primary anyway") so
engineers are not misled during incidents; simply replace the logger.error
string with the corrected message and keep the existing return new
RedisModule.default(primaryUrl) behavior.

@devksingh4 devksingh4 merged commit 6ab8c03 into main Oct 29, 2025
12 of 13 checks passed
@devksingh4 devksingh4 deleted the dsingh14/redis-failover branch October 29, 2025 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants