ops(deploy): rolling-update via GitHub Actions over Tailscale by bootjp · Pull Request #616 · bootjp/elastickv

bootjp · 2026-04-24T10:58:10Z

Summary

Move rolling deploys from "operator SSHes from workstation" to a GitHub-Actions workflow_dispatch that joins the Tailnet and invokes the existing scripts/rolling-update.sh unchanged. Adds audit trail, approval gate, dry-run preview, and centralized secret handling.

Precondition (operator responsibility): Tailscale already installed + logged in on every cluster node with tag:elastickv-node. Runbook documents this in §1.

Motivation

No audit trail for production rollouts today (each operator's local shell history is the record).
No approval gate — a typo in NODES takes out the cluster before anyone else sees it.
No dry-run — operators can't preview targeting without reading the script.
Secret sprawl: SSH keys, Tailscale auth, S3 creds live on operator workstations; rotation == manual shuffling.

What the workflow does

workflow_dispatch only (no auto-deploy on push/tag).
Inputs: ref (required), image_tag (rollback override), nodes (raft-id subset filter), dry_run (bool, default true).
Joins Tailnet via tailscale/github-action@v3 (ephemeral OAuth, tag:ci-deploy).
Renders NODES + SSH_TARGETS from production environment vars; filters to requested subset.
Verifies image manifest on ghcr.io before starting the roll.
tailscale ping on every target.
Dry-run stops there and prints the plan; non-dry-run invokes scripts/rolling-update.sh.
environment: production → required reviewers on non-dry-run.
concurrency: rolling-update with cancel-in-progress: false.

Design and operator docs

Design: docs/design/2026_04_24_proposed_deploy_via_tailscale.md
Runbook: docs/deploy_via_tailscale_runbook.md — ACL config, OAuth client, required GitHub secrets/variables, first-run sequence, rollback, troubleshooting.

What's NOT changed

No changes to scripts/rolling-update.sh. The script's env-var contract is the integration boundary; keeping the script stable means operators can still invoke it manually from a workstation during incidents.
No changes to cluster nodes. Nodes keep their existing authorized_keys entries. v1 is plain SSH over Tailscale; Tailscale SSH (keyless) is called out as a follow-up (§2.4 of design).

Test plan

actionlint passes (0 errors)
YAML parses
Run with dry_run: true, nodes: n1 — confirms tailnet join, SSH config, image availability, mapping render without touching prod
Run with dry_run: false, nodes: n1 — single-node roll, verify cluster stays healthy and correct image lands
Run with dry_run: false, nodes: (empty) — full roll

Follow-ups (out of scope for v1, tracked in the design doc)

Post-deploy health verification beyond tailnet reachability (probe Prometheus / Redis after each node)
Auto-rollback on script failure
Jepsen-gating before allowing deploy
cosign verify on the image
Tailscale SSH (option A in design §2.4)
Shared deploy user with restricted sudo

Before merge

Operator setup is in the runbook (§3, §4). PR doesn't unblock the workflow by itself — it's dead until the secrets/variables in the production environment are populated. That's a deliberate choice so merging is harmless.

Today's rolling-update flow is manual: operators SSH from a workstation with required env vars and invoke scripts/rolling-update.sh. That has no audit trail, no approval gate, no dry-run, and relies on per-operator secret handling. This change adds a workflow_dispatch workflow that joins the Tailnet via tailscale/github-action (ephemeral OAuth node, tag:ci-deploy), SSHes into each cluster node over MagicDNS, and invokes the existing scripts/rolling-update.sh — unchanged. Nodes stay as they are; the script's env-var contract is the integration boundary. Highlights: - workflow_dispatch only (no auto-deploy); production GitHub environment gates non-dry-run runs on required reviewers. - inputs: ref (required), image_tag (rollback override), nodes (subset filter), dry_run (default true). - Dry-run does everything up to the container touch: renders NODES + SSH_TARGETS from env variables, verifies the image exists on ghcr.io, and tailscale-pings every target. Catches typo'd inputs before any production effect. - Concurrency group "rolling-update", cancel-in-progress: false, so parallel invocations queue rather than stomp. - No node-side changes required: nodes are assumed to already run tailscaled and expose authorized_keys for the deploy user. Plain SSH over Tailscale; Tailscale SSH (keyless) is called out as a follow-up. Design doc: docs/design/2026_04_24_proposed_deploy_via_tailscale.md Operator runbook: docs/deploy_via_tailscale_runbook.md Validation: - actionlint passes (0 errors) - YAML parses - No changes to scripts/rolling-update.sh; the workflow calls it with the same env var contract already documented in the script's usage(). Out of scope (follow-ups): post-deploy health verification, auto- rollback on script failure, Jepsen-gating, image-signature verification, Tailscale SSH (keyless), shared deploy user.

chatgpt-codex-connector · 2026-04-24T10:58:14Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

coderabbitai · 2026-04-24T10:58:16Z

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 12 minutes and 32 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 12 minutes and 32 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f0f900d2-464a-4326-9136-6c5ca97e7321

📥 Commits

Reviewing files that changed from the base of the PR and between 87edd5f and 3ddcd42.

📒 Files selected for processing (3)

.github/workflows/rolling-update.yml
docs/deploy_via_tailscale_runbook.md
docs/design/2026_04_24_proposed_deploy_via_tailscale.md

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ops/deploy-via-tailscale

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a design proposal and an operational runbook for migrating the manual rolling update process to an automated GitHub Actions workflow utilizing Tailscale for secure connectivity. The review feedback identifies critical configuration errors in the documentation examples, specifically regarding the inclusion of ports in the NODES_RAFT_MAP which would cause deployment script failures, and a mismatch in the required Tailscale OAuth scopes.

gemini-code-assist · 2026-04-24T11:01:20Z

+  entries. Prevents the first-connect TOFU prompt.
+
+**Variables (non-secret):**
+- `NODES_RAFT_MAP` — `n1=kv01:50051,n2=kv02:50051,...` (advertised addresses


The example for NODES_RAFT_MAP includes the port, which is incompatible with the scripts/rolling-update.sh script as it appends the RAFT_PORT automatically. It is better to use the hostnames only, preferably the MagicDNS FQDNs for reliable resolution. Furthermore, as this design document outlines a production-grade deployment and migration, it must detail strategies for live or zero-downtime cutovers and provide mitigation strategies for operational risks, such as potential failures during rolling upgrades.

Suggested change

- `NODES_RAFT_MAP` — `n1=kv01:50051,n2=kv02:50051,...` (advertised addresses

- NODES_RAFT_MAP — n1=kv01.<tailnet>.ts.net,n2=kv02.<tailnet>.ts.net,... (advertised addresses

References

When a design document identifies a significant operational risk, such as the inability to perform rolling upgrades, it must also detail potential mitigation strategies, like implementing a temporary "bridge" or "proxy" mode.

When designing a production-grade system migration, the plan must consider and detail strategies for live or zero-downtime cutovers (e.g., blue/green, dual-write proxy, logical replication) to avoid service interruption.

gemini-code-assist · 2026-04-24T11:01:20Z

+Admin console → Settings → OAuth clients → New client:
+
+- Description: `elastickv GitHub Actions deploy`
+- Scopes: `auth_keys` (write)


The OAuth client scope should be devices (write) instead of auth_keys (write). The tailscale/github-action requires the devices scope to register the ephemeral node in the tailnet using OAuth. This is correctly identified in the design document (§2.3) but misstated here.

Suggested change

- Scopes: `auth_keys` (write)

- Scopes: devices (write)

gemini-code-assist · 2026-04-24T11:01:20Z

+|------|-------|---------|
+| `IMAGE_BASE`      | Container image path (no tag)     | `ghcr.io/bootjp/elastickv` |
+| `SSH_USER`        | SSH login on every node           | `bootjp` |
+| `NODES_RAFT_MAP`  | Comma-separated `raftId=host:port` | `n1=kv01:50051,n2=kv02:50051,n3=kv03:50051,n4=kv04:50051,n5=kv05:50051` |


The example for NODES_RAFT_MAP includes the port (e.g., :50051), but the scripts/rolling-update.sh script automatically appends the RAFT_PORT to the host value. Providing the port here will result in an invalid address like host:50051:50051 when the script constructs the --address flag. The value should only contain the hostnames or IPs. Additionally, using the MagicDNS FQDN is recommended for consistency with the SSH_TARGETS_MAP example.

- NODES_RAFT_MAP examples no longer include ":50051". scripts/rolling-update.sh auto-appends RAFT_PORT to each entry, so a port in NODES produces host:port:port when the script constructs --address (Gemini HIGH). - OAuth client scope: design doc §2.3 said "devices:write" while the runbook §3 said "auth_keys (write)". Aligned both on "auth_keys (write)" which is what tailscale/github-action requires to mint the ephemeral join key. Added a note that newer action versions may additionally need "devices:core" (write) and to consult the action's README as the authoritative scope list (addresses the Gemini MEDIUM comment while hedging against action-version drift).

bootjp · 2026-04-24T15:10:19Z

Addressed review feedback in 6322748:

HIGH: removed :50051 from NODES_RAFT_MAP examples in both design doc and runbook. The script auto-appends RAFT_PORT, so including the port produced host:50051:50051 in --address.
MEDIUM: aligned OAuth-scope guidance. Design doc §2.3 (was "devices:write") now matches runbook §3 on "auth_keys (write)", which is what tailscale/github-action actually needs to mint the ephemeral key. Added a note that newer action versions may additionally require devices:core, pointing operators to the action README as the authoritative source.

Note on the CI failure (test job, ubuntu-latest): the failing tests are TestLua_ZSCORE_DelThenZScoreReturnsNil, TestLua_ZSCORE_TTLExpired, TestLua_ZRANGEBYSCORE_FastPathHitWithScores — unrelated to this PR (docs + YAML only, no Go code). Likely pre-existing flake on main; will re-run after the review pass.

/gemini review
@codex review

gemini-code-assist

Code Review

This pull request proposes a design and runbook for automating rolling updates via GitHub Actions and Tailscale. The feedback highlights a missing workflow file, a contradiction in dry-run implementation, and the need for more detailed mitigation and cutover strategies. Additionally, improvements are suggested for the runbook regarding SSH configuration clarity, security hygiene for host scanning, and environment variable consistency.

gemini-code-assist · 2026-04-24T15:13:07Z

+- The script is invoked with `--help` + the rendered env is printed as a
+  collapsed log group.


There is a contradiction regarding dry-run support; using --help is insufficient for validation. A native --dry-run flag should be implemented. Furthermore, as this design document covers a production-grade migration and rolling upgrades, it must include mitigation strategies for operational risks (e.g., 'bridge' or 'proxy' mode) and detail zero-downtime cutover strategies (e.g., blue/green or dual-write proxy) to comply with repository requirements.

References

When a design document identifies a significant operational risk, such as the inability to perform rolling upgrades, it must also detail potential mitigation strategies, like implementing a temporary 'bridge' or 'proxy' mode.

When designing a production-grade system migration, the plan must consider and detail strategies for live or zero-downtime cutovers (e.g., blue/green, dual-write proxy, logical replication) to avoid service interruption.

gemini-code-assist · 2026-04-24T15:13:07Z

+
+## 5. Implementation plan
+
+1. Write `.github/workflows/rolling-update.yml` implementing §2.1.


The implementation plan lists the creation of .github/workflows/rolling-update.yml as the first step, and the PR description indicates that the workflow has been verified with actionlint. However, the workflow file itself is missing from this pull request. Please include the workflow file to ensure the proposed deployment mechanism can be reviewed and tested.

gemini-code-assist · 2026-04-24T15:13:07Z

+sudo tailscale up \
+  --ssh=false \
+  --advertise-tags=tag:elastickv-node \
+  --accept-routes=false


The instructions specify --ssh=false, which disables Tailscale SSH. This approach requires the standard system SSH daemon (sshd) to be running and correctly configured to accept connections on the Tailscale interface. It would be safer to explicitly mention this requirement to prevent operators from being locked out of the deployment flow if they rely solely on Tailscale SSH for access.

gemini-code-assist · 2026-04-24T15:13:07Z

+| `TS_OAUTH_CLIENT_ID`        | Tailscale OAuth client ID from step 3 |
+| `TS_OAUTH_SECRET`           | Tailscale OAuth secret from step 3 |
+| `DEPLOY_SSH_PRIVATE_KEY`    | OpenSSH private key, authorized on every node under the deploy user |
+| `DEPLOY_KNOWN_HOSTS`        | `ssh-keyscan kv01.<tailnet>.ts.net kv02.<tailnet>.ts.net …` output (one host per line) |


Using ssh-keyscan without the -H flag stores hostnames in plain text in the known_hosts file. For better security hygiene, it is recommended to use the -H flag to hash hostnames, which helps prevent host discovery if the secrets or the runner environment are compromised.

gemini-code-assist · 2026-04-24T15:13:07Z

+| `NODES_RAFT_MAP`  | Comma-separated `raftId=host` (no port — the script appends `RAFT_PORT`) | `n1=kv01,n2=kv02,n3=kv03,n4=kv04,n5=kv05` |
+| `SSH_TARGETS_MAP` | Comma-separated `raftId=ssh-host` | `n1=kv01.<tailnet>.ts.net,n2=kv02.<tailnet>.ts.net,...` |


The environment variable names NODES_RAFT_MAP and SSH_TARGETS_MAP differ from the names expected by the scripts/rolling-update.sh script (NODES and SSH_TARGETS). While the workflow likely performs a mapping, this discrepancy can lead to confusion if an operator attempts to run the script manually using the variables defined in this runbook. Aligning the names or explicitly noting the mapping would improve maintainability and reduce friction during manual troubleshooting.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6322748fe4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-24T15:14:28Z

+permissions:
+  contents: read
+  id-token: write   # required by tailscale/github-action OIDC flow


Add packages:read for GHCR manifest check

This workflow narrows GITHUB_TOKEN to contents and id-token, which implicitly removes package scope, but the Verify image exists on ghcr.io step authenticates to GHCR and inspects a manifest with that token. In environments where the image is private (or package auth is otherwise required), docker login/docker manifest inspect will fail with authorization errors before rollout begins, so deploys are blocked even when the image exists.

Useful? React with 👍 / 👎.

- workflow: add `packages: read` to the job permissions so the `Verify image exists on ghcr.io` step's `docker manifest inspect` call works against private ghcr.io images (Codex P1). - runbook §1: explain that `--ssh=false` disables Tailscale SSH and the workflow relies on the system sshd — operators who use Tailscale SSH elsewhere need to keep that in mind (Gemini Medium). - runbook §4: change `ssh-keyscan` example + troubleshooting to `ssh-keyscan -H` so known_hosts entries are hashed and the secret does not leak tailnet topology in plaintext (Gemini Security Medium). - runbook §4 variables: document that `NODES_RAFT_MAP` / `SSH_TARGETS_MAP` are workflow-side names the render step maps to the script's `NODES` / `SSH_TARGETS`; manual invocation from a workstation must use the script-side names (Gemini Medium). Not addressed: Gemini HIGH claim that the workflow file is missing (line 187) — it IS included at `.github/workflows/rolling-update.yml` in this PR; the reviewer misread the file list. Not addressed: Gemini HIGH re native --dry-run flag + zero-downtime strategy (line 128) — dry-run is deliberately a workflow-level input, not a script-level flag, so the script stays invokable from a workstation without CI-specific options; zero-downtime cutover is outside the scope of a CI wrapper and is tracked in the resilience-roadmap follow-ups.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ad00bdc0ba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-24T15:37:37Z

+            if [[ -z "$NODES_RAFT_MAP" ]]; then
+              echo "::error::nodes filter '$NODES_FILTER' matches nothing in NODES_RAFT_MAP"
+              exit 1


Reject partially invalid node filters

The filter logic silently drops unknown raft IDs and only errors when no IDs match, so an input like nodes: n1,n9 proceeds as a rollout of n1 only. That can leave operators believing multiple nodes were upgraded when one was skipped, which is risky for staged deploys/rollbacks and can leave the cluster in an unintended mixed-version state. Treat inputs.nodes as an exact set and fail if any requested ID is missing from the configured map(s).

Useful? React with 👍 / 👎.

bootjp · 2026-04-24T15:38:21Z

Addressed round-2 in ad00bdc:

Codex P1 packages:read: added to workflow permissions for ghcr.io private manifest inspect.
Gemini Medium sshd note: documented that --ssh=false requires the system sshd.
Gemini Security-Medium ssh-keyscan -H: runbook now uses -H so known_hosts entries are hashed.
Gemini Medium variable-name discrepancy: documented that NODES_RAFT_MAP/SSH_TARGETS_MAP are workflow-side names the render step maps to the script's NODES/SSH_TARGETS; manual workstation invocation uses the script-side names directly.

Not addressed: HIGH claim that the workflow file is missing — it IS in this PR at .github/workflows/rolling-update.yml (reviewer misread file list). HIGH re native --dry-run flag + zero-downtime — dry-run is deliberately a workflow-level input to keep the script clean for workstation use; zero-downtime cutover is outside the CI wrapper's scope.

/gemini review
@codex review

gemini-code-assist

Code Review

This pull request introduces a design proposal and an operational runbook for migrating manual rolling updates to a GitHub Actions workflow using Tailscale. The feedback identifies that the workflow file itself is missing from the PR and points out technical issues regarding GitHub environment protection rules, DNS resolution for Raft nodes, and OAuth scope requirements. Additionally, it suggests adding mitigation strategies for risks associated with workflow cancellation.

gemini-code-assist · 2026-04-24T15:40:30Z

+
+## 5. Implementation plan
+
+1. Write `.github/workflows/rolling-update.yml` implementing §2.1.


The pull request title, description, and test plan indicate that the GitHub Actions workflow has been implemented and verified (e.g., actionlint passes). However, the .github/workflows/rolling-update.yml file is missing from the changes in this pull request. If this PR is intended to implement the deployment flow, please include the workflow file.

gemini-code-assist · 2026-04-24T15:40:30Z

+reviewers. A non-dry-run deploy will pause until approved; the dry-run run
+itself does not need approval (it only needs the tailnet join).


There is a contradiction between the workflow definition (line 54) and the dry-run semantics described here. If the job is associated with the production environment, GitHub will enforce the environment's protection rules (such as required reviewers) for every run, including dry-runs. To allow dry-runs to proceed without approval, the workflow should dynamically set the environment or use a separate job for the dry-run/preview phase.

gemini-code-assist · 2026-04-24T15:40:30Z

+**Gap:** there is no "stop mid-rollout" control today. If the workflow is
+cancelled via GitHub UI during a roll, the in-flight node may be mid-recreate.
+`rolling-update.sh` is supposed to be idempotent and crash-safe, but this
+should be verified before we call the workflow production-ready.


The design identifies a significant operational risk regarding workflow cancellation during a rollout. Per the general rules, such risks should be accompanied by detailed mitigation strategies. Consider adding a section to the runbook for manual recovery or updating the deployment script to handle termination signals gracefully to ensure the cluster is not left in an inconsistent state.

References

When a design document identifies a significant operational risk, it must also detail potential mitigation strategies.

gemini-code-assist · 2026-04-24T15:40:30Z

+|------|-------|---------|
+| `IMAGE_BASE`      | Container image path (no tag)     | `ghcr.io/bootjp/elastickv` |
+| `SSH_USER`        | SSH login on every node           | `bootjp` |
+| `NODES_RAFT_MAP`  | Comma-separated `raftId=host` (no port — the script appends `RAFT_PORT`). The workflow renders this into the script's `NODES` env var. | `n1=kv01,n2=kv02,n3=kv03,n4=kv04,n5=kv05` |


The example for NODES_RAFT_MAP uses short hostnames (e.g., kv01). For reliable cluster communication over Tailscale, it is recommended to use the full MagicDNS names (e.g., kv01.<tailnet>.ts.net). Short names may not be resolvable by the nodes themselves depending on their local DNS search domains, which could prevent the Raft cluster from forming correctly.

Suggested change:
| NODES_RAFT_MAP | Comma-separated raftId=host (no port — the script appends RAFT_PORT). The workflow renders this into the script's NODES env var. | n1=kv01.<tailnet>.ts.net,n2=kv02.<tailnet>.ts.net,n3=kv03.<tailnet>.ts.net,n4=kv04.<tailnet>.ts.net,n5=kv05.<tailnet>.ts.net |

gemini-code-assist · 2026-04-24T15:40:30Z

+- Scopes: `auth_keys` (write). Recent `tailscale/github-action` versions
+  may additionally require `devices:core` (write); enable that if the
+  join step fails with an authorization error. The action's README is
+  the definitive source for current scope requirements.


When using an OAuth client with the tailscale/github-action, the devices scope with write access is generally required to allow the action to register the ephemeral runner as a node. I recommend updating the instructions to include this scope as a requirement to avoid authorization errors during the tailnet join step.

Suggested change:
- Scopes: auth_keys (write) and devices (write). The devices scope (or devices:core in newer API versions) is required for the action to register the ephemeral runner as a node in your tailnet.

gemini-code-assist · 2026-04-24T15:40:30Z

+rules" to auto-approve if the workflow input `dry_run == true` (optional; cuts
+friction for previews).


GitHub's native environment protection rules do not support conditional auto-approval based on workflow inputs. Implementing this would require a custom GitHub Action for Deployment Protection Rules. It would be more accurate to state that dry-runs will still require manual approval if they target the production environment, or suggest using a separate, unprotected environment for previews.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ad00bdc0ba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-24T15:42:40Z

+            if [[ -z "$NODES_RAFT_MAP" ]]; then
+              echo "::error::nodes filter '$NODES_FILTER' matches nothing in NODES_RAFT_MAP"
+              exit 1


Fail when requested node IDs are only partially matched

The nodes filter currently errors only when zero IDs match, so a typo like n1,n9 (or whitespace variant n1, n2) silently drops unmatched IDs and proceeds with a partial rollout. That means operators can believe they updated a specific subset while one or more intended nodes were skipped, which is risky during security or incident-driven deploys. Please validate that every requested ID is present in NODES_RAFT_MAP and fail if any are missing.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-24T15:42:40Z

+          IFS=',' read -r -a entries <<< "$SSH_TARGETS"
+          failed=0
+          for e in "${entries[@]}"; do


Check reachability for all rollout nodes, not only SSH map entries

The reachability step iterates only over SSH_TARGETS, but rolling-update.sh resolves missing SSH mappings by falling back to each node host from NODES (ssh_target_by_id). If SSH_TARGETS_MAP is incomplete, dry-run can report success without probing some actual rollout targets, and the job can then fail mid-roll when it reaches an unvalidated node. Preflight should derive targets from NODES + SSH_TARGETS using the same fallback semantics or enforce one-to-one mapping coverage first.

Useful? React with 👍 / 👎.

- workflow nodes filter (Codex P2): reject any raft ID in the `nodes` input that does not appear in NODES_RAFT_MAP. Previously a typo like `n1,n9` silently rolled n1 only; now the workflow fails fast with a list of known IDs so the operator sees the typo before touching prod. - runbook section 4 (Gemini Medium x2): GitHub's native environment protection rules cannot be made conditional on workflow inputs, so the previous "auto-approve dry-run" guidance was wrong. Documented the three workable options: accept the prompt for dry-runs too (v1 default), split into a second unprotected environment, or install a deployment-protection-rule GitHub App. - runbook section 4 NODES_RAFT_MAP example (Gemini Medium): use full MagicDNS FQDNs instead of short hostnames so every node can resolve its peers regardless of local DNS search domains. - runbook section 6 (Gemini Medium): added "If a running workflow is cancelled mid-rollout" recovery steps — find the in-flight node from logs, finish the recreate by hand, confirm leader, rerun scoped. Filed as a tracked gap to teach the workflow per-node start-markers in a follow-up. Not addressed: Gemini HIGH line 187 claiming the workflow file is missing — the file IS present at .github/workflows/rolling-update.yml and has been since the first push of this PR. Third time the bot has flagged this (same finding in rounds 1 and 2); leaving as-is since responding further would just be repeating the same correction.

bootjp · 2026-04-24T15:57:24Z

Round 3 in 894bce9:

Codex P2: workflow rejects unknown raft IDs in nodes filter — n1,n9 now fails fast with the known-IDs list instead of silently rolling only n1.
Gemini Medium ×2: corrected dry_run environment-approval guidance (GitHub can't make protection rules conditional on inputs). Listed the three workable options; v1 ships with approach 1 (prompt on every run).
Gemini Medium: NODES_RAFT_MAP example now uses MagicDNS FQDNs, not short names.
Gemini Medium: added "If a running workflow is cancelled mid-rollout" recovery runbook section.

Not addressed: Gemini HIGH line 187 (workflow file missing) — the file IS in this PR; third round the bot has flagged it despite the response; leaving as-is.

/gemini review
@codex review

gemini-code-assist

Code Review

This pull request introduces a design document and a comprehensive runbook for migrating the manual rolling-update process to an automated GitHub Actions workflow using Tailscale for secure connectivity. The documentation covers setup, security configurations, and operational procedures. Feedback focuses on correcting technical inaccuracies, such as fixing a YAML syntax error in the workflow snippet, updating Tailscale OAuth scope names, and ensuring that log patterns and command visibility descriptions in the runbook accurately reflect the behavior of the underlying deployment scripts. Additionally, an inconsistency regarding the use of FQDNs for node mapping was identified between the design and operational docs.

gemini-code-assist · 2026-04-24T16:00:28Z

+  entries. Prevents the first-connect TOFU prompt.
+
+**Variables (non-secret):**
+- `NODES_RAFT_MAP` — `n1=kv01,n2=kv02,...` (advertised hostnames as seen


There is an inconsistency between the design document and the runbook. The runbook correctly recommends using full MagicDNS FQDNs for NODES_RAFT_MAP to ensure reliable resolution across nodes with different search domains, whereas this example uses short hostnames. Furthermore, as this design document proposes a production-grade system migration, it must detail strategies for live or zero-downtime cutovers (e.g., blue/green, dual-write proxy, logical replication) to avoid service interruption.

Suggested change

- `NODES_RAFT_MAP` — `n1=kv01,n2=kv02,...` (advertised hostnames as seen

- NODES_RAFT_MAP — n1=kv01.tailnet.ts.net,n2=kv02.tailnet.ts.net,... (advertised hostnames as seen

References

When designing a production-grade system migration, the plan must consider and detail strategies for live or zero-downtime cutovers (e.g., blue/green, dual-write proxy, logical replication) to avoid service interruption.

gemini-code-assist · 2026-04-24T16:00:28Z

+
+- Description: `elastickv GitHub Actions deploy`
+- Scopes: `auth_keys` (write). Recent `tailscale/github-action` versions
+  may additionally require `devices:core` (write); enable that if the


The scope devices:core is not a standard Tailscale OAuth scope. For the tailscale/github-action to successfully join the tailnet and manage ephemeral nodes, it typically requires the devices:write scope (along with auth_keys:write). Using an incorrect scope name will cause the authentication step to fail.

Suggested change

may additionally require `devices:core` (write); enable that if the

may additionally require devices (write); enable that if the

gemini-code-assist · 2026-04-24T16:00:28Z

+hazard that needs manual cleanup.
+
+1. **Look at the last log line from the `Roll cluster` step.** The script
+   logs `[rolling-update] rolling n<id>: docker stop/rm/run ...` before


The runbook describes a log line that does not exist in the current version of scripts/rolling-update.sh. The script actually logs ==> [$node_id@$node_host] start at the beginning of each node's update process. Providing the correct log pattern is important for operators to accurately identify the state of a cancelled rollout.

Suggested change

logs `[rolling-update] rolling n<id>: docker stop/rm/run ...` before

logs ==> [n<id>@<host>] start before each node recreate. Whatever

gemini-code-assist · 2026-04-24T16:00:28Z

+   flight when the cancel signal landed.
+2. **SSH into that node** over Tailscale and run `docker ps`. If the
+   container is absent or `Exited`, finish the recreate by hand with the
+   docker run arguments the script emitted (which you can see in the


The script scripts/rolling-update.sh does not emit the full docker run command to the logs; it redirects the output of the docker run command to /dev/null. Operators will not be able to see the arguments in the workflow log for the Roll cluster step. They should instead refer to the environment variables rendered in the preceding workflow step to reconstruct the command.

Suggested change

docker run arguments the script emitted (which you can see in the

docker run arguments (which you can reconstruct from the rendered

gemini-code-assist · 2026-04-24T16:00:28Z

+
+```
+name: Rolling update
+on: workflow_dispatch:


The YAML snippet contains a syntax error. on: workflow_dispatch: on a single line is invalid when followed by nested keys. It should be split across lines to be a valid GitHub Actions workflow trigger definition.

Suggested change

on: workflow_dispatch:

on:

workflow_dispatch:

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 894bce93d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-24T16:00:40Z

+Expected for non-dry-run deploys — a reviewer from the `production` environment
+must click Approve. Check the "Required reviewers" list in the environment


Fix approval troubleshooting text for dry-run workflows

This troubleshooting guidance is incorrect for the workflow as implemented: .github/workflows/rolling-update.yml sets environment: production unconditionally, so dry-run and non-dry-run executions both pause for environment approval in v1. Telling operators this is only expected for non-dry-runs can send them down the wrong debugging path when a dry-run is legitimately waiting for approval.

Useful? React with 👍 / 👎.

- Gemini HIGH (design line 82): switch NODES_RAFT_MAP example to full MagicDNS FQDNs so it matches the runbook; bare hostnames resolve differently per node. - Gemini Medium (design line 45): fix YAML — on/workflow_dispatch/inputs must be nested, not on a single line, and the fence is labelled yaml. - Gemini Medium (runbook §3, design §2.3): retract devices:core — not a valid Tailscale OAuth scope; note devices:write as the standard one. - Gemini Medium (runbook §6, line 153-156): correct the cancelled-job log pattern to what the script actually emits (`==> [<id>@<host>] start`, scripts/rolling-update.sh:398), not the fictitious `[rolling-update] rolling n<id>: ...`. - Gemini Medium (runbook §6, line 156-160): clarify that docker run stdout/stderr is redirected to /dev/null, so operators reconstruct the invocation from the step-level env log, not from the docker-run argv. - Codex P2 (runbook §8 approval troubleshooting): clarify that both dry-run and non-dry-run runs pause for approval in v1 because `environment: production` is unconditional; reference §4 for the second-environment upgrade path.

bootjp · 2026-04-24T16:13:24Z

Round 4 in 165116b:

Gemini HIGH (design line 82): NODES_RAFT_MAP example now uses full MagicDNS FQDNs to match the runbook; a cross-reference between the two docs keeps the convention visible.
Gemini Medium (design line 45): fixed YAML — on: / workflow_dispatch: / inputs: are now correctly nested, and the fence is tagged yaml.
Gemini Medium (runbook §3 + design §2.3): retracted devices:core as invalid; noted devices:write as the standard scope and explicitly called out the prior wrong scope so future readers don't cargo-cult it.
Gemini Medium (runbook §6 lines 153-160): corrected the cancelled-job log pattern to ==> [<raft-id>@<host>] start (what the script actually emits at scripts/rolling-update.sh:398) and noted that docker run output is redirected to /dev/null, so operators reconstruct the invocation from the step-level env log, not from the docker-run argv.
Codex P2 (runbook §8 approval troubleshooting): clarified that both dry-run and non-dry-run runs pause for approval in v1 because environment: production is unconditional; reference to §4 for the second-environment upgrade path.

Doc-only PR; no build needed. The test (ubuntu-latest) CI failure on round-3's commit was unrelated to docs; will re-run after this push.

No other actionable feedback this round.

/gemini review
@codex review

gemini-code-assist

Code Review

This pull request introduces a design document and a runbook for automating rolling updates via Tailscale and GitHub Actions. The feedback highlights that the actual workflow file is missing from the PR, identifies a contradiction regarding dry-run approvals in the environment settings, and corrects inaccuracies in the documentation concerning manual recovery variables and script idempotency.

gemini-code-assist · 2026-04-24T16:14:59Z

+
+## 5. Implementation plan
+
+1. Write `.github/workflows/rolling-update.yml` implementing §2.1.


The implementation plan lists the creation of .github/workflows/rolling-update.yml as the first step, but this file is missing from the current pull request. Since this workflow is the core of the proposed automation, it should be included for review.

gemini-code-assist · 2026-04-24T16:14:59Z

+reviewers. A non-dry-run deploy will pause until approved; the dry-run run
+itself does not need approval (it only needs the tailnet join).


There is a contradiction between this design document and the runbook regarding dry-run approvals. The runbook (Line 76) correctly notes that GitHub environment protections are not conditional and will pause dry-runs for approval. This section should be updated to reflect that limitation.

Suggested change

reviewers. A non-dry-run deploy will pause until approved; the dry-run run

itself does not need approval (it only needs the tailnet join).

reviewers. Note that in v1, both dry-run and non-dry-run executions will pause

for approval because GitHub's environment protection rules are not conditional.

gemini-code-assist · 2026-04-24T16:14:59Z

+   the resolved env instead: the step logs `NODES_RAFT_MAP`,
+   `EXTRA_ENV`, `GOMEMLIMIT`, `CONTAINER_MEMORY_LIMIT`, `IMAGE`, and
+   `DATA_DIR` before invoking the script — those are sufficient to


The list of variables mentioned here is insufficient for manually reconstructing the docker run command. It misses critical networking and S3 configuration (e.g., RAFT_PORT, REDIS_PORT, S3_PORT, ENABLE_S3) required by the script. Also, GOMEMLIMIT and CONTAINER_MEMORY_LIMIT are mentioned but are not directly used by scripts/rolling-update.sh.

Suggested change

the resolved env instead: the step logs `NODES_RAFT_MAP`,

`EXTRA_ENV`, `GOMEMLIMIT`, `CONTAINER_MEMORY_LIMIT`, `IMAGE`, and

`DATA_DIR` before invoking the script — those are sufficient to

the resolved env instead: the step logs the rendered NODES,

SSH_TARGETS, IMAGE, DATA_DIR, EXTRA_ENV, and all PORT

settings before invoking the script — those are sufficient to

gemini-code-assist · 2026-04-24T16:14:59Z

+The script is idempotent for the "container exists and is up" case, so
+re-running the workflow with the same `ref` after confirming the
+interrupted node is healthy is safe — the script will stop+recreate
+each node in turn regardless of whether it was touched before.


The description of the script's behavior on re-run is inaccurate. scripts/rolling-update.sh (lines 794-798) skips nodes where the image ID is unchanged and the gRPC endpoint is healthy. It does not "stop+recreate each node in turn regardless".

Suggested change

The script is idempotent for the "container exists and is up" case, so

re-running the workflow with the same `ref` after confirming the

interrupted node is healthy is safe — the script will stop+recreate

each node in turn regardless of whether it was touched before.

The script is idempotent: if the image ID is unchanged and the gRPC

endpoint is healthy, it skips the node. Re-running the workflow with

the same ref after confirming the interrupted node is healthy is

safe — the script will skip completed nodes and only process those

requiring an update.

chatgpt-codex-connector · 2026-04-24T16:17:22Z

Codex Review: Didn't find any major issues. Delightful!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…#617) ## Summary Add two OOM-defense defaults to `scripts/rolling-update.sh`: - `GOMEMLIMIT=1800MiB` (via new `DEFAULT_EXTRA_ENV`, merged into the existing `EXTRA_ENV` plumbing) - `--memory=2500m` on the remote `docker run` (via new `CONTAINER_MEMORY_LIMIT`) Both are env-var-controlled with empty-string opt-out (`${VAR-default}` so unset uses the default, but an explicit empty string disables it). ## Motivation 2026-04-24 incident: all 4 live nodes were kernel-OOM-SIGKILLed 22-169 times in 24h under a traffic spike. Each kill risked WAL-tail truncation and triggered election storms, cascading into p99 GET spikes to 6-8s. The runtime defense was applied by hand during the incident; this PR makes it the script default so future rollouts inherit it. - `GOMEMLIMIT` — Go runtime GCs aggressively as heap approaches the limit, keeping RSS below the container ceiling. - `--memory` (cgroup hard limit) — if Go can't keep up (e.g. non-heap growth), the kill is scoped to the container, not host processes like `qemu-guest-agent` or `systemd`. ## Behavior changes | Variable | Default | Opt-out | |----------|---------|---------| | `DEFAULT_EXTRA_ENV` | `GOMEMLIMIT=1800MiB` | `DEFAULT_EXTRA_ENV=""` | | `CONTAINER_MEMORY_LIMIT`| `2500m` | `CONTAINER_MEMORY_LIMIT=""` | Operator-supplied `EXTRA_ENV` keys override matching keys in `DEFAULT_EXTRA_ENV` (e.g., `EXTRA_ENV="GOMEMLIMIT=3000MiB"` wins over the default). ## Related Companion PRs (defense-in-depth): - #612 `memwatch` — graceful shutdown before kernel OOM (prevents WAL corruption in the first place) - #613 WAL auto-repair — recovers on startup when the above fails - #616 rolling-update via GitHub Actions over Tailscale — consumes this script ## Test plan - [x] `bash -n scripts/rolling-update.sh` passes - [x] Deployed equivalents manually on all 4 live nodes during the incident (2026-04-24T07:44Z - 07:46Z); no OOM recurrence since - [ ] Next rolling-update invocation should produce `docker run ... --memory=2500m ... -e GOMEMLIMIT=1800MiB ...` on each node ## Design doc reference `docs/design/2026_04_24_proposed_resilience_roadmap.md` (item 1 — capacity/runtime defenses).

- Gemini Medium (design §2.6 line 146): the design doc contradicted the runbook by claiming dry-runs do NOT need approval. GitHub's environment-protection rules cannot be made conditional on workflow inputs, so `environment: production` pauses BOTH dry-run and non-dry-run executions in v1. Aligned the design-doc wording with the runbook and cross-referenced §4 for the second-environment upgrade path. - Gemini Medium (runbook §6 line 164 env list): the list of reconstruction vars was incomplete. Listed every env var the workflow actually exports (IMAGE, DATA_DIR, RAFT_PORT, REDIS_PORT, S3_PORT, ENABLE_S3, NODES, SSH_TARGETS, EXTRA_ENV) and called out the script-level defaults for anything not overridden, plus noted GOMEMLIMIT / CONTAINER_MEMORY_LIMIT are propagated via EXTRA_ENV once PR #617 lands. - Gemini Medium (runbook §6 line 178 idempotency): corrected the "stop+recreate every node regardless" claim. The script (scripts/rolling-update.sh:794-798) skips nodes whose running image id matches the target AND whose gRPC endpoint is healthy, so re-running after a partial roll is safe because already-rolled nodes are no-ops, not stops. Declining again Gemini HIGH "workflow file missing" — the file IS in this PR at .github/workflows/rolling-update.yml; this is the fourth round the bot has flagged its own misread. See prior rounds for rationale; no change.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 24383b2db1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-24T16:37:43Z

+        uses: actions/checkout@v6
+        with:
+          ref: ${{ inputs.ref }}


Restrict deploy script checkout to trusted refs

The workflow checks out ${{ inputs.ref }} and later executes ./scripts/rolling-update.sh after loading production credentials (SSH key, Tailscale OAuth secret) and joining the tailnet, so ref is effectively arbitrary code execution, not just a deployment selector. If someone can dispatch runs and supply a branch/tag they control, a modified script in that ref can exfiltrate secrets or perform unintended actions once the environment is approved. Keep execution on a trusted protected revision (or strictly validate allowed refs) and use the input only for image selection.

Useful? React with 👍 / 👎.

Codex P1: the workflow hands the checked-out tree a Tailscale OAuth secret and an SSH key, then executes scripts/rolling-update.sh from that tree. Anyone who can dispatch runs could previously point inputs.ref at a branch containing a malicious script modification and exfiltrate the secrets. Mitigations: - New 'Validate ref is default branch or a tag' step rejects any ref that is not the default branch (by name or HEAD sha) or an existing tag. A sha reachable from elsewhere is still accepted (the subsequent checkout does its own verification) but non- default branches fail closed with an operator-visible error. - actions/checkout now pins persist-credentials: false so the GITHUB_TOKEN is not left in the runner's git config for the deploy script to harvest. The token is still explicitly exposed to the ghcr verification step via env:, which is the only place it needs to be readable.

bootjp · 2026-04-24T16:54:19Z

Addressed in 3ddcd42:

Codex P1 (arbitrary code execution via inputs.ref): new pre-checkout step validates that ref resolves to the repo's default branch (by name or HEAD sha), an existing tag, or a sha; any other branch fails closed with an operator-visible error before the tailnet join / SSH key load. Also pinned actions/checkout to persist-credentials: false so the GITHUB_TOKEN is not left in the runner's git config for the deploy script to harvest.

Gemini HIGH / Medium comments already addressed in previous rounds (bot is reading stale state):

HIGH line 200 (workflow file missing): present at .github/workflows/rolling-update.yml since this PR's first push. Fourth round the bot has flagged this despite the correction; declining silently.
Medium: dry-run approval contradiction: §2.6 of the design doc and §4 of the runbook both now state GitHub's native protection rules do not condition on inputs, so dry-run pauses for approval in v1. Fixed in round 4 (commit 165116b).
Medium: vars list insufficient to reconstruct docker run: runbook §6 recovery section now lists the full set (IMAGE, DATA_DIR, RAFT_PORT, REDIS_PORT, S3_PORT, ENABLE_S3, NODES, SSH_TARGETS, merged EXTRA_ENV) with the script's defaults for what the workflow doesn't export. Fixed in round 4.
Medium: re-run description inaccurate: runbook now says scripts/rolling-update.sh:794-798 skips a node when its image id matches the target and the gRPC endpoint is healthy. Fixed in round 4.

/gemini review
@codex review

gemini-code-assist · 2026-04-24T16:54:22Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

chatgpt-codex-connector · 2026-04-24T16:54:28Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

bootjp mentioned this pull request Apr 24, 2026

ops(rolling-update): add GOMEMLIMIT=1800MiB + --memory=2500m defaults #617

Merged

3 tasks

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 24, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 24, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 24, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 24, 2026

View reviewed changes

	- `NODES_RAFT_MAP` — `n1=kv01:50051,n2=kv02:50051,...` (advertised addresses
	- NODES_RAFT_MAP — n1=kv01.<tailnet>.ts.net,n2=kv02.<tailnet>.ts.net,... (advertised addresses

		- The script is invoked with `--help` + the rendered env is printed as a
		collapsed log group.


		## 5. Implementation plan

		1. Write `.github/workflows/rolling-update.yml` implementing §2.1.

		\| `NODES_RAFT_MAP` \| Comma-separated `raftId=host` (no port — the script appends `RAFT_PORT`) \| `n1=kv01,n2=kv02,n3=kv03,n4=kv04,n5=kv05` \|
		\| `SSH_TARGETS_MAP` \| Comma-separated `raftId=ssh-host` \| `n1=kv01.<tailnet>.ts.net,n2=kv02.<tailnet>.ts.net,...` \|

		reviewers. A non-dry-run deploy will pause until approved; the dry-run run
		itself does not need approval (it only needs the tailnet join).

		rules" to auto-approve if the workflow input `dry_run == true` (optional; cuts
		friction for previews).

	- `NODES_RAFT_MAP` — `n1=kv01,n2=kv02,...` (advertised hostnames as seen
	- NODES_RAFT_MAP — n1=kv01.tailnet.ts.net,n2=kv02.tailnet.ts.net,... (advertised hostnames as seen

	may additionally require `devices:core` (write); enable that if the
	may additionally require devices (write); enable that if the

	logs `[rolling-update] rolling n<id>: docker stop/rm/run ...` before
	logs ==> [n<id>@<host>] start before each node recreate. Whatever

	docker run arguments the script emitted (which you can see in the
	docker run arguments (which you can reconstruct from the rendered

		Expected for non-dry-run deploys — a reviewer from the `production` environment
		must click Approve. Check the "Required reviewers" list in the environment

-The script is idempotent for the "container exists and is up" case, so
-re-running the workflow with the same `ref` after confirming the
-interrupted node is healthy is safe — the script will stop+recreate
-each node in turn regardless of whether it was touched before.
+The script is idempotent: if the image ID is unchanged and the gRPC
+endpoint is healthy, it skips the node. Re-running the workflow with
+the same ref after confirming the interrupted node is healthy is
+safe — the script will skip completed nodes and only process those
+requiring an update.

Conversation

bootjp commented Apr 24, 2026

Summary

Motivation

What the workflow does

Design and operator docs

What's NOT changed

Test plan

Follow-ups (out of scope for v1, tracked in the design doc)

Before merge

Uh oh!

chatgpt-codex-connector Bot commented Apr 24, 2026

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading