CI: Add behavioral validation tests for topology, problems, and operations by renecannao · Pull Request #91 · ProxySQL/orchestrator

renecannao · 2026-04-09T06:23:51Z

Summary

The existing functional tests are mostly HTTP status code checks ("does the endpoint return 200?"). This PR adds real behavioral tests that verify orchestrator actually works correctly.

New test scripts

test-topology-discovery.sh (#88) — Cross-references orchestrator's API data against direct MySQL queries:

Correct instance count (3)
Master identified correctly (read_only=false, no upstream)
Replicas identified correctly (read_only=true, correct master, replication running)
GTID tracking active
Instance freshness (LastChecked recent)
Uptime detection

test-problem-detection.sh (#89) — Induces problems and verifies detection:

Stops replication → verifies orchestrator detects it → restarts → verifies it clears
Sets replica writable → verifies detection → restores read_only → verifies cleared
Injects errant GTID → verifies GtidErrant field populated

test-topology-operations.sh (#90) — Tests actual refactoring operations:

relocate: move mysql3 under mysql2, verify via API and direct MySQL
move-up: move mysql3 back under mysql1
repoint: GTID-based repoint to different master
set-read-only / set-writeable: toggle read_only via API

Depends on #85
Closes #88, closes #89, closes #90

Test plan

All 3 test scripts pass on MySQL 8.4
Tests run in the MySQL version matrix
Topology operations correctly modify replication and orchestrator detects the changes

Summary by CodeRabbit

Tests
- Added functional tests for replication topology discovery, instance state validation, problem detection/clearing, topology refactoring (relocate/move-up/repoint) and read-only management.
- Improved test reliability with polling, validation, and cleanup steps.
Chores
- Extended CI workflow to run the new functional test steps as part of MySQL job.

coderabbitai · 2026-04-09T06:23:59Z

Warning

Rate limit exceeded

@renecannao has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 20 minutes and 11 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 20 minutes and 11 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 51052a23-1f43-41b7-88ee-be2de97b33fe

📥 Commits

Reviewing files that changed from the base of the PR and between 47988d9 and cd02379.

📒 Files selected for processing (1)

tests/functional/test-topology-operations.sh

📝 Walkthrough

Walkthrough

Three new functional test scripts were added to validate topology discovery, problem detection, and topology operations. The CI workflow was updated to run these tests between smoke and regression stages. The shared test helper mysql_source_host() parsing was adjusted, and an existing test (test-named-channels.sh) gained extra validation and cleanup.

Changes

Cohort / File(s)	Summary
CI Workflow `.github/workflows/functional.yml`	Inserted three new functional test steps — `tests/functional/test-topology-discovery.sh`, `tests/functional/test-problem-detection.sh`, `tests/functional/test-topology-operations.sh` — into the MySQL workflow job between smoke and regression tests.
Test Helpers `tests/functional/lib.sh`	Modified `mysql_source_host()` to use `SHOW ... STATUS` without `\G` and parse tab-separated output via `awk -F'\t' '{print $2; exit}'` for replication source extraction.
Existing Test `tests/functional/test-named-channels.sh`	Added pre-check on `mysql2` for expected data before configuring named channel, replaced fixed sleep with a polling loop (up to 15s) and improved failure messaging; added cleanup step to drop `extra_db`.
New Tests `tests/functional/test-topology-discovery.sh`, `tests/functional/test-problem-detection.sh`, `tests/functional/test-topology-operations.sh`	Added three executable Bash functional tests: discovery (validates instance counts, master/replica identification, GTID, freshness, uptime), problem detection (stops/starts replication, detects/clears problems, read_only mismatch, errant GTIDs), and topology operations (relocate, move-up, repoint, read-only toggling) with polling-driven assertions and recovery steps.

Sequence Diagram(s)

sequenceDiagram
    participant TestRunner as Test Runner
    participant Orchestrator as Orchestrator API
    participant MySQL1 as mysql1 (master)
    participant MySQL2 as mysql2 (replica)
    participant MySQL3 as mysql3 (replica)

    TestRunner->>Orchestrator: wait for /api/cluster and /api/instance readiness
    TestRunner->>Orchestrator: POST /api/discover/mysql1/3306 (discover topology)
    Orchestrator->>MySQL1: query instance status, GTID, read_only, uptime
    Orchestrator->>MySQL2: query instance status, replication status
    Orchestrator->>MySQL3: query instance status, replication status
    Orchestrator-->>TestRunner: cluster JSON (instances, master/replica info)

    alt Problem detection test
        TestRunner->>MySQL2: execute SQL to stop replication
        TestRunner->>Orchestrator: POST /api/discover/mysql2/3306
        Orchestrator->>MySQL2: detect replication stopped
        Orchestrator-->>TestRunner: /api/problems includes mysql2 problem
        TestRunner->>MySQL2: restart replication
        TestRunner->>Orchestrator: POST /api/discover/mysql2/3306
    end

    alt Topology operations test
        TestRunner->>Orchestrator: POST /api/relocate/mysql3
        Orchestrator->>MySQL3: orchestrate move (exec SQL or GTID operations)
        Orchestrator-->>TestRunner: operation result
        TestRunner->>MySQL3: query @@read_only and replication source to verify
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Add functional test CI: Docker MySQL topology + ProxySQL + failover tests #51: Modifies functional test tooling and adds/uses mysql_source_host parsing adjustments and new functional tests.
CI: MySQL version matrix (5.7-9.6) and named channels tests #85: Makes similar updates to the functional CI workflow and test helper parsing logic that overlap with this change set.

Poem

🐰
I hopped through logs and chased the thread—
Discovery, problems, operations fed.
With polls and checks I nibbled each clue,
Topology tidy, tests passing too! 🥕

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly summarizes the main change: adding three behavioral validation tests for topology, problems, and operations, which is directly supported by all file summaries.
Linked Issues check	✅ Passed	All three linked issues (`#88`, `#89`, `#90`) are met: topology discovery validation (test-topology-discovery.sh), replication problem detection (test-problem-detection.sh), and topology refactoring operations (test-topology-operations.sh) are implemented with required test cases and cross-validation.
Out of Scope Changes check	✅ Passed	All changes are directly related to the PR objectives: workflow configuration changes to run the three tests, helper function updates for parsing reliability, and the three new test scripts with supporting modifications.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch issue88-90-behavioral-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces three new functional test scripts to validate orchestrator's problem detection, topology discovery, and topology refactoring operations. My review identified a bug in the errant GTID test cleanup where the database drop operation would fail due to super_read_only being enabled, and I have provided a correction. Additionally, I recommended replacing fixed sleep intervals with polling loops for better test reliability, suggested consolidating redundant Python parsing logic, and noted a discrepancy between a comment and the actual implementation regarding timeout thresholds.

gemini-code-assist · 2026-04-09T06:33:45Z

+$COMPOSE exec -T mysql3 mysql -uroot -ptestpass -e "
+    SET GLOBAL read_only=0;
+    DROP DATABASE IF EXISTS errant_detect_test;
+    SET GLOBAL read_only=1;
+" 2>/dev/null


The cleanup for the errant GTID test will fail silently. After the injection at lines 158-164, super_read_only is set to 1. The cleanup attempts to DROP DATABASE, which is a write operation that will be blocked by super_read_only=1. The error is suppressed by 2>/dev/null.

The cleanup needs to disable super_read_only to be able to drop the database and then restore it.

Suggested change

$COMPOSE exec -T mysql3 mysql -uroot -ptestpass -e "

SET GLOBAL read_only=0;

DROP DATABASE IF EXISTS errant_detect_test;

SET GLOBAL read_only=1;

" 2>/dev/null

$COMPOSE exec -T mysql3 mysql -uroot -ptestpass -e "

SET GLOBAL read_only=0;

SET GLOBAL super_read_only=0;

DROP DATABASE IF EXISTS errant_detect_test;

SET GLOBAL read_only=1;

SET GLOBAL super_read_only=1;

" 2>/dev/null

gemini-code-assist · 2026-04-09T06:33:45Z

+echo "Setting mysql2 writeable via API..."
+RESULT=$(curl -s --max-time 10 "$ORC_URL/api/set-writeable/mysql2/3306" 2>/dev/null)
+
+sleep 2


Using a fixed sleep (here and on line 158) can make tests flaky if the operation takes longer than expected. A more robust approach is to poll for the expected read_only status in a loop with a timeout.

For example:

SUCCESS=false for i in $(seq 1 10); do # check condition if [ ... ]; then SUCCESS=true break fi sleep 1 done # assert SUCCESS is true

gemini-code-assist · 2026-04-09T06:33:45Z

+    HAS_MYSQL2=$(echo "$PROBLEMS" | python3 -c "
+import json, sys
+problems = json.load(sys.stdin)
+for p in problems:
+    h = p.get('Key', {}).get('Hostname', '')
+    if 'mysql2' in h:
+        sys.exit(0)
+sys.exit(1)
+" 2>/dev/null && echo "yes" || echo "no")


The python script used here to check for problems on mysql2 is identical to the one used in the first test (lines 30-38). This duplication should be avoided to improve maintainability. Consider creating a reusable shell function that takes the instance name and problem list as arguments.

gemini-code-assist · 2026-04-09T06:33:45Z

+ORC_MASTER_HOST=$(echo "$MASTER_INFO" | python3 -c "import json,sys; print(json.load(sys.stdin)['Key']['Hostname'])" 2>/dev/null || echo "")
+ORC_MASTER_RO=$(echo "$MASTER_INFO" | python3 -c "import json,sys; print(json.load(sys.stdin)['ReadOnly'])" 2>/dev/null || echo "")


These two commands parse the same JSON string ($MASTER_INFO) in two separate python3 processes. This is inefficient. Consider parsing the JSON once and extracting both values in a single python3 call. You could output them separated by a delimiter and use cut or read to assign them to variables.

gemini-code-assist · 2026-04-09T06:33:45Z

+echo ""
+echo "--- Test 5: Instance freshness ---"
+
+# Verify LastChecked is recent (within last 60 seconds)


This comment states the check is for the "last 60 seconds", but the implementation at line 193 uses a threshold of 120 seconds. The comment should be updated to match the code to avoid confusion.

Suggested change

# Verify LastChecked is recent (within last 60 seconds)

# Verify LastChecked is recent (within last 120 seconds)

gemini-code-assist · 2026-04-09T06:33:45Z

+# Use move-up to move mysql3 from under mysql2 back to mysql1
+echo "Moving mysql3 up via API..."
+RESULT=$(curl -s --max-time 30 "$ORC_URL/api/move-up/mysql3/3306" 2>/dev/null)
+CODE=$(echo "$RESULT" | python3 -c "import json,sys; print(json.load(sys.stdin).get('Code',''))" 2>/dev/null || echo "")


This line is identical to line 59 and line 118. To improve maintainability and reduce duplication, consider creating a small helper function to extract the Code field from the API JSON response.

For example:

get_api_code() { echo "$1" | python3 -c "import json,sys; print(json.load(sys.stdin).get('Code',''))" 2>/dev/null || echo "" } # Usage CODE=$(get_api_code "$RESULT")

…89, #90) Add three new test scripts that verify orchestrator actually works correctly, not just that API endpoints return 200: - test-topology-discovery.sh: validates orchestrator's view of the topology matches actual MySQL state (master/replica roles, read_only, GTID tracking, replication threads, instance freshness) - test-problem-detection.sh: verifies orchestrator detects and clears replication problems (stopped replication, writable replica, errant GTID) by inducing problems and checking the API response - test-topology-operations.sh: tests actual topology refactoring operations (relocate, move-up, repoint, set-read-only/writeable) and cross-references results with direct MySQL queries

The test created extra_db on mysql2 but only cleaned up the channel on mysql3. Add DROP DATABASE to prevent residual state.

The \G (vertical format) flag is unreliable when passed through docker exec pipes — it returns empty output on MySQL 8.4 and 9.x. Use tabular SHOW REPLICA STATUS instead; column 2 is always the source host.

awk splits on whitespace by default, picking up "for" from "Waiting for source..." (column 1 = Replica_IO_State). Use -F'\t' to split on tabs so column 2 = Source_Host.

Orchestrator caches instance state and only refreshes on its poll interval. After STOP REPLICA / SET GLOBAL read_only / injecting errant GTIDs, the tests need to force a re-discover call so orchestrator updates its cached data before the test assertions check the API.

- Verify data exists on mysql2 before setting up the extra channel - Replace single sleep+check with a 15s retry loop for data replication - Add better error messages for debugging if setup or replication fails

coderabbitai

🧹 Nitpick comments (3)

tests/functional/test-topology-discovery.sh (1)

5-5: Add error handling for cd command.

If the cd fails (e.g., directory doesn't exist), the script will continue with incorrect working directory, potentially causing confusing failures.

Proposed fix

-cd "$(dirname "$0")/../.."
+cd "$(dirname "$0")/../.." || { echo "Failed to cd to repository root"; exit 1; }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/functional/test-topology-discovery.sh` at line 5, The script currently
runs cd "$(dirname "$0")/../.." without checking for failure; update the script
so the cd command's failure is detected and handled (for example, test the exit
status of cd or enable errexit) and exit with a clear error message;
specifically modify the invocation around cd "$(dirname "$0")/../.." (or the
script's top-of-file bootstrapping) to abort the script and log why (unable to
change to the expected working directory) when the cd fails.

tests/functional/test-problem-detection.sh (1)

5-5: Add error handling for cd command.

Consistent with the other test scripts, this should fail fast if the directory change fails.

Proposed fix

-cd "$(dirname "$0")/../.."
+cd "$(dirname "$0")/../.." || { echo "Failed to cd to repository root"; exit 1; }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/functional/test-problem-detection.sh` at line 5, The cd command in
tests/functional/test-problem-detection.sh should fail fast like the other
scripts: update the cd "$(dirname "$0")/../.." invocation to check its exit
status and abort on failure (e.g., append an immediate failure handler such as
|| exit 1 or an equivalent error message + exit) so the script stops if the
directory change fails.

tests/functional/test-topology-operations.sh (1)

5-5: Add error handling for cd command.

Same issue as in other test scripts — the script should fail fast if the directory change fails.

Proposed fix

-cd "$(dirname "$0")/../.."
+cd "$(dirname "$0")/../.." || { echo "Failed to cd to repository root"; exit 1; }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/functional/test-topology-operations.sh` at line 5, The cd command cd
"$(dirname "$0")/../.." should fail fast if it cannot change directories; update
that line to check the exit status and abort on failure (e.g., append a
conditional OR that prints an error to stderr and exits non‑zero), or ensure the
script enables strict failure (set -e) before running the cd; target the exact
cd invocation (cd "$(dirname "$0")/../..") when applying the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/functional/test-problem-detection.sh`:
- Line 5: The cd command in tests/functional/test-problem-detection.sh should
fail fast like the other scripts: update the cd "$(dirname "$0")/../.."
invocation to check its exit status and abort on failure (e.g., append an
immediate failure handler such as || exit 1 or an equivalent error message +
exit) so the script stops if the directory change fails.

In `@tests/functional/test-topology-discovery.sh`:
- Line 5: The script currently runs cd "$(dirname "$0")/../.." without checking
for failure; update the script so the cd command's failure is detected and
handled (for example, test the exit status of cd or enable errexit) and exit
with a clear error message; specifically modify the invocation around cd
"$(dirname "$0")/../.." (or the script's top-of-file bootstrapping) to abort the
script and log why (unable to change to the expected working directory) when the
cd fails.

In `@tests/functional/test-topology-operations.sh`:
- Line 5: The cd command cd "$(dirname "$0")/../.." should fail fast if it
cannot change directories; update that line to check the exit status and abort
on failure (e.g., append a conditional OR that prints an error to stderr and
exits non‑zero), or ensure the script enables strict failure (set -e) before
running the cd; target the exact cd invocation (cd "$(dirname "$0")/../..") when
applying the change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f699f0c4-2392-4ad4-a3b1-fea6df43c768

📥 Commits

Reviewing files that changed from the base of the PR and between 5b13080 and 1d88f3f.

📒 Files selected for processing (6)

.github/workflows/functional.yml
tests/functional/lib.sh
tests/functional/test-named-channels.sh
tests/functional/test-problem-detection.sh
tests/functional/test-topology-discovery.sh
tests/functional/test-topology-operations.sh

After STOP+START REPLICA, mysql2 may still appear in /api/problems due to other issues (replication lag, errant GTID from stop/start cycle). Instead of checking /api/problems for any mysql2 entry, verify that the SQL thread is running again via the instance API. Also force re-discovery on each poll iteration to ensure fresh state.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

tests/functional/test-named-channels.sh (1)

65-65: Use _ for the intentionally unused loop variable.

Line 65 uses i but never reads it, which triggers SC2034 and adds noise.

#!/bin/bash
# Verify the loop variable is not referenced anywhere in this script.
rg -n --type=sh -C2 '\bfor\s+i\s+in\s+\$\(seq 1 15\)' tests/functional/test-named-channels.sh
rg -n --type=sh '\$i\b' tests/functional/test-named-channels.sh

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/functional/test-named-channels.sh` at line 65, The loop variable `i` in
the `for i in $(seq 1 15); do` line is intentionally unused and should be
renamed to `_` to avoid the SC2034 unused-variable warning; update the loop to
`for _ in $(seq 1 15); do` and verify there are no references to `$i` elsewhere
(if any exist, replace or remove them) so the change is safe.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/functional/test-named-channels.sh`:
- Around line 129-131: The DROP DATABASE cleanup currently suppresses stderr and
ignores failures for the command `$COMPOSE exec -T mysql2 mysql -uroot
-ptestpass -e "DROP DATABASE IF EXISTS extra_db;"`; change this so failures are
detected and cause the test run to fail: remove the `2>/dev/null` suppression,
capture and check the command exit status (or enable `set -e`/`set -o errexit`
for the script), and on non-zero exit print a clear error message and exit
non-zero so stale state cannot leak into later tests.

In `@tests/functional/test-problem-detection.sh`:
- Line 23: The mysql state-changing exec lines (e.g. the call using $COMPOSE
exec -T mysql2 mysql -uroot -ptestpass -e "$STOP_SQL") suppress stderr and
ignore exit codes; update each such mutation call (including analogous
START/INSERT/UPDATE/DELETE commands) to capture stdout/stderr and check the
command's exit status immediately, and if non-zero print a descriptive error
including the captured stderr and exit non-zero to fail the test fast. Locate
the commands by the pattern "$COMPOSE exec -T mysql* mysql -uroot -ptestpass -e
\"...\"" and modify them to redirect/collect stderr (or assign output to a
variable), test $? (or the command result), and abort with an explanatory
message when the command fails.
- Around line 100-105: The current check marks CLEARED when only SQL_RUNNING is
"True", which can pass while IO_RUNNING is "False"; update the conditional that
sets CLEARED (the block using SQL_RUNNING, IO_RUNNING, CLEARED and REPL_STATE)
to require both SQL_RUNNING = "True" and IO_RUNNING = "True" before setting
CLEARED and breaking the loop, and adjust the echo message to reflect both
thread states (e.g., "SQL=True, IO=True") so the test only passes when both
threads are healthy.
- Line 15: discover_topology is being treated as optional but must be a hard
precondition; update the test to fail immediately when discover_topology
"mysql1" returns non-zero by checking its exit status and exiting with a
non-zero code and an explanatory stderr message (e.g., run discover_topology
"mysql1" and if it fails print "Topology discovery failed for mysql1" to stderr
and exit 1), or alternatively enable strict shell error handling at the top of
the script (set -euo pipefail) to enforce failure; refer to the
discover_topology invocation to locate where to add the check.
- Around line 5-6: The script should fail fast if bootstrap steps fail: add a
strict shell flag (e.g., set -euo pipefail or at minimum set -e) at the top of
the script (immediately after the shebang) so that failures in the cd "$(dirname
"$0")/../.." or source tests/functional/lib.sh commands abort the run; ensure
the flags are applied before those two lines so any failure in cd or source
stops the script.

---

Nitpick comments:
In `@tests/functional/test-named-channels.sh`:
- Line 65: The loop variable `i` in the `for i in $(seq 1 15); do` line is
intentionally unused and should be renamed to `_` to avoid the SC2034
unused-variable warning; update the loop to `for _ in $(seq 1 15); do` and
verify there are no references to `$i` elsewhere (if any exist, replace or
remove them) so the change is safe.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e9671a70-a5c5-46e7-96f5-90bb0c864147

📥 Commits

Reviewing files that changed from the base of the PR and between 1d88f3f and 47988d9.

📒 Files selected for processing (2)

tests/functional/test-named-channels.sh
tests/functional/test-problem-detection.sh

coderabbitai · 2026-04-09T12:28:44Z

+# Clean up test database on mysql2
+$COMPOSE exec -T mysql2 mysql -uroot -ptestpass -e "DROP DATABASE IF EXISTS extra_db;" 2>/dev/null
+


⚠️ Potential issue | 🟠 Major

Handle cleanup failures explicitly to avoid cross-test contamination.

Line 130 suppresses stderr and does not check command success. If DROP DATABASE fails, stale state can leak into later functional tests in the same environment.

Suggested fix

# Clean up test database on mysql2 -$COMPOSE exec -T mysql2 mysql -uroot -ptestpass -e "DROP DATABASE IF EXISTS extra_db;" 2>/dev/null +if ! $COMPOSE exec -T mysql2 mysql -uroot -ptestpass -e "DROP DATABASE IF EXISTS extra_db;" 2>/dev/null; then + fail "Cleanup: failed to drop extra_db on mysql2" +else + pass "Cleanup: dropped extra_db on mysql2" +fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Clean up test database on mysql2

$COMPOSE exec -T mysql2 mysql -uroot -ptestpass -e "DROP DATABASE IF EXISTS extra_db;" 2>/dev/null

# Clean up test database on mysql2

if ! $COMPOSE exec -T mysql2 mysql -uroot -ptestpass -e "DROP DATABASE IF EXISTS extra_db;" 2>/dev/null; then

echo "Error: Failed to drop extra_db on mysql2"

exit 1

fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/functional/test-named-channels.sh` around lines 129 - 131, The DROP DATABASE cleanup currently suppresses stderr and ignores failures for the command `$COMPOSE exec -T mysql2 mysql -uroot -ptestpass -e "DROP DATABASE IF EXISTS extra_db;"`; change this so failures are detected and cause the test run to fail: remove the `2>/dev/null` suppression, capture and check the command exit status (or enable `set -e`/`set -o errexit` for the script), and on non-zero exit print a clear error message and exit non-zero so stale state cannot leak into later tests.

coderabbitai · 2026-04-09T12:28:45Z

+cd "$(dirname "$0")/../.."
+source tests/functional/lib.sh


⚠️ Potential issue | 🟠 Major

Fail fast when bootstrap steps fail.

If cd or source fails, the script still continues (no set -e), which can cascade into misleading failures later.

Suggested fix

-cd "$(dirname "$0")/../.." -source tests/functional/lib.sh +cd "$(dirname "$0")/../.." || { echo "FATAL: unable to cd to repository root"; exit 1; } +source tests/functional/lib.sh || { echo "FATAL: unable to load tests/functional/lib.sh"; exit 1; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cd "$(dirname "$0")/../.."

source tests/functional/lib.sh

cd "$(dirname "$0")/../.." || { echo "FATAL: unable to cd to repository root"; exit 1; }

source tests/functional/lib.sh || { echo "FATAL: unable to load tests/functional/lib.sh"; exit 1; }

🧰 Tools

🪛 Shellcheck (0.11.0)

[warning] 5-5: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[info] 6-6: Not following: tests/functional/lib.sh was not specified as input (see shellcheck -x).

(SC1091)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/functional/test-problem-detection.sh` around lines 5 - 6, The script should fail fast if bootstrap steps fail: add a strict shell flag (e.g., set -euo pipefail or at minimum set -e) at the top of the script (immediately after the shebang) so that failures in the cd "$(dirname "$0")/../.." or source tests/functional/lib.sh commands abort the run; ensure the flags are applied before those two lines so any failure in cd or source stops the script.

coderabbitai · 2026-04-09T12:28:45Z

+START_SQL=$(mysql_start_replica_sql)
+
+wait_for_orchestrator || { echo "FATAL: Orchestrator not reachable"; exit 1; }
+discover_topology "mysql1"


⚠️ Potential issue | 🟠 Major

Treat topology discovery as a hard precondition.

Line 15 ignores discover_topology failure, so tests can run against partial/undiscovered state and produce noisy results.

Suggested fix

-discover_topology "mysql1" +discover_topology "mysql1" || { echo "FATAL: initial topology discovery failed"; exit 1; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

discover_topology "mysql1"

discover_topology "mysql1" || { echo "FATAL: initial topology discovery failed"; exit 1; }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/functional/test-problem-detection.sh` at line 15, discover_topology is being treated as optional but must be a hard precondition; update the test to fail immediately when discover_topology "mysql1" returns non-zero by checking its exit status and exiting with a non-zero code and an explanatory stderr message (e.g., run discover_topology "mysql1" and if it fails print "Topology discovery failed for mysql1" to stderr and exit 1), or alternatively enable strict shell error handling at the top of the script (set -euo pipefail) to enforce failure; refer to the discover_topology invocation to locate where to add the check.

coderabbitai · 2026-04-09T12:28:45Z

+
+# Stop replication on mysql2
+echo "Stopping replication on mysql2..."
+$COMPOSE exec -T mysql2 mysql -uroot -ptestpass -e "$STOP_SQL" 2>/dev/null


⚠️ Potential issue | 🟠 Major

Check exit codes for state-changing MySQL commands.

These commands mutate DB state but suppress stderr and do not verify success. If any command fails, subsequent assertions may incorrectly blame orchestrator behavior.

Suggested pattern

+# helper +run_mysql() { + local host="$1" + local sql="$2" + if ! $COMPOSE exec -T "$host" mysql -uroot -ptestpass -e "$sql" >/dev/null; then + fail "MySQL command failed on ${host}" "$sql" + return 1 + fi + return 0 +}

-$COMPOSE exec -T mysql2 mysql -uroot -ptestpass -e "$STOP_SQL" 2>/dev/null +run_mysql "mysql2" "$STOP_SQL" || exit 1

Apply the same pattern to the other mutation calls.

Also applies to: 81-81, 122-122, 150-150, 179-185, 216-220

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/functional/test-problem-detection.sh` at line 23, The mysql state-changing exec lines (e.g. the call using $COMPOSE exec -T mysql2 mysql -uroot -ptestpass -e "$STOP_SQL") suppress stderr and ignore exit codes; update each such mutation call (including analogous START/INSERT/UPDATE/DELETE commands) to capture stdout/stderr and check the command's exit status immediately, and if non-zero print a descriptive error including the captured stderr and exit non-zero to fail the test fast. Locate the commands by the pattern "$COMPOSE exec -T mysql* mysql -uroot -ptestpass -e \"...\"" and modify them to redirect/collect stderr (or assign output to a variable), test $? (or the command result), and abort with an explanatory message when the command fails.

coderabbitai · 2026-04-09T12:28:45Z

+    SQL_RUNNING=$(echo "$REPL_STATE" | cut -d: -f1)
+    IO_RUNNING=$(echo "$REPL_STATE" | cut -d: -f2)
+    if [ "$SQL_RUNNING" = "True" ]; then
+        CLEARED=true
+        echo "Replication recovered after ${i}s (SQL=True, IO=${IO_RUNNING})"
+        break


⚠️ Potential issue | 🟠 Major

Require both SQL and IO threads for “cleared” state.

Current clear condition only checks SQL=True; replication can still be unhealthy with IO=False, causing a false pass.

Suggested fix

- if [ "$SQL_RUNNING" = "True" ]; then + if [ "$SQL_RUNNING" = "True" ] && [ "$IO_RUNNING" = "True" ]; then CLEARED=true - echo "Replication recovered after ${i}s (SQL=True, IO=${IO_RUNNING})" + echo "Replication recovered after ${i}s (SQL=True, IO=True)" break fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

SQL_RUNNING=$(echo "$REPL_STATE" | cut -d: -f1)

IO_RUNNING=$(echo "$REPL_STATE" | cut -d: -f2)

if [ "$SQL_RUNNING" = "True" ]; then

CLEARED=true

echo "Replication recovered after ${i}s (SQL=True, IO=${IO_RUNNING})"

break

SQL_RUNNING=$(echo "$REPL_STATE" | cut -d: -f1)

IO_RUNNING=$(echo "$REPL_STATE" | cut -d: -f2)

if [ "$SQL_RUNNING" = "True" ] && [ "$IO_RUNNING" = "True" ]; then

CLEARED=true

echo "Replication recovered after ${i}s (SQL=True, IO=True)"

break

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/functional/test-problem-detection.sh` around lines 100 - 105, The current check marks CLEARED when only SQL_RUNNING is "True", which can pass while IO_RUNNING is "False"; update the conditional that sets CLEARED (the block using SQL_RUNNING, IO_RUNNING, CLEARED and REPL_STATE) to require both SQL_RUNNING = "True" and IO_RUNNING = "True" before setting CLEARED and breaking the loop, and adjust the echo message to reflect both thread states (e.g., "SQL=True, IO=True") so the test only passes when both threads are healthy.

…e-up

gemini-code-assist bot reviewed Apr 9, 2026

View reviewed changes

renecannao force-pushed the issue80-81-mysql-matrix-channels branch 2 times, most recently from e81056f to fa42679 Compare April 9, 2026 07:33

renecannao added 2 commits April 9, 2026 09:42

Clean up extra_db on mysql2 in named channels test

cbda41b

The test created extra_db on mysql2 but only cleaned up the channel on mysql3. Add DROP DATABASE to prevent residual state.

renecannao force-pushed the issue88-90-behavioral-tests branch from 34f3c7c to cbda41b Compare April 9, 2026 09:44

renecannao changed the base branch from issue80-81-mysql-matrix-channels to master April 9, 2026 10:01

renecannao added 4 commits April 9, 2026 10:12

Fix mysql_source_host: use tabular SHOW STATUS instead of \G

a8afb3d

The \G (vertical format) flag is unreliable when passed through docker exec pipes — it returns empty output on MySQL 8.4 and 9.x. Use tabular SHOW REPLICA STATUS instead; column 2 is always the source host.

Fix mysql_source_host: use tab-delimited awk field split

62b814e

awk splits on whitespace by default, picking up "for" from "Waiting for source..." (column 1 = Replica_IO_State). Use -F'\t' to split on tabs so column 2 = Source_Host.

Fix named channels test: verify setup data, retry data replication

13ecfe5

- Verify data exists on mysql2 before setting up the extra channel - Replace single sleep+check with a 15s retry loop for data replication - Add better error messages for debugging if setup or replication fails

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

renecannao added 2 commits April 9, 2026 12:10

Fix unbound variable: extract IO_RUNNING from REPL_STATE

47988d9

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

Fix topology operations: wait for replication lag to clear before mov…

cd02379

…e-up

renecannao self-assigned this Apr 9, 2026

renecannao merged commit c3f54cc into master Apr 9, 2026
14 checks passed

coderabbitai bot mentioned this pull request Apr 16, 2026

ci: fix lint on push/schedule and bake orchestrator test runtime image #96

Merged

3 tasks

		ORC_MASTER_HOST=$(echo "$MASTER_INFO" \| python3 -c "import json,sys; print(json.load(sys.stdin)['Key']['Hostname'])" 2>/dev/null \|\| echo "")
		ORC_MASTER_RO=$(echo "$MASTER_INFO" \| python3 -c "import json,sys; print(json.load(sys.stdin)['ReadOnly'])" 2>/dev/null \|\| echo "")

	# Verify LastChecked is recent (within last 60 seconds)
	# Verify LastChecked is recent (within last 120 seconds)

		# Clean up test database on mysql2
		$COMPOSE exec -T mysql2 mysql -uroot -ptestpass -e "DROP DATABASE IF EXISTS extra_db;" 2>/dev/null

	discover_topology "mysql1"
	discover_topology "mysql1" \|\| { echo "FATAL: initial topology discovery failed"; exit 1; }

Conversation

renecannao commented Apr 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New test scripts

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

renecannao commented Apr 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 9, 2026 •

edited

Loading