Issue #5: CI test redesign — nested lifecycle, dynamic IDs, log collector, LLM judge#6
Merged
dogkeeper886 merged 13 commits intotestlink_1_9_20_fixedfrom Apr 19, 2026
Merged
Conversation
…e separation) Canonical design reference for the CI test suite: four nested lifecycle scopes (session/suite/test/step) with guaranteed teardown, dynamic ID capture rules, layered test flow, unique-names convention, and the rationale for keeping docker-compose.ci.yml separate from the dev docker-compose.yml. Refs #5 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The log collector was pointed at `<projectRoot>/docker/`, a directory that contains no compose file. `docker compose logs` in that cwd failed to start and the collector silently disabled itself, so per-test log extraction never produced output. Replace `RunConfig.dockerComposePath` (directory) with `composeFile` (absolute path to the compose file). Invoke `docker compose -f <composeFile> logs --follow --timestamps` with cwd set to the project root. Default target is the CI compose at `cicd/docker-compose.ci.yml`. Refs #5 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Session setup (ci-up.sh) and teardown (ci-down.sh) used to live inside test-case steps — TC-INTEGRATION-001 started the stack, TC-E2E-001's last step tore it down. That made teardown conditional on the e2e test reaching its final step: any earlier failure left containers and volumes behind. Running only one suite was inconsistent too (build skipped lifecycle entirely, integration started but never stopped). Introduce cicd/scripts/run-tests.sh as the single entry point. It: - Runs ci-up.sh before tests (except for --suite build, which only exercises the image artifact). - Traps EXIT/INT/TERM to invoke ci-down.sh regardless of outcome — pass, fail, Ctrl-C, or crash all produce a clean teardown. - Passes remaining args through to the tsx CLI. Call sites updated: - TC-INTEGRATION-001 and TC-E2E-001 no longer invoke ci-up/ci-down. - TC-INTEGRATION-001's spurious dependency on TC-BUILD-002 removed; the compose image is built by ci-up.sh, and TC-BUILD-002 builds a separate testlink-ci-test tag only used by TC-BUILD-001. - package.json npm scripts route through the wrapper. - .github/workflows/test-suite.yml calls the wrapper from repo root. Refs #5 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three pieces of framework scaffolding for the test restructure:
1. Executor auto-populates `{{runId}}` (Date.now()) and `{{testId}}`
(test case ID) as captured variables at the start of each test.
Tests can use these to generate unique entity names without
bespoke bash.
2. Add cicd/scripts/xmlrpc-capture.sh — reads an XML-RPC methodCall
document from stdin, POSTs to the TestLink API, mirrors the
response to stderr for expectPatterns, and emits structured JSON
on stdout for the framework's capture: mechanism. Replaces ~6
lines of inline bash extraction in every CRUD step.
3. SUITES list updated to match the guidelines:
build, smoke, auth, crud, workflow, negative, regression.
Refs #5
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reorganize cicd/tests/testcases/ per the guidelines: drop the
integration/e2e bucket names, organize by the test-flow pyramid
(smoke → auth → crud → workflow). Rewrite each test to follow the
three core rules:
- Unique names — every created entity embeds {{testId}} and {{runId}}
(auto-populated by the executor) so rerunning against the same DB
never collides with residue.
- Dynamic ID capture — the xmlrpc-capture.sh helper extracts the
created entity's id from the XML response and emits JSON; the
framework's capture: mechanism pipes it into {{projectId}},
{{suiteId}}, {{caseId}} for subsequent steps.
- Per-test data ownership — every test creates its parent entities
in setup steps and deletes them in reverse order as teardown steps.
No test leaks data that another test depends on.
Layout:
smoke/ TC-SMOKE-001 login page responds
TC-SMOKE-002 XML-RPC tl.ping
auth/ TC-AUTH-001 valid API key accepted
TC-AUTH-002 invalid API key rejected
crud/ TC-CRUD-001 project CRUD
TC-CRUD-002 test suite CRUD
TC-CRUD-003 test case CRUD
workflow/ TC-WORKFLOW-001 full entity-graph round-trip
The old e2e/TC-E2E-001 is superseded by TC-WORKFLOW-001, which
exercises the same graph but with captured IDs (the original
hardcoded testprojectid=1, testsuiteid=2, external-id CIT-1).
test-pipeline.yml updated to run the new suites in dependency order:
build → smoke → auth → crud → workflow. Lower-layer failure
short-circuits the higher layers.
Refs #5
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs surfaced when running the CRUD and workflow suites against a
live TestLink:
1. xmlrpc-capture.sh pulled the "first <int> in the response" as the
created id. But TestLink's XML-RPC emits the `id` field as
<string>N</string> for several entity types (createTestProject
among them), so the capture emitted `{"ok": true}` with no id and
every downstream step referenced an empty {{projectId}}. Match the
`id` member explicitly and accept both <int> and <string> wrappers;
do the same for faultCode.
2. The executor substituted {{runId}} / {{testId}} / captured vars into
step.command but not into step.expectPatterns. "Read back" steps
used expectPatterns like "case-{{testId}}-{{runId}}" to verify the
round-trip, which never matched because the regex was the literal
template string. Run the same substitution over expect/reject
patterns before checking.
Test expect-patterns for creation steps updated from "<int>" (never
going to match <string>) to "<name>id</name>", which is stable across
both id representations.
Verified against a fresh CI stack:
- smoke: 2/2 pass
- auth: 2/2 pass
- crud: 3/3 pass
- workflow: 1/1 pass
- workflow re-run in same session: 1/1 pass (idempotent)
Refs #5
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CI compose file used to publish the app on host port 8090, the
same port as the dev compose, so the two stacks could not coexist.
Move the port behind ${TL_PORT:-8091} and have all CI scripts and
test helpers reference $TL_URL. run-tests.sh sources cicd/tests/.env
(gitignored) so a single override changes every consumer.
Why: a developer who left the dev stack running would see CI fail to
bind on every test invocation. Splitting the port lets both run side
by side.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ning
Three changes to the judge:
1. Switch from Ollama /api/generate to /api/chat with a system + user
message split. Make LLM_JUDGE_URL and LLM_JUDGE_MODEL env-driven via
cicd/tests/.env so the host running tests can target any Ollama
instance and model without touching code.
2. Redesign the prompt along role/task/behavior/output lines. Drop the
hardcoded heuristics ("exit code 0 = pass", etc.) that were fighting
negative tests. Add two optional YAML fields the test author owns:
- objective — what the test proves and why it matters
- judgeContext — what evidence each step produces and what silent
failures look like in this domain
The judge reads OBJECTIVE → CONTEXT → CRITERIA → OBSERVATIONS in that
order, so the per-test situational framing reaches the model before
the raw evidence.
3. Tune Ollama options for small (~4B) models:
- num_ctx 8192 — prompts can hit ~5-7k chars; 4096 default was
silently truncating OBSERVATIONS for
multi-step tests, producing empty/garbage JSON
- num_predict 512 — enough headroom for FAIL evidence quotes
while still catching runaway generation
Why: gemma3:4b was returning empty {} or off-schema JSON on roughly
20-30% of tests with the old prompt and default options. The combo of
per-test framing and proper context size brings 11/11 testcases through
the judge cleanly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-BUILD-003
Each YAML now declares:
- objective — the plain-English purpose of the test
- judgeContext — what each step does, what evidence it emits, and what
silent failures look like in that test's domain
This is the per-test framing the redesigned LLM judge consumes. Negative
tests (TC-AUTH-002) explicitly call out that they EXPECT a fault
response, which fixes the prior misjudgment where the judge labeled the
expected fault as a failure.
TC-BUILD-003 is also rewritten to validate composer.json from inside
the testlink-ci-test image (depending on TC-BUILD-002 to build it)
instead of shelling out to a host python3 that may not exist.
TC-SMOKE-001 picks up the $TL_URL change from the prior port-config
commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…enerate - run-tests.sh header still listed --suite integration and TC-E2E-001 in the usage examples, both deleted in the suite restructure. Update to --suite crud and --id TC-WORKFLOW-001. - LLMJudge.unloadModel was POSTing to /api/chat with messages: [], which is awkward — the unload doesn't need conversational semantics, and an empty messages array isn't well-defined for the chat endpoint. Use /api/generate with keep_alive: 0 (Ollama's documented unload pattern). /api/chat stays in place for the actual judging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace 24 hardcoded literals of the admin API key in YAML test steps
with the framework's {{devKey}} substitution. The key now flows from a
single source:
cicd/tests/.env (TL_DEV_KEY)
-> run-tests.sh / ci-up.sh source it
-> executor.ts injects it into TestExecutor.variables as {{devKey}}
-> ci-up.sh forwards it via `docker compose exec -e TL_DEV_KEY`
into init-db.sh, which seeds the matching value into the users
table
Each consumer falls back to the previous hardcoded default
(a1b2c3d4...) when TL_DEV_KEY is unset, so existing setups keep working
without an .env file.
Why: rotating the seeded admin key used to require editing init-db.sh,
ci-up.sh, and 5 YAML files. Now it's one .env line.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- docker-publish.yml: switch to workflow_dispatch only with a job-level guard restricting it to refs/heads/testlink_1_9_20_fixed, so accidental manual dispatches from a feature branch can't publish. Forks without GHCR write auth no longer fail on every branch push. - test-pipeline.yml: drop the pull_request trigger; PRs no longer fire CI automatically. Use workflow_dispatch from the Actions tab to run against a feature branch when needed. Push to main still runs it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the push-to-main trigger. Both workflows (CI Pipeline and Docker) now only fire via workflow_dispatch. Run from the Actions tab against whichever ref you want to validate or release. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #5.
Summary
End-to-end rework of the CI test framework along the lines of the issue and
cicd/TESTING_GUIDELINES.md:cicd/scripts/run-tests.shwithtrap EXIT, so containers/volumes are torn down on any exit path (pass, fail, Ctrl-C, crash). Per-test/step lifecycle is enforced in YAML.capture:from XML-RPC responses. No numeric ID literal appears in test params (only in response-shape assertions).{{runId}}/{{testId}}suffix, so the suite is idempotent across re-runs.cicd/docker-compose.ci.ymlexplicitly withcwd=projectRoot; per-test logs land incicd/results/<run>/<testId>.log.testcases/is reorganized intosmoke/ → auth/ → crud/ → workflow/. Build suite stays separate (no stack lifecycle).${TL_PORT:-8091}:80(dev still uses 8090). All scripts and YAML steps reference$TL_URL. Single env override changes every consumer./api/generate→/api/chat. MadeLLM_JUDGE_URL/LLM_JUDGE_MODELenv-driven viacicd/tests/.env. Replaced rigid heuristics with role/task/behavior/output prompt structure. Added two YAML fields the test author owns (objective,judgeContext) so per-test situational framing reaches the model. Tuned Ollama options (num_ctx 8192,num_predict 512) so small models (gemma3:4b) don't silently truncate prompts or runaway-generate.Issue #5 was updated (with consent) to bring LLM-judge work in scope and to mark
TL_PORTas a Must.Acceptance criteria check
--suite buildskips ci-up; others use the wrapper).cicd/results/<run>/.<int>1</int>step-number assertions in createTestCase payloads).TESTING_GUIDELINES.md.Test plan
bash cicd/scripts/run-tests.shagainst a fresh checkout — should land 11/11 in ~50s withexit 0.bash cicd/scripts/run-tests.sh --suite build— should run 3/3 without bringing up the compose stack.cat cicd/results/<run>/TC-CRUD-001.log— should contain docker compose log lines for that test window.LLM_JUDGE_MODELincicd/tests/.env.Out of scope (followups)
negativesuite (TC-AUTH-002 is the one negative test today; lives inauth/)./var/run/docker.sock).🤖 Generated with Claude Code