Context
An audit of the current CI test framework (cicd/tests/) identified structural issues that make the suite fragile and inconsistent:
- Setup/teardown bolted into test steps.
TC-INTEGRATION-001 starts the CI stack as its first step; TC-E2E-001's last step tears it down. If any step before the teardown fails, containers and volumes are left behind. Running only the build or integration suite produces inconsistent lifecycle behavior.
- Hardcoded entity IDs.
TC-E2E-001 assumes testprojectid=1 and testsuiteid=2. Tests break on reorder and cannot run twice in a row.
- Static entity names. Created entities use fixed strings like
"CI Test Project" with prefix CIT, preventing idempotent reruns.
- Log collector broken.
cicd/tests/src/cli.ts:48 sets dockerDir = projectRoot/docker/, but that directory contains no compose file. The log collector fails to start silently (executor.ts:336 catches the error), so per-test log extraction has never worked. Even pointing it at cicd/ would not work — docker compose logs defaults to docker-compose.yml/compose.yml, not docker-compose.ci.yml; an explicit -f argument is needed.
- Port 8090 collision. Both
docker-compose.yml (dev) and cicd/docker-compose.ci.yml (CI) publish host port 8090, so they cannot run concurrently.
Proper design
Documented in cicd/TESTING_GUIDELINES.md (added alongside this issue). Summary:
- Four nested lifecycle scopes — session / suite / test / step — each with setup and guaranteed teardown (
trap EXIT / finally).
- Dynamic ID capture — no numeric ID literal in any step except in response-shape assertions. IDs flow from creation responses via
capture: into later steps.
- Test flow layered — smoke → auth → crud → workflow → negative → regression, with short-circuit skip on lower-layer failure.
- Unique entity names — every created entity includes a run ID or timestamp; never static strings.
- Per-test ownership of data — each test creates what it needs and deletes it in reverse order in its own teardown.
Scope
Must
Nice to have
Out of scope (deferred)
- Replacement of the YAML runner with Vitest or another standard framework
Acceptance criteria
- A fresh checkout can run the full suite and leaves no residual containers/volumes afterward
- Any single suite can run in isolation with its own setup and teardown
- Running the same suite twice in a row against a fresh env produces identical results
- Per-test logs in
cicd/results/<run>/<testId>.log are non-empty for integration/e2e tests
- No numeric ID literal appears in any test step except in response-shape assertions
- Every new or rewritten test satisfies the checklist in
cicd/TESTING_GUIDELINES.md §10
Context
An audit of the current CI test framework (
cicd/tests/) identified structural issues that make the suite fragile and inconsistent:TC-INTEGRATION-001starts the CI stack as its first step;TC-E2E-001's last step tears it down. If any step before the teardown fails, containers and volumes are left behind. Running only thebuildorintegrationsuite produces inconsistent lifecycle behavior.TC-E2E-001assumestestprojectid=1andtestsuiteid=2. Tests break on reorder and cannot run twice in a row."CI Test Project"with prefixCIT, preventing idempotent reruns.cicd/tests/src/cli.ts:48setsdockerDir = projectRoot/docker/, but that directory contains no compose file. The log collector fails to start silently (executor.ts:336 catches the error), so per-test log extraction has never worked. Even pointing it atcicd/would not work —docker compose logsdefaults todocker-compose.yml/compose.yml, notdocker-compose.ci.yml; an explicit-fargument is needed.docker-compose.yml(dev) andcicd/docker-compose.ci.yml(CI) publish host port 8090, so they cannot run concurrently.Proper design
Documented in
cicd/TESTING_GUIDELINES.md(added alongside this issue). Summary:trap EXIT/finally).capture:into later steps.Scope
Must
cicd/TESTING_GUIDELINES.mdas the canonical design referencedockerComposePath(directory) inRunConfigwith an explicitcomposeFile(absolute path); invokedocker compose -f <composeFile> logs --follow --timestampswith cwd set to the project roottrap EXITfor guaranteed teardowncicd/tests/testcases/to match the flow layers (smoke/,auth/,crud/,workflow/,negative/,regression/)capture:from creation responses.github/workflows/test-pipeline.yml/test-suite.ymlto call the new wrapperTL_PORT/TL_URLincicd/tests/.env) so dev (8090) and CI (8091) stacks coexist on one machine/api/generateto/api/chat, makeLLM_JUDGE_URL/LLM_JUDGE_MODELenv-driven viacicd/tests/.envobjectiveandjudgeContextso each test author owns the situational framing the judge sees; backfill all existing testcasesNice to have
negativesuite (bad auth, missing fields, permission denied)/var/run/docker.sockfor build/exec steps)Out of scope (deferred)
Acceptance criteria
cicd/results/<run>/<testId>.logare non-empty for integration/e2e testscicd/TESTING_GUIDELINES.md§10