[refactor] CI test suite: nested lifecycle, dynamic ID capture, log-collector fix

## Context

An audit of the current CI test framework (`cicd/tests/`) identified structural issues that make the suite fragile and inconsistent:

- **Setup/teardown bolted into test steps.** `TC-INTEGRATION-001` starts the CI stack as its first step; `TC-E2E-001`'s last step tears it down. If any step before the teardown fails, containers and volumes are left behind. Running only the `build` or `integration` suite produces inconsistent lifecycle behavior.
- **Hardcoded entity IDs.** `TC-E2E-001` assumes `testprojectid=1` and `testsuiteid=2`. Tests break on reorder and cannot run twice in a row.
- **Static entity names.** Created entities use fixed strings like `"CI Test Project"` with prefix `CIT`, preventing idempotent reruns.
- **Log collector broken.** `cicd/tests/src/cli.ts:48` sets `dockerDir = projectRoot/docker/`, but that directory contains no compose file. The log collector fails to start silently (executor.ts:336 catches the error), so per-test log extraction has never worked. Even pointing it at `cicd/` would not work — `docker compose logs` defaults to `docker-compose.yml`/`compose.yml`, not `docker-compose.ci.yml`; an explicit `-f` argument is needed.
- **Port 8090 collision.** Both `docker-compose.yml` (dev) and `cicd/docker-compose.ci.yml` (CI) publish host port 8090, so they cannot run concurrently.

## Proper design

Documented in `cicd/TESTING_GUIDELINES.md` (added alongside this issue). Summary:

- **Four nested lifecycle scopes** — session / suite / test / step — each with setup and guaranteed teardown (`trap EXIT` / `finally`).
- **Dynamic ID capture** — no numeric ID literal in any step except in response-shape assertions. IDs flow from creation responses via `capture:` into later steps.
- **Test flow layered** — smoke → auth → crud → workflow → negative → regression, with short-circuit skip on lower-layer failure.
- **Unique entity names** — every created entity includes a run ID or timestamp; never static strings.
- **Per-test ownership of data** — each test creates what it needs and deletes it in reverse order in its own teardown.

## Scope

### Must
- [ ] Land `cicd/TESTING_GUIDELINES.md` as the canonical design reference
- [ ] Fix log collector: replace `dockerComposePath` (directory) in `RunConfig` with an explicit `composeFile` (absolute path); invoke `docker compose -f <composeFile> logs --follow --timestamps` with cwd set to the project root
- [ ] Move session setup/teardown out of test YAMLs into a wrapper script with `trap EXIT` for guaranteed teardown
- [ ] Restructure `cicd/tests/testcases/` to match the flow layers (`smoke/`, `auth/`, `crud/`, `workflow/`, `negative/`, `regression/`)
- [ ] Replace hardcoded IDs in all tests with `capture:` from creation responses
- [ ] Use unique entity names with run-ID / timestamp suffix everywhere
- [ ] Each test owns its data — create + delete in reverse order, teardown runs even on step failure
- [ ] Update `.github/workflows/test-pipeline.yml` / `test-suite.yml` to call the new wrapper
- [ ] Make CI host port configurable via env var (`TL_PORT`/`TL_URL` in `cicd/tests/.env`) so dev (8090) and CI (8091) stacks coexist on one machine
- [ ] LLM judge: switch from Ollama `/api/generate` to `/api/chat`, make `LLM_JUDGE_URL`/`LLM_JUDGE_MODEL` env-driven via `cicd/tests/.env`
- [ ] LLM judge prompt: redesign with role/task/behavior/output structure (drop hardcoded heuristics like "exit code 0 = pass"); add YAML fields `objective` and `judgeContext` so each test author owns the situational framing the judge sees; backfill all existing testcases

### Nice to have
- [ ] Add a dedicated `negative` suite (bad auth, missing fields, permission denied)
- [ ] Run the test runner itself inside a container so host-side Node/tsx/Docker setup isn't required (mount `/var/run/docker.sock` for build/exec steps)

### Out of scope (deferred)
- Replacement of the YAML runner with Vitest or another standard framework

## Acceptance criteria

- A fresh checkout can run the full suite and leaves no residual containers/volumes afterward
- Any single suite can run in isolation with its own setup and teardown
- Running the same suite twice in a row against a fresh env produces identical results
- Per-test logs in `cicd/results/<run>/<testId>.log` are non-empty for integration/e2e tests
- No numeric ID literal appears in any test step except in response-shape assertions
- Every new or rewritten test satisfies the checklist in `cicd/TESTING_GUIDELINES.md` §10


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[refactor] CI test suite: nested lifecycle, dynamic ID capture, log-collector fix #5

Context

Proper design

Scope

Must

Nice to have

Out of scope (deferred)

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[refactor] CI test suite: nested lifecycle, dynamic ID capture, log-collector fix #5

Description

Context

Proper design

Scope

Must

Nice to have

Out of scope (deferred)

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions