Skip to content

[refactor] CI test suite: nested lifecycle, dynamic ID capture, log-collector fix #5

@dogkeeper886

Description

@dogkeeper886

Context

An audit of the current CI test framework (cicd/tests/) identified structural issues that make the suite fragile and inconsistent:

  • Setup/teardown bolted into test steps. TC-INTEGRATION-001 starts the CI stack as its first step; TC-E2E-001's last step tears it down. If any step before the teardown fails, containers and volumes are left behind. Running only the build or integration suite produces inconsistent lifecycle behavior.
  • Hardcoded entity IDs. TC-E2E-001 assumes testprojectid=1 and testsuiteid=2. Tests break on reorder and cannot run twice in a row.
  • Static entity names. Created entities use fixed strings like "CI Test Project" with prefix CIT, preventing idempotent reruns.
  • Log collector broken. cicd/tests/src/cli.ts:48 sets dockerDir = projectRoot/docker/, but that directory contains no compose file. The log collector fails to start silently (executor.ts:336 catches the error), so per-test log extraction has never worked. Even pointing it at cicd/ would not work — docker compose logs defaults to docker-compose.yml/compose.yml, not docker-compose.ci.yml; an explicit -f argument is needed.
  • Port 8090 collision. Both docker-compose.yml (dev) and cicd/docker-compose.ci.yml (CI) publish host port 8090, so they cannot run concurrently.

Proper design

Documented in cicd/TESTING_GUIDELINES.md (added alongside this issue). Summary:

  • Four nested lifecycle scopes — session / suite / test / step — each with setup and guaranteed teardown (trap EXIT / finally).
  • Dynamic ID capture — no numeric ID literal in any step except in response-shape assertions. IDs flow from creation responses via capture: into later steps.
  • Test flow layered — smoke → auth → crud → workflow → negative → regression, with short-circuit skip on lower-layer failure.
  • Unique entity names — every created entity includes a run ID or timestamp; never static strings.
  • Per-test ownership of data — each test creates what it needs and deletes it in reverse order in its own teardown.

Scope

Must

  • Land cicd/TESTING_GUIDELINES.md as the canonical design reference
  • Fix log collector: replace dockerComposePath (directory) in RunConfig with an explicit composeFile (absolute path); invoke docker compose -f <composeFile> logs --follow --timestamps with cwd set to the project root
  • Move session setup/teardown out of test YAMLs into a wrapper script with trap EXIT for guaranteed teardown
  • Restructure cicd/tests/testcases/ to match the flow layers (smoke/, auth/, crud/, workflow/, negative/, regression/)
  • Replace hardcoded IDs in all tests with capture: from creation responses
  • Use unique entity names with run-ID / timestamp suffix everywhere
  • Each test owns its data — create + delete in reverse order, teardown runs even on step failure
  • Update .github/workflows/test-pipeline.yml / test-suite.yml to call the new wrapper
  • Make CI host port configurable via env var (TL_PORT/TL_URL in cicd/tests/.env) so dev (8090) and CI (8091) stacks coexist on one machine
  • LLM judge: switch from Ollama /api/generate to /api/chat, make LLM_JUDGE_URL/LLM_JUDGE_MODEL env-driven via cicd/tests/.env
  • LLM judge prompt: redesign with role/task/behavior/output structure (drop hardcoded heuristics like "exit code 0 = pass"); add YAML fields objective and judgeContext so each test author owns the situational framing the judge sees; backfill all existing testcases

Nice to have

  • Add a dedicated negative suite (bad auth, missing fields, permission denied)
  • Run the test runner itself inside a container so host-side Node/tsx/Docker setup isn't required (mount /var/run/docker.sock for build/exec steps)

Out of scope (deferred)

  • Replacement of the YAML runner with Vitest or another standard framework

Acceptance criteria

  • A fresh checkout can run the full suite and leaves no residual containers/volumes afterward
  • Any single suite can run in isolation with its own setup and teardown
  • Running the same suite twice in a row against a fresh env produces identical results
  • Per-test logs in cicd/results/<run>/<testId>.log are non-empty for integration/e2e tests
  • No numeric ID literal appears in any test step except in response-shape assertions
  • Every new or rewritten test satisfies the checklist in cicd/TESTING_GUIDELINES.md §10

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions