You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All .github/workflows/*.yml are currently workflow_dispatch only (set during issue #5). Running them on a GitHub-hosted runner today would mostly work — TL_PORT, TL_URL, and TL_DEV_KEY all have correct defaults in cicd/scripts/ and cicd/tests/src/ — except for the LLM judge:
LLM_JUDGE_URL defaults to http://localhost:11434, which has nothing listening on a clean runner. The judge blocks for its 5-minute timeout per test or falls back to "not available" depending on the path.
LLM_JUDGE_MODEL defaults to llama3:8b, which won't match whatever model we actually point at.
Other secrets (e.g., a rotated TL_DEV_KEY) are not currently plumbed.
Locally, these values live in cicd/tests/.env (gitignored). On a runner, there's no .env. The current shell scripts source .env if present and fall back to env-var defaults otherwise, so the plumbing is in place — what's missing is a decision on how the runner gets the values.
Decisions needed
Where does the LLM judge live? Candidates:
Self-hosted runner with network access to an Ollama instance
A hosted model endpoint (Anthropic/OpenAI) via API key
A cloud Ollama VM behind a secret-gated URL
Skip LLM judging in CI (--no-llm) and rely on the simple judge
How are secrets passed to the runner? Two viable shapes:
A. Materialize .env. Workflow step writes cicd/tests/.env from ${{ secrets.* }} / ${{ vars.* }} before run-tests.sh runs. Matches local dev exactly. Downside: secrets touch disk briefly.
B. Export via workflow env:. Workflow sets env vars directly; scripts pick them up since .env sourcing is optional and defaults exist. No file written. Recommended starting point — simpler, secrets stay in process env.
Trigger scope. Stay workflow_dispatch only, or add push/PR triggers to the main branch? Automated triggers make cost/runtime a live concern if a self-hosted runner isn't available.
Rotating TL_DEV_KEY. The hardcoded seed (a1b2c3d4...) is fine for ephemeral CI containers; a rotated value via secret would prove the env-var path works end-to-end. Low value unless paired with a separate test that uses the key.
Update .github/workflows/test-pipeline.yml and test-suite.yml to set the required env vars (LLM_JUDGE_URL, LLM_JUDGE_MODEL, and TL_DEV_KEY if rotated) via the chosen transport.
Confirm the full suite runs green end-to-end on a runner — or that --no-llm is the deliberate default and the simple judge carries CI.
Document how a developer adds or rotates a secret — single paragraph in CLAUDE.md or a new cicd/CI_SETUP.md.
Nice to have
Optional re-enablement of automatic triggers (push-to-main or PR-to-main), gated on runner cost/availability.
A smoke-only CI variant that skips the build+plan+execution suites (cheaper, runs on every PR) while the full suite stays manual.
Out of scope
Replacing the LLM judge. Separate FR if the decision is to move to a hosted API.
Refactoring the env-var plumbing inside the test framework — it already handles external env vars cleanly. The only gap is the runner-side wiring.
Context
All
.github/workflows/*.ymlare currentlyworkflow_dispatchonly (set during issue #5). Running them on a GitHub-hosted runner today would mostly work —TL_PORT,TL_URL, andTL_DEV_KEYall have correct defaults incicd/scripts/andcicd/tests/src/— except for the LLM judge:LLM_JUDGE_URLdefaults tohttp://localhost:11434, which has nothing listening on a clean runner. The judge blocks for its 5-minute timeout per test or falls back to "not available" depending on the path.LLM_JUDGE_MODELdefaults tollama3:8b, which won't match whatever model we actually point at.TL_DEV_KEY) are not currently plumbed.Locally, these values live in
cicd/tests/.env(gitignored). On a runner, there's no.env. The current shell scripts source.envif present and fall back to env-var defaults otherwise, so the plumbing is in place — what's missing is a decision on how the runner gets the values.Decisions needed
--no-llm) and rely on the simple judge.env. Workflow step writescicd/tests/.envfrom${{ secrets.* }}/${{ vars.* }}beforerun-tests.shruns. Matches local dev exactly. Downside: secrets touch disk briefly.env:. Workflow sets env vars directly; scripts pick them up since.envsourcing is optional and defaults exist. No file written. Recommended starting point — simpler, secrets stay in process env.workflow_dispatchonly, or add push/PR triggers to the main branch? Automated triggers make cost/runtime a live concern if a self-hosted runner isn't available.TL_DEV_KEY. The hardcoded seed (a1b2c3d4...) is fine for ephemeral CI containers; a rotated value via secret would prove the env-var path works end-to-end. Low value unless paired with a separate test that uses the key.Scope
Must
CLAUDE.mdor a new doc so the runner setup is reproducible..github/workflows/test-pipeline.ymlandtest-suite.ymlto set the required env vars (LLM_JUDGE_URL,LLM_JUDGE_MODEL, andTL_DEV_KEYif rotated) via the chosen transport.--no-llmis the deliberate default and the simple judge carries CI.cicd/CI_SETUP.md.Nice to have
Out of scope
Acceptance criteria
test-pipelineworkflow on GitHub Actions completes end-to-end (simple judge at minimum; LLM judge if decision [feat] Add deleteTestCase and deleteTestSuite to XML-RPC API #1 provides a reachable endpoint).cicd/tests/.envremains the local-dev source of truth and is not referenced by workflows directly.Spec
FR doc to follow alongside this issue.