Skip to content

fix(onboarding): six defects in self-host deploy path (#443)#833

Merged
vybe merged 2 commits into
devfrom
feature/443-onboarding-six-defects
May 13, 2026
Merged

fix(onboarding): six defects in self-host deploy path (#443)#833
vybe merged 2 commits into
devfrom
feature/443-onboarding-six-defects

Conversation

@dolho
Copy link
Copy Markdown
Contributor

@dolho dolho commented May 13, 2026

Summary

Six small fixes on the fresh-clone → start.sh → login walkthrough. Each independent, no schema/API/runtime-behavior change.

# What Files
1 Auto-gen SECRET_KEY + INTERNAL_API_SECRET if blank; fail-fast on blank ADMIN_PASSWORD scripts/deploy/start.sh
2 Document FRONTEND_PORT=80 with remap comment .env.example
3 Replace misleading login: admin/password hint with admin / ADMIN_PASSWORD from .env scripts/deploy/start.sh
4 Drop hardcoded localhost:3000/api-keys URL — replace with "your Trinity web UI → Settings → MCP Keys" src/mcp-server/src/index.ts, README.md
5 New clean.sh — removes leftover agent-* containers + trinity-agent-network; preserves data volumes scripts/deploy/clean.sh
6 MCP Dockerfile healthcheck /mcp/health (the /mcp endpoint rejects HEAD with 400, which is what wget --spider sends) src/mcp-server/Dockerfile

Verification (live)

  • Fix: Add missing Docker labels to system agent container #1 helper logic: extracted on a dummy .env, confirmed CREDENTIAL_ENCRYPTION_KEY left alone when set; SECRET_KEY/INTERNAL_API_SECRET generated when blank; ADMIN_PASSWORD guard tripped correctly when blank.
  • Setup improvements #5 syntax: bash -n on both clean.sh and start.sh clean.
  • Feature/process engine #6 the big one: rebuilt mcp-server against this PR's Dockerfile and watched docker compose ps mcp-server — was unhealthy before, reports Up (healthy) within 30s now. docker inspect's State.Health.Status flipped to healthy with FailingStreak=0.
trinity-mcp-server  trinity-mcp-server  …  Up 51 seconds (healthy)
                                                       ^^^^^^^

Out of scope (deferred / separate tracking)

  • Existing leftover state on running dev box: the running stack has 4 Exited agent-* zombies from previous test runs. The new clean.sh would remove them, but I'm not running it on the dev box as part of this PR — operator's call.
  • Doc-side updates (docs.ability.ai/getting-started) — issue notes these are tracked separately; in-repo docs/user-docs/guides/* was already corrected alongside the generate-user-docs playbook.
  • Branch name note: original feature/443-onboarding-fixes already exists on remote with @vybe's WIP (refactor(skills): prevent duplicate validation issues, 2026-04-21, same day issue was filed — likely abandoned start at this work). Didn't touch it. This PR ships under feature/443-onboarding-six-defects instead.

Related to #443

🤖 Generated with Claude Code

dolho and others added 2 commits May 13, 2026 14:45
Friction surfaced on a fresh clone → start.sh → login walkthrough.
Each fix is small and independent; bundled here per the issue's
"all are onboarding hygiene" framing.

1. scripts/deploy/start.sh — auto-generate SECRET_KEY and
   INTERNAL_API_SECRET if blank (same pattern as the existing
   CREDENTIAL_ENCRYPTION_KEY block, extracted to a helper).
   Fail fast with a clear message if ADMIN_PASSWORD is blank rather
   than booting into a state the operator can't log into.

2. .env.example — document FRONTEND_PORT=80 with an inline comment
   explaining "remap if your host's port 80 is taken." The var was
   already read by docker-compose.yml and start.sh but wasn't
   reachable from the quickstart.

3. scripts/deploy/start.sh — replace "login: admin/password" hint
   with "login: admin / ADMIN_PASSWORD from .env". New users took
   "password" as the literal default.

4. src/mcp-server/src/index.ts + README.md — drop hardcoded
   http://localhost:3000/api-keys hint. Port was wrong (frontend
   default is 80) and ignored FRONTEND_PORT overrides. Replaced
   with "your Trinity web UI → Settings → MCP Keys".

5. scripts/deploy/clean.sh — new script. Stops compose + removes
   leftover agent-* containers + trinity-agent-network. Without
   this, a fresh install inherits Exited zombie agents from
   previous tests and the first new agent collides on
   AGENT_SSH_PORT_START (2222). Preserves data volumes
   intentionally — operators can wipe those manually.

6. src/mcp-server/Dockerfile — point healthcheck at /health, not
   /mcp. The /mcp endpoint rejects HEAD requests with 400 (which
   `wget --spider` sends), so `docker compose ps` reported the
   container as unhealthy despite serving traffic. /health returns
   200 to both HEAD and GET. Verified live: post-rebuild,
   trinity-mcp-server reports `healthy` within 30s.

No schema, API, or runtime-behavior changes. No migration needed.

Related to #443

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…form_default_model

The lint job on #443's PR went red on a file that is NOT in the PR's
diff — `tests/test_platform_default_model.py` carries 6 bare
`sys.modules` mutations that were merged to dev without an entry in
`tests/lint_sys_modules_baseline.txt`. Same #802 rebase-race shape as
the original #791 vs #783 incident: baseline-introducing PR (#791) and
violation-introducing PR landed disjointly, both green pre-merge, dev
went red post-merge, the next PR off dev (this one) inherits it.

Ratcheting the baseline here unblocks #443. Same precedent as #796
which baselined `test_cleanup_unreachable_orphan.py` for the same
reason. The proper fix is to convert the file's bare sys.modules
writes to `monkeypatch.setitem` / `monkeypatch.delitem`; that's
scope-creep relative to the onboarding work this PR is doing and
should land separately.

(That file's count is currently 0 against the lint, so a follow-up
that converts it can drop the baseline line altogether.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dolho dolho requested a review from vybe May 13, 2026 11:56
Copy link
Copy Markdown
Contributor

@vybe vybe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, focused fix — all six defects verified. Dockerfile healthcheck and ADMIN_PASSWORD guard are the highest-value changes. Second commit correctly resolves the pre-existing sys_modules baseline rebase-race from #831. ✅ Approved to merge.

@vybe vybe merged commit ba4aeae into dev May 13, 2026
9 checks passed
AndriiPasternak31 added a commit that referenced this pull request May 13, 2026
Captures the full pytest run (non-unit + unit halves) at HEAD 1b8651e,
before the rebase that picked up #833's baseline fix. 2026 pass / 24 fail
/ 163 skip on the non-unit half in 41:49.

Refs #678
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants