Skip to content

ops: harden deploy automation contracts#196

Merged
liujuanjuan1984 merged 2 commits intomainfrom
ops/issue-148-deploy-contracts
Mar 18, 2026
Merged

ops: harden deploy automation contracts#196
liujuanjuan1984 merged 2 commits intomainfrom
ops/issue-148-deploy-contracts

Conversation

@liujuanjuan1984
Copy link
Collaborator

@liujuanjuan1984 liujuanjuan1984 commented Mar 18, 2026

变更概览

本 PR 聚焦补齐 deploy.sh / deploy_release.sh 主路径的自动化契约,让部署脚本在 agent/非交互执行场景下更可编排、失败更可判定。

相关提交:

  • e900c22 ops: harden deploy automation contracts #148
  • da468c8 fix: preserve provider secret bootstrap flow #148

模块一:deploy 主入口与 sudo 预检

  • scripts/shell_helpers.sh 增加统一 ensure_sudo_ready()
  • scripts/deploy.sh 主路径前置执行 sudo 预检
  • 非交互场景要求 sudo -n 可用,避免部署执行到一半才因为权限交互失败
  • 新增 deploy_healthcheck_timeout_seconds / deploy_healthcheck_interval_seconds 输入,供部署验收阶段使用

模块二:systemd 启动后的 readiness / timeout / 状态契约

  • scripts/deploy/enable_instance.sh 中补齐 /health 轮询
  • 启动后不仅检查 systemctl enable --now,还要求 GET /health 返回 {"status":"ok"}
  • 新增机器可读 JSON 状态行与分类 exit code,区分:
    • systemd_reload_failed
    • systemd_start_failed
    • systemd_not_active
    • readiness_timeout
    • missing_dependency
    • invalid_argument

模块三:provider/model/secret 组合校验

  • scripts/deploy/setup_instance.sh 增加 OPENCODE_PROVIDER_ID / OPENCODE_MODEL_ID 成对校验
  • 对已知 provider(Google / OpenAI / Anthropic / Azure OpenAI / OpenRouter)前置检查对应 secret 是否已提供
  • 保留两步式 bootstrap 流程:先生成模板,再在正式校验阶段检查 provider secret,避免破坏首次安全部署体验
  • scripts/deploy/run_opencode.sh 保留 defense in depth,运行时再次校验已知 provider 的必需 secret
  • scripts/deploy/install_release_runtime.sh 同步安装 provider helper,保证 release runtime 路径行为一致

模块四:文档与测试

  • 更新 scripts/deploy_readme.md,明确新的 deploy 契约、输入项与 exit code
  • 更新 docs/agent_deploy_sop.md,明确非交互 sudo 预检与 JSON 状态输出
  • 扩展 tests/test_deploy_security_contract.py,覆盖新的部署契约文本基线

验证

已执行:

  • uv run pre-commit run --all-files
  • uv run pytest

关联

Closes #148
Relates to #145

@liujuanjuan1984
Copy link
Collaborator Author

补一条后续边界说明:

本 PR 在 deploy 层加入了“已知 LLM provider 的最小必要校验”,这是为了让 #148 的 deploy 自动化契约先闭环,避免出现“脚本成功但 OpenCode 实际不可用”的半成功状态。

但从长期架构边界看,LLM provider/model/secret 的权威校验不应长期停留在 opencode-a2a-server 外层,而应尽量下沉到 OpenCode 自身的 serve 启动侧或原生配置校验入口。

后续跟踪已单独记录在:

@liujuanjuan1984 liujuanjuan1984 marked this pull request as ready for review March 18, 2026 05:51
@liujuanjuan1984 liujuanjuan1984 merged commit 10dc8e7 into main Mar 18, 2026
3 checks passed
@liujuanjuan1984 liujuanjuan1984 deleted the ops/issue-148-deploy-contracts branch March 18, 2026 05:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Priority: Med] [Ops] 补齐 deploy 自动化契约:非交互预检、健康验收与参数校验

1 participant