feat(anythingllm-docker): INT-P01 DOKS→Docker migration plan + ADR-005#36
Merged
Merged
Conversation
…migration plan Generates anythingllm-docker/sites/ai.weown.agency/ from the existing copier template (project_name=int-p01, WeOwnLLM hardened image) and adds the tooling to migrate INT-P01 off DOKS via a parallel-build + DNS-cutover pattern: - scripts/migrate-from-doks.sh - one-shot bridge that kubectl-execs into the live DOKS pod, streams /app/server/storage out as a tarball, and wraps it in the same skinny-backup layout the template's restore.sh already understands. Optional --upload-to-spaces stages the artifact at s3://weown-backups/int-p01/ for redundancy. - MIGRATION_RUNBOOK.md - phased runbook: inventory/freeze, staging droplet provision (temporary hostname), DOKS extraction, restore, Jason/Yonks staging validation, production cutover, 7-day soak, rollback path. - anythingllm-docker/sites/README.md - directory-level explainer matching the existing keycloak-docker/sites/ convention. - anythingllm-docker/sites/.gitignore - blocks terraform state, real tfvars, backup tarballs, and stray .env files from being committed. Source plan: D383 / Tuleap A174 (#1238). Trigger: Signal #WeOwn.Dev ask from Jason 2026-05-21 (SearXNG broken on DOKS for the Calhoun MetaAgent). DOKS instance is never modified during the migration - rollback is a DNS flip until decommission (T+7 days post-cutover). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…le updates
Folds in feedback on the migration plan:
- Caddyfile + cloud-init: dual-hostname from first boot
('ai-stage.weown.agency, ai.weown.agency' in one site block) so the
production cutover is a DNS A-record swap on the same droplet -
no re-deploy of compose or Caddyfile required at cutover.
- Runbook: replaces the previous int-p01-new.ccc.bot staging hostname
with ai-stage.weown.agency (same parent zone), simplifies Phase 6
accordingly, and adds Phase 1.5 - an optional local-laptop dry-run
that round-trips the DOKS backup through restore.sh against a
throwaway docker container before any droplet exists.
- Image-path open question removed: always reg.mini.dev/anythingllm:latest,
with a note that 'mini_key' is an API key fragment that must come from
Infisical (A126) or DOCR (D341), never embedded in the URL.
- ADR-005 (Proposed): decision record for the retirement, the
parallel-build + DNS-cutover pattern, two human validation gates
(Phase 4 Jason/Yonks soak, Phase 6 CTO cutover approval), and
compliance mappings across NIST CSF 2.0, SOC 2, ISO/IEC 27001:2022,
ISO/IEC 42001:2023, CIS Controls v8. Status flips to Accepted at the
close of the 7-day post-cutover soak.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR documents and scaffolds the INT-P01 (ai.weown.agency) migration plan from DOKS to a single DigitalOcean droplet running AnythingLLM via Docker Compose, including an ADR, a phased migration runbook, and site-specific infrastructure/scripts under anythingllm-docker/sites/.
Changes:
- Adds ADR-005 describing the DOKS retirement decision and the parallel-build + DNS-cutover approach.
- Introduces a new
anythingllm-docker/sites/ai.weown.agency/deployment directory (OpenTofu, cloud-init bootstrap, Compose+Caddy config, and migration/backup/restore/deploy scripts). - Adds
anythingllm-docker/sites/documentation and gitignore rules to formalize the “sites/ per domain” convention.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| CHANGELOG.md | Records the addition of ADR-005 and the INT-P01 docker-site migration assets. |
| anythingllm-docker/sites/README.md | Documents the sites/ convention and lists current deployments. |
| anythingllm-docker/sites/.gitignore | Adds global ignore rules for generated site state/tfvars/backups. |
| anythingllm-docker/sites/ai.weown.agency/terraform/versions.tf | Defines OpenTofu/provider version constraints for the INT-P01 droplet. |
| anythingllm-docker/sites/ai.weown.agency/terraform/variables.tf | Declares infra/runtime variables for the INT-P01 droplet + AnythingLLM configuration. |
| anythingllm-docker/sites/ai.weown.agency/terraform/terraform.tfvars.example | Provides a safe example tfvars file for local operator use. |
| anythingllm-docker/sites/ai.weown.agency/terraform/templates/cloud-init.yaml | Cloud-init bootstrap that installs Docker/Infisical, writes Compose/Caddy/backup scripts, and starts the stack. |
| anythingllm-docker/sites/ai.weown.agency/terraform/outputs.tf | Exposes droplet identifiers/URLs after provisioning. |
| anythingllm-docker/sites/ai.weown.agency/terraform/monitoring.tf | Adds DO monitoring alerts for CPU/memory/disk. |
| anythingllm-docker/sites/ai.weown.agency/terraform/main.tf | Provisions droplet + reserved IP + firewall, and wires in cloud-init user-data. |
| anythingllm-docker/sites/ai.weown.agency/scripts/restore.sh | Restores a “skinny backup” into Docker volumes (with optional Spaces fetch). |
| anythingllm-docker/sites/ai.weown.agency/scripts/migrate-from-doks.sh | Bridges DOKS PVC contents into a restore-compatible skinny-backup tarball. |
| anythingllm-docker/sites/ai.weown.agency/scripts/deploy.sh | Uploads Compose/Caddy config and restarts the stack on the droplet. |
| anythingllm-docker/sites/ai.weown.agency/scripts/backup.sh | Creates skinny backups and applies retention policy. |
| anythingllm-docker/sites/ai.weown.agency/README.md | Site-level overview, prerequisites, and migration/ops guidance. |
| anythingllm-docker/sites/ai.weown.agency/MIGRATION_RUNBOOK.md | Phased operational runbook for the DOKS→Docker migration and validation gates. |
| anythingllm-docker/sites/ai.weown.agency/docker/compose.prod.yaml | Production Compose definition for AnythingLLM + Caddy. |
| anythingllm-docker/sites/ai.weown.agency/docker/Caddyfile | Dual-hostname Caddy config for staging soak + production cutover. |
| anythingllm-docker/sites/ai.weown.agency/CHANGELOG.md | Site-level changelog describing the migration additions. |
| anythingllm-docker/sites/ai.weown.agency/.gitignore | Site-specific ignore rules for state, tfvars, backups, logs, env files. |
| .github/ADR-005-int-p01-doks-retirement.md | ADR documenting the decision, consequences, and compliance mappings for INT-P01 DOKS retirement. |
Comment on lines
+269
to
+295
| #!/bin/bash | ||
| set -euo pipefail | ||
| export INFISICAL_CLIENT_ID='' | ||
| export INFISICAL_CLIENT_SECRET='' | ||
| infisical login --method=universal-auth \ | ||
| --clientId="$INFISICAL_CLIENT_ID" \ | ||
| --clientSecret="$INFISICAL_CLIENT_SECRET" \ | ||
| --silent | ||
|
|
||
| # Cron wrapper: logs in with Machine Identity, then runs backup with secrets injected | ||
| - path: /etc/cron.daily/intp01-backup | ||
| permissions: "0750" | ||
| content: | | ||
| #!/bin/bash | ||
| # Daily backup with Infisical runtime secret injection | ||
| set -euo pipefail | ||
| export INFISICAL_CLIENT_ID='' | ||
| export INFISICAL_CLIENT_SECRET='' | ||
| infisical login --method=universal-auth \ | ||
| --clientId="$INFISICAL_CLIENT_ID" \ | ||
| --clientSecret="$INFISICAL_CLIENT_SECRET" \ | ||
| --silent | ||
| infisical run \ | ||
| --projectId= \ | ||
| --env=prod \ | ||
| -- /opt/intp01/backup.sh \ | ||
| >> /var/log/intp01-backup.log 2>&1 |
Comment on lines
+323
to
+324
| --projectId= \ | ||
| --env=prod \ |
Comment on lines
+13
to
+16
| anythingllm: | ||
| image: reg.mini.dev/anythingllm:latest | ||
| restart: unless-stopped | ||
| environment: |
Comment on lines
+49
to
+54
| ssh "$REMOTE" "cd $APP_DIR && \ | ||
| docker compose pull && \ | ||
| infisical run \ | ||
| --projectId= \ | ||
| --env=prod \ | ||
| -- docker compose up -d" |
Comment on lines
+149
to
+155
| if [[ -n "$host" ]]; then | ||
| echo "==> Running restore on remote: ${host}" | ||
| ssh "$host" "export INFISICAL_CLIENT_ID='' && export INFISICAL_CLIENT_SECRET='' && infisical login --method=universal-auth --clientId=\"\$INFISICAL_CLIENT_ID\" --clientSecret=\"\$INFISICAL_CLIENT_SECRET\" --silent && infisical run --projectId= --env=prod -- bash -c '$RESTORE_CMDS'" | ||
| else | ||
| echo "==> Running restore locally" | ||
| eval "$RESTORE_CMDS" | ||
| fi |
Comment on lines
+252
to
+253
| read -rs "SPACES_ACCESS_KEY?Paste SPACES_ACCESS_KEY: "; echo | ||
| read -rs "SPACES_SECRET_KEY?Paste SPACES_SECRET_KEY: "; echo |
| *.tfstate.* | ||
| *.tfstate.backup | ||
| .terraform/ | ||
| .terraform.lock.hcl |
Comment on lines
+7
to
+9
| # Terraform state and locks | ||
| **/terraform/.terraform/ | ||
| **/terraform/.terraform.lock.hcl |
Comment on lines
+131
to
+134
| if [[ -n "$host" ]]; then | ||
| echo "==> Running backup on remote: ${host}" | ||
| ssh "$host" "$BACKUP_CMDS" | ||
|
|
Comment on lines
+161
to
+169
| ### Restore | ||
|
|
||
| ```bash | ||
| # Restore from local backup on droplet | ||
| ./scripts/restore.sh root@your-droplet-ip anythingllm-ai_backup_20260115_120000 | ||
|
|
||
| # The restore script will automatically fetch from DO Spaces if the backup | ||
| # is not found locally. | ||
| ``` |
ncimino
previously approved these changes
May 26, 2026
# Conflicts: # CHANGELOG.md
…ot PR #36 findings Re-bases the INT-P01 site on the s004 reference pattern (Path C slim cloud-init + Layer 2 bootstrap-secret rotation; see docs/INFRA_BOOTSTRAP_PATTERN.md), and closes every inline review comment left by copilot-pull-request-reviewer on PR #36 (15 comments, all fixed). Path C + Layer 2 adoption (single biggest change): - terraform/templates/cloud-init.yaml: now ONLY handles first-boot bootstrap — Docker, Infisical CLI (artifacts-cli.infisical.com apt repo, not the legacy install-cli.sh capped at v0.38), the v1 → v2 Machine Identity rotation via Infisical Universal Auth API (revokes the v1 secret embedded in terraform state + DO droplet metadata within minutes of provisioning), and a .bootstrap-complete marker. Compose.yaml + Caddyfile + backup.sh + cron NO LONGER ship with cloud-init — they live in ansible/deploy.yml. - ansible/deploy.yml (new): owns all post-bootstrap state on the droplet. Asserts .bootstrap-complete + .infisical-auth.env exist, uploads compose+Caddyfile+backup.sh, installs daily backup cron with logrotate, runs docker compose up under infisical run, pulls images via community.docker.docker_image_pull (no SDK required on the droplet), updates DO tags (commit-<sha> + skinny-backup) via scripts/tag-droplet.sh, waits for /api/ping health. Idempotent — re-runnable any time without tofu taint. - scripts/deploy.sh: now a thin `ansible-playbook` wrapper requiring INFISICAL_PROJECT_ID env var; installs community.docker:==3.13.0 collection if missing. - terraform/backend.tf + init.sh (new): DO Spaces remote tofu state backend (SSE-C encrypted, S3-compatible). init.sh reads spaces_* credentials from terraform.tfvars and forwards them to `tofu init -backend-config=`. - terraform/main.tf: adds lifecycle ignore_changes = [user_data, tags] so the runtime tag mutations from ansible + bootstrap scripts stick. - terraform/variables.tf: adds spaces_access_key, spaces_secret_key, spaces_encryption_key, ssh_source_cidrs. - docker/compose.prod.yaml: bind-mounts /var/log/caddy into Caddy so the otel-agent filelog receiver can ship logs and they survive container recreation. Caddyfile dual-hostname preserved across the refactor: ai-stage.weown.agency, ai.weown.agency { … } Production cutover (Phase 6) is a pure DNS A-record swap on the same droplet — Caddy already has the cert for both names in one site block. Copilot review findings (PR #36) all addressed: #1, #2, #4, #5 Empty Infisical values in cloud-init/deploy/restore → fixed by Path C; cloud-init now uses HCL templatefile substitution (${infisical_*}) that resolves at tofu apply time from var.* — no more pre-baked empty strings. #3, #6, #7 Floating image tags (reg.mini.dev/anythingllm:latest) → pinned to :1.7.2 (same as s004; documented as the WeOwnLLM hardened-image version Shahid verified on s004.ccc.bot). #8 ADR #WeOwnVer mis-computed → v3.5.5.1 → v3.4.5.1. May = month 4 of S3, ISO W22 - W18 + 1 = offset 5, iteration 1 → v3.4.5.1. Math shown inline in the Version line per VERSIONING_WEOWNVER.md. #9 Broken link to private notes/Perpetuator/... in a public repo → replaced with in-repo references (Tuleap A174 / #1238 + the in-repo runbook). #10 Path inconsistency /opt/int-p01/ vs /opt/intp01/ → unified on /opt/int_p01_anythingllm/ throughout README, runbook, scripts, cloud-init. project_name (terraform var) is `int-p01-anythingllm` (hyphenated); underscore form for paths/volumes is `int_p01_anythingllm`. Matches s004's convention. #11 Bash-incompatible `read -rs "VAR?prompt"` (zsh-only) → switched to the canonical zsh-first + bash-fallback pattern: `read -rs "VAR?prompt" 2>/dev/null || read -rsp …` (matches global CLAUDE.md secrets pattern). #12, #13 .terraform.lock.hcl gitignored → unignored at both the sites/.gitignore root and the per-site .gitignore. Lock file is now tracked for reproducible provider versions across machines + CI runs. #14 backup.sh remote mode not wrapped in infisical run → adopted s004's backup.sh which sources the droplet's .infisical-auth.env over SSH and re-execs itself under `infisical run` so SPACES_* are injected for the S3 upload step. Requires INFISICAL_PROJECT_ID env var in remote mode. #15 README restore example used `anythingllm-ai_backup_…` template placeholder → replaced the entire "Migration from Helm/Kubernetes" section with a pointer to MIGRATION_RUNBOOK.md (which uses the real `int-p01-anythingllm_backup_<TS>` naming). Migration artifacts preserved + updated: - MIGRATION_RUNBOOK.md: replaced ssh + manual infisical-run restore invocations with the new Path C flow (INFISICAL_PROJECT_ID=<id> ./scripts/deploy.sh for app layer, then ./scripts/restore.sh for the DOKS data swap). Added explicit Layer 2 rotation verification step. Phase 1.5 local-laptop dry-run pinned to :1.7.2 and renamed volumes/networks to match production (int_p01_anythingllm_*). - scripts/migrate-from-doks.sh: PROJECT_NAME → int-p01-anythingllm so the produced tarball matches what restore.sh on the droplet expects. End-of-script "next step" instructions updated to call the new restore.sh wrapper instead of raw ssh + infisical run. CHANGELOG resolution from rebase: merged the otel-agent additions from main with the INT-P01 + ADR-005 entries; ordered newest-first. Rebased onto origin/main (commit 455be2a). The branch is now a linear 2-commit history: feat (site) + docs (ADR), with this third commit on top covering the full refactor + review-feedback round. User said "we will squash later" so commits are kept granular. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oks-to-docker-migration # Conflicts: # CHANGELOG.md
| final_message: | | ||
| int-p01-anythingllm bootstrap complete (cloud-init done). Next step from the | ||
| operator workstation: | ||
| INFISICAL_PROJECT_ID=<id> ansible-playbook int-p01-anythingllm/ansible/deploy.yml -i 'root@<this-droplet-ip>,' |
|
|
||
| get_tfvar() { | ||
| local var_name="$1" | ||
| grep "^${var_name}" terraform.tfvars | sed 's/.*= *"\(.*\)"/\1/' | tr -d ' ' |
Comment on lines
+47
to
+49
| # CIDRs allowed to SSH (port 22). PRODUCTION: restrict to admin IP/32 or VPN range. | ||
| ssh_source_cidrs = ["0.0.0.0/0", "::/0"] | ||
|
|
| | Framework | Control | How this ADR satisfies it | | ||
| |---|---|---| | ||
| | **NIST CSF 2.0** | PR.IP-3 (config-change procedures) | Migration is fully scripted + documented in runbook, with hard human gates before each irreversible step | | ||
| | | PR.DS-1 (data-at-rest) | Backup encrypted at rest via DO Spaces SSE; restore tarball never written to a host outside `/opt/intp01/backups/` | |
|
|
||
| 1. **Functional defect on DOKS.** Jason verified (2026-05-21) that **SearXNG web search does not work on INT-P01 for the Calhoun MetaAgent** under Claude Opus 4.7 — only Tavily works (Discovery #598). The same SearXNG configuration succeeds on every other droplet-based instance (#468) and on the dedicated SearXNG instance at `searxng.weown.app`. The fault is the DOKS network/ingress posture, not the application or the SearXNG droplet. Spending dev time isolating a DOKS-specific failure for a deployment pattern we have already chosen to retire is poor allocation. | ||
| 2. **Cost.** DOKS for this workload runs ≈ **$97/mo** vs ≈ **$48/mo** for a single droplet (#514) — **~$49/mo direct savings**, before any consolidation. Recurring, with no offsetting capability. | ||
| 3. **Image supply chain.** D381 made the **WeOwnLLM hardened AnythingLLM image** (`reg.mini.dev/anythingllm:latest`) the deployment standard, and it is validated working on `s004.ccc.bot` as of 2026-05-21 (A132 / `#1165`). The DOKS instance still runs upstream `mintplexlabs/anythingllm`, which has ~10× the CVE surface (D289). Retirement aligns INT-P01 with the supply-chain standard. |
| ### 2. Provision infrastructure (terraform — first-boot bootstrap) | ||
|
|
||
| ```bash | ||
| cd ../int-p01-anythingllm/terraform |
| - `infisical_client_secret` — the one-time-shown client secret | ||
| - `infisical_project_id` — `weown-anythingllm` project ID | ||
| - `domain` — leave as `ai.weown.agency` (the production URL). The Caddyfile is dual-hostname (`ai-stage.weown.agency, ai.weown.agency`) regardless of this value; `domain` only affects the `tofu output` URL and monitoring alert text. Caddy obtains certs for both names at first request after DNS resolves. | ||
| - Optionally adjust `do_region` if you want the new droplet closer to users / closer to the DOKS source for migration bandwidth |
Comment on lines
+55
to
+58
| committed. Once the template carries Path C + Layer 2 by default | ||
| (currently being upstreamed from `s004/` and `ai.weown.agency/`), the | ||
| generated site will already include `ansible/`, `backend.tf`, and | ||
| `init.sh`. Until then, copy those from `s004/` and adjust paths. |
Comment on lines
+59
to
+61
| # `tofu plan` fails with "Invalid character" on the single quotes. | ||
| default = ["0.0.0.0/0", "::/0"] | ||
| } |
Comment on lines
+57
to
+59
| # `tojson` emits a valid JSON array (double-quoted strings) which HCL parses | ||
| # as a list. Without it, Copier renders Python's list-repr ('a', 'b') and | ||
| # `tofu plan` fails with "Invalid character" on the single quotes. |
ncimino
added a commit
that referenced
this pull request
May 27, 2026
…agent Self-contained brief for the IDE agent (Cursor / Claude Code / etc.) that will execute the actual migration after PR #36 merges. Companion to MIGRATION_RUNBOOK.md — runbook is the procedure; this prompt is the agent's session-start brief + extended phase plan. Key addition vs. the runbook itself: a new **Phase 1.5 — LOCAL-LAPTOP validation** between source-backup extract (Phase 1) and staging droplet provisioning (Phase 2). Goal: confirm the restored backup boots AnythingLLM cleanly and SearXNG works on the operator's laptop BEFORE paying for a droplet. Catches data-format mismatches and the SearXNG configuration bug locally instead of on a $48/mo staging droplet. Phases 0-6 covered with explicit commands and gate criteria: - Phase 0: pre-flight (branch, secrets, DNS, DOKS health) - Phase 1: extract source backup via migrate-from-doks.sh - Phase 1.5: LOCAL validation (NEW — laptop docker-compose + restore) - Phase 2: staging droplet provision + Layer 2 rotation verify - Phase 3: deploy app layer + restore on staging - Phase 4: Jason+Yonks soak (gate G1) - Phase 5: DNS cutover (gate G2 — CTO approval) - Phase 6: 7-day soak + DOKS decommission (gate G3) Plus: validation-gates table, hard "what NOT to do" list (no DOKS mods pre-cutover, no committed secrets/IPs, no :latest, no skipping gates), output-expected description, and a confirmation prompt the agent pastes back to operator before starting. References: - D383 (source plan) - A174 / Tuleap #1238 (cutover) - ADR-005 (in-repo decision of record) - MIGRATION_RUNBOOK.md (authoritative phase-by-phase) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 Automated Pull Request — authored by
weown-bot(ecosystem service account)Opened by: @ncimino
Last pushed by: @ncimino
Branch:
feature/nik-int-p01-doks-to-docker-migration→mainContributors on this branch:
📋 Human Review Checklist — NIST CSF 2.0 Functions
Review per the 6 NIST CSF Functions. Frameworks referenced: NIST CSF 2.0, CIS Controls v8 IG1, CSA CCM v4, ISO/IEC 27001:2022, SOC 2, ISO/IEC 42001:2023. See
docs/COMPLIANCE_ROADMAP.md.🏛️ Govern (GV)
.github/CODEOWNERS)🔍 Identify (ID)
.github/SECURITY_ASSESSMENT.md)🛡️ Protect (PR)
--from-literal, never/tmp, always$(mktemp)— ISO A.8.24)restricted(NIST PR.IP, CIS 4)🕵️ Detect (DE)
livenessProbe+readinessProbe) configured🚨 Respond (RS)
.github/INCIDENT_RESPONSE.md)♻️ Recover (RC)
📚 Documentation & Versioning
CHANGELOG.mdupdated (per-directory or repo-level/CHANGELOG.md)#WeOwnVerversion bumped perdocs/VERSIONING_WEOWNVER.md📝 Recent Commits (full bodies for Copilot context)
b89ae55 refactor(int-p01): adopt Path C + Layer 2 standard; address all Copilot PR #36 findings
Author: Nik
Date: Tue May 26 15:10:35 2026 -0600
Re-bases the INT-P01 site on the s004 reference pattern (Path C slim
cloud-init + Layer 2 bootstrap-secret rotation; see
docs/INFRA_BOOTSTRAP_PATTERN.md), and closes every inline review comment
left by copilot-pull-request-reviewer on PR #36 (15 comments, all fixed).
Path C + Layer 2 adoption (single biggest change):
bootstrap — Docker, Infisical CLI (artifacts-cli.infisical.com apt
repo, not the legacy install-cli.sh capped at v0.38), the v1 → v2
Machine Identity rotation via Infisical Universal Auth API (revokes
the v1 secret embedded in terraform state + DO droplet metadata
within minutes of provisioning), and a .bootstrap-complete marker.
Compose.yaml + Caddyfile + backup.sh + cron NO LONGER ship with
cloud-init — they live in ansible/deploy.yml.
droplet. Asserts .bootstrap-complete + .infisical-auth.env exist,
uploads compose+Caddyfile+backup.sh, installs daily backup cron
with logrotate, runs docker compose up under infisical run,
pulls images via community.docker.docker_image_pull (no SDK
required on the droplet), updates DO tags (commit- +
skinny-backup) via scripts/tag-droplet.sh, waits for /api/ping
health. Idempotent — re-runnable any time without tofu taint.
ansible-playbookwrapper requiringINFISICAL_PROJECT_ID env var; installs community.docker:==3.13.0
collection if missing.
backend (SSE-C encrypted, S3-compatible). init.sh reads spaces_*
credentials from terraform.tfvars and forwards them to
tofu init -backend-config=.so the runtime tag mutations from ansible + bootstrap scripts stick.
spaces_encryption_key, ssh_source_cidrs.
so the otel-agent filelog receiver can ship logs and they survive
container recreation.
Caddyfile dual-hostname preserved across the refactor:
ai-stage.weown.agency, ai.weown.agency { … }
Production cutover (Phase 6) is a pure DNS A-record swap on the same
droplet — Caddy already has the cert for both names in one site block.
Copilot review findings (PR #36) all addressed:
#1, #2, #4, #5 Empty Infisical values in cloud-init/deploy/restore
→ fixed by Path C; cloud-init now uses HCL templatefile
substitution (${infisical_}) that resolves at tofu
apply time from var. — no more pre-baked empty strings.
#3, #6, #7 Floating image tags (reg.mini.dev/anythingllm:latest)
→ pinned to :1.7.2 (same as s004; documented as the
WeOwnLLM hardened-image version Shahid verified on
s004.ccc.bot).
#8 ADR #WeOwnVer mis-computed → v3.5.5.1 → v3.4.5.1.
May = month 4 of S3, ISO W22 - W18 + 1 = offset 5,
iteration 1 → v3.4.5.1. Math shown inline in the
Version line per VERSIONING_WEOWNVER.md.
#9 Broken link to private notes/Perpetuator/... in a
public repo → replaced with in-repo references
(Tuleap A174 / #1238 + the in-repo runbook).
#10 Path inconsistency /opt/int-p01/ vs /opt/intp01/
→ unified on /opt/int_p01_anythingllm/ throughout
README, runbook, scripts, cloud-init. project_name
(terraform var) is
int-p01-anythingllm(hyphenated);underscore form for paths/volumes is
int_p01_anythingllm. Matches s004's convention.#11 Bash-incompatible
read -rs "VAR?prompt"(zsh-only)→ switched to the canonical zsh-first + bash-fallback
pattern:
read -rs "VAR?prompt" 2>/dev/null || read -rsp …(matches global CLAUDE.md secrets pattern).
#12, #13 .terraform.lock.hcl gitignored → unignored at both
the sites/.gitignore root and the per-site .gitignore.
Lock file is now tracked for reproducible provider
versions across machines + CI runs.
#14 backup.sh remote mode not wrapped in infisical run
→ adopted s004's backup.sh which sources the droplet's
.infisical-auth.env over SSH and re-execs itself under
infisical runso SPACES_* are injected for the S3upload step. Requires INFISICAL_PROJECT_ID env var in
remote mode.
#15 README restore example used
anythingllm-ai_backup_…template placeholder → replaced the entire
"Migration from Helm/Kubernetes" section with a
pointer to MIGRATION_RUNBOOK.md (which uses the real
int-p01-anythingllm_backup_<TS>naming).Migration artifacts preserved + updated:
invocations with the new Path C flow (INFISICAL_PROJECT_ID=
./scripts/deploy.sh for app layer, then ./scripts/restore.sh for
the DOKS data swap). Added explicit Layer 2 rotation verification
step. Phase 1.5 local-laptop dry-run pinned to :1.7.2 and renamed
volumes/networks to match production (int_p01_anythingllm_*).
so the produced tarball matches what restore.sh on the droplet
expects. End-of-script "next step" instructions updated to call
the new restore.sh wrapper instead of raw ssh + infisical run.
CHANGELOG resolution from rebase: merged the otel-agent additions
from main with the INT-P01 + ADR-005 entries; ordered newest-first.
Rebased onto origin/main (commit 455be2a). The branch is now a
linear 2-commit history: feat (site) + docs (ADR), with this third
commit on top covering the full refactor + review-feedback round.
User said "we will squash later" so commits are kept granular.
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com
a609cac Merge branch 'main' into feature/nik-int-p01-doks-to-docker-migration
Author: Nik
Date: Tue May 26 15:47:51 2026 -0600
Conflicts:
CHANGELOG.md
a767e0e docs(adr): add ADR-005 for INT-P01 DOKS retirement; runbook + Caddyfile updates
Author: Nik
Date: Mon May 25 14:27:05 2026 -0600
Folds in feedback on the migration plan:
('ai-stage.weown.agency, ai.weown.agency' in one site block) so the
production cutover is a DNS A-record swap on the same droplet -
no re-deploy of compose or Caddyfile required at cutover.
with ai-stage.weown.agency (same parent zone), simplifies Phase 6
accordingly, and adds Phase 1.5 - an optional local-laptop dry-run
that round-trips the DOKS backup through restore.sh against a
throwaway docker container before any droplet exists.
with a note that 'mini_key' is an API key fragment that must come from
Infisical (A126) or DOCR (D341), never embedded in the URL.
parallel-build + DNS-cutover pattern, two human validation gates
(Phase 4 Jason/Yonks soak, Phase 6 CTO cutover approval), and
compliance mappings across NIST CSF 2.0, SOC 2, ISO/IEC 27001:2022,
ISO/IEC 42001:2023, CIS Controls v8. Status flips to Accepted at the
close of the 7-day post-cutover soak.
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com
96bce9e feat(anythingllm-docker): add INT-P01 (ai.weown.agency) DOKS->Docker migration plan
Author: Nik
Date: Mon May 25 13:55:31 2026 -0600
Generates anythingllm-docker/sites/ai.weown.agency/ from the existing copier
template (project_name=int-p01, WeOwnLLM hardened image) and adds the tooling
to migrate INT-P01 off DOKS via a parallel-build + DNS-cutover pattern:
live DOKS pod, streams /app/server/storage out as a tarball, and wraps it
in the same skinny-backup layout the template's restore.sh already
understands. Optional --upload-to-spaces stages the artifact at
s3://weown-backups/int-p01/ for redundancy.
provision (temporary hostname), DOKS extraction, restore, Jason/Yonks
staging validation, production cutover, 7-day soak, rollback path.
the existing keycloak-docker/sites/ convention.
tfvars, backup tarballs, and stray .env files from being committed.
Source plan: D383 / Tuleap A174 (#1238). Trigger: Signal #WeOwn.Dev ask
from Jason 2026-05-21 (SearXNG broken on DOKS for the Calhoun MetaAgent).
DOKS instance is never modified during the migration - rollback is a DNS
flip until decommission (T+7 days post-cutover).
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com
🔍 Copilot AI Review: Copilot is configured to auto-request review for bot-authored PRs. If an auto-created PR opens without an initial Copilot review, push a follow-up commit to the same open PR (
review_on_push: true) to trigger review automatically.👥 Required Reviewers: 1 human approval enforced by branch protection. requested automatically.
📚 Review Guidelines:
.github/copilot-instructions.md(phase-aware compliance directives)🛠️ Workflow Operations:
.github/workflows/README.mdAuto-generated by
.github/workflows/auto-pr-to-main.yml