Skip to content

feat(anythingllm-docker): INT-P01 DOKS→Docker migration plan + ADR-005#36

Merged
ncimino merged 5 commits into
mainfrom
feature/nik-int-p01-doks-to-docker-migration
May 27, 2026
Merged

feat(anythingllm-docker): INT-P01 DOKS→Docker migration plan + ADR-005#36
ncimino merged 5 commits into
mainfrom
feature/nik-int-p01-doks-to-docker-migration

Conversation

@weown-bot
Copy link
Copy Markdown
Contributor

@weown-bot weown-bot commented May 25, 2026

🤖 Automated Pull Request — authored by weown-bot (ecosystem service account)

Opened by: @ncimino
Last pushed by: @ncimino
Branch: feature/nik-int-p01-doks-to-docker-migrationmain

Contributors on this branch:


📋 Human Review Checklist — NIST CSF 2.0 Functions

Review per the 6 NIST CSF Functions. Frameworks referenced: NIST CSF 2.0, CIS Controls v8 IG1, CSA CCM v4, ISO/IEC 27001:2022, SOC 2, ISO/IEC 42001:2023. See docs/COMPLIANCE_ROADMAP.md.

🏛️ Govern (GV)

  • CODEOWNERS correct for affected paths (.github/CODEOWNERS)
  • ADR required/updated if an architectural decision is introduced
  • Policy impact considered and documented
  • All Copilot AI review comments addressed or explicitly deferred with rationale

🔍 Identify (ID)

  • New assets inventoried (Helm values, container images, dependencies)
  • SBOM regenerated if dependencies changed
  • Risk register / threat model touched if threat surface changed (.github/SECURITY_ASSESSMENT.md)

🛡️ Protect (PR)

  • Least privilege: RBAC, ServiceAccounts, scoped PATs (NIST PR.AC, CIS 5/6, ISO A.5.15-A.5.18)
  • Secrets managed via Infisical (never --from-literal, never /tmp, always $(mktemp) — ISO A.8.24)
  • NetworkPolicy present for new deployments (NIST PR.AC-5, CIS 12, CSA IVS)
  • TLS 1.3 with strong cipher suites where applicable (NIST PR.DS-1, CIS 3)
  • Container security: non-root UID 1000+, Pod Security restricted (NIST PR.IP, CIS 4)

🕵️ Detect (DE)

  • Logs / metrics added for new components (NIST DE.CM, CIS 8/13)
  • Alert rules updated if thresholds change
  • Health checks (livenessProbe + readinessProbe) configured

🚨 Respond (RS)

  • Runbook updated if operational behavior changes (.github/INCIDENT_RESPONSE.md)
  • Incident response impact considered (escalation paths, on-call)

♻️ Recover (RC)

  • Backup strategy covers new persistent data (NIST RC.RP, CIS 11, ISO A.8.13)
  • Rollback procedure tested or documented
  • DR impact assessed for new critical components

📚 Documentation & Versioning

  • Relevant CHANGELOG.md updated (per-directory or repo-level /CHANGELOG.md)
  • #WeOwnVer version bumped per docs/VERSIONING_WEOWNVER.md
  • READMEs / ADRs / inline comments updated

📝 Recent Commits (full bodies for Copilot context)

b89ae55 refactor(int-p01): adopt Path C + Layer 2 standard; address all Copilot PR #36 findings

Author: Nik
Date: Tue May 26 15:10:35 2026 -0600

Re-bases the INT-P01 site on the s004 reference pattern (Path C slim
cloud-init + Layer 2 bootstrap-secret rotation; see
docs/INFRA_BOOTSTRAP_PATTERN.md), and closes every inline review comment
left by copilot-pull-request-reviewer on PR #36 (15 comments, all fixed).

Path C + Layer 2 adoption (single biggest change):

  • terraform/templates/cloud-init.yaml: now ONLY handles first-boot
    bootstrap — Docker, Infisical CLI (artifacts-cli.infisical.com apt
    repo, not the legacy install-cli.sh capped at v0.38), the v1 → v2
    Machine Identity rotation via Infisical Universal Auth API (revokes
    the v1 secret embedded in terraform state + DO droplet metadata
    within minutes of provisioning), and a .bootstrap-complete marker.
    Compose.yaml + Caddyfile + backup.sh + cron NO LONGER ship with
    cloud-init — they live in ansible/deploy.yml.
  • ansible/deploy.yml (new): owns all post-bootstrap state on the
    droplet. Asserts .bootstrap-complete + .infisical-auth.env exist,
    uploads compose+Caddyfile+backup.sh, installs daily backup cron
    with logrotate, runs docker compose up under infisical run,
    pulls images via community.docker.docker_image_pull (no SDK
    required on the droplet), updates DO tags (commit- +
    skinny-backup) via scripts/tag-droplet.sh, waits for /api/ping
    health. Idempotent — re-runnable any time without tofu taint.
  • scripts/deploy.sh: now a thin ansible-playbook wrapper requiring
    INFISICAL_PROJECT_ID env var; installs community.docker:==3.13.0
    collection if missing.
  • terraform/backend.tf + init.sh (new): DO Spaces remote tofu state
    backend (SSE-C encrypted, S3-compatible). init.sh reads spaces_*
    credentials from terraform.tfvars and forwards them to
    tofu init -backend-config=.
  • terraform/main.tf: adds lifecycle ignore_changes = [user_data, tags]
    so the runtime tag mutations from ansible + bootstrap scripts stick.
  • terraform/variables.tf: adds spaces_access_key, spaces_secret_key,
    spaces_encryption_key, ssh_source_cidrs.
  • docker/compose.prod.yaml: bind-mounts /var/log/caddy into Caddy
    so the otel-agent filelog receiver can ship logs and they survive
    container recreation.

Caddyfile dual-hostname preserved across the refactor:
ai-stage.weown.agency, ai.weown.agency { … }
Production cutover (Phase 6) is a pure DNS A-record swap on the same
droplet — Caddy already has the cert for both names in one site block.

Copilot review findings (PR #36) all addressed:
#1, #2, #4, #5 Empty Infisical values in cloud-init/deploy/restore
→ fixed by Path C; cloud-init now uses HCL templatefile
substitution (${infisical_}) that resolves at tofu
apply time from var.
— no more pre-baked empty strings.
#3, #6, #7 Floating image tags (reg.mini.dev/anythingllm:latest)
→ pinned to :1.7.2 (same as s004; documented as the
WeOwnLLM hardened-image version Shahid verified on
s004.ccc.bot).
#8 ADR #WeOwnVer mis-computed → v3.5.5.1 → v3.4.5.1.
May = month 4 of S3, ISO W22 - W18 + 1 = offset 5,
iteration 1 → v3.4.5.1. Math shown inline in the
Version line per VERSIONING_WEOWNVER.md.
#9 Broken link to private notes/Perpetuator/... in a
public repo → replaced with in-repo references
(Tuleap A174 / #1238 + the in-repo runbook).
#10 Path inconsistency /opt/int-p01/ vs /opt/intp01/
→ unified on /opt/int_p01_anythingllm/ throughout
README, runbook, scripts, cloud-init. project_name
(terraform var) is int-p01-anythingllm (hyphenated);
underscore form for paths/volumes is
int_p01_anythingllm. Matches s004's convention.
#11 Bash-incompatible read -rs "VAR?prompt" (zsh-only)
→ switched to the canonical zsh-first + bash-fallback
pattern: read -rs "VAR?prompt" 2>/dev/null || read -rsp …
(matches global CLAUDE.md secrets pattern).
#12, #13 .terraform.lock.hcl gitignored → unignored at both
the sites/.gitignore root and the per-site .gitignore.
Lock file is now tracked for reproducible provider
versions across machines + CI runs.
#14 backup.sh remote mode not wrapped in infisical run
→ adopted s004's backup.sh which sources the droplet's
.infisical-auth.env over SSH and re-execs itself under
infisical run so SPACES_* are injected for the S3
upload step. Requires INFISICAL_PROJECT_ID env var in
remote mode.
#15 README restore example used anythingllm-ai_backup_…
template placeholder → replaced the entire
"Migration from Helm/Kubernetes" section with a
pointer to MIGRATION_RUNBOOK.md (which uses the real
int-p01-anythingllm_backup_<TS> naming).

Migration artifacts preserved + updated:

  • MIGRATION_RUNBOOK.md: replaced ssh + manual infisical-run restore
    invocations with the new Path C flow (INFISICAL_PROJECT_ID=
    ./scripts/deploy.sh for app layer, then ./scripts/restore.sh for
    the DOKS data swap). Added explicit Layer 2 rotation verification
    step. Phase 1.5 local-laptop dry-run pinned to :1.7.2 and renamed
    volumes/networks to match production (int_p01_anythingllm_*).
  • scripts/migrate-from-doks.sh: PROJECT_NAME → int-p01-anythingllm
    so the produced tarball matches what restore.sh on the droplet
    expects. End-of-script "next step" instructions updated to call
    the new restore.sh wrapper instead of raw ssh + infisical run.

CHANGELOG resolution from rebase: merged the otel-agent additions
from main with the INT-P01 + ADR-005 entries; ordered newest-first.

Rebased onto origin/main (commit 455be2a). The branch is now a
linear 2-commit history: feat (site) + docs (ADR), with this third
commit on top covering the full refactor + review-feedback round.
User said "we will squash later" so commits are kept granular.

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com


a609cac Merge branch 'main' into feature/nik-int-p01-doks-to-docker-migration

Author: Nik
Date: Tue May 26 15:47:51 2026 -0600

Conflicts:

CHANGELOG.md


a767e0e docs(adr): add ADR-005 for INT-P01 DOKS retirement; runbook + Caddyfile updates

Author: Nik
Date: Mon May 25 14:27:05 2026 -0600

Folds in feedback on the migration plan:

  • Caddyfile + cloud-init: dual-hostname from first boot
    ('ai-stage.weown.agency, ai.weown.agency' in one site block) so the
    production cutover is a DNS A-record swap on the same droplet -
    no re-deploy of compose or Caddyfile required at cutover.
  • Runbook: replaces the previous int-p01-new.ccc.bot staging hostname
    with ai-stage.weown.agency (same parent zone), simplifies Phase 6
    accordingly, and adds Phase 1.5 - an optional local-laptop dry-run
    that round-trips the DOKS backup through restore.sh against a
    throwaway docker container before any droplet exists.
  • Image-path open question removed: always reg.mini.dev/anythingllm:latest,
    with a note that 'mini_key' is an API key fragment that must come from
    Infisical (A126) or DOCR (D341), never embedded in the URL.
  • ADR-005 (Proposed): decision record for the retirement, the
    parallel-build + DNS-cutover pattern, two human validation gates
    (Phase 4 Jason/Yonks soak, Phase 6 CTO cutover approval), and
    compliance mappings across NIST CSF 2.0, SOC 2, ISO/IEC 27001:2022,
    ISO/IEC 42001:2023, CIS Controls v8. Status flips to Accepted at the
    close of the 7-day post-cutover soak.

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com


96bce9e feat(anythingllm-docker): add INT-P01 (ai.weown.agency) DOKS->Docker migration plan

Author: Nik
Date: Mon May 25 13:55:31 2026 -0600

Generates anythingllm-docker/sites/ai.weown.agency/ from the existing copier
template (project_name=int-p01, WeOwnLLM hardened image) and adds the tooling
to migrate INT-P01 off DOKS via a parallel-build + DNS-cutover pattern:

  • scripts/migrate-from-doks.sh - one-shot bridge that kubectl-execs into the
    live DOKS pod, streams /app/server/storage out as a tarball, and wraps it
    in the same skinny-backup layout the template's restore.sh already
    understands. Optional --upload-to-spaces stages the artifact at
    s3://weown-backups/int-p01/ for redundancy.
  • MIGRATION_RUNBOOK.md - phased runbook: inventory/freeze, staging droplet
    provision (temporary hostname), DOKS extraction, restore, Jason/Yonks
    staging validation, production cutover, 7-day soak, rollback path.
  • anythingllm-docker/sites/README.md - directory-level explainer matching
    the existing keycloak-docker/sites/ convention.
  • anythingllm-docker/sites/.gitignore - blocks terraform state, real
    tfvars, backup tarballs, and stray .env files from being committed.

Source plan: D383 / Tuleap A174 (#1238). Trigger: Signal #WeOwn.Dev ask
from Jason 2026-05-21 (SearXNG broken on DOKS for the Calhoun MetaAgent).
DOKS instance is never modified during the migration - rollback is a DNS
flip until decommission (T+7 days post-cutover).

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com



🔍 Copilot AI Review: Copilot is configured to auto-request review for bot-authored PRs. If an auto-created PR opens without an initial Copilot review, push a follow-up commit to the same open PR (review_on_push: true) to trigger review automatically.

👥 Required Reviewers: 1 human approval enforced by branch protection. requested automatically.

📚 Review Guidelines: .github/copilot-instructions.md (phase-aware compliance directives)

🛠️ Workflow Operations: .github/workflows/README.md

Auto-generated by .github/workflows/auto-pr-to-main.yml

ncimino and others added 2 commits May 25, 2026 13:55
…migration plan

Generates anythingllm-docker/sites/ai.weown.agency/ from the existing copier
template (project_name=int-p01, WeOwnLLM hardened image) and adds the tooling
to migrate INT-P01 off DOKS via a parallel-build + DNS-cutover pattern:

- scripts/migrate-from-doks.sh - one-shot bridge that kubectl-execs into the
  live DOKS pod, streams /app/server/storage out as a tarball, and wraps it
  in the same skinny-backup layout the template's restore.sh already
  understands. Optional --upload-to-spaces stages the artifact at
  s3://weown-backups/int-p01/ for redundancy.
- MIGRATION_RUNBOOK.md - phased runbook: inventory/freeze, staging droplet
  provision (temporary hostname), DOKS extraction, restore, Jason/Yonks
  staging validation, production cutover, 7-day soak, rollback path.
- anythingllm-docker/sites/README.md - directory-level explainer matching
  the existing keycloak-docker/sites/ convention.
- anythingllm-docker/sites/.gitignore - blocks terraform state, real
  tfvars, backup tarballs, and stray .env files from being committed.

Source plan: D383 / Tuleap A174 (#1238). Trigger: Signal #WeOwn.Dev ask
from Jason 2026-05-21 (SearXNG broken on DOKS for the Calhoun MetaAgent).
DOKS instance is never modified during the migration - rollback is a DNS
flip until decommission (T+7 days post-cutover).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…le updates

Folds in feedback on the migration plan:

- Caddyfile + cloud-init: dual-hostname from first boot
  ('ai-stage.weown.agency, ai.weown.agency' in one site block) so the
  production cutover is a DNS A-record swap on the same droplet -
  no re-deploy of compose or Caddyfile required at cutover.
- Runbook: replaces the previous int-p01-new.ccc.bot staging hostname
  with ai-stage.weown.agency (same parent zone), simplifies Phase 6
  accordingly, and adds Phase 1.5 - an optional local-laptop dry-run
  that round-trips the DOKS backup through restore.sh against a
  throwaway docker container before any droplet exists.
- Image-path open question removed: always reg.mini.dev/anythingllm:latest,
  with a note that 'mini_key' is an API key fragment that must come from
  Infisical (A126) or DOCR (D341), never embedded in the URL.
- ADR-005 (Proposed): decision record for the retirement, the
  parallel-build + DNS-cutover pattern, two human validation gates
  (Phase 4 Jason/Yonks soak, Phase 6 CTO cutover approval), and
  compliance mappings across NIST CSF 2.0, SOC 2, ISO/IEC 27001:2022,
  ISO/IEC 42001:2023, CIS Controls v8. Status flips to Accepted at the
  close of the 7-day post-cutover soak.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 25, 2026 20:27
@weown-bot weown-bot requested a review from ncimino as a code owner May 25, 2026 20:27
@ncimino ncimino changed the title Auto-PR: docs(adr): add ADR-005 for INT-P01 DOKS retirement; runbook + Caddyfile updates feat(anythingllm-docker): INT-P01 DOKS→Docker migration plan + ADR-005 May 25, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR documents and scaffolds the INT-P01 (ai.weown.agency) migration plan from DOKS to a single DigitalOcean droplet running AnythingLLM via Docker Compose, including an ADR, a phased migration runbook, and site-specific infrastructure/scripts under anythingllm-docker/sites/.

Changes:

  • Adds ADR-005 describing the DOKS retirement decision and the parallel-build + DNS-cutover approach.
  • Introduces a new anythingllm-docker/sites/ai.weown.agency/ deployment directory (OpenTofu, cloud-init bootstrap, Compose+Caddy config, and migration/backup/restore/deploy scripts).
  • Adds anythingllm-docker/sites/ documentation and gitignore rules to formalize the “sites/ per domain” convention.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
CHANGELOG.md Records the addition of ADR-005 and the INT-P01 docker-site migration assets.
anythingllm-docker/sites/README.md Documents the sites/ convention and lists current deployments.
anythingllm-docker/sites/.gitignore Adds global ignore rules for generated site state/tfvars/backups.
anythingllm-docker/sites/ai.weown.agency/terraform/versions.tf Defines OpenTofu/provider version constraints for the INT-P01 droplet.
anythingllm-docker/sites/ai.weown.agency/terraform/variables.tf Declares infra/runtime variables for the INT-P01 droplet + AnythingLLM configuration.
anythingllm-docker/sites/ai.weown.agency/terraform/terraform.tfvars.example Provides a safe example tfvars file for local operator use.
anythingllm-docker/sites/ai.weown.agency/terraform/templates/cloud-init.yaml Cloud-init bootstrap that installs Docker/Infisical, writes Compose/Caddy/backup scripts, and starts the stack.
anythingllm-docker/sites/ai.weown.agency/terraform/outputs.tf Exposes droplet identifiers/URLs after provisioning.
anythingllm-docker/sites/ai.weown.agency/terraform/monitoring.tf Adds DO monitoring alerts for CPU/memory/disk.
anythingllm-docker/sites/ai.weown.agency/terraform/main.tf Provisions droplet + reserved IP + firewall, and wires in cloud-init user-data.
anythingllm-docker/sites/ai.weown.agency/scripts/restore.sh Restores a “skinny backup” into Docker volumes (with optional Spaces fetch).
anythingllm-docker/sites/ai.weown.agency/scripts/migrate-from-doks.sh Bridges DOKS PVC contents into a restore-compatible skinny-backup tarball.
anythingllm-docker/sites/ai.weown.agency/scripts/deploy.sh Uploads Compose/Caddy config and restarts the stack on the droplet.
anythingllm-docker/sites/ai.weown.agency/scripts/backup.sh Creates skinny backups and applies retention policy.
anythingllm-docker/sites/ai.weown.agency/README.md Site-level overview, prerequisites, and migration/ops guidance.
anythingllm-docker/sites/ai.weown.agency/MIGRATION_RUNBOOK.md Phased operational runbook for the DOKS→Docker migration and validation gates.
anythingllm-docker/sites/ai.weown.agency/docker/compose.prod.yaml Production Compose definition for AnythingLLM + Caddy.
anythingllm-docker/sites/ai.weown.agency/docker/Caddyfile Dual-hostname Caddy config for staging soak + production cutover.
anythingllm-docker/sites/ai.weown.agency/CHANGELOG.md Site-level changelog describing the migration additions.
anythingllm-docker/sites/ai.weown.agency/.gitignore Site-specific ignore rules for state, tfvars, backups, logs, env files.
.github/ADR-005-int-p01-doks-retirement.md ADR documenting the decision, consequences, and compliance mappings for INT-P01 DOKS retirement.

Comment on lines +269 to +295
#!/bin/bash
set -euo pipefail
export INFISICAL_CLIENT_ID=''
export INFISICAL_CLIENT_SECRET=''
infisical login --method=universal-auth \
--clientId="$INFISICAL_CLIENT_ID" \
--clientSecret="$INFISICAL_CLIENT_SECRET" \
--silent

# Cron wrapper: logs in with Machine Identity, then runs backup with secrets injected
- path: /etc/cron.daily/intp01-backup
permissions: "0750"
content: |
#!/bin/bash
# Daily backup with Infisical runtime secret injection
set -euo pipefail
export INFISICAL_CLIENT_ID=''
export INFISICAL_CLIENT_SECRET=''
infisical login --method=universal-auth \
--clientId="$INFISICAL_CLIENT_ID" \
--clientSecret="$INFISICAL_CLIENT_SECRET" \
--silent
infisical run \
--projectId= \
--env=prod \
-- /opt/intp01/backup.sh \
>> /var/log/intp01-backup.log 2>&1
Comment on lines +323 to +324
--projectId= \
--env=prod \
Comment on lines +13 to +16
anythingllm:
image: reg.mini.dev/anythingllm:latest
restart: unless-stopped
environment:
Comment on lines +49 to +54
ssh "$REMOTE" "cd $APP_DIR && \
docker compose pull && \
infisical run \
--projectId= \
--env=prod \
-- docker compose up -d"
Comment on lines +149 to +155
if [[ -n "$host" ]]; then
echo "==> Running restore on remote: ${host}"
ssh "$host" "export INFISICAL_CLIENT_ID='' && export INFISICAL_CLIENT_SECRET='' && infisical login --method=universal-auth --clientId=\"\$INFISICAL_CLIENT_ID\" --clientSecret=\"\$INFISICAL_CLIENT_SECRET\" --silent && infisical run --projectId= --env=prod -- bash -c '$RESTORE_CMDS'"
else
echo "==> Running restore locally"
eval "$RESTORE_CMDS"
fi
Comment on lines +252 to +253
read -rs "SPACES_ACCESS_KEY?Paste SPACES_ACCESS_KEY: "; echo
read -rs "SPACES_SECRET_KEY?Paste SPACES_SECRET_KEY: "; echo
*.tfstate.*
*.tfstate.backup
.terraform/
.terraform.lock.hcl
Comment thread anythingllm-docker/sites/.gitignore Outdated
Comment on lines +7 to +9
# Terraform state and locks
**/terraform/.terraform/
**/terraform/.terraform.lock.hcl
Comment on lines +131 to +134
if [[ -n "$host" ]]; then
echo "==> Running backup on remote: ${host}"
ssh "$host" "$BACKUP_CMDS"

Comment on lines +161 to +169
### Restore

```bash
# Restore from local backup on droplet
./scripts/restore.sh root@your-droplet-ip anythingllm-ai_backup_20260115_120000

# The restore script will automatically fetch from DO Spaces if the backup
# is not found locally.
```
ncimino
ncimino previously approved these changes May 26, 2026
ncimino and others added 2 commits May 26, 2026 15:47
…ot PR #36 findings

Re-bases the INT-P01 site on the s004 reference pattern (Path C slim
cloud-init + Layer 2 bootstrap-secret rotation; see
docs/INFRA_BOOTSTRAP_PATTERN.md), and closes every inline review comment
left by copilot-pull-request-reviewer on PR #36 (15 comments, all fixed).

Path C + Layer 2 adoption (single biggest change):
- terraform/templates/cloud-init.yaml: now ONLY handles first-boot
  bootstrap — Docker, Infisical CLI (artifacts-cli.infisical.com apt
  repo, not the legacy install-cli.sh capped at v0.38), the v1 → v2
  Machine Identity rotation via Infisical Universal Auth API (revokes
  the v1 secret embedded in terraform state + DO droplet metadata
  within minutes of provisioning), and a .bootstrap-complete marker.
  Compose.yaml + Caddyfile + backup.sh + cron NO LONGER ship with
  cloud-init — they live in ansible/deploy.yml.
- ansible/deploy.yml (new): owns all post-bootstrap state on the
  droplet. Asserts .bootstrap-complete + .infisical-auth.env exist,
  uploads compose+Caddyfile+backup.sh, installs daily backup cron
  with logrotate, runs docker compose up under infisical run,
  pulls images via community.docker.docker_image_pull (no SDK
  required on the droplet), updates DO tags (commit-<sha> +
  skinny-backup) via scripts/tag-droplet.sh, waits for /api/ping
  health. Idempotent — re-runnable any time without tofu taint.
- scripts/deploy.sh: now a thin `ansible-playbook` wrapper requiring
  INFISICAL_PROJECT_ID env var; installs community.docker:==3.13.0
  collection if missing.
- terraform/backend.tf + init.sh (new): DO Spaces remote tofu state
  backend (SSE-C encrypted, S3-compatible). init.sh reads spaces_*
  credentials from terraform.tfvars and forwards them to
  `tofu init -backend-config=`.
- terraform/main.tf: adds lifecycle ignore_changes = [user_data, tags]
  so the runtime tag mutations from ansible + bootstrap scripts stick.
- terraform/variables.tf: adds spaces_access_key, spaces_secret_key,
  spaces_encryption_key, ssh_source_cidrs.
- docker/compose.prod.yaml: bind-mounts /var/log/caddy into Caddy
  so the otel-agent filelog receiver can ship logs and they survive
  container recreation.

Caddyfile dual-hostname preserved across the refactor:
  ai-stage.weown.agency, ai.weown.agency { … }
Production cutover (Phase 6) is a pure DNS A-record swap on the same
droplet — Caddy already has the cert for both names in one site block.

Copilot review findings (PR #36) all addressed:
  #1, #2, #4, #5  Empty Infisical values in cloud-init/deploy/restore
                  → fixed by Path C; cloud-init now uses HCL templatefile
                  substitution (${infisical_*}) that resolves at tofu
                  apply time from var.* — no more pre-baked empty strings.
  #3, #6, #7      Floating image tags (reg.mini.dev/anythingllm:latest)
                  → pinned to :1.7.2 (same as s004; documented as the
                  WeOwnLLM hardened-image version Shahid verified on
                  s004.ccc.bot).
  #8              ADR #WeOwnVer mis-computed → v3.5.5.1 → v3.4.5.1.
                  May = month 4 of S3, ISO W22 - W18 + 1 = offset 5,
                  iteration 1 → v3.4.5.1. Math shown inline in the
                  Version line per VERSIONING_WEOWNVER.md.
  #9              Broken link to private notes/Perpetuator/... in a
                  public repo → replaced with in-repo references
                  (Tuleap A174 / #1238 + the in-repo runbook).
  #10             Path inconsistency /opt/int-p01/ vs /opt/intp01/
                  → unified on /opt/int_p01_anythingllm/ throughout
                  README, runbook, scripts, cloud-init. project_name
                  (terraform var) is `int-p01-anythingllm` (hyphenated);
                  underscore form for paths/volumes is
                  `int_p01_anythingllm`. Matches s004's convention.
  #11             Bash-incompatible `read -rs "VAR?prompt"` (zsh-only)
                  → switched to the canonical zsh-first + bash-fallback
                  pattern: `read -rs "VAR?prompt" 2>/dev/null || read -rsp …`
                  (matches global CLAUDE.md secrets pattern).
  #12, #13        .terraform.lock.hcl gitignored → unignored at both
                  the sites/.gitignore root and the per-site .gitignore.
                  Lock file is now tracked for reproducible provider
                  versions across machines + CI runs.
  #14             backup.sh remote mode not wrapped in infisical run
                  → adopted s004's backup.sh which sources the droplet's
                  .infisical-auth.env over SSH and re-execs itself under
                  `infisical run` so SPACES_* are injected for the S3
                  upload step. Requires INFISICAL_PROJECT_ID env var in
                  remote mode.
  #15             README restore example used `anythingllm-ai_backup_…`
                  template placeholder → replaced the entire
                  "Migration from Helm/Kubernetes" section with a
                  pointer to MIGRATION_RUNBOOK.md (which uses the real
                  `int-p01-anythingllm_backup_<TS>` naming).

Migration artifacts preserved + updated:
- MIGRATION_RUNBOOK.md: replaced ssh + manual infisical-run restore
  invocations with the new Path C flow (INFISICAL_PROJECT_ID=<id>
  ./scripts/deploy.sh for app layer, then ./scripts/restore.sh for
  the DOKS data swap). Added explicit Layer 2 rotation verification
  step. Phase 1.5 local-laptop dry-run pinned to :1.7.2 and renamed
  volumes/networks to match production (int_p01_anythingllm_*).
- scripts/migrate-from-doks.sh: PROJECT_NAME → int-p01-anythingllm
  so the produced tarball matches what restore.sh on the droplet
  expects. End-of-script "next step" instructions updated to call
  the new restore.sh wrapper instead of raw ssh + infisical run.

CHANGELOG resolution from rebase: merged the otel-agent additions
from main with the INT-P01 + ADR-005 entries; ordered newest-first.

Rebased onto origin/main (commit 455be2a). The branch is now a
linear 2-commit history: feat (site) + docs (ADR), with this third
commit on top covering the full refactor + review-feedback round.
User said "we will squash later" so commits are kept granular.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@weown-bot weown-bot requested a review from ncimino May 26, 2026 21:48
…oks-to-docker-migration

# Conflicts:
#	CHANGELOG.md
Copilot AI review requested due to automatic review settings May 27, 2026 05:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 11 comments.

final_message: |
int-p01-anythingllm bootstrap complete (cloud-init done). Next step from the
operator workstation:
INFISICAL_PROJECT_ID=<id> ansible-playbook int-p01-anythingllm/ansible/deploy.yml -i 'root@<this-droplet-ip>,'

get_tfvar() {
local var_name="$1"
grep "^${var_name}" terraform.tfvars | sed 's/.*= *"\(.*\)"/\1/' | tr -d ' '
Comment on lines +47 to +49
# CIDRs allowed to SSH (port 22). PRODUCTION: restrict to admin IP/32 or VPN range.
ssh_source_cidrs = ["0.0.0.0/0", "::/0"]

| Framework | Control | How this ADR satisfies it |
|---|---|---|
| **NIST CSF 2.0** | PR.IP-3 (config-change procedures) | Migration is fully scripted + documented in runbook, with hard human gates before each irreversible step |
| | PR.DS-1 (data-at-rest) | Backup encrypted at rest via DO Spaces SSE; restore tarball never written to a host outside `/opt/intp01/backups/` |

1. **Functional defect on DOKS.** Jason verified (2026-05-21) that **SearXNG web search does not work on INT-P01 for the Calhoun MetaAgent** under Claude Opus 4.7 — only Tavily works (Discovery #598). The same SearXNG configuration succeeds on every other droplet-based instance (#468) and on the dedicated SearXNG instance at `searxng.weown.app`. The fault is the DOKS network/ingress posture, not the application or the SearXNG droplet. Spending dev time isolating a DOKS-specific failure for a deployment pattern we have already chosen to retire is poor allocation.
2. **Cost.** DOKS for this workload runs ≈ **$97/mo** vs ≈ **$48/mo** for a single droplet (#514) — **~$49/mo direct savings**, before any consolidation. Recurring, with no offsetting capability.
3. **Image supply chain.** D381 made the **WeOwnLLM hardened AnythingLLM image** (`reg.mini.dev/anythingllm:latest`) the deployment standard, and it is validated working on `s004.ccc.bot` as of 2026-05-21 (A132 / `#1165`). The DOKS instance still runs upstream `mintplexlabs/anythingllm`, which has ~10× the CVE surface (D289). Retirement aligns INT-P01 with the supply-chain standard.
### 2. Provision infrastructure (terraform — first-boot bootstrap)

```bash
cd ../int-p01-anythingllm/terraform
- `infisical_client_secret` — the one-time-shown client secret
- `infisical_project_id` — `weown-anythingllm` project ID
- `domain` — leave as `ai.weown.agency` (the production URL). The Caddyfile is dual-hostname (`ai-stage.weown.agency, ai.weown.agency`) regardless of this value; `domain` only affects the `tofu output` URL and monitoring alert text. Caddy obtains certs for both names at first request after DNS resolves.
- Optionally adjust `do_region` if you want the new droplet closer to users / closer to the DOKS source for migration bandwidth
Comment on lines +55 to +58
committed. Once the template carries Path C + Layer 2 by default
(currently being upstreamed from `s004/` and `ai.weown.agency/`), the
generated site will already include `ansible/`, `backend.tf`, and
`init.sh`. Until then, copy those from `s004/` and adjust paths.
Comment on lines +59 to +61
# `tofu plan` fails with "Invalid character" on the single quotes.
default = ["0.0.0.0/0", "::/0"]
}
Comment on lines +57 to +59
# `tojson` emits a valid JSON array (double-quoted strings) which HCL parses
# as a list. Without it, Copier renders Python's list-repr ('a', 'b') and
# `tofu plan` fails with "Invalid character" on the single quotes.
@ncimino ncimino merged commit ff11a63 into main May 27, 2026
17 checks passed
@ncimino ncimino deleted the feature/nik-int-p01-doks-to-docker-migration branch May 27, 2026 19:10
@ncimino ncimino restored the feature/nik-int-p01-doks-to-docker-migration branch May 27, 2026 19:15
ncimino added a commit that referenced this pull request May 27, 2026
…agent

Self-contained brief for the IDE agent (Cursor / Claude Code / etc.) that
will execute the actual migration after PR #36 merges. Companion to
MIGRATION_RUNBOOK.md — runbook is the procedure; this prompt is the
agent's session-start brief + extended phase plan.

Key addition vs. the runbook itself: a new **Phase 1.5 — LOCAL-LAPTOP
validation** between source-backup extract (Phase 1) and staging droplet
provisioning (Phase 2). Goal: confirm the restored backup boots
AnythingLLM cleanly and SearXNG works on the operator's laptop BEFORE
paying for a droplet. Catches data-format mismatches and the SearXNG
configuration bug locally instead of on a $48/mo staging droplet.

Phases 0-6 covered with explicit commands and gate criteria:
- Phase 0: pre-flight (branch, secrets, DNS, DOKS health)
- Phase 1: extract source backup via migrate-from-doks.sh
- Phase 1.5: LOCAL validation (NEW — laptop docker-compose + restore)
- Phase 2: staging droplet provision + Layer 2 rotation verify
- Phase 3: deploy app layer + restore on staging
- Phase 4: Jason+Yonks soak (gate G1)
- Phase 5: DNS cutover (gate G2 — CTO approval)
- Phase 6: 7-day soak + DOKS decommission (gate G3)

Plus: validation-gates table, hard "what NOT to do" list (no DOKS mods
pre-cutover, no committed secrets/IPs, no :latest, no skipping gates),
output-expected description, and a confirmation prompt the agent pastes
back to operator before starting.

References:
- D383 (source plan)
- A174 / Tuleap #1238 (cutover)
- ADR-005 (in-repo decision of record)
- MIGRATION_RUNBOOK.md (authoritative phase-by-phase)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants