Skip to content

docs: finalize AnythingLLM docker-compose and deployment guide#11

Closed
mshahid538 wants to merge 1 commit into
mainfrom
droplet-deployment-guide
Closed

docs: finalize AnythingLLM docker-compose and deployment guide#11
mshahid538 wants to merge 1 commit into
mainfrom
droplet-deployment-guide

Conversation

@mshahid538
Copy link
Copy Markdown
Contributor

No description provided.

@mshahid538 mshahid538 self-assigned this Feb 20, 2026
Copy link
Copy Markdown
Member

@romandidomizio romandidomizio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

═══════════════════════════════════════════════════════════════════════════════
🏐 #ContextVolley | @rmn@shd | 2026-03-16 | W12
═══════════════════════════════════════════════════════════════════════════════

📐 CCC Metadata

Field Value
CCC-ID RMN_2026-W12_035
CCC Version 3.1.3.1
Context Type 📋 TASK-BLOCKER
Target @shd (DevOps)
Priority 🔴 P0
Handoff Protocol SEEK:CONFIRM → BUILD → PR-UPDATE

🎯 ISSUE SUMMARY

Your PR in weownnetwork/ai/anythingllm/docker contains ethdenver-specific config with basic documentation, .env.example that doesn't match our target variables, and no deployment script. We need a repeatable framework/template. This blocks team replication. We need generic, reusable deployment tooling for all future instances.


📦 REQUIRED FIXES

1. Remove Instance-Specific Naming

Current Required
anythingllm_ethdenver <CONTAINER_NAME> (user input)
/root/ethdenver_storage <HOST_PATH> (user input)
ETHDenver.CCC.bot <DOMAIN> (user input)

2. Create Unified Deploy Script (deploy.sh)

Script must handle:

# Pre-flight Checks
✓ Docker Engine installed on target droplet
✓ Docker Compose available
✓ User authenticated to correct droplet (SSH/DO API)

# Interactive Prompts
✓ Container name (no defaults)
✓ Host storage path
✓ Domain for exposure (Caddy/Traefik)
✓ LLM Provider (default: openrouter, NOT openai)
✓ LLM Model (user selection)
✓ Embedding Model (user selection)
✓ API Key (secure injection via Infisical or prompt)
✓ Port mapping

# Secure Env Handling
✓ Generate .env from prompts (no hardcoded values)
✓ Sanitized .env.example for GitHub
✓ Inject secrets at runtime (not baked into compose)

# Deployment Execution
✓ docker compose up -d
✓ SSL verification (Caddy auto-HTTPS)
✓ Health check confirmation

3. Reference Pattern

Use our AnythingLLM Helm Chart deploy script as template logic. Adapt for Docker Droplet context (no K8s, single-node for docker).

4. File Structure

anythingllm/docker/
├── docker-compose.yml      # Generic, no hardcoded names
├── .env.example            # Sanitized template
├── deploy.sh               # Unified deployment script
├── README.md               # Usage instructions + pre-reqs
└── .env                    # Runtime only (NOT committed)

🚫 WHAT TO REMOVE

File Issue Action
docker-compose.yml Hardcoded anythingllm_ethdenver Replace with variable injection
.env.example LLM_PROVIDER=openai Change default to openrouter
README.md ethdenver-specific docs Generalize for any instance

✅ ACCEPTANCE CRITERIA

# Criteria Status
1 Script runs on any DO Docker droplet
2 No hardcoded instance names in config
3 Prompts for all required variables
4 Secure API key handling (Infisical-ready)
5 OpenRouter as default provider
6 README covers pre-reqs + usage
7 Single script (no fragmented commands)

📝 README.md ADDENDUM — SYSTEM UPDATES & CONFIG

Purpose: Standardize update procedures, resource requirements, and extension configs (MCP/Env).
Location: Append to README.md or create docs/DEPLOYMENT.md.


🔄 System Updates & Configuration

1. Automated Update Script

To ensure all services are pulled and restarted cleanly, use the provided update script.

# ./scripts/update.sh
#!/bin/bash
echo "🔄 Pulling latest changes..."
git pull origin main
echo "🐳 Restarting containers..."
docker compose down
docker compose up -d --pull always
echo "✅ Update complete."

Usage:

chmod +x scripts/update.sh
./scripts/update.sh

2. AnythingLLM (Docker Self-Hosted)

We utilize AnythingLLM for document retrieval and agent context with offloaded inference and embedding. Ensure your host meets minimum requirements before deployment.

Resource Minimum Recommended
RAM 2 GB 4 GB+
CPU 1 Cores 2 Cores+
Storage 10 GB 50 GB+ (SSD)

📖 Official Docs: AnythingLLM Docker Requirements

3. Environment Variables

Configure core system behavior via .env. Key variables include as a start, for example:

# Community Hub Configuration
# Enable agent skill imports from AnythingLLM Hub
# "1" = Allow verified/private items only (recommended for enterprise)
# "allow_all" = Allow all items including unverified (not recommended)
COMMUNITY_HUB_BUNDLE_DOWNLOADS_ENABLED: "1"  # Enterprise security: verified items only

# MCP Configuration
MCP_SERVER_ENABLED=true
MCP_CONFIG_PATH=/etc/mcp/servers.json

4. Custom MCP Servers

To add custom Model Context Protocol (MCP) servers, edit the MCP configuration file, for example:

File: /etc/mcp/servers.json

{
  "mcpServers": {
    "custom-tool": {
      "command": "node",
      "args": ["/app/tools/custom-server.js"],
      "env": {
        "API_KEY": "${YOUR_API_KEY}"
      }
    }
  }
}

Restart Required: After modifying MCP configs, restart the agent service:

docker compose restart agent

📬 HANDOFF

Action Required:

  1. Review this volley
  2. Update PR with generic config + deploy.sh
  3. Remove ethdenver-specific references
  4. Test against fresh droplet (not ethdenver instance)
  5. Tag @rmn for review before merge

Blocker Status: 🔴 PR cannot merge until corrected

═══════════════════════════════════════════════════════════════════════════════

ncimino pushed a commit that referenced this pull request Apr 20, 2026
Resolved all remaining issues from PR #5 Copilot review:

Issue #1 - Workflow branch triggers:
- Added explicit branch patterns: maintenance, feature/*, fix/*, docs/*, hotfix/*
- Excluded experimental/* branches to prevent unintended PRs
- Maintains security while supporting defined branching strategy

Issue #2 - Dynamic repository values:
- Changed hardcoded 'WeOwnNetwork' to ${{ github.repository_owner }}
- Changed hardcoded 'ai' to ${{ github.event.repository.name }}
- Enables workflow portability across forks and repos

Issue #3 - Improved PR title fallback:
- Added commit count when available
- Uses latest commit subject as additional hint
- Provides context: 'Merge branch into main (X commits)'
- Falls back gracefully through multiple options

Issue #4 - Copilot date context:
- Updated to current date: January 26, 2026 (Sunday)
- Clarified Copilot cannot use web search during reviews
- Focus on format validation vs exact date calculation

Issue #5 & #9 - Version format clarity:
- Clarified 3.4.0 as SEASON.WEEK.DAY with DAY=0, VERSION omitted
- Updated special cases table with explicit component breakdowns
- Added note explaining shorthand format vs full 4-part format

Issue #6 - CI/CD dry-run validation:
- Removed '|| true' error suppression
- Allows failures to propagate and fail pipeline
- Aligns with quality gates (blocking on K8s failures)

Issue #7 - README absolute paths:
- Changed ../docs/ to /docs/ for HELM_VALUE_MANAGEMENT.md
- Ensures links work across all documentation contexts

Issue #11 - Example day inconsistency:
- Fixed Jan 25, 2026 from Saturday (6) to Sunday (7)
- Provided complete example version: 2.5.7.1

Issue #12 - CHANGELOG date:
- Updated from 2026-01-25 to 2026-01-26 (current date)

Issue #14 - WordPress version clarity:
- Clarified as 'WordPress application version 3.2.5'
- Distinguishes from WeOwnVer chart versioning

Issue #15 - Security consistency:
- Pinned all actions/checkout@v4 to specific SHA
- Added comment: # v4.1.5 for version tracking
- Consistent with other pinned actions in workflow

All paths now use absolute /docs/ references, all version format
ambiguities resolved, security controls enforced consistently.
@mshahid538 mshahid538 closed this Apr 29, 2026
ncimino added a commit that referenced this pull request May 26, 2026
…ot PR #36 findings

Re-bases the INT-P01 site on the s004 reference pattern (Path C slim
cloud-init + Layer 2 bootstrap-secret rotation; see
docs/INFRA_BOOTSTRAP_PATTERN.md), and closes every inline review comment
left by copilot-pull-request-reviewer on PR #36 (15 comments, all fixed).

Path C + Layer 2 adoption (single biggest change):
- terraform/templates/cloud-init.yaml: now ONLY handles first-boot
  bootstrap — Docker, Infisical CLI (artifacts-cli.infisical.com apt
  repo, not the legacy install-cli.sh capped at v0.38), the v1 → v2
  Machine Identity rotation via Infisical Universal Auth API (revokes
  the v1 secret embedded in terraform state + DO droplet metadata
  within minutes of provisioning), and a .bootstrap-complete marker.
  Compose.yaml + Caddyfile + backup.sh + cron NO LONGER ship with
  cloud-init — they live in ansible/deploy.yml.
- ansible/deploy.yml (new): owns all post-bootstrap state on the
  droplet. Asserts .bootstrap-complete + .infisical-auth.env exist,
  uploads compose+Caddyfile+backup.sh, installs daily backup cron
  with logrotate, runs docker compose up under infisical run,
  pulls images via community.docker.docker_image_pull (no SDK
  required on the droplet), updates DO tags (commit-<sha> +
  skinny-backup) via scripts/tag-droplet.sh, waits for /api/ping
  health. Idempotent — re-runnable any time without tofu taint.
- scripts/deploy.sh: now a thin `ansible-playbook` wrapper requiring
  INFISICAL_PROJECT_ID env var; installs community.docker:==3.13.0
  collection if missing.
- terraform/backend.tf + init.sh (new): DO Spaces remote tofu state
  backend (SSE-C encrypted, S3-compatible). init.sh reads spaces_*
  credentials from terraform.tfvars and forwards them to
  `tofu init -backend-config=`.
- terraform/main.tf: adds lifecycle ignore_changes = [user_data, tags]
  so the runtime tag mutations from ansible + bootstrap scripts stick.
- terraform/variables.tf: adds spaces_access_key, spaces_secret_key,
  spaces_encryption_key, ssh_source_cidrs.
- docker/compose.prod.yaml: bind-mounts /var/log/caddy into Caddy
  so the otel-agent filelog receiver can ship logs and they survive
  container recreation.

Caddyfile dual-hostname preserved across the refactor:
  ai-stage.weown.agency, ai.weown.agency { … }
Production cutover (Phase 6) is a pure DNS A-record swap on the same
droplet — Caddy already has the cert for both names in one site block.

Copilot review findings (PR #36) all addressed:
  #1, #2, #4, #5  Empty Infisical values in cloud-init/deploy/restore
                  → fixed by Path C; cloud-init now uses HCL templatefile
                  substitution (${infisical_*}) that resolves at tofu
                  apply time from var.* — no more pre-baked empty strings.
  #3, #6, #7      Floating image tags (reg.mini.dev/anythingllm:latest)
                  → pinned to :1.7.2 (same as s004; documented as the
                  WeOwnLLM hardened-image version Shahid verified on
                  s004.ccc.bot).
  #8              ADR #WeOwnVer mis-computed → v3.5.5.1 → v3.4.5.1.
                  May = month 4 of S3, ISO W22 - W18 + 1 = offset 5,
                  iteration 1 → v3.4.5.1. Math shown inline in the
                  Version line per VERSIONING_WEOWNVER.md.
  #9              Broken link to private notes/Perpetuator/... in a
                  public repo → replaced with in-repo references
                  (Tuleap A174 / #1238 + the in-repo runbook).
  #10             Path inconsistency /opt/int-p01/ vs /opt/intp01/
                  → unified on /opt/int_p01_anythingllm/ throughout
                  README, runbook, scripts, cloud-init. project_name
                  (terraform var) is `int-p01-anythingllm` (hyphenated);
                  underscore form for paths/volumes is
                  `int_p01_anythingllm`. Matches s004's convention.
  #11             Bash-incompatible `read -rs "VAR?prompt"` (zsh-only)
                  → switched to the canonical zsh-first + bash-fallback
                  pattern: `read -rs "VAR?prompt" 2>/dev/null || read -rsp …`
                  (matches global CLAUDE.md secrets pattern).
  #12, #13        .terraform.lock.hcl gitignored → unignored at both
                  the sites/.gitignore root and the per-site .gitignore.
                  Lock file is now tracked for reproducible provider
                  versions across machines + CI runs.
  #14             backup.sh remote mode not wrapped in infisical run
                  → adopted s004's backup.sh which sources the droplet's
                  .infisical-auth.env over SSH and re-execs itself under
                  `infisical run` so SPACES_* are injected for the S3
                  upload step. Requires INFISICAL_PROJECT_ID env var in
                  remote mode.
  #15             README restore example used `anythingllm-ai_backup_…`
                  template placeholder → replaced the entire
                  "Migration from Helm/Kubernetes" section with a
                  pointer to MIGRATION_RUNBOOK.md (which uses the real
                  `int-p01-anythingllm_backup_<TS>` naming).

Migration artifacts preserved + updated:
- MIGRATION_RUNBOOK.md: replaced ssh + manual infisical-run restore
  invocations with the new Path C flow (INFISICAL_PROJECT_ID=<id>
  ./scripts/deploy.sh for app layer, then ./scripts/restore.sh for
  the DOKS data swap). Added explicit Layer 2 rotation verification
  step. Phase 1.5 local-laptop dry-run pinned to :1.7.2 and renamed
  volumes/networks to match production (int_p01_anythingllm_*).
- scripts/migrate-from-doks.sh: PROJECT_NAME → int-p01-anythingllm
  so the produced tarball matches what restore.sh on the droplet
  expects. End-of-script "next step" instructions updated to call
  the new restore.sh wrapper instead of raw ssh + infisical run.

CHANGELOG resolution from rebase: merged the otel-agent additions
from main with the INT-P01 + ADR-005 entries; ordered newest-first.

Rebased onto origin/main (commit 455be2a). The branch is now a
linear 2-commit history: feat (site) + docs (ADR), with this third
commit on top covering the full refactor + review-feedback round.
User said "we will squash later" so commits are kept granular.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ncimino added a commit that referenced this pull request May 27, 2026
#36)

* feat(anythingllm-docker): add INT-P01 (ai.weown.agency) DOKS->Docker migration plan

Generates anythingllm-docker/sites/ai.weown.agency/ from the existing copier
template (project_name=int-p01, WeOwnLLM hardened image) and adds the tooling
to migrate INT-P01 off DOKS via a parallel-build + DNS-cutover pattern:

- scripts/migrate-from-doks.sh - one-shot bridge that kubectl-execs into the
  live DOKS pod, streams /app/server/storage out as a tarball, and wraps it
  in the same skinny-backup layout the template's restore.sh already
  understands. Optional --upload-to-spaces stages the artifact at
  s3://weown-backups/int-p01/ for redundancy.
- MIGRATION_RUNBOOK.md - phased runbook: inventory/freeze, staging droplet
  provision (temporary hostname), DOKS extraction, restore, Jason/Yonks
  staging validation, production cutover, 7-day soak, rollback path.
- anythingllm-docker/sites/README.md - directory-level explainer matching
  the existing keycloak-docker/sites/ convention.
- anythingllm-docker/sites/.gitignore - blocks terraform state, real
  tfvars, backup tarballs, and stray .env files from being committed.

Source plan: D383 / Tuleap A174 (#1238). Trigger: Signal #WeOwn.Dev ask
from Jason 2026-05-21 (SearXNG broken on DOKS for the Calhoun MetaAgent).
DOKS instance is never modified during the migration - rollback is a DNS
flip until decommission (T+7 days post-cutover).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(adr): add ADR-005 for INT-P01 DOKS retirement; runbook + Caddyfile updates

Folds in feedback on the migration plan:

- Caddyfile + cloud-init: dual-hostname from first boot
  ('ai-stage.weown.agency, ai.weown.agency' in one site block) so the
  production cutover is a DNS A-record swap on the same droplet -
  no re-deploy of compose or Caddyfile required at cutover.
- Runbook: replaces the previous int-p01-new.ccc.bot staging hostname
  with ai-stage.weown.agency (same parent zone), simplifies Phase 6
  accordingly, and adds Phase 1.5 - an optional local-laptop dry-run
  that round-trips the DOKS backup through restore.sh against a
  throwaway docker container before any droplet exists.
- Image-path open question removed: always reg.mini.dev/anythingllm:latest,
  with a note that 'mini_key' is an API key fragment that must come from
  Infisical (A126) or DOCR (D341), never embedded in the URL.
- ADR-005 (Proposed): decision record for the retirement, the
  parallel-build + DNS-cutover pattern, two human validation gates
  (Phase 4 Jason/Yonks soak, Phase 6 CTO cutover approval), and
  compliance mappings across NIST CSF 2.0, SOC 2, ISO/IEC 27001:2022,
  ISO/IEC 42001:2023, CIS Controls v8. Status flips to Accepted at the
  close of the 7-day post-cutover soak.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(int-p01): adopt Path C + Layer 2 standard; address all Copilot PR #36 findings

Re-bases the INT-P01 site on the s004 reference pattern (Path C slim
cloud-init + Layer 2 bootstrap-secret rotation; see
docs/INFRA_BOOTSTRAP_PATTERN.md), and closes every inline review comment
left by copilot-pull-request-reviewer on PR #36 (15 comments, all fixed).

Path C + Layer 2 adoption (single biggest change):
- terraform/templates/cloud-init.yaml: now ONLY handles first-boot
  bootstrap — Docker, Infisical CLI (artifacts-cli.infisical.com apt
  repo, not the legacy install-cli.sh capped at v0.38), the v1 → v2
  Machine Identity rotation via Infisical Universal Auth API (revokes
  the v1 secret embedded in terraform state + DO droplet metadata
  within minutes of provisioning), and a .bootstrap-complete marker.
  Compose.yaml + Caddyfile + backup.sh + cron NO LONGER ship with
  cloud-init — they live in ansible/deploy.yml.
- ansible/deploy.yml (new): owns all post-bootstrap state on the
  droplet. Asserts .bootstrap-complete + .infisical-auth.env exist,
  uploads compose+Caddyfile+backup.sh, installs daily backup cron
  with logrotate, runs docker compose up under infisical run,
  pulls images via community.docker.docker_image_pull (no SDK
  required on the droplet), updates DO tags (commit-<sha> +
  skinny-backup) via scripts/tag-droplet.sh, waits for /api/ping
  health. Idempotent — re-runnable any time without tofu taint.
- scripts/deploy.sh: now a thin `ansible-playbook` wrapper requiring
  INFISICAL_PROJECT_ID env var; installs community.docker:==3.13.0
  collection if missing.
- terraform/backend.tf + init.sh (new): DO Spaces remote tofu state
  backend (SSE-C encrypted, S3-compatible). init.sh reads spaces_*
  credentials from terraform.tfvars and forwards them to
  `tofu init -backend-config=`.
- terraform/main.tf: adds lifecycle ignore_changes = [user_data, tags]
  so the runtime tag mutations from ansible + bootstrap scripts stick.
- terraform/variables.tf: adds spaces_access_key, spaces_secret_key,
  spaces_encryption_key, ssh_source_cidrs.
- docker/compose.prod.yaml: bind-mounts /var/log/caddy into Caddy
  so the otel-agent filelog receiver can ship logs and they survive
  container recreation.

Caddyfile dual-hostname preserved across the refactor:
  ai-stage.weown.agency, ai.weown.agency { … }
Production cutover (Phase 6) is a pure DNS A-record swap on the same
droplet — Caddy already has the cert for both names in one site block.

Copilot review findings (PR #36) all addressed:
  #1, #2, #4, #5  Empty Infisical values in cloud-init/deploy/restore
                  → fixed by Path C; cloud-init now uses HCL templatefile
                  substitution (${infisical_*}) that resolves at tofu
                  apply time from var.* — no more pre-baked empty strings.
  #3, #6, #7      Floating image tags (reg.mini.dev/anythingllm:latest)
                  → pinned to :1.7.2 (same as s004; documented as the
                  WeOwnLLM hardened-image version Shahid verified on
                  s004.ccc.bot).
  #8              ADR #WeOwnVer mis-computed → v3.5.5.1 → v3.4.5.1.
                  May = month 4 of S3, ISO W22 - W18 + 1 = offset 5,
                  iteration 1 → v3.4.5.1. Math shown inline in the
                  Version line per VERSIONING_WEOWNVER.md.
  #9              Broken link to private notes/Perpetuator/... in a
                  public repo → replaced with in-repo references
                  (Tuleap A174 / #1238 + the in-repo runbook).
  #10             Path inconsistency /opt/int-p01/ vs /opt/intp01/
                  → unified on /opt/int_p01_anythingllm/ throughout
                  README, runbook, scripts, cloud-init. project_name
                  (terraform var) is `int-p01-anythingllm` (hyphenated);
                  underscore form for paths/volumes is
                  `int_p01_anythingllm`. Matches s004's convention.
  #11             Bash-incompatible `read -rs "VAR?prompt"` (zsh-only)
                  → switched to the canonical zsh-first + bash-fallback
                  pattern: `read -rs "VAR?prompt" 2>/dev/null || read -rsp …`
                  (matches global CLAUDE.md secrets pattern).
  #12, #13        .terraform.lock.hcl gitignored → unignored at both
                  the sites/.gitignore root and the per-site .gitignore.
                  Lock file is now tracked for reproducible provider
                  versions across machines + CI runs.
  #14             backup.sh remote mode not wrapped in infisical run
                  → adopted s004's backup.sh which sources the droplet's
                  .infisical-auth.env over SSH and re-execs itself under
                  `infisical run` so SPACES_* are injected for the S3
                  upload step. Requires INFISICAL_PROJECT_ID env var in
                  remote mode.
  #15             README restore example used `anythingllm-ai_backup_…`
                  template placeholder → replaced the entire
                  "Migration from Helm/Kubernetes" section with a
                  pointer to MIGRATION_RUNBOOK.md (which uses the real
                  `int-p01-anythingllm_backup_<TS>` naming).

Migration artifacts preserved + updated:
- MIGRATION_RUNBOOK.md: replaced ssh + manual infisical-run restore
  invocations with the new Path C flow (INFISICAL_PROJECT_ID=<id>
  ./scripts/deploy.sh for app layer, then ./scripts/restore.sh for
  the DOKS data swap). Added explicit Layer 2 rotation verification
  step. Phase 1.5 local-laptop dry-run pinned to :1.7.2 and renamed
  volumes/networks to match production (int_p01_anythingllm_*).
- scripts/migrate-from-doks.sh: PROJECT_NAME → int-p01-anythingllm
  so the produced tarball matches what restore.sh on the droplet
  expects. End-of-script "next step" instructions updated to call
  the new restore.sh wrapper instead of raw ssh + infisical run.

CHANGELOG resolution from rebase: merged the otel-agent additions
from main with the INT-P01 + ADR-005 entries; ordered newest-first.

Rebased onto origin/main (commit 455be2a). The branch is now a
linear 2-commit history: feat (site) + docs (ADR), with this third
commit on top covering the full refactor + review-feedback round.
User said "we will squash later" so commits are kept granular.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Nik <nik.cimino@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants