Auto-PR: Merge branch 'feature/wordpress-docker-copier-template' into feature/wp-hardening-shd#19
Conversation
| @@ -1,3 +1,7 @@ | |||
| # trivy:ignore:AVD-KSV-0109 -- PHP source code references $phpmailer->Password as a variable | |||
| @@ -1,3 +1,7 @@ | |||
| # trivy:ignore:AVD-KSV-0109 -- PASSWORD_ITERATIONS is a numeric PBKDF2 iteration count | |||
|
Closing governance spam PR. Bug fix in progress. |
ncimino
left a comment
There was a problem hiding this comment.
WordPress Helm chart review
Overall: the intent is right (codified PHP limits + edge blocks for sensitive files + multi-site values strategy), but a few items are silently broken at render-time and should be addressed before merge. I verified everything below with helm template and helm lint against the actual chart.
Block on:
- CHANGELOG version regression —
3.2.7 (2026-05-06)is inserted ABOVE3.3.8 (2025-11-07), which is older. Either the chart version should be bumped past3.3.8or the entry is in the wrong place. - Per-site
ingress.tls:block is a no-op —helm lint -f values-burnedout.yamlwarns:cannot overwrite table with non table for wordpress.ingress.tls. The chart definesingress.tlsas a map (protocols/ciphers); per-site files override it with a K8s-style list. Helm rejects the merge — the renderedsecretNamestays hardcoded towordpress-tls. Both per-site files are affected. server-snippetis disabled by default in ingress-nginx ≥1.9 — the primary hardening claim (Task #264 edge blocks) may be silently ignored on any modern cluster. Needscontroller.allowSnippetAnnotations: trueANDcontroller.annotations-risk-level: Critical, or a different mechanism.
Should address:
auto_prepend_file = '/var/www/html/wordfence-waf.php'on fresh deploys — file doesn't exist until Wordfence is installed; PHP warns per request.APACHE_HTTP_PORTchanged from8080to80— running as root to bind privileged port, opposite direction from container hardening best practice.- The
# {{ ... }}yamllint workaround pattern — works but produces noisy rendered YAML (verified viahelm template). - Edge block regex is missing common sensitive paths (xmlrpc, wp-config backups, readme/license, debug.log, .svn).
Pre-existing (worth flagging): policy/v1beta1 PodDisruptionBudget is unavailable in K8s ≥1.25 (helm lint warning).
Inline comments below have specifics.
Verified rendering with helm template / helm lint (helm v4.1.4) on the head commit.
| @@ -5,37 +5,51 @@ All notable changes to this WordPress deployment will be documented in this file | |||
| The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), | |||
| and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). | |||
|
|
|||
| ## [3.2.7] - 2026-05-06 | |||
There was a problem hiding this comment.
Retracted (2026-05-13). My claim that this is a version regression was wrong. I assumed main's Chart.yaml was at 3.3.8 based on the latest CHANGELOG header, but main's actual Chart.yaml is version: 3.2.6 (last bump: 4294d03 fix(wordpress): CRITICAL - prevent password regeneration on upgrades (v3.2.6)). The PR's 3.2.6 → 3.2.7 is a correct forward bump.
What threw me off: this CHANGELOG already has [3.3.8], [3.3.7], … entries dating to Oct/Nov 2025, even though Chart.yaml was never bumped past 3.2.6. That's a pre-existing repo drift, not this PR's responsibility. CI's version-consistency check passes anyway because it only verifies the Chart version string appears somewhere in CHANGELOG.md, not that it matches the latest entry. I'll open a separate PR to tighten that check.
Apologies for the noise on this thread.
| @@ -2,23 +2,23 @@ apiVersion: v2 | |||
| name: wordpress | |||
| description: Enterprise-grade WordPress with enhanced security, zero-trust networking, and automated TLS | |||
| type: application | |||
| version: 3.2.6 | |||
| version: 3.2.7 | |||
There was a problem hiding this comment.
Retracted (2026-05-13). Same root mistake as my CHANGELOG comment — main's Chart.yaml is at 3.2.6, so the 3.2.6 → 3.2.7 bump here is correct. There's pre-existing drift between Chart.yaml and the CHANGELOG on main (top entry 3.3.8 vs. chart 3.2.6), but that's outside this PR's scope. A follow-up PR will tighten the CI check that should have caught the drift.
| ingress: | ||
| enabled: true | ||
| className: "nginx" | ||
| tls: |
There was a problem hiding this comment.
Critical: this tls: block is silently rejected by Helm.
helm lint -f wordpress/helm/values-burnedout.yaml emits:
warning: cannot overwrite table with non table for wordpress.ingress.tls
The chart's values.yaml defines ingress.tls as a map (with protocols and ciphers keys, used for nginx annotations). This file overrides it with a list (K8s-style secretName/hosts). Helm refuses to merge a list onto a map and keeps the original map — so secretName: burnedout-tls and the hosts: entry below it have no effect on the rendered output.
I confirmed by rendering: the resulting Ingress still uses secretName: wordpress-tls (hardcoded in templates/ingress.yaml:55), and the hosts come from .Values.wordpress.domain + includeWWW.
Fix options:
- Easiest: delete this entire
tls:block from both per-site files. It's dead code that just produces a helm warning. The cert secret will be namedwordpress-tlsin the site's namespace, which is fine for per-namespace installs. - Cleaner: add a
wordpress.tlsSecretNamevalue (defaultwordpress-tls) and read it intemplates/ingress.yaml:55, then set it per site.
| ingress: | ||
| enabled: true | ||
| className: "nginx" | ||
| tls: |
There was a problem hiding this comment.
Same issue as values-burnedout.yaml:10 — the tls: list silently fails to override the chart's ingress.tls map. secretName: ptoken-tls is ignored; the rendered Ingress uses the hardcoded wordpress-tls. Delete this block or wire up a wordpress.tlsSecretName value.
| {{- toYaml .Values.ingress.annotations | nindent 4 }} | ||
| # {{ toYaml .Values.ingress.annotations | nindent 4 }} | ||
| # Block sensitive files at the edge (Task #264) | ||
| nginx.ingress.kubernetes.io/server-snippet: | |
There was a problem hiding this comment.
Critical: this annotation is disabled by default in modern ingress-nginx.
After CVE-2022-4886 / CVE-2023-5043, ingress-nginx ≥1.9 ships with --allow-snippet-annotations=false as the default. Without explicit cluster-level opt-in, this entire server-snippet block is silently ignored, and the file-blocking that this PR centers on (Task #264) doesn't actually happen at the edge.
To make this work, the cluster's ingress-nginx ConfigMap needs both:
controller:
allowSnippetAnnotations: "true"
config:
annotations-risk-level: "Critical" # or include "snippets" in the allowed risk levelsBetter alternatives that don't require enabling raw nginx snippets:
- Block at the application layer (Apache
<Files>directives baked into the image, or a small mu-plugin). - Use
nginx.ingress.kubernetes.io/whitelist-source-range+ a separate locked-down Ingress for management paths. - If you stick with snippets, document the cluster prerequisite in
wordpress/CHANGELOG.mdand ideally add a values-driven toggle so charts that hit a strict cluster don't render dead annotations.
| # {{ toYaml .Values.ingress.annotations | nindent 4 }} | ||
| # Block sensitive files at the edge (Task #264) | ||
| nginx.ingress.kubernetes.io/server-snippet: | | ||
| location ~* /\.(git|env|user\.ini|htaccess) { |
There was a problem hiding this comment.
High: edge block list is sparse for a hardening PR.
Common sensitive paths that are also worth 403'ing:
xmlrpc.php— defense-in-depth alongside thedisableXmlRpcmu-pluginwp-config.php~,wp-config.php.bak,wp-config.php.old,wp-config.php.swp— editor backupsreadme.html,license.txt— WordPress version fingerprinting\.(svn|hg|bzr)— other VCS metadatadebug.log,error_log— WordPress writes these intowp-contentby default ifWP_DEBUG_LOGflips oncomposer\.(json|lock),package(-lock)?\.json,\.env\.example— dependency / config leakage(phpinfo|info|test)\.php— common scanner targets
Also: the current regex /\.(git|env|user\.ini|htaccess) matches /foo/.git/HEAD etc. ✓ — but for the bare filenames like xmlrpc.php or readme.html you'll need a separate location block (or a combined regex with |^/(xmlrpc\.php|readme\.html|license\.txt)$).
| @@ -32,37 +43,37 @@ metadata: | |||
| nginx.ingress.kubernetes.io/rate-limit-window: "1m" | |||
| nginx.ingress.kubernetes.io/rate-limit-connections: "10" | |||
There was a problem hiding this comment.
High: missing standard security response headers.
If this is a hardening pass, the obvious additions for an HTTPS-terminating ingress are:
nginx.ingress.kubernetes.io/configuration-snippet: |
more_set_headers "Strict-Transport-Security: max-age=63072000; includeSubDomains; preload";
more_set_headers "X-Content-Type-Options: nosniff";
more_set_headers "X-Frame-Options: SAMEORIGIN";
more_set_headers "Referrer-Policy: strict-origin-when-cross-origin";
more_set_headers "Permissions-Policy: geolocation=(), microphone=(), camera=()";Same caveat as the server-snippet comment above — configuration-snippet is also gated by allow-snippet-annotations. If you go the non-snippet route, the only one of these you can set without snippets is HSTS via nginx.ingress.kubernetes.io/hsts: "true" + related controls.
| @@ -1,27 +1,38 @@ | |||
| {{- if .Values.ingress.enabled -}} | |||
| # {{ if .Values.ingress.enabled }} | |||
There was a problem hiding this comment.
Updated (2026-05-13). I had the right diagnosis but suggested the wrong fix.
The repo already has .yamllint.yml with ignore: **/helm/templates/** (line 8–10), and .pre-commit-config.yaml excludes .*/templates/.*\.yaml$ (line 24). CI's YAML Lint step uses config_file: .yamllint.yml. yamllint does not lint these Helm templates anywhere in this repo's tooling.
That means the two fix yamllint commits in this PR (52bc0a2 fix: resolve yamllint line endings and comment indentation, 488ebfb fix: resolve yamllint line endings) are addressing a problem that doesn't exist in CI. The most likely scenario: the author ran raw yamllint locally without -c .yamllint.yml and "fixed" the false-positive warnings, damaging template syntax in the process.
Recommended actual fix: revert wordpress/helm/templates/ingress.yaml and wordpress/helm/templates/php-config-configmap.yaml to their pre-52bc0a2 state (and the yamllint-cosmetic changes to values.yaml). The templates already pass helm lint and helm template correctly — I verified locally. The other findings in this review (per-site tls: schema, server-snippet default-off, auto_prepend_file, APACHE_HTTP_PORT/root) stand and are independent of this point.
(My earlier suggestion of adding a yamllint config was wrong — it already exists.)
| max_execution_time = 300 | ||
| max_input_time = 300 | ||
| memory_limit = 256M | ||
|
|
||
| ; Wordfence WAF Hardening: Forces loading via PHP-FPM | ||
| auto_prepend_file = '/var/www/html/wordfence-waf.php' |
There was a problem hiding this comment.
High: auto_prepend_file points to a file that doesn't exist on fresh deploys.
/var/www/html/wordfence-waf.php is created by the Wordfence plugin's WAF installer. Until Wordfence is installed and its WAF mode is activated, this file is missing. PHP behavior in that case: emit a Warning: Failed opening '/var/www/html/wordfence-waf.php' for inclusion on every request, then proceed.
Net effect on a fresh deploy:
- Request handling still works.
error_logfills with warnings (one per request).- Anything tailing the container logs for security signal gets a lot of noise.
- If
display_errorsis ever on (e.g. someone enables WP_DEBUG_DISPLAY), the warning leaks to clients.
Options:
- Make this conditional on a values toggle, e.g.
wordpress.security.wordfenceWafAutoPrepend(defaultfalse). - Or include a stub
wordfence-waf.phpin the chart that's a<?phpno-op until Wordfence overwrites it. - Or use
php_admin_value[auto_prepend_file] = ...in a separate config only mounted when Wordfence is enabled.
| - name: WORDPRESS_ENABLE_REDIS | ||
| value: "yes" # Enable Redis since we re-enabled it | ||
| - name: APACHE_HTTP_PORT | ||
| value: "80" |
There was a problem hiding this comment.
High: bind to privileged port + root user is the opposite of hardening.
APACHE_HTTP_PORT was 8080 (non-privileged → can run as non-root); now 80 (privileged → needs root, which is set on line 148: runAsUser: 0).
For a PR titled wp-hardening-shd, this is a regression from container-security best practice. Apache can setcap CAP_NET_BIND_SERVICE to bind 80 without root, but the cleaner path is keep Apache on 8080 and let the K8s Service map 80 → 8080. The chart's service.port: 80 and service.targetPort: 80 would need to change to targetPort: 8080.
Was this intentional? If yes, the PR description should justify the root requirement (e.g. a plugin that needs root). If no, revert: value: "8080" and set service.targetPort: 8080 in values.yaml:264.
|
Follow-up: opened #21 to tighten the version-consistency CI check (so future drift like the one my retracted comments touched on gets caught) and to align main's drifted Chart.yaml files. Once #21 merges, this PR will conflict on |
There was a problem hiding this comment.
wordpress/helm/values.yaml — [Copilot] networkPolicy: is declared twice and the later block sets enabled: false, which will disable the NetworkPolicy resource. This negates the earlier zero-trust configuration. Remove the duplicate and ensure enabled: true is set in the single consolidated block.
There was a problem hiding this comment.
wordpress/helm/values.yaml — [Copilot] The chart-level automountServiceAccountToken: false value appears unused by the templates (the Deployment uses .Values.wordpress.security.automountServiceAccountToken and the ServiceAccount uses .Values.serviceAccount.automount). Consider removing this top-level key or wiring it into the templates.
There was a problem hiding this comment.
wordpress/helm/templates/ingress.yaml — [Copilot] The default TLS protocols include TLSv1.2 (ssl-protocols: "TLSv1.2 TLSv1.3"). Other Helm charts in this repo default to TLS 1.3 only. Change to "TLSv1.3" to match the repository hardening baseline.
There was a problem hiding this comment.
wordpress/helm/values.yaml — [Copilot] ingress.tls is defined twice under ingress: (first as a map with protocols/ciphers, then again as a list with secretName/hosts). The second tls: overwrites the first, so .Values.ingress.tls.protocols/.ciphers won't exist at template render time. Rename one of the keys or consolidate.
There was a problem hiding this comment.
wordpress/helm/Chart.yaml — [Copilot] Chart version is bumped to 3.2.7, but this PR doesn't include a corresponding entry in wordpress/CHANGELOG.md. Please add a changelog entry describing the changes included in this version.
There was a problem hiding this comment.
wordpress/helm/templates/php-config-configmap.yaml — [Copilot] auto_prepend_file is set to /var/www/html/wordfence-waf.php, but this file won't exist unless Wordfence WAF is installed/configured. This can generate warnings on every request (and may break PHP execution). Make this conditional via a values flag (e.g., wordpress.wordfence.enabled).
There was a problem hiding this comment.
wordpress/helm/templates/ingress.yaml — [Copilot] This template unconditionally adds nginx.ingress.kubernetes.io/server-snippet. Many security-hardened clusters disable snippet annotations (after CVE-2022-4886). Consider making this opt-in via values, or use an approach that doesn't rely on snippet annotations.
There was a problem hiding this comment.
wordpress/helm/values.yaml — [Copilot] wordpress: is declared twice in this values file. In YAML, the later wordpress: block overwrites the earlier one, which drops settings like enableMultisite, wordpressTablePrefix, and wordpressExtraWpConfigContent. Consolidate into a single wordpress: map.
There was a problem hiding this comment.
wordpress/helm/values-burnedout.yaml — [@ncimino] Critical: this tls: block is silently rejected by Helm. helm lint -f wordpress/helm/values-burnedout.yaml emits: warning: cannot overwrite table with non table for wordpress.ingress.tls. The secretName: burnedout-tls override is ignored; the rendered Ingress uses the hardcoded wordpress-tls. Fix the ingress template to iterate over .Values.ingress.tls (list) to resolve this.
There was a problem hiding this comment.
wordpress/helm/values-ptoken.yaml — [@ncimino] Same issue as values-burnedout.yaml — the tls: list silently fails to override the chart's ingress.tls map. secretName: ptoken-tls is ignored; the rendered Ingress uses the hardcoded wordpress-tls.
There was a problem hiding this comment.
wordpress/helm/templates/ingress.yaml — [@ncimino] Critical: server-snippet annotation is disabled by default in modern ingress-nginx. After CVE-2022-4886 / CVE-2023-5043, ingress-nginx ≥1.9 ships with --allow-snippet-annotations=false. The snippet will be silently ignored or the Ingress rejected outright.
There was a problem hiding this comment.
wordpress/helm/templates/ingress.yaml — [@ncimino] High: edge block list is sparse for a hardening PR. Common sensitive paths worth 403'ing: xmlrpc.php (defense-in-depth alongside disableXmlRpc mu-plugin), wp-config.php, and wp-login.php rate limiting.
There was a problem hiding this comment.
wordpress/helm/templates/ingress.yaml — [@ncimino] High: missing standard security response headers. For a hardening pass, add: X-Frame-Options: SAMEORIGIN, X-Content-Type-Options: nosniff, Strict-Transport-Security: max-age=31536000; includeSubDomains, and a Content-Security-Policy header via nginx.ingress.kubernetes.io/configuration-snippet.
There was a problem hiding this comment.
wordpress/helm/templates/php-config-configmap.yaml — [@ncimino] High: auto_prepend_file points to a file that doesn't exist on fresh deploys. /var/www/html/wordfence-waf.php is created by the Wordfence plugin's WAF installer. Until Wordfence is installed and activated, PHP will emit a warning on every request (or error out if display_errors=On). Gate this behind a values flag.
There was a problem hiding this comment.
wordpress/helm/values.yaml — [@ncimino] High: bind to privileged port + root user is the opposite of hardening. APACHE_HTTP_PORT was 8080 (non-privileged → can run as non-root); it's now 80 (privileged → needs root, which is explicitly set). Consider keeping 8080 internally and having the Service map 80→8080, allowing non-root operation.
romandidomizio
left a comment
There was a problem hiding this comment.
PR 19 Review — WordPress Hardening (feature/wp-hardening-shd → main)
Main merged in. After the merge, 7 unique files remain in the diff vs. main — the WordPress-specific helm chart changes: wordpress/helm/values.yaml, wordpress/helm/templates/ingress.yaml, wordpress/helm/templates/php-config-configmap.yaml, wordpress/helm/values-ptoken.yaml, wordpress/helm/values-burnedout.yaml, wordpress/helm/Chart.yaml, wordpress/CHANGELOG.md.
Relationship to PR #15: PR #15 is
feature/wp-hardening-shd→feature/wordpress-docker-copier-template(the intermediate staging PR). This PR (#19) is the canonical merge tomain. All issues flagged below apply to both.
❌ Unresolved Copilot / Inline Review Comments
wordpress/helm/values.yaml
1. networkPolicy: defined twice — second block DISABLES NetworkPolicy (Copilot)
The first networkPolicy: block correctly defines zero-trust ingress/egress rules. A second networkPolicy: block later in the file sets enabled: false, silently overwriting the first and disabling NetworkPolicy entirely. Consolidate into a single block with enabled: true.
2. ingress.tls defined twice — protocols/ciphers silently dropped (Copilot)
ingress: contains two tls: keys: first defines protocols/ciphers, second is a list with secretName/hosts. YAML keeps only the second, so .Values.ingress.tls.protocols and .ciphers don't exist at render time. Rename one key (e.g., tlsConfig:) or merge into a single structure.
3. service: defined twice — earlier port definition ignored (Copilot)
Only the last service: takes effect. Merge into a single block.
4. backup: defined twice — resources block dropped (Copilot)
The last backup: definition wins, dropping fields from the first. Consolidate.
5. wordpress: defined twice — enableMultisite, wordpressTablePrefix, wordpressExtraWpConfigContent silently dropped (Copilot)
Second wordpress: overwrites the first. Merge into one block.
6. Top-level automountServiceAccountToken: false unused by templates (Copilot)
Templates reference .Values.wordpress.security.automountServiceAccountToken and .Values.serviceAccount.automount. The top-level key is dead config. Remove it or wire it into the appropriate template.
7. APACHE_HTTP_PORT: 8080 vs container port 80 (Copilot, @ncimino)
The container port, Service targetPort, and probes are all configured for port 80, but APACHE_HTTP_PORT is 80 (changed from the safer 8080). Port 80 requires root — this is the opposite of hardening. @ncimino notes: "APACHE_HTTP_PORT was 8080 (non-privileged → can run as non-root); now 80 (privileged → needs root)." Either keep 8080 and adjust the Service/probes, or document the explicit decision to run as root for Apache port binding.
8. wordpress.security runs as root (Copilot)
runAsUser: 0, runAsGroup: 0, runAsNonRoot: false conflicts with the repo's PSS restricted hardening goal. This is compounded by item 7 above. Document the exception explicitly if root is intentionally required for Apache port 80 binding.
wordpress/helm/templates/ingress.yaml
9. TLS 1.2 included — should be TLSv1.3 only (Copilot)
ssl-protocols: "TLSv1.2 TLSv1.3" — every other chart in this repo enforces TLS 1.3 only. Change to "TLSv1.3".
10. server-snippet annotation disabled by default in ingress-nginx ≥1.9 (Copilot, @ncimino)
After CVE-2022-4886 / CVE-2023-5043, ingress-nginx ships with --allow-snippet-annotations=false as default. This annotation will be silently ignored or rejected by a security-hardened ingress. Make it opt-in via a values flag.
11. Block list sparse — missing common attack paths (@ncimino)
The server-snippet block protects some paths but is missing: xmlrpc.php (defense-in-depth alongside the disableXmlRpc mu-plugin), wp-config.php, and wp-login.php rate limiting. Add these for a complete hardening pass.
12. Missing standard security response headers (@ncimino)
A hardening PR should add X-Frame-Options, X-Content-Type-Options, Strict-Transport-Security, and Content-Security-Policy headers. These are standard for HTTPS-terminating ingresses and align with SOC2 CC6.8.
13. TLS secret hardcoded as wordpress-tls (Copilot)
secretName: wordpress-tls is hardcoded. values-ptoken.yaml and values-burnedout.yaml set ingress.tls[0].secretName to site-specific values, but the template ignores them. Parameterize to iterate over .Values.ingress.tls.
wordpress/helm/templates/php-config-configmap.yaml
14. auto_prepend_file points to wordfence-waf.php — file won't exist on fresh deploys (Copilot, @ncimino)
/var/www/html/wordfence-waf.php is created by Wordfence WAF installer. Until Wordfence is installed, this generates PHP warnings on every request (or breaks execution depending on error handling). Gate behind a values flag (e.g., wordpress.wordfence.enabled).
wordpress/helm/values-ptoken.yaml and values-burnedout.yaml
15. tls: list silently rejected by Helm (@ncimino)
helm lint -f values-burnedout.yaml (and ptoken equivalent) emits a warning: cannot overwrite table with non table for wordpress.ingress.tls. The tls: list override fails silently — secretName is ignored, rendered Ingress uses hardcoded wordpress-tls. Fix item 13 above to resolve this.
🔒 Trivy / Security Scan Findings
16. wordpress/helm/templates/mu-plugins-configmap.yaml — ConfigMap with sensitive content (Trivy AVD-KSV-01010, MEDIUM)
Trivy flagged potential sensitive content in the ConfigMap. Review what's stored in wordpress-mu-plugins and ensure no credentials, API keys, or secrets are embedded. Use Kubernetes Secrets or ExternalSecrets for sensitive values.
17. vaultwarden/helm/templates/configmap.yaml — ConfigMap with sensitive content (Trivy AVD-KSV-01010, MEDIUM)
Same finding for the vaultwarden ConfigMap. This file is in scope here because it's touched by this PR's branch. Verify no sensitive data is stored in plain ConfigMap fields.
📋 Documentation
18. Chart version 3.2.7 has no corresponding CHANGELOG entry
(Note: @ncimino retracted the version regression concern — 3.2.6 → 3.2.7 bump is correct. However, wordpress/CHANGELOG.md still needs a ## [3.2.7] entry describing the hardening changes for the documentation validation CI check to pass.)
Summary
The duplicate YAML key issues (items 1–6) are the most critical — they silently disable NetworkPolicy and drop configuration. Items 9–12 are security/hardening regressions in a PR that is explicitly a hardening pass. Items 14–15 will cause PHP errors on fresh deploys. The Trivy findings and CHANGELOG gap need resolution before merge to main. All Copilot and @ncimino inline comments remain unresolved. @mshahid538 please address before re-requesting review.
- ADR-004: add blank line before bulleted list (MD032) - workflows/README.md: switch *emphasis* to _emphasis_ for style consistency (MD049) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| ## Extra environment variables | ||
| extraEnvVars: | ||
| - name: WORDPRESS_ENABLE_REDIS | ||
| value: "yes" # Enable Redis since we re-enabled it | ||
| - name: APACHE_HTTP_PORT | ||
| value: "8080" | ||
| - name: WORDPRESS_CONFIG_EXTRA | ||
| value: | | ||
| define('WP_CACHE', true); | ||
| define('DISABLE_FILE_EDITING', true); | ||
| define('DISALLOW_FILE_MODS', true); | ||
| define('FORCE_SSL_ADMIN', true); | ||
| define('WP_AUTO_UPDATE_CORE', true); | ||
| - name: WORDPRESS_ENABLE_REDIS | ||
| value: "yes" # Enable Redis since we re-enabled it | ||
| - name: APACHE_HTTP_PORT | ||
| value: "80" | ||
| - name: WORDPRESS_CONFIG_EXTRA | ||
| value: | | ||
| define('WP_CACHE', true); | ||
| define('DISABLE_FILE_EDITING', true); | ||
| define('DISALLOW_FILE_MODS', true); | ||
| define('FORCE_SSL_ADMIN', true); | ||
| define('WP_AUTO_UPDATE_CORE', true); |
| ## Container / Pod Security Context (Production Ready) | ||
| automountServiceAccountToken: false | ||
| runAsUser: 0 # Required for Apache port 80 binding | ||
| runAsUser: 0 # Required for Apache port 80 binding | ||
| runAsGroup: 0 | ||
| runAsNonRoot: false | ||
| fsGroup: 33 # www-data for file permissions | ||
| fsGroup: 33 # www-data for file permissions | ||
| readOnlyRootFilesystem: false | ||
| allowPrivilegeEscalation: false | ||
| capabilities: {} # Use default capabilities for WordPress | ||
| capabilities: {} # Use default capabilities for WordPress |
| ## TLS Security Configuration (Parameterized for flexibility) | ||
| tls: | ||
| protocols: "TLSv1.2 TLSv1.3" # Both for maximum compatibility (99.9% devices) | ||
| protocols: "TLSv1.2 TLSv1.3" # Both for maximum compatibility (99.9% devices) | ||
| # Cipher suites: Mozilla "Intermediate" profile - balances security with compatibility | ||
| # Supports: Perfect Forward Secrecy, mobile devices (CHACHA20), modern encryption (AES-GCM) | ||
| ciphers: "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305" |
| # {{ toYaml .Values.ingress.annotations | nindent 4 }} | ||
| # Block sensitive files at the edge (Task #264) | ||
| nginx.ingress.kubernetes.io/server-snippet: | | ||
| location ~* /\.(git|env|user\.ini|htaccess) { | ||
| deny all; | ||
| return 403; | ||
| } |
| tls: | ||
| - hosts: | ||
| - {{ .Values.wordpress.domain | quote }} | ||
| {{- if .Values.wordpress.includeWWW }} | ||
| - {{ printf "www.%s" .Values.wordpress.domain | quote }} | ||
| {{- end }} | ||
| secretName: wordpress-tls | ||
| - hosts: | ||
| - "{{ .Values.wordpress.domain }}" | ||
| # {{ if .Values.wordpress.includeWWW }} | ||
| - '{{ printf "www.%s" .Values.wordpress.domain }}' | ||
| # {{ end }} | ||
| secretName: wordpress-tls | ||
| rules: |
| memory_limit = 256M | ||
|
|
||
| ; Wordfence WAF Hardening: Forces loading via PHP-FPM | ||
| auto_prepend_file = '/var/www/html/wordfence-waf.php' |
| - Chart version updated to 3.2.7 | ||
| - Maintenance and stability improvements | ||
|
|
||
| ## [3.3.8] - 2025-11-07 |
| tls: | ||
| - secretName: ptoken-tls | ||
| hosts: | ||
| - ptoken.agency |
| tls: | ||
| - secretName: burnedout-tls | ||
| hosts: | ||
| - burnedout.xyz |
- ingress.yaml: parameterize spec.tls[0].secretName from .Values.ingress.tls[0].secretName so per-site overrides (burnedout-tls, ptoken-tls) actually take effect. Falls back to "wordpress-tls" when .Values.ingress.tls is a map or unset (preserves default and TLS hardening map). - php-config-configmap.yaml: gate "auto_prepend_file = wordfence-waf.php" behind .Values.wordpress.wordfence.enabled (default false) to avoid PHP warnings when the plugin is not installed. - ingress.yaml: gate the file-blocking server-snippet annotation behind .Values.ingress.serverSnippet.enabled (default true) so the chart can deploy on hardened controllers that set allow-snippet-annotations: false. - values.yaml: add wordpress.wordfence.enabled and ingress.serverSnippet.enabled. - Chart.yaml: bump 3.2.7 -> 3.3.0 (SemVer + WeOwnVer valid). - wordpress/CHANGELOG.md: document 3.3.0. Verified: helm template renders correctly with default, values-burnedout, and values-ptoken; helm lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses PR #19 review items 11 and 12: - Block xmlrpc.php, wp-config backups, readme.html, license.txt, debug.log, and .svn at the nginx edge (defense-in-depth) - Add X-Frame-Options, X-Content-Type-Options, HSTS, Referrer-Policy, Permissions-Policy, and Content-Security-Policy via configuration-snippet - CSP is configurable per site via ingress.securityHeaders.contentSecurityPolicy - Document wp-login.php rate limiting as controller-level config Chart version 3.3.0 → 3.3.1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| wordpressBlogName: "Enterprise WordPress Site" | ||
| wordpressScheme: https | ||
| domain: "" # Set via --set global.domain or deploy script | ||
| domain: "" # Set via --set global.domain or deploy script |
| enableMultisite: false | ||
| wordpressTablePrefix: "wp_" | ||
| includeWWW: false # Automatically configure www subdomain for main domain | ||
| redirectToWWW: false # Redirect root domain to www subdomain (e.g., example.com → www.example.com) | ||
| redirectFromWWW: false # Redirect www subdomain to root domain (e.g., www.example.com → example.com) |
| automountServiceAccountToken: false | ||
| runAsUser: 0 # Required for Apache port 80 binding | ||
| runAsUser: 0 # Required for Apache port 80 binding | ||
| runAsGroup: 0 | ||
| runAsNonRoot: false | ||
| fsGroup: 33 # www-data for file permissions | ||
| fsGroup: 33 # www-data for file permissions | ||
| readOnlyRootFilesystem: false | ||
| allowPrivilegeEscalation: false | ||
| capabilities: {} # Use default capabilities for WordPress | ||
| capabilities: {} # Use default capabilities for WordPress |
| protocols: "TLSv1.2 TLSv1.3" # Both for maximum compatibility (99.9% devices) | ||
| # Cipher suites: Mozilla "Intermediate" profile - balances security with compatibility | ||
| # Supports: Perfect Forward Secrecy, mobile devices (CHACHA20), modern encryption (AES-GCM) | ||
| ciphers: "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305" |
| upload_max_filesize = 128M | ||
| post_max_size = 128M |
| ## Extra environment variables | ||
| extraEnvVars: | ||
| - name: WORDPRESS_ENABLE_REDIS | ||
| value: "yes" # Enable Redis since we re-enabled it | ||
| - name: APACHE_HTTP_PORT | ||
| value: "8080" | ||
| - name: WORDPRESS_CONFIG_EXTRA | ||
| value: | | ||
| define('WP_CACHE', true); | ||
| define('DISABLE_FILE_EDITING', true); | ||
| define('DISALLOW_FILE_MODS', true); | ||
| define('FORCE_SSL_ADMIN', true); | ||
| define('WP_AUTO_UPDATE_CORE', true); | ||
| - name: WORDPRESS_ENABLE_REDIS | ||
| value: "yes" # Enable Redis since we re-enabled it | ||
| - name: APACHE_HTTP_PORT | ||
| value: "80" | ||
| - name: WORDPRESS_CONFIG_EXTRA | ||
| value: | | ||
| define('WP_CACHE', true); | ||
| define('DISABLE_FILE_EDITING', true); | ||
| define('DISALLOW_FILE_MODS', true); | ||
| define('FORCE_SSL_ADMIN', true); | ||
| define('WP_AUTO_UPDATE_CORE', true); |
| hosts: | ||
| - ptoken.agency |
| hosts: | ||
| - burnedout.xyz |
- Move APACHE_HTTP_PORT from dead wordpress.extraEnvVars to top-level extraEnvVars (deployment.yaml only reads the top-level key) - Remove duplicate WORDPRESS_ENABLE_REDIS (already injected by template when redis.enabled=true) and WORDPRESS_CONFIG_EXTRA (already generated from wordpressExtraWpConfigContent — duplicate silently overwrote the more complete template version) - Align proxy-body-size 64m -> 128m to match PHP upload_max_filesize - Fix comment: global.domain -> wordpress.domain - Remove dead hosts field from per-site tls overrides (template builds hosts from wordpress.domain, not tls list) - Document enableMultisite/redirectFromWWW as not yet wired Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…check The version-consistency check in .github/workflows/validation.yml used `grep -qi $chart_version $changelog_file` (presence-anywhere), which silently passed even when Chart.yaml drifted away from the latest CHANGELOG entry. Concrete case discovered during PR #19 review: wordpress/helm/Chart.yaml was 3.2.6 while wordpress/CHANGELOG.md's top entry was [3.3.8] — the older 3.2.6 string still appeared in past entries, so the check kept passing. Tighten the check to compare Chart.yaml's `version:` against the first `## [X.Y.Z]` header in CHANGELOG.md (the Keep-a-Changelog newest-on-top convention). Three charts had pre-existing drift and would fail the new check on main, so align them in this commit: nextcloud/helm/Chart.yaml 1.0.0 → 1.1.0 vaultwarden/helm/Chart.yaml 1.0.0 → 1.3.1 wordpress/helm/Chart.yaml 3.2.6 → 3.3.8 These are metadata-only bumps: Chart.yaml now matches the version each chart's own CHANGELOG already declares as latest. No template or values changes. Not addressed in this commit (separate decisions needed): - WeOwnVer format inconsistency between docs/VERSIONING_WEOWNVER.md (SEASON.MONTH.WEEK.ITERATION) and validation.yml's `versioning` job (SEASON.WEEK.DAY.VERSION). None of the current chart versions cleanly fit either schema. - The `versioning` job only validates the first Chart.yaml found alphabetically (`*/helm/Chart.yaml | head -1` on line 212), not all of them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…check The version-consistency check in .github/workflows/validation.yml used `grep -qi $chart_version $changelog_file` (presence-anywhere), which silently passed even when Chart.yaml drifted away from the latest CHANGELOG entry. Concrete case discovered during PR #19 review: wordpress/helm/Chart.yaml was 3.2.6 while wordpress/CHANGELOG.md's top entry was [3.3.8] — the older 3.2.6 string still appeared in past entries, so the check kept passing. Tighten the check to compare Chart.yaml's `version:` against the first `## [X.Y.Z]` header in CHANGELOG.md (the Keep-a-Changelog newest-on-top convention). Three charts had pre-existing drift and would fail the new check on main, so align them in this commit: nextcloud/helm/Chart.yaml 1.0.0 → 1.1.0 vaultwarden/helm/Chart.yaml 1.0.0 → 1.3.1 wordpress/helm/Chart.yaml 3.2.6 → 3.3.8 These are metadata-only bumps: Chart.yaml now matches the version each chart's own CHANGELOG already declares as latest. No template or values changes. Not addressed in this commit (separate decisions needed): - WeOwnVer format inconsistency between docs/VERSIONING_WEOWNVER.md (SEASON.MONTH.WEEK.ITERATION) and validation.yml's `versioning` job (SEASON.WEEK.DAY.VERSION). None of the current chart versions cleanly fit either schema. - The `versioning` job only validates the first Chart.yaml found alphabetically (`*/helm/Chart.yaml | head -1` on line 212), not all of them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… DO droplets (#27) * feat(sandbox-docker): add copier template for AIO Sandbox on DO droplets Adds a new copier project for deploying agent-infra/sandbox (browser, shell, filesystem, VSCode, Jupyter, MCP servers in one container) on a DigitalOcean droplet. Mirrors the keycloak-docker / anythingllm-docker pattern: Caddy reverse proxy, Infisical runtime secret injection, OpenTofu droplet + firewall + reserved IP, skinny volume backups with GFS retention. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Auto-PR: Merge branch 'feature/wordpress-docker-copier-template' into feature/wp-hardening-shd (#19) * feat: codify PHP limits and block .user.ini per Task D152 & #264 * chore(helm): implement multi-site values strategy for burnedout and ptoken * chore: bump helm chart version to 3.2.7 * fix: resolve trivy security scan and linting issues * udated version bump * fix: resolve yamllint line endings and comment indentation * fix: resolve yamllint line endings * docs: apply markdownlint autofixes for PR #15 CI - ADR-004: add blank line before bulleted list (MD032) - workflows/README.md: switch *emphasis* to _emphasis_ for style consistency (MD049) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(wordpress/helm): address PR #15 review items 10-12 - ingress.yaml: parameterize spec.tls[0].secretName from .Values.ingress.tls[0].secretName so per-site overrides (burnedout-tls, ptoken-tls) actually take effect. Falls back to "wordpress-tls" when .Values.ingress.tls is a map or unset (preserves default and TLS hardening map). - php-config-configmap.yaml: gate "auto_prepend_file = wordfence-waf.php" behind .Values.wordpress.wordfence.enabled (default false) to avoid PHP warnings when the plugin is not installed. - ingress.yaml: gate the file-blocking server-snippet annotation behind .Values.ingress.serverSnippet.enabled (default true) so the chart can deploy on hardened controllers that set allow-snippet-annotations: false. - values.yaml: add wordpress.wordfence.enabled and ingress.serverSnippet.enabled. - Chart.yaml: bump 3.2.7 -> 3.3.0 (SemVer + WeOwnVer valid). - wordpress/CHANGELOG.md: document 3.3.0. Verified: helm template renders correctly with default, values-burnedout, and values-ptoken; helm lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ingress): expand edge block list and add security response headers Addresses PR #19 review items 11 and 12: - Block xmlrpc.php, wp-config backups, readme.html, license.txt, debug.log, and .svn at the nginx edge (defense-in-depth) - Add X-Frame-Options, X-Content-Type-Options, HSTS, Referrer-Policy, Permissions-Policy, and Content-Security-Policy via configuration-snippet - CSP is configurable per site via ingress.securityHeaders.contentSecurityPolicy - Document wp-login.php rate limiting as controller-level config Chart version 3.3.0 → 3.3.1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(values): resolve dead config, env var duplicates, and size mismatch - Move APACHE_HTTP_PORT from dead wordpress.extraEnvVars to top-level extraEnvVars (deployment.yaml only reads the top-level key) - Remove duplicate WORDPRESS_ENABLE_REDIS (already injected by template when redis.enabled=true) and WORDPRESS_CONFIG_EXTRA (already generated from wordpressExtraWpConfigContent — duplicate silently overwrote the more complete template version) - Align proxy-body-size 64m -> 128m to match PHP upload_max_filesize - Fix comment: global.domain -> wordpress.domain - Remove dead hosts field from per-site tls overrides (template builds hosts from wordpress.domain, not tls list) - Document enableMultisite/redirectFromWWW as not yet wired Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: m.shahid <m.shahid538@yahoo.com> Co-authored-by: Nik <nik.cimino@gmail.com> Co-authored-by: romandidomizio <rodi1364@colorado.edu> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * - SearXNG Deployment to 'searxng.weown.app'. - Add SearXNG deployment and team integration docs - Adds the production deployment assets, ingress routing, browser integration guide, workstation browser policy playbook, and secret-safe SearXNG settings templating so the service can be reviewed without committing production secrets. * refactor(searxng): restructure as copier template matching keycloak-docker pattern Resolves all 15 review comments from PR #20 code review: BLOCKING (4 resolved): - Remove committed production IPs from inventory.ini (no inventory file; copier template uses dynamic doctl-based discovery) - Remove committed RFC1918 IPs from k8s-external-ingress.yaml (k8s manifests removed; Caddy handles TLS directly on droplet) - Fix TLS bypass: old docker-compose bound port 80 publicly; now Caddy handles all TLS termination with auto Let's Encrypt - Pin container images: configurable image variables replace :latest tags SECURITY (4 resolved): - Secret key no longer world-readable on disk: Infisical Machine Identity injects SEARXNG_SECRET_KEY at runtime via `infisical run` - Disable Google autocomplete (privacy leak): set autocomplete to "" - TLS cipher restrictions: Caddy enforces modern TLS by default (no nginx ingress annotations needed) - Remove StrictHostKeyChecking=accept-new: no inventory.ini OPERATIONAL (5 resolved): - Docker installation via get.docker.com (upstream docker-ce, not mismatched docker.io + docker-compose-plugin) - Ansible deploy uses proper changed_when patterns - Health checks defined for all 3 services (searxng, valkey, caddy) - Valkey actually configured: settings.yml includes redis.url pointing to valkey:6379 for rate-limiting and bot detection - Firefox policies.json: slurp-merge-write pattern preserves existing enterprise policies instead of overwriting TEMPLATE STRUCTURE (new): - copier.yaml with _subdirectory: template (copier >= 9.0.0) - template/terraform/ — main.tf (droplet + reserved IP + firewall), monitoring.tf (CPU/mem/disk alerts), variables.tf, outputs.tf, backend.tf, versions.tf, cloud-init.yaml - template/docker/ — compose.prod.yaml, Caddyfile, searxng/settings.yml - template/scripts/ — deploy.sh, backup.sh (skinny backups with DO Spaces offload), restore.sh - template/ansible/ — deploy.yml, configure-browser-search.yml - template/ — .gitignore, README.md, CHANGELOG.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(wordpress-docker): use Minimus private image with PDO extensions (D117) Replace reg.mini.dev/wordpress:latest with reg.mini.dev/1923/wordpress-fluentsmtp:latest in the copier template default and all generated site configs. The base Minimus WordPress image strips PDO (WordPress core uses mysqli). FluentSMTP requires PDO for email logging — without it, logins trigger a fatal error. The private image includes php-pdo-auto and php-pdo-mysql-auto plus 5 other PHP extensions. Ref: burnedout.xyz PDO postmortem 2026-04-14, decision D117. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(ci): align drifted chart versions; tighten CHANGELOG consistency check The version-consistency check in .github/workflows/validation.yml used `grep -qi $chart_version $changelog_file` (presence-anywhere), which silently passed even when Chart.yaml drifted away from the latest CHANGELOG entry. Concrete case discovered during PR #19 review: wordpress/helm/Chart.yaml was 3.2.6 while wordpress/CHANGELOG.md's top entry was [3.3.8] — the older 3.2.6 string still appeared in past entries, so the check kept passing. Tighten the check to compare Chart.yaml's `version:` against the first `## [X.Y.Z]` header in CHANGELOG.md (the Keep-a-Changelog newest-on-top convention). Three charts had pre-existing drift and would fail the new check on main, so align them in this commit: nextcloud/helm/Chart.yaml 1.0.0 → 1.1.0 vaultwarden/helm/Chart.yaml 1.0.0 → 1.3.1 wordpress/helm/Chart.yaml 3.2.6 → 3.3.8 These are metadata-only bumps: Chart.yaml now matches the version each chart's own CHANGELOG already declares as latest. No template or values changes. Not addressed in this commit (separate decisions needed): - WeOwnVer format inconsistency between docs/VERSIONING_WEOWNVER.md (SEASON.MONTH.WEEK.ITERATION) and validation.yml's `versioning` job (SEASON.WEEK.DAY.VERSION). None of the current chart versions cleanly fit either schema. - The `versioning` job only validates the first Chart.yaml found alphabetically (`*/helm/Chart.yaml | head -1` on line 212), not all of them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): split long error message to resolve yamllint line-length warning * DR initial commit with restic + velero * fix(cluster-backup): address all reviewer + CI feedback from PR #14/#24 Resolves the 3 failing CI jobs and all 30 inline review comments on PR #24 (and the predecessor PR #14, whose comments were ported via the branch rename `dra` → `feature/shahid-velero-restic`). CI failures fixed ----------------- * Helm Template Validation — chart no longer declares an upstream `velero` subchart dependency (the chart provides its OWN Velero + Restic templates; the two would collide and `helm template` could not run because `helm dependency build` is not invoked in CI). * Helm Lint — removed the duplicate `cluster-backup.restoreName` template definition in `_helpers.tpl`. * Documentation Validation / Markdown Lint — `cluster-backup/README.md` now satisfies MD031/MD022/MD032 (blank lines around fences / headings / lists). Added `cluster-backup/CHANGELOG.md` with a `[1.0.0] - 2026-05-22` entry that matches `Chart.yaml`'s `version: 1.0.0` — required by the tightened version-consistency check landed in #21. * trailing-whitespace / end-of-file-fixer — stripped trailing whitespace from every cluster-backup yaml + shell file and ensured every file ends with `\n`. Security & correctness (per Copilot / @romandidomizio review) ------------------------------------------------------------ * `automountServiceAccountToken: false` flipped to `true` on BOTH the Velero server and the Restic node-agent ServiceAccounts. Both controllers MUST call the Kubernetes API (Velero for cross-namespace resource enumeration / Backup-Restore CRDs; Restic for PodVolumeBackup / PodVolumeRestore / ResticRepository / coordination leases). Disabling token automount without a projected-token workaround silently broke the controllers. * Removed `helm/templates/velero-secret.yaml`. The chart was rendering a placeholder Secret named `cluster-backup-cloud-credentials` which collided with the real Secret that `deploy.sh` creates out-of-band (`helm upgrade` would fail with "resource already exists" on every re-install). The Secret is now solely owned by `deploy.sh`. Removed the corresponding `checksum/secret` annotation from the Velero Deployment and Restic DaemonSet. * Added `helm/templates/velero-service.yaml`. The chart previously shipped a `ServiceMonitor` selecting a Service that did not exist, and `NOTES.txt` instructed users to `kubectl port-forward svc/<fullname>-velero` — also missing. Service is `ClusterIP`, metrics-only. * `helm/templates/networkpolicy.yaml` egress no longer uses `to: []` for any rule. DNS is constrained to the kube-system namespace; external S3 traffic is allowed to `0.0.0.0/0` MINUS RFC1918, link-local, and loopback ranges (so restic / velero can never accidentally exfiltrate to in-cluster IPs over an S3-shaped path); Kubernetes API access is constrained to the `default` namespace ClusterIP. * `helm/templates/restic-daemonset.yaml` no longer mounts host `/` via `hostPath`. Restic only needs read access to `/var/lib/kubelet/pods` (the standard kubelet pod-volume root), so that's the only host mount that remains. Mounting host-root was incompatible with Pod Security `restricted` and not required for restic operation. * `helm/templates/backup-schedules.yaml` passes a full `dict "Chart" $.Chart "Release" $.Release "Values" $.Values "Template" $.Template ...` context into the schedule-naming and schedule-config helpers, removing any ambiguity about what `cluster-backup.fullname` and `cluster-backup.labels` resolve to when called from within a `range` loop body. * `helm/templates/_helpers.tpl`: simplified `cluster-backup.serviceAccountName` and `cluster-backup.resticServiceAccountName` to return deterministic names. The previous form gated on `.Values.velero.{server,restic} .serviceAccount.create`, which is not a path that exists in `values.yaml` — it was a remnant of the now-removed `velero` subchart pattern. The helpers always returned `"default"`, which conflicted with the ClusterRoleBinding subject the chart actually installed; Velero would have been denied every API call. * `helm/templates/NOTES.txt` rewritten: `kubectl create backup …` / `kubectl create restore …` are NOT valid kubectl subcommands for Velero CRDs and were misleading users into thinking the controller was broken when in reality they were running a non-existent kubectl verb. NOTES now uses the Velero CLI (`velero backup get`, `velero schedule get`, etc.) and points readers at the install docs. Same fix applied across README.md, deploy.sh, verify.sh, test-local.sh, and (post-deployment) usage messages. Script hardening ---------------- * `deploy.sh`: - Switched to `#!/usr/bin/env bash` + `set -euo pipefail`. - All temporary files now live in a single `mktemp -d`-created directory with mode 0700, cleaned up via an `EXIT|INT|TERM` trap. The previous version wrote `/tmp/s3-credentials` and `/tmp/cluster-backup-values.yaml` at fixed paths — world-readable on some hosts and racy under concurrent runs. - S3 secret-key prompt uses `read -rs` (no echo). The credential is piped into `kubectl create secret … --from-file=cloud=/dev/stdin --dry-run=client -o yaml | kubectl apply -f -`, so the secret never lives on disk and never appears in argv (`ps`). - All configuration accepts env-var inputs (`TENANT`, `CLUSTER`, `ENVIRONMENT`, `S3_*`) so the script is usable in CI/automation without an interactive TTY. * `test-local.sh`: - MinIO image pinned (`minio/minio:RELEASE.2024-08-17T01-24-54Z`). Was `minio/minio:latest`, which made local test runs non-reproducible and exposed them to silent upstream breaking changes. - MinIO credentials no longer hardcoded as `minioadmin / minioadmin` in the manifest. Access key defaults to `localdev-access`; secret key is a freshly-generated 24-char random value per run (overridable via `TEST_MINIO_{ACCESS,SECRET}_KEY` env vars). Rendered into the Secret over a stdin pipe — same no-argv-leak pattern as deploy.sh. - Service now correctly exposes BOTH the S3 API (NodePort 30000) AND the web console (NodePort 30001). The original Service mapped only 9000 to NodePort 30000 yet the "MinIO Console" line pointed at 30000 — that's the S3 API, not the console. Web console address is now `http://localhost:30001`. - Replaced fixed `sleep 30` waits with `wait_for_velero_phase` which polls `.status.phase` of the Backup / Restore CR and emits `kubectl describe` + `velero backup logs` diagnostics on timeout. - Uses the Velero CLI for backup / restore creation. * `verify.sh`: - Same shebang + strict-mode + Velero-CLI updates as deploy.sh. - `test_backup_creation` polls phase with diagnostics instead of blind-sleeping; also skips gracefully if the Velero CLI isn't installed locally (verify is run on hosts that may not have it). * `create-tenant-cluster.sh`: - All upstream components are version-pinned via env-var defaults: `DOKS_K8S_VERSION`, `INGRESS_NGINX_VERSION`, `CERT_MANAGER_VERSION`, `METRICS_SERVER_VERSION`. The previous version used `doctl kubernetes cluster create … --version latest` and `metrics-server/releases/latest/`, making provisioning silently non-reproducible. - Team-member namespace is normalized to RFC1123 (`AnnaF` → `annaf`, etc.) before being passed to `kubectl create namespace`. K8s rejects uppercase-containing namespace names with an opaque error, and the previous version of this script would silently fail at `create_team_member_namespace` for any contributor whose handle wasn't already lowercase-alphanumeric. - `create_team_member_access` no longer reads `.secrets[0].name` to harvest a long-lived ServiceAccount token (the BoundServiceAccount- Token feature in K8s ≥1.24 means that field is empty on modern clusters, so the function was returning an empty kubeconfig). Switched to `kubectl create token` (TokenRequest API) with an explicit 24h duration. Kubeconfig is written with `umask 077` so the file containing the bearer token is mode 0600. Not in scope for this commit ---------------------------- * The Restic DaemonSet still ships with a `SYS_ADMIN` capability add and a non-root securityContext (uid 65534). Real-world restic operation typically requires either root or a relaxed podSecurityContext to read pod volumes owned by arbitrary uids; tightening this further requires a per-environment decision about the Pod Security Standard for the `velero` namespace. Flagged in `cluster-backup/CHANGELOG.md` under "Notes for operators". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cluster-backup): repair Schedule rendering + dedupe component label Two issues that surfaced once kubeconform actually parsed the rendered output (the previous CI run was already past my first commit but failed on Kubeconform Validation): 1. Duplicate `app.kubernetes.io/component` label The common `cluster-backup.labels` helper emitted `app.kubernetes.io/component: backup-system`, and every per-resource template also appended its own `app.kubernetes.io/component: <role>` right below the include. The K8s API server tolerates duplicate map keys (last write wins), but strict YAML parsers — including `kubeconform` — reject the manifest. Removed the component-label line from the shared helper; per-resource components remain. 2. Schedule CR `.spec.schedule` rendered as `"map[…]"` instead of cron The `cluster-backup.backupScheduleConfig` helper accessed `.schedule | quote` at the top level, but its caller passes the whole schedule object under `.schedule` — so `.schedule` was the FULL map, and Go's `fmt.Sprint` rendered it as `"map[enabled:true excludeNamespaces:[…] includeNamespaces:[] retention:30d schedule:0 2 * * *]"`. Velero would have rejected the Schedule as invalid syntax at apply time. Reworked the helper to extract fields explicitly via `.schedule.schedule`, `.schedule.retention`, `.schedule.includeNamespaces`, `.schedule.excludeNamespaces`. This is the same access pattern the caller already implied; the helper had just been written incorrectly from the start. The CR now renders with the expected `schedule: "0 2 * * *"`. Also fixed an adjacent issue in `backup-schedules.yaml`: the `{{- $ctx := dict … -}}` line had its trailing newline stripped, which fused the previous Schedule's last YAML line directly to the next `---` document separator (e.g. `defaultVolumesToRestic: true---`). Replaced with non-stripped `{{ $ctx := merge … }}` on its own line. Verified with `helm template … | kubeconform -strict -summary -ignore-missing-schemas -`: 21 resources, Valid: 11, Invalid: 0, Errors: 0, Skipped: 10 (skipped resources are Velero CRDs with no published JSON schema: BackupStorageLocation, VolumeSnapshotLocation, Schedule). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cluster-backup): resolve Trivy CRITICAL/HIGH findings Local `trivy config --severity CRITICAL,HIGH cluster-backup/` is now clean (0 findings). Two categories: Real code fixes --------------- 1. **KSV-0053 (HIGH) — restic ClusterRole granted `pods/exec`.** Restic backs up pod volumes by reading the kubelet bind-mount; it never execs into user pods. Removed `pods/exec` from the resource list and tightened the verb set on `[pods, pods/log, namespaces, nodes]` from `get list watch create update patch delete` down to just `get list watch` — restic does not own any of those resources. This also drops KSV-0042 (delete pod logs) + part of KSV-0048 (manage workloads/pods) from the restic role specifically. 2. **KSV-0005 (HIGH) — restic securityContext was internally inconsistent.** The previous values.yaml had `runAsNonRoot: true, runAsUser: 65534` + `capabilities.add: [SYS_ADMIN]`. uid 65534 cannot read pod volumes owned by arbitrary uids, so restic would have failed at runtime. Per Velero's documented restic deployment posture (https://velero.io/docs/v1.12/restic/), the node-agent must run as root with SYS_ADMIN. Changed to `runAsNonRoot: false, runAsUser: 0, runAsGroup: 0, fsGroup: 0`, keeping `readOnlyRootFilesystem: true` and the default seccomp profile. The PSS for the `velero` namespace must therefore be `baseline` or `privileged` (NOT `restricted`) — documented in cluster-backup/CHANGELOG.md. Targeted .trivyignore additions ------------------------------- The remaining findings are inherent to a cluster-backup tool and cannot be fixed without breaking its function. Each entry in .trivyignore is explained — Trivy ignore-rules are the right vehicle for "this is intentional, here's why": - KSV-0041 (CRITICAL): Velero must manage Secrets to back them up. - KSV-0056 (HIGH x3): Velero must manage Services/Endpoints/Ingresses/ NetworkPolicies to back up and restore network topology. - KSV-0005 / KSV-0022 (HIGH): SYS_ADMIN is required for restic's mount/setns ops; this is Velero's documented deployment posture. - KSV-0012 (MEDIUM): companion of KSV-0005 (run-as-root). - KSV-0023 (MEDIUM): hostPath on `/var/lib/kubelet/pods` is how restic reads pod volumes. The earlier broader mount of host `/` was already removed in d0c637b. - KSV-0042 (MEDIUM): Velero must delete pods to restore them. - KSV-0048 (MEDIUM): Velero must create/update Deployments, StatefulSets, DaemonSets, Jobs, CronJobs. - KSV-0049 (MEDIUM): Velero must manage ConfigMaps. - KSV-0125 (MEDIUM): `velero/velero:v1.12.2` is the official upstream image on docker.io. Mirroring to a private registry is tracked as a separate follow-up. All ignored rules are scoped by .trivyignore comments to the cluster-backup chart only. Roman's review verified ----------------------- All 17 items from @romandidomizio's CHANGES_REQUESTED review have been addressed in this branch (d0c637b + ac2e9f6 + this commit). Spot-checked all of them; no outstanding source-level issues. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(wordpress-docker): replace .env secrets with Infisical runtime injection Removes all on-disk .env.prod secret management and replaces it with the Infisical Universal Auth (Machine Identity) pattern, matching the keycloak-docker reference implementation. Changes: - copier.yaml: enable_infisical defaults true; prompts for client_id and client_secret instead of legacy infisical_token - variables.tf.jinja: replace infisical_token + enable_infisical bool with infisical_client_id + infisical_client_secret (sensitive); mysql_password / mysql_root_password gated on not enable_infisical - main.tf.jinja: templatefile() passes correct vars per enable_infisical - cloud-init.yaml.jinja: installs Infisical CLI, writes infisical-auth.sh (machine identity), starts stack via infisical run, cron backup authenticated via infisical-auth.sh + infisical run - terraform.tfvars.example.jinja: removes mysql passwords from Required section; replaces infisical_token with client_id + client_secret - ansible/deploy.yml.jinja: removes pre_tasks (.env.prod check), removes Upload .env task, converts docker_compose_v2 calls to shell + infisical run - scripts/deploy.sh.jinja: removes .env.prod check + scp; uses infisical run - scripts/backup.sh.jinja: removes source .env; uses infisical run for DB dump - scripts/restore.sh.jinja: removes source .env + dot-env restore; uses infisical run for DB restore and compose up - docker/.env.prod.example: replaced with Infisical pointer (no passwords) Compliance: §3.10 Infisical checklist, §3.1 NIST PR.DS, §3.2 CIS 3.11, §3.4 ISO A.8.24, §3.5 SOC2 CC6.7, §3.8 Docker Compose checklist * feat(wordpress-docker): add pull-prod workflow + fix backup/restore scripts - Add scripts/pull-prod.sh: pulls production DB + wp-content to local dev stack in one command; reads DB password from docker inspect (not .env) to avoid silent dump failures when credentials diverge - Fix backup.sh: use docker inspect for DB password instead of sourcing .env — same root cause that caused 0-byte dumps in burnedout-xyz - Fix restore.sh: add COMPOSE_PROJECT_NAME=burnedout-local, auto URL replacement (siteurl/home) after prod DB import, use mariadb client - Fix compose.local.yaml: add Caddy service (WP image is FPM-only; missing Caddy made the stack silently unreachable on port 8080) - Add Caddyfile.local: HTTP-only local Caddy config for FPM proxy - Propagate all fixes to template (pull-prod.sh.jinja, backup.sh.jinja, restore.sh.jinja, compose.local.yaml.jinja) - Expand README.md (site + template) with Day-to-Day Operations section covering when/how to use each script Fixes: silent 0-byte mysqldump when .env password diverges from running container; localhost:8080 returning empty due to missing Caddy * chore: add Ansible template and documentation files - RESTORE-RUNBOOK-PROMPT.md: runbook documentation for burnedout.xyz restore - scripts/manage-droplets.sh: droplet management script (fixed SC2155 shellcheck) - wordpress-docker/template/ansible/: Ansible configuration, inventory template, and requirements * fix(wordpress-docker): address code review findings Critical fixes: - restore.sh: fix PROJECT_NAME ("burnedout" not "burnedoutxyz") and APP_DIR ("/opt/burnedout" not "/opt/burnedout-xyz") — remote restore was completely broken, targeting nonexistent containers and paths - restore.sh: use docker inspect for DB credentials instead of sourcing .env — matches backup.sh pattern, works with or without Infisical - RESTORE-RUNBOOK: replace broken base image (reg.mini.dev/wordpress:latest) with correct private image (reg.mini.dev/1923/wordpress-fluentsmtp:latest) Medium fixes: - backup.sh: stop including .env in backup archive — production secrets should not be stored in tarballs (credential exposure vector) - RESTORE-RUNBOOK: fix false claim that compose.local.yaml sets WORDPRESS_CONFIG_EXTRA (only compose.prod.yaml does) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Auto-PR: feat: add observability infrastructure, SearXNG copier template, and fleet tooling (#26) * feat: add observability infrastructure, SearXNG copier template, and fleet tooling Add SigNoz self-hosted observability (signoz-docker/) and OTel Collector agent (otel-agent/) for per-container metrics, log aggregation, and traces across the fleet. Add SearXNG copier template (searxng-docker/) with full IaC, Infisical integration, skinny backups, and browser search playbook. Add fleet scripts for DO agent enablement and OTel deployment. Add AnythingLLM MCP configuration playbook. Fix cloud-init `runcmds:` typo to correct `runcmd:` across all templates. Fix manage-droplets.sh SSH key rotation parsing bug. Add project CLAUDE.md for AI guidance. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(otel): SigNoz Cloud fleet agent Phase 1 (burnedout-xyz verified) Pivot observability to SigNoz Cloud us2 with per-droplet OTel collector, Infisical runtime secret injection, bootstrap/deploy fleet scripts, and deprecation notes for self-hosted signoz-docker template. * fix(pr26): address Copilot + reviewer findings on observability work Resolves all blocking review feedback on PR #26 surfaced by Copilot (rounds 1 + 2) and an independent code review of the diff. Covers correctness bugs, security defaults, and operational gaps. Scope is limited to closing review items — no new features. Correctness bugs (deploy-blocking): - signoz scripts (deploy/backup/restore + ansible/deploy.yml.jinja): use `replace('-', '_')` to match `main.tf.jinja` -> `templatefile` -> cloud-init path/volume normalization. Previously the no-separator form (`replace('-', '')`) caused /opt/signozobservability vs /opt/signoz_observability divergence and silent volume-name mismatches in backup/restore. - cloud-init OTel gateway DSN: switch `password=$${CLICKHOUSE_PASSWORD}` to `password=$${env:CLICKHOUSE_PASSWORD}` so the OTel collector actually expands the env var (bare `${VAR}` is not interpolated by the collector). - README.md.jinja: fix broken code fence at line 92 ("```nished Phase 1..."). - README.md.jinja: replace undefined `monitoring_*_threshold` references with the actual copier-defined names (`cpu_alert_threshold`, etc.). - monitoring.tf.jinja: hardcoded `value = 8` for load-5 alert is now driven by a new `load_alert_threshold` copier variable (default 8 = 2x default vCPUs). - CHANGELOG.md.jinja: drop the stub `[{{ version }}] - {{ date }}` line that referenced undefined copier variables. - anythingllm/ansible/configure-allm.yml: rewrite to use plain Ansible templating (`{{ var }}`) instead of broken copier-style escapes (`{{ '{{' }} var {{ '}}' }}`). Wrap Docker Go-template (`{{.State.Running}}`) in `{% raw %}...{% endraw %}` so Ansible passes it through. Add `no_log: true` to tasks that handle merged MCP server configs (which can include credentials from `extra_mcp_servers`). - otel-agent OTLP transport docs: fix gRPC/HTTP mismatch in comments across config.yaml, compose.yaml, deploy-otel-fleet.sh, otel-agent/README.md (the actual exporter is otlphttp). - scripts/enable-do-agent.sh: drop unused `ALREADY=0` counter. Security + IaC fixes: - terraform backend.tf (signoz + searxng): remove invalid `var.*` references in the s3 backend block (backend cannot interpolate variables — init runs before vars are evaluated). Add the working DO Spaces pattern from keycloak-docker/sites/sso.weown.dev/ (skip_* flags) + a new `init.sh.jinja` that bridges terraform.tfvars -> `tofu init -backend-config=`. Also drop the unsupported `lock`/`lock_timeout` block. Moves `spaces_*` variable declarations to variables.tf.jinja and adds placeholders to terraform.tfvars.example.jinja. - New `ssh_source_cidrs` copier variable (signoz + searxng) drives the SSH firewall rule. Default keeps backward-compat `["0.0.0.0/0", "::/0"]` with a loud help warning to pin to admin IP/32 or VPN range in production. - signoz cloud-init Infisical CLI install: switch from the deprecated install-cli.sh channel (capped at v0.38 — broken `infisical run`) to the current artifacts-cli.infisical.com apt repo, mirroring the idempotent block in scripts/bootstrap-otel-agent.sh. - searxng copier.yaml: pin `searxng_image` to a dated release tag (was `:latest` — drift risk on every pull); help text directs operators to Docker Hub for newer tags. - searxng .gitignore: rewrite the `searxng/settings.yml` ignore rule to the actual generated path `docker/searxng/settings.yml`. - restore.sh.jinja: validate `BACKUP_NAME` against a strict allowlist regex `^[A-Za-z0-9._-]+$` before forwarding it into the remote ssh `bash -c` command, closing the shell-injection vector flagged by both reviewers. - otel-agent deploy paths (deploy-otel-fleet.sh + otel-agent/deploy.yml): reject `http://` URLs explicitly in the OTEL_URL normalization case; only `https://` is accepted (config.yaml sets `insecure: false`). - otel-agent/README.md: add a "Threat model — host-level access by design" section that explicitly documents the root+docker.sock+host-root-mount topology, what `:ro` mitigates vs doesn't, and the mitigations the design relies on. Operational quality: - copier.yaml (signoz + searxng): remove `s3` from the `backup_remote_storage` choices — the backup/restore scripts only implement `do-spaces`, so picking `s3` was a silent no-op. - copier.yaml + variables.tf.jinja + README.md.jinja (signoz): reword `clickhouse_retention_days` description to mark it as informational (actual TTL is set via SigNoz UI), since the value was never wired into any ClickHouse config. - compose.prod.yaml.jinja + cloud-init.yaml.jinja (signoz + searxng): add `/var/log/caddy:/var/log/caddy` bind mount so Caddy access logs survive container recreation and the otel-agent filelog/caddy receiver can pick them up from the host. Repo hygiene: - Add `.claude/` to top-level `.gitignore` (Claude Code local workspace). Deferred (require design decisions; tracked for follow-up): - Infisical Machine Identity persistence in user_data / TF state. Documented as known risk in CHANGELOG; full remediation requires architectural change (cloud-init bootcmd + tmpfs, systemd credentials, or external bootstrap delivery). Rotate the Machine Identity after destroy. - `lifecycle { ignore_changes = [user_data] }` masks user_data updates (intentional, to avoid droplet recreation on every var bump). Future: document `tofu taint` workflow in signoz-docker README. - Docker-socket-proxy in front of otel-agent (significant scope, follow-up PR). - The legacy `install-cli.sh` Infisical install pattern in `anythingllm-docker/` and `keycloak-docker/` cloud-init.yaml.jinja (out of this PR's scope; same fix applies — separate PR). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(pr26): address Copilot round-3 + own review pass Resolves the 22 new Copilot comments on merge commit 59a3d0a plus additional issues found in an own-review pass over the diff. New Copilot findings (all addressed): - signoz-docker/template/README.md.jinja: quick-start used `tofu init` directly, but the S3 backend requires `-backend-config` flags via the new `./init.sh`. Rewrote step 2 to use the init script + documented why (variables not allowed in backend blocks). - signoz-docker/template/README.md.jinja: step 3 referenced the old positional-arg `deploy-otel-fleet.sh weown-ai <ip>` syntax which was removed in the SigNoz Cloud pivot. Rewrote to use the new flag-based selectors and added context that fleet OTel agents ship to SigNoz Cloud by default (not to this self-hosted gateway). - signoz-docker/template/scripts/{deploy,restore}.sh.jinja: replaced `143.198.xxx.xxx` example IPs (real DO range) with RFC 5737 documentation ranges (`198.51.100.42`) so the public repo does not normalize references to real infrastructure. - signoz-docker/template/terraform/templates/cloud-init.yaml.jinja: `docker ps --format 'table {{.Names}}\t{{.Image}}\t{{.Status}}'` would have been interpreted by Copier's Jinja pass as attribute lookups on the root context (likely failing rendering). Wrapped in `{% raw %}...{% endraw %}`. - signoz-docker/template/terraform/templates/cloud-init.yaml.jinja: retention echo claimed "daily 30d / monthly 12mo / yearly forever" but the implementation only deletes local backups older than 30 days. Rewrote the message to match the actual behavior. - scripts/bootstrap-otel-agent.sh: Infisical Machine Identity secrets were passed inline as env-var assignments in the ssh command string, which exposes them in the local `ps` listing during the SSH connection. Rewrote to pipe `export VAR=...` (via `printf %q` for safe quoting) through ssh stdin (process substitution) so the secrets never appear in argv. - otel-agent/README.md: my prior "Threat model" section claimed that `:ro` on `/var/run/docker.sock` prevents Docker API writes. That is incorrect — `:ro` is a bind-mount flag affecting the socket inode, NOT the Docker API protocol. Any process that can `connect(2)` to the socket can issue write API calls (containers/create, exec, --privileged, etc.) regardless of mount mode. Rewrote the section to call out the socket access as effectively root-on-host, list what we actually rely on (image pinning, memory cap, no host-side secrets at rest), and document the docker-socket-proxy mitigation path for future work. - otel-agent/compose.yaml + otel-agent/README.md "Safety" section: brought the "all mounts read-only ⇒ safe" claim in line with the corrected threat model — fs mounts are tamper-safe, but the docker socket isn't, and we say so explicitly. - signoz-docker/template/docker/compose.prod.yaml.jinja + signoz-docker/template/terraform/templates/cloud-init.yaml.jinja: added a comment block on the ZooKeeper service explaining that `ALLOW_ANONYMOUS_LOGIN: "yes"` is intentional for this deprecated self-hosted fallback (port unpublished, only ClickHouse reaches it on `signoznet`); enabling SASL would require synchronized ClickHouse credentials and a new Infisical secret. Operators adopting this for production should enable auth before exposing the stack. Own-review pass: - scripts/deploy-otel-fleet.sh: updated the OTEL_URL normalization comment to match the actual behavior (https:// or scheme-less; reject plain http:// because `tls.insecure: false`). The previous comment said "otlphttp requires https:// or http://" which contradicted the reject-http logic. Not addressed (deferred — same call as round 2): - Infisical Machine Identity persisted in user_data / Terraform state (architectural change required; tracked in CHANGELOG). - Copilot's two "configure-allm.yml jinja escapes" comments at line 33 are stale — that file was fixed in a prior commit and currently uses plain `{{ var }}` form. No further action needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(anythingllm): move configure-allm.yml under anythingllm-docker/ The configure-allm.yml playbook uses `docker exec` + `docker inspect`, so it is Docker-specific. The `anythingllm/` directory is reserved for the helm/k8s deployment path; Docker-stack tooling belongs under `anythingllm-docker/` next to the copier template + cloud-init that provision the Docker-based instances this playbook configures. Used `git mv` to preserve file history. The empty `anythingllm/ansible/` directory has been removed. Updated the CHANGELOG entry to reflect the new location and the rationale for the move. No content changes — only relocation. Callers (currently none in-repo) referencing `anythingllm/ansible/configure-allm.yml` should switch to `anythingllm-docker/ansible/configure-allm.yml`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Nik <nik.cimino@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: m.shahid <m.shahid538@yahoo.com> Co-authored-by: Nik Cimino <ncimino@users.noreply.github.com> * Auto-PR: chore(s004): add baseline copier templates for monitoring and infrastructure scaffolding (#31) * feat: add observability infrastructure, SearXNG copier template, and fleet tooling Add SigNoz self-hosted observability (signoz-docker/) and OTel Collector agent (otel-agent/) for per-container metrics, log aggregation, and traces across the fleet. Add SearXNG copier template (searxng-docker/) with full IaC, Infisical integration, skinny backups, and browser search playbook. Add fleet scripts for DO agent enablement and OTel deployment. Add AnythingLLM MCP configuration playbook. Fix cloud-init `runcmds:` typo to correct `runcmd:` across all templates. Fix manage-droplets.sh SSH key rotation parsing bug. Add project CLAUDE.md for AI guidance. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(s004): sync live caddy stdout configuration and private registry updates * chore(s004): add baseline copier templates for monitoring and infrastructure scaffolding * fix(s004-deployment): address Copilot + reviewer findings — make deploy actually work Resolves 8 new Copilot comments on the merge commit + 12 findings from an independent reviewer pass on s004-deployment/. The directory was committed as a skeleton with several deployment-blocking bugs. CRITICAL — deployment-blocking bugs: - Volume name mismatch (backup.sh, restore.sh, cloud-init backup script): scripts referenced `${PROJECT_NAME}_storage` where `PROJECT_NAME=s004anythingllm` (no underscore), so the resolved name was `s004anythingllm_storage`. But `docker/compose.prod.yaml` (and cloud-init's embedded compose) explicitly name volumes `s004_anythingllm_storage`. Every backup tarred an empty anonymous volume; every restore wrote into a phantom volume the app never reads. Hardcoded the actual volume names (`s004_anythingllm_storage`, `s004_anythingllm_caddy_data`) in all three sites. - Empty Infisical credentials in cloud-init: the `infisical-auth.sh` helper, daily cron wrapper, and runcmd `infisical run` all had literal `INFISICAL_CLIENT_ID=''` / `--projectId=`. terraform/main.tf already plumbs `infisical_client_id`, `infisical_client_secret`, `infisical_project_id`, `infisical_environment` into `templatefile()`, so just reference them as `${infisical_*}`. - Empty `--projectId=` in scripts/deploy.sh + scripts/restore.sh: workstation scripts now read INFISICAL_PROJECT_ID (and optional INFISICAL_ENV, default `prod`) from the environment with a `:?` guard that prints a clear error. - Hardcoded `s004.ccc.bot` domain in cloud-init compose env + Caddyfile: switched to `${domain}` so var.domain takes effect. - Image registry inconsistency: cloud-init pulled `mintplexlabs/anythingllm:latest`, compose used `reg.mini.dev/anythingllm:latest`. Both now use `${anythingllm_image}` from var.anythingllm_image (default `reg.mini.dev/anythingllm:1.7.2`, the WeOwn mirror). - 23 unescaped shell variables in cloud-init `templatefile()` body (`${JWT_SECRET}`, `${ADMIN_EMAIL}`, `${BACKUP_NAME}`, `${SPACES_BUCKET}`, `${BASH_REMATCH[1]}`, etc.) would have made `tofu plan` error before producing any output. Escaped every uppercase shell var as `$${VAR}`. HIGH — lessons from PR #26 applied here: - SSH 0.0.0.0/0 firewall rule → introduced `var.ssh_source_cidrs` (default `["0.0.0.0/0", "::/0"]` with help text directing production to pin to admin/VPN CIDR). Same fix as signoz-docker. - Deprecated Infisical install channel (`infisical.com/install-cli.sh`, capped at v0.38 with broken `infisical run` session handling) → rewrote to use the current `artifacts-cli.infisical.com` apt repo with the same idempotent legacy-purge logic from `bootstrap-otel-agent.sh`. - Caddy access logs written to `/var/log/caddy/` but no host bind mount → added `- /var/log/caddy:/var/log/caddy` to both `docker/compose.prod.yaml` and the cloud-init embedded compose, so otel-agent's filelog/caddy receiver can read them. - `restore.sh` shell-injection via unvalidated `$BACKUP_NAME` → `[[ ! "$BACKUP_NAME" =~ ^[A-Za-z0-9._-]+$ ]]` guard before the heredoc. - Real DigitalOcean IP `143.198.xxx.xxx` in usage examples → RFC 5737 `198.51.100.42`. MEDIUM — code quality: - `terraform.tfvars.example` had unrendered Copier jinja stubs (`{{ project_name }}`, `{{ enable_skinny_backups | lower }}`, etc.) — operators copying the file to terraform.tfvars would get literal jinja that terraform rejects. Replaced every stub with a real example value matching the variable's default in variables.tf. - `.gitignore` ignored `.terraform.lock.hcl`, breaking provider reproducibility. Removed the ignore + added an explanatory comment. - `versions.tf` header still said `# {{ project_name }}` → `s004-anythingllm`. - `monitoring.tf` declared three alert resources that always created regardless of `var.enable_monitoring`. Added `count = var.enable_monitoring ? 1 : 0`. - `scripts/restore.sh` had dead REMOTE/BACKUP_NAME assignments at lines 20-21, overwritten by the arg-count block. Removed. Not addressed (would expand scope): - Infisical Machine Identity rendered into user_data → terraform state. Same trade-off as signoz-docker; needs a remote encrypted backend + ideally one-time-use bootstrap secret retrieval. - `lifecycle { ignore_changes = [user_data] }` means cloud-init fixes require `tofu taint` to deploy. Documented behavior, not changed here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(infra): adopt Path C + Layer 2 in s004-deployment; document migration Resolves the two architectural issues deferred from PR #26 / PR #31: 1. Infisical Machine Identity persisted in terraform state / DO metadata (Layer 2 bootstrap-secret rotation). 2. `lifecycle { ignore_changes = [user_data] }` silently swallowing cloud-init updates (Path C: thin cloud-init + ansible app layer). Reference implementation lives in s004-deployment/. Repo-wide pattern and per-project migration checklist live in docs/INFRA_BOOTSTRAP_PATTERN.md. Layer 2 — bootstrap-secret rotation (s004-deployment/): - New rotate-bootstrap-secret.sh embedded in cloud-init `write_files`. Runs once via runcmd at first boot. Flow: 1. Log in to Infisical with v1 (from terraform.tfvars → state → cloud-init). 2. Decode JWT to extract identityId (base64url-safe, handles padding). 3. POST /api/v1/auth/universal-auth/identities/{id}/client-secrets → v2. 4. Atomically swap .infisical-auth.env to v2 after verifying it auths. 5. POST .../client-secrets/{v1Id}/revoke to disable v1. 6. Touch .rotation-complete (idempotent on re-run). Net effect: v1 in terraform state + DO droplet metadata is revoked within minutes of provisioning. v2 only ever lives on the droplet filesystem. Best-effort with structured logging — if the Machine Identity lacks self-management permission, the script logs cleanly to /var/log/s004anythingllm-rotation.log and the operator follows a manual runbook in s004-deployment/README.md. Path C — thin cloud-init + ansible (s004-deployment/): - Slimmed terraform/templates/cloud-init.yaml. It now handles ONLY first-boot bootstrap: package + Docker install, Infisical CLI install (artifacts-cli apt repo), .infisical-auth.env write, Layer 2 rotation, .bootstrap-complete marker, unattended-upgrades. Removed: compose.yaml, Caddyfile, backup.sh, daily-backup cron, infisical helper, docker pulls, docker compose up. - New ansible/deploy.yml owns the app layer. Uploads compose + Caddyfile + backup.sh, renders the daily backup cron, pulls images, runs `docker compose up -d --remove-orphans`, waits for AnythingLLM health. Pre-tasks assert INFISICAL_PROJECT_ID is set and .bootstrap-complete exists. - scripts/deploy.sh rewritten as a thin wrapper around `ansible-playbook ansible/deploy.yml -i 'host,'`. Idempotent — re-runnable any time compose/Caddy/scripts change, without touching terraform. Layer 1 — DO Spaces remote state backend (already standard elsewhere): - Added terraform/backend.tf + terraform/init.sh + spaces_* vars. State is no longer a plain local JSON file. Same pattern as keycloak-docker/sites/sso.weown.dev/ and signoz-docker (PR #26). Required for Layer 2 to actually reduce risk — without it the v1 in state lives on the operator's workstation indefinitely. Documentation + migration plan: - New docs/INFRA_BOOTSTRAP_PATTERN.md (cross-cutting). Explains both problems, both solutions, failure modes, compliance mapping (NIST CSF 2.0, CIS Controls v8, ISO/IEC 27001:2022), per-project migration checklist for signoz-docker / searxng-docker / anythingllm-docker / keycloak-docker / wordpress-docker, a "what Layer 2 does NOT solve" section, and a future-hardening section (Vault wrapping, cloud-IAM mediation, systemd-credentials + TPM). - Each *-docker README has a MIGRATION PENDING / MIGRATION PARTIAL banner pointing at the pattern doc + s004-deployment reference impl. Created searxng-docker/README.md (was missing). - s004-deployment/README.md: rewrote Quick Start to reflect Path C workflow, added "Updating the deployment" matrix, added manual rotation runbook. - s004-deployment/CHANGELOG.md: Unreleased entry. Migration sequencing recommendation (in INFRA_BOOTSTRAP_PATTERN.md): each *-docker project gets its own focused follow-up PR, order by deployment criticality (anythingllm → wordpress → keycloak → signoz → searxng). Each is ~2-4h plus an operator-scheduled tofu taint + tofu apply against existing droplets. Not changed in this commit: the *-docker template implementations themselves. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(infra): per-project Migration Status sections + audited status table Replaces the one-line "MIGRATION PENDING" banners with a per-project Migration Status section in each *-docker/README.md. Each section is a table showing the project's current Layer 1 / Layer 2 / Path C / Infisical CLI state with citations to the files that need work, plus project-specific notes. The shared 6-step migration checklist + rationale stays in docs/INFRA_BOOTSTRAP_PATTERN.md (single source of truth). Audit findings (2026-05-25): - anythingllm-docker: Layer 1 missing, Layer 2 missing, Path C not adopted (no ansible playbook exists), legacy Infisical install. **Highest priority** — source of live deployments. - wordpress-docker: Layer 1 missing, Layer 2 missing, Path C partial (ansible exists with compose + Caddy + Wordfence upload, cloud-init also embeds the app layer), legacy Infisical install. - keycloak-docker: Layer 1 partial (template has backend.tf.jinja but no init.sh.jinja; the rendered sites/sso.weown.dev/ subdir has both), Layer 2 missing, Path C partial (site.yml.jinja with community.docker.docker_compose_v2 — most ansible-shaped of any template, but cloud-init still embeds app layer), legacy Infisical install. - signoz-docker: Layer 1 done (PR #26), Layer 2 missing, Path C partial (ansible exists, cloud-init also embeds), legacy Infisical install (with open ZooKeeper anon-login accepted-risk note). - searxng-docker: Layer 1 done (PR #26), Layer 2 missing, Path C partial, legacy Infisical install. Updated docs/INFRA_BOOTSTRAP_PATTERN.md status table to reflect the audit (was less precise — said "Pending" without distinguishing Layer 1 vs Path C state). The migration sequencing recommendation in the doc unchanged: anythingllm → wordpress → keycloak → signoz → searxng. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(infra): migrate anythingllm-docker to Path C + Layer 2; relocate s004; add auto DO tagging Big bundle: completes Option B (migrate the anythingllm-docker template to the canonical bootstrap pattern, then relocate s004 as a generated site of that template), AND adds the automatic DO droplet tagging system across feature scripts + ansible. Template migration (anythingllm-docker/template/): - terraform/backend.tf.jinja + terraform/init.sh.jinja added (Layer 1 — DO Spaces remote state with SSE-C, same pattern as signoz-docker). - terraform/templates/cloud-init.yaml.jinja replaced with the slim Path C version + embedded rotate-bootstrap-secret.sh (Layer 2). Cloud-init now handles only: package install, Docker install, Infisical CLI install (artifacts-cli apt repo — current channel), .infisical-auth.env write, bootstrap-secret rotation, .bootstrap-complete marker, unattended-upgrades. Compose, Caddy, backup script, cron all removed — they live in ansible now. - terraform/main.tf.jinja: `replace('-', '_')` (was `replace('-', '')` — caused the volume-name mismatch we hit in s004), SSH firewall rule uses `var.ssh_source_cidrs`, `lifecycle { ignore_changes = [user_data, tags] }` so runtime tag mutations stick across `tofu apply` cycles. - terraform/variables.tf.jinja: adds ssh_source_cidrs + spaces_* trio. - terraform/terraform.tfvars.example: adds the new vars with placeholders. - terraform/monitoring.tf.jinja: `count = var.enable_monitoring ? 1 : 0` on each alert so the variable actually gates resources. - docker/compose.prod.yaml.jinja: image pinned via copier vars, Caddy /var/log/caddy bind mount so otel-agent's filelog/caddy receiver can read access logs. - docker/Caddyfile.jinja: domain substitution via copier var. - scripts/deploy.sh.jinja: thin ansible-playbook wrapper. Requires INFISICAL_PROJECT_ID env var. - scripts/backup.sh.jinja: hardcoded volume names use `{{ project_name | replace('-', '_') }}_storage` etc. Wraps the `docker ps --format '{{ .Names }}'` lines in {% raw %}{% endraw %} so copier doesn't try to interpret the Go template syntax. - scripts/restore.sh.jinja: same volume fix + BACKUP_NAME regex validation before the SSH heredoc + INFISICAL_PROJECT_ID env requirement. - ansible/deploy.yml.jinja added (Path C app layer): uploads compose + Caddyfile + backup.sh, renders daily cron + logrotate, pulls images, reconciles compose stack, waits for health, then updates DO tags (skinny-backup + commit-<sha>) via the shared helper. - copier.yaml: adds ssh_source_cidrs variable (json type). - README.md.jinja: rewrote Quick Start for Path C workflow. - .gitignore: stop ignoring .terraform.lock.hcl. s004-deployment/ → anythingllm-docker/sites/s004/: - `git mv` preserves history per file. Then overlaid the freshly-rendered template output (via `copier copy --data-file s004-answers.yaml`) so the site reflects exactly what the migrated template produces today, with underscore-consistent paths (/opt/s004_anythingllm/, s004_anythingllm_* volumes) that were previously inconsistent. - All cross-references in INFRA_BOOTSTRAP_PATTERN.md + the 5 *-docker READMEs updated from s004-deployment/... → anythingllm-docker/sites/s004/... Auto DO tagging (new): - scripts/tag-droplet.sh — shared helper for tag mutation. Subcommands: add, remove, replace-prefix, set-commit, list. Chainable in a single call. Resolves droplet name → ID via doctl, mutates tags via `doctl compute droplet-action tag/untag --wait`. Idempotent. - scripts/bootstrap-otel-agent.sh: after each successful bootstrap, looks up the droplet name from the target IP and adds the `otel` tag. - scripts/deploy-otel-fleet.sh: same — adds `otel` after each successful deploy. - anythingllm-docker/ansible/configure-allm.yml: after MCP config changes successfully apply, looks up the droplet from inventory_hostname and adds the `searxng-mcp` tag. - anythingllm-docker/template/ansible/deploy.yml.jinja: on every deploy, invokes tag-droplet.sh with `replace-prefix commit- commit-<sha> add skinny-backup`. Restoring "the last working deployment" becomes read the tag, `git checkout <sha>`, re-run deploy.sh. - terraform/main.tf.jinja `ignore_changes = [tags]` so feature/state tags added at runtime aren't reverted by subsequent `tofu apply` runs. Documentation (docs/INFRA_BOOTSTRAP_PATTERN.md): - New "DO tag taxonomy" section documenting the three layers: project tags (terraform-set), feature tags (script-driven — otel, skinny-backup, searxng-mcp), state tags (commit-<sha>, replaced each deploy). - Project migration status table updated: anythingllm-docker promoted to reference implementation; wordpress now priority 1 follow-up. Verified the migrated template renders cleanly via `copier copy` — produces the 17 files that match anythingllm-docker/sites/s004/ in structure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(anythingllm-docker): Copilot round-4 findings on PR #31 ab65db6 Fixes 7 real findings from Copilot's review of the migration commit (other ~15 were stale — looking at pre-overlay file state or pointed at the deferred Layer 2 limitation already documented). Each fix applied to the TEMPLATE; `sites/s004/` then re-rendered via `copier copy` to keep both in sync. Real fixes: - `template/docker/Caddyfile.jinja`: was `output stdout` while the compose service bind-mounts `/var/log/caddy:/var/log/caddy` — the mount was useless. Switched to `output file /var/log/caddy/{{ project_name }}.log` so the file lands on the host where otel-agent's `filelog/caddy` receiver can read it (and survives container recreation). - `template/README.md.jinja`: backup-storage and migration-procedure examples used `/opt/{{ project_name }}/backups/` (hyphenated, e.g. `/opt/s004-anythingllm/backups/`) but the actual deployment path is `/opt/{{ project_name | replace('-', '_') }}/backups/` (underscore). Updated to underscore form everywhere paths appear. - `template/README.md.jinja`: restore example was `anythingllm-ai_backup_20260115_120000` (stale placeholder from the pre-migration template). Updated to `{{ project_name }}_backup_…`. - `template/scripts/backup.sh.jinja`: remote-mode ran `ssh "$host" "$BACKUP_CMDS"` directly — `SPACES_ACCESS_KEY` / `SPACES_SECRET_KEY` were NOT in the inner shell's env, so the `aws s3 cp` upload step would silently no-op. Wrapped the remote bash in `infisical run --projectId=… --env=…` (sources `/opt/<project>/.infisical-auth.env` the same way restore.sh does). Added `INFISICAL_PROJECT_ID:?` guard so remote mode fails fast. - `template/terraform/templates/cloud-init.yaml.jinja`: re-added `awscli` to the packages list. The slim cloud-init had dropped it during the Path C refactor, but the daily backup cron uses `aws s3 cp` so without awscli the first cron tick fails. - `template/scripts/deploy.sh.jinja`: `ansible-galaxy collection install community.docker` was unpinned — operators on different workstations could end up with different versions. Pinned to `==3.13.0` with a precise version check. Re-rendered `anythingllm-docker/sites/s004/` via `copier copy anythingllm-docker/ <tmp> --data-file s004-answers.yaml --defaults --trust`, then overlaid the rendered output onto sites/s004/. Stale/not-fixed comments (acknowledged): - Several "volume name mismatch" comments were on the OLD code — current scripts use `s004_anythingllm_storage` matching compose. - `terraform/main.tf` "Infisical secret in user_data → tfstate" is the deferred Layer 2 limitation (rotation runs at first boot; see docs/INFRA_BOOTSTRAP_PATTERN.md "What Layer 2 does NOT solve"). - `terraform/versions.tf` "no remote state backend" — backend IS in backend.tf (separate file). Terraform merges multiple `terraform {}` blocks across files. False positive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(anythingllm-docker): Copilot round-5 — fix HCL blocker + 8 other real issues Resolves the 9 actual bugs from Copilot's review of `3e312a4` (other ~19 comments triaged as either stale or false-positives — captured below). CRITICAL — tofu blocker: - `template/terraform/variables.tf.jinja:58` (and tfvars.example): the `ssh_source_cidrs` default was `{{ ssh_source_cidrs }}`, which Copier renders as Python's list-repr `['0.0.0.0/0', '::/0']` (single quotes) — NOT valid HCL. `tofu plan` would error on "Invalid character". Switched to `{{ ssh_source_cidrs | tojson }}` so Copier emits the JSON form with double quotes (which HCL parses as a list). Files-not-being-rendered-by-Copier blocker: - `_templates_suffix` defaults to `.jinja` in Copier, so files without that suffix in template/ were copied as-is — including their `{{ ... }}` template syntax. The rendered s004 site had unrendered placeholders in `.gitignore`, `terraform/versions.tf`, and `terraform/terraform.tfvars.example`. Renamed all three to add the `.jinja` suffix. Real correctness bugs: - `anythingllm-docker/ansible/configure-allm.yml`: the tag step did `ip="{{ inventory_hostname }}"` then looked up the droplet by PublicIPv4. With the documented `-i 'root@<ip>,'` inventory pattern, `inventory_hostname` is `root@<ip>`, not `<ip>`, so the doctl lookup always failed silently. Added `ip="${target##*@}"` to strip the prefix. - `template/scripts/backup.sh.jinja` remote-mode: was `bash -c '$BACKUP_CMDS'` where BACKUP_CMDS contains the literal `'table {{.Names}}...'` directives. Outer single quotes break on the inner single quotes. Restructured to invoke the droplet's own `/opt/<project>/backup.sh` (uploaded by ansible) inside `infisical run`. - `template/scripts/restore.sh.jinja` remote-mode: same quoting issue + sourced `/opt/<project>/infisical-auth.sh` (a script that doesn't exist; cloud-init writes `.infisical-auth.env`). Restructured the same way: source `.infisical-auth.env`, run `infisical login`, then exec `/opt/<project>/restore.sh "$BACKUP_NAME"` under `infisical run`. - `template/terraform/main.tf.jinja:36` tag-comment listed `anythingllm/ansible/configure-allm.yml` (no such file); the playbook lives at `anythingllm-docker/ansible/configure-allm.yml`. Fixed. - `template/docker/compose.prod.yaml.jinja`: `LLM_PROVIDER` and `VECTOR_DB` were hardcoded to "openrouter"/"lancedb" while the template still prompts for `llm_provider` and `vector_db` Copier vars (and passes them through Terraform). Those Copier knobs were dead. Replaced with `{{ llm_provider }}` / `{{ vector_db }}`. Stale/false-positive comments (verified, not fixed): - Caddyfile/compose hardcoded `s004.ccc.bot` — correct for the SITE. - main.tf:25 "Infisical secret in user_data" — deferred Layer 2. - versions.tf:6 "no remote state backend" — false positive; backend IS in backend.tf (separate file). - CHANGELOG anythingllm_image — already `reg.mini.dev/anythingllm:1.7.2`. - cloud-init jq `$$V2` — terraform escape, becomes `$V2` post-templatefile inside single-quoted jq script (jq's `--arg V2` definition). - Volume name mismatch comments — scripts already use the explicit `s004_anythingllm_storage` matching compose `name:` declarations. Re-rendered sites/s004/ from the updated template via copier; all 9 fixes verified in the rendered output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(pr31): independent-review findings — restore auto-tagging + harden rotation script Final pre-merge pass from an independent code-review agent caught one HIGH bug that broke the auto-tagging feature this PR was meant to add, plus three LOW hardenings. Applied to the template and re-rendered sites/s004/. HIGH — auto-tagging was silently dead-on-arrival: - `template/ansible/deploy.yml.jinja:202` used `playbook_dir | dirname | dirname | dirname` to find scripts/tag-droplet.sh. But the layout is: <repo-root>/anythingllm-docker/sites/<name>/ansible/deploy.yml So three dirname operations land at <repo-root>/anythingllm-docker/ — not the repo root. With `failed_when: false`, the broken tagging step never failed, so `skinny-backup` and `commit-<sha>` tags silently never appeared on droplets. Added one more `| dirname` to walk to the repo root. The sibling task in anythingllm-docker/ansible/configure-allm.yml is one level shallower and was already correct (dirname × 2). LOW — rotation-log hardening: - cloud-init.yaml.jinja was logging the full Infisical API response on v2-mint failure. Replaced with `jq -r '.message // .error // .'` (head -c 500) so any future API change that echoes request fragments doesn't persist the bearer token. - Pre-chmod the rotation log to 0600 BEFORE writing so it's never world-readable even on failure paths. LOW — config knob clarity: - `INFISICAL_HOST="$${INFISICAL_HOST:-https://app.infisical.com}"` looked overridable, but cloud-init runcmd: has a minimal env so the override couldn't actually be set externally. Dropped the indirection and inlined the SaaS URL. LOW — tag-droplet.sh duplicate-name safety: - Previously did `awk … {print $2; exit}` — first-match-wins on duplicate droplet names. Now counts matches and errors out cleanly if 0 or >1. Reviewer findings explicitly NOT fixed (purely cosmetic): - U+21BA arrow in usage text (Finding 6) - Per-project README status drift risk (Finding 5) Re-rendered sites/s004/ via copier; HIGH fix verified in rendered ansible/deploy.yml, LOW fixes verified in sites/s004/terraform/templates/cloud-init.yaml. This is the merge-ready state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(pr31): Copilot round-6 — 3 real bugs + 1 cosmetic Triaged 25 Copilot comments on `0993c43`. Verified ~20 as stale (Copilot re-flagged from older state — current code is correct). 4 actionable: REAL (would break first deploy): - `template/ansible/deploy.yml.jinja:156` — was using `community.docker.docker_image`, which requires the python3-docker Python SDK on the target host. The slim cloud-init only installs Docker itself, not the SDK. Switched to `community.docker.docker_image_pull` (CLI-based — no SDK needed). Available since community.docker 3.4; we pin 3.13.0. REAL (silently-broken config knob): - `template/scripts/backup.sh.jinja` + `restore.sh.jinja` — were hardcoding `REMOTE_STORAGE="do-spaces"` while `copier.yaml` offers …
🤖 Automated Pull Request — authored by
weown-bot(ecosystem service account)Opened by: @mshahid538
Last pushed by: @ncimino
Branch:
feature/wp-hardening-shd→mainContributors on this branch:
📋 Human Review Checklist — NIST CSF 2.0 Functions
Review per the 6 NIST CSF Functions. Frameworks referenced: NIST CSF 2.0, CIS Controls v8 IG1, CSA CCM v4, ISO/IEC 27001:2022, SOC 2, ISO/IEC 42001:2023. See
docs/COMPLIANCE_ROADMAP.md.🏛️ Govern (GV)
.github/CODEOWNERS)🔍 Identify (ID)
.github/SECURITY_ASSESSMENT.md)🛡️ Protect (PR)
--from-literal, never/tmp, always$(mktemp)— ISO A.8.24)restricted(NIST PR.IP, CIS 4)🕵️ Detect (DE)
livenessProbe+readinessProbe) configured🚨 Respond (RS)
.github/INCIDENT_RESPONSE.md)♻️ Recover (RC)
📚 Documentation & Versioning
CHANGELOG.mdupdated (per-directory or repo-level/CHANGELOG.md)#WeOwnVerversion bumped perdocs/VERSIONING_WEOWNVER.md📝 Recent Commits (full bodies for Copilot context)
cc891af fix(values): resolve dead config, env var duplicates, and size mismatch
Author: Nik
Date: Mon May 18 16:04:47 2026 -0600
extraEnvVars (deployment.yaml only reads the top-level key)
when redis.enabled=true) and WORDPRESS_CONFIG_EXTRA (already generated
from wordpressExtraWpConfigContent — duplicate silently overwrote the
more complete template version)
hosts from wordpress.domain, not tls list)
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
63cf42a feat(ingress): expand edge block list and add security response headers
Author: Nik
Date: Fri May 15 22:51:47 2026 -0600
Addresses PR #19 review items 11 and 12:
debug.log, and .svn at the nginx edge (defense-in-depth)
Permissions-Policy, and Content-Security-Policy via configuration-snippet
Chart version 3.3.0 → 3.3.1.
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com
8249072 fix(wordpress/helm): address PR #15 review items 10-12
Author: Nik
Date: Wed May 13 23:49:36 2026 -0600
.Values.ingress.tls[0].secretName so per-site overrides
(burnedout-tls, ptoken-tls) actually take effect. Falls
back to "wordpress-tls" when .Values.ingress.tls is a
map or unset (preserves default and TLS hardening map).
wordfence-waf.php" behind .Values.wordpress.wordfence.enabled
(default false) to avoid PHP warnings when the plugin is
not installed.
behind .Values.ingress.serverSnippet.enabled (default true)
so the chart can deploy on hardened controllers that set
allow-snippet-annotations: false.
ingress.serverSnippet.enabled.
Verified: helm template renders correctly with default,
values-burnedout, and values-ptoken; helm lint clean.
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com
0318268 docs: apply markdownlint autofixes for PR #15 CI
Author: Nik
Date: Wed May 13 23:35:57 2026 -0600
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com
d514af5 Merge remote-tracking branch 'origin/main' into feature/wp-hardening-shd
Author: romandidomizio
Date: Wed May 13 13:07:40 2026 -0600
488ebfb fix: resolve yamllint line endings
Author: m.shahid
Date: Wed May 6 23:03:36 2026 +0500
52bc0a2 fix: resolve yamllint line endings and comment indentation
Author: m.shahid
Date: Wed May 6 22:52:40 2026 +0500
c073eb0 udated version bump
Author: m.shahid
Date: Wed May 6 22:42:50 2026 +0500
994c8be fix: resolve trivy security scan and linting issues
Author: m.shahid
Date: Wed May 6 22:31:06 2026 +0500
30b5355 Merge branch 'feature/wordpress-docker-copier-template' into feature/wp-hardening-shd
Author: Nik
Date: Tue May 5 16:28:43 2026 -0600
b98ac28 chore: bump helm chart version to 3.2.7
Author: m.shahid
Date: Thu Apr 23 14:46:43 2026 +0500
8d44f58 chore(helm): implement multi-site values strategy for burnedout and ptoken
Author: m.shahid
Date: Thu Apr 23 14:43:00 2026 +0500
1ece74f feat: codify PHP limits and block .user.ini per Task D152 & #264
Author: m.shahid
Date: Thu Apr 23 14:38:47 2026 +0500
🔍 Copilot AI Review: Copilot is configured to auto-request review for bot-authored PRs. If an auto-created PR opens without an initial Copilot review, push a follow-up commit to the same open PR (
review_on_push: true) to trigger review automatically.👥 Required Reviewers: 1 human approval enforced by branch protection. requested automatically.
📚 Review Guidelines:
.github/copilot-instructions.md(phase-aware compliance directives)🛠️ Workflow Operations:
.github/workflows/README.mdAuto-generated by
.github/workflows/auto-pr-to-main.yml