Update Documentation for Vault Cluster Setup with Security Considerations by Smana · Pull Request #76 · Smana/cloud-native-ref

Smana · 2024-01-08T18:57:49Z

Type

Documentation

Description

This PR primarily updates the documentation for the Vault cluster setup. The most significant changes include:

The 'Architecture' section has been removed from the README.
A new 'Security Considerations' section has been added, providing advice on how to enhance the security of the Vault setup.
The 'High Availability' section has been expanded with more detailed explanations of the architectural decisions.
The 'Getting Started' section has been slightly modified.
A minor change has been made to the instructions for building and importing the full chain bundle in the management README.

Changes walkthrough

Relevant files

Documentation

README.md terraform/vault/cluster/README.md The changes in this file include the removal of the 'Architecture' section and the addition of a 'Security Considerations' section. The 'High Availability' section has been expanded with more detailed explanations of the architectural decisions. The 'Getting Started' section has also been slightly modified.	+17/-14
README.md terraform/vault/management/README.md A minor change has been made to the instructions for building and importing the full chain bundle. The user is now instructed to navigate to the 'terraform/vault/management' directory before executing the commands.	+1/-0

✨ Usage guide:

Overview:
The describe tool scans the PR code changes, and generates a description for the PR - title, type, summary, walkthrough and labels. The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on a PR.

When commenting, to edit configurations related to the describe tool (pr_description section), use the following template:

/describe --pr_description.some_config1=... --pr_description.some_config2=...

With a configuration file, use the following template:

[pr_description]
some_config1=...
some_config2=...

Enabling\disabling automation When you first install the app, the default mode for the describe tool is: `pr_commands = ["/describe --pr_description.add_original_user_description=true" "--pr_description.keep_original_user_title=true", ...]` meaning the `describe` tool will run automatically on every PR, will keep the original title, and will add the original user description above the generated description. Markers are an alternative way to control the generated description, to give maximal control to the user. If you set: `pr_commands = ["/describe --pr_description.use_description_markers=true", ...]` the tool will replace every marker of the form `pr_agent:marker_name` in the PR description with the relevant content, where `marker_name` is one of the following: `type`: the PR type. `summary`: the PR summary. `walkthrough`: the PR walkthrough. Note that when markers are enabled, if the original PR description does not contain any markers, the tool will not alter the description at all.
Custom labels The default labels of the `describe` tool are quite generic: [`Bug fix`, `Tests`, `Enhancement`, `Documentation`, `Other`]. If you specify custom labels in the repo's labels page or via configuration file, you can get tailored labels for your use cases. Examples for custom labels: `Main topic:performance` - pr_agent:The main topic of this PR is performance `New endpoint` - pr_agent:A new endpoint was added in this PR `SQL query` - pr_agent:A new SQL query was added in this PR `Dockerfile changes` - pr_agent:The PR contains changes in the Dockerfile ... The list above is eclectic, and aims to give an idea of different possibilities. Define custom labels that are relevant for your repo and use cases. Note that Labels are not mutually exclusive, so you can add multiple label categories. Make sure to provide proper title, and a detailed and well-phrased description for each label, so the tool will know when to suggest it.
More PR-Agent commands To invoke the PR-Agent, add a comment using one of the following commands: /review: Request a review of your Pull Request. /describe: Update the PR title and description based on the contents of the PR. /improve [--extended]: Suggest code improvements. Extended mode provides a higher quality feedback. /ask <QUESTION>: Ask a question about the PR. /update_changelog: Update the changelog based on the PR's contents. /add_docs 💎: Generate docstring for new components introduced in the PR. /generate_labels 💎: Generate labels for the PR based on the PR's contents. /analyze 💎: Automatically analyzes the PR, and presents changes walkthrough for each component. See the tools guide for more details. To list the possible configuration parameters, add a /config comment.

See the describe usage page for a comprehensive guide on using this tool.

github-actions · 2024-01-08T18:58:57Z

PR Analysis

🎯 Main theme: Updating documentation for Vault cluster setup and adding a new section on security considerations.
📝 PR summary: This PR updates the documentation for setting up a Vault cluster, including changes to the architecture section and the addition of a new section on security considerations. The changes provide more detailed explanations of the architectural decisions and security measures implemented in the Vault cluster setup.
📌 Type of PR: Documentation
🧪 Relevant tests added: No
⏱️ Estimated effort to review [1-5]: 1, because the PR is focused on documentation updates and does not involve any changes to the codebase.
🔒 Security concerns: No

PR Feedback

💡 General suggestions: The PR is well-structured and provides valuable information on the architecture and security considerations for setting up a Vault cluster. It would be beneficial to provide more context or explanation for some of the security measures mentioned, especially for readers who may not be familiar with these concepts.

🤖 Code feedback:

relevant file	terraform/vault/cluster/README.md
suggestion	It would be helpful to provide more context or explanation for the security considerations listed. For example, explain why it's important to keep the Root CA offline, or what the potential risks are if SSM is not disabled once the cluster is operational. [medium]
relevant line	* Keep the Root CA offline.

relevant file	terraform/vault/cluster/README.md
suggestion	The link to the external project for hardened AMIs is a good resource. However, it would be beneficial to briefly describe what a hardened AMI is and why it's recommended for use. [medium]
relevant line	<a href=https://github.com/konstruktoid/hardened-images) from @konstruktoid. An Ubuntu AMI from Canonical is used by default.>* Use hardened AMIs, such as those built with [this project

✨ Usage guide:

Overview:
The review tool scans the PR code changes, and generates a PR review. The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on any PR.
When commenting, to edit configurations related to the review tool (pr_reviewer section), use the following template:

/review --pr_reviewer.some_config1=... --pr_reviewer.some_config2=...

With a configuration file, use the following template:

[pr_reviewer]
some_config1=...
some_config2=...

Utilizing extra instructions The `review` tool can be configured with extra instructions, which can be used to guide the model to a feedback tailored to the needs of your project. Be specific, clear, and concise in the instructions. With extra instructions, you are the prompter. Specify the relevant sub-tool, and the relevant aspects of the PR that you want to emphasize. Examples for extra instructions: `[pr_reviewer] # /review # extra_instructions=""" In the code feedback section, emphasize the following: - Does the code logic cover relevant edge cases? - Is the code logic clear and easy to understand? - Is the code logic efficient? ... """` Use triple quotes to write multi-line instructions. Use bullet points to make the instructions more readable.
How to enable\disable automation When you first install PR-Agent app, the default mode for the `review` tool is: `pr_commands = ["/review", ...]` meaning the `review` tool will run automatically on every PR, with the default configuration. Edit this field to enable/disable the tool, or to change the used configurations
About the 'Code feedback' section The `review` tool provides several type of feedbacks, one of them is code suggestions. If you are interested only in the code suggestions, it is recommended to use the `improve` feature instead, since it dedicated only to code suggestions, and usually gives better results. Use the `review` tool if you want to get a more comprehensive feedback, which includes code suggestions as well.
Auto-labels The `review` tool can auto-generate two specific types of labels for a PR: a `possible security issue` label, that detects possible security issues (`enable_review_labels_security` flag) a `Review effort [1-5]: x` label, where x is the estimated effort to review the PR (`enable_review_labels_effort` flag)
Extra sub-tools The `review` tool provides a collection of possible feedbacks about a PR. It is recommended to review the possible options, and choose the ones relevant for your use case. Some of the feature that are disabled by default are quite useful, and should be considered for enabling. For example: `require_score_review`, `require_soc2_review`, `enable_review_labels_effort`, and more.
More PR-Agent commands To invoke the PR-Agent, add a comment using one of the following commands: /review: Request a review of your Pull Request. /describe: Update the PR title and description based on the contents of the PR. /improve [--extended]: Suggest code improvements. Extended mode provides a higher quality feedback. /ask <QUESTION>: Ask a question about the PR. /update_changelog: Update the changelog based on the PR's contents. /add_docs 💎: Generate docstring for new components introduced in the PR. /generate_labels 💎: Generate labels for the PR based on the PR's contents. /analyze 💎: Automatically analyzes the PR, and presents changes walkthrough for each component. See the tools guide for more details. To list the possible configuration parameters, add a /config comment.

See the review usage page for a comprehensive guide on using this tool.

Smana · 2024-01-08T18:59:10Z

/describe

github-actions · 2024-01-08T19:00:13Z

PR Description updated to latest commit (a623eab)

…L composition, per-claim CNP) Four bugs surfaced once OpenWebUI was wired into the AI gateway end-to-end (commit 556e7e1b) — each individually small but the combination was needed to make the chat UI return real completions instead of timeouts and 4xx: 1. **CEC backend Service port collision.** SR exposes 50051 (ext_proc gRPC) and 8080 (HTTP API). With `backendServices: [{name: vllm-semantic-router}]` Cilium pushed BOTH endpoints into the EDS cluster, so cilium-envoy would intermittently dial 8080 for ext_proc and get back HTTP/1.1 instead of an ext_proc gRPC stream — surfaced as `upstream connect error or disconnect/reset before headers. reset reason: protocol error`. `failure_mode_allow: true` did not save the request: the ext_proc passthrough returned 403. Fix: pin the cluster to 50051 with `number: ["50051"]`. Pinned the four xplane-* clusters to 8000 too for symmetry / Hubble clarity. 2. **`llm-router` Service had no endpoints.** Selector targeted `app.kubernetes.io/name: vllm-semantic-router`, but the chart actually labels the SR pod `semantic-router`. CEC service- redirect itself doesn't need endpoints, but the egress-policy plane on the source side does — clients with restrictive egress CNPs (default-deny + `toEndpoints`) couldn't resolve a destination identity and Cilium's L7 filter returned plain `Access denied` 403. Fix: switch the selector to `semantic-router` so endpoints exist and the policy decision becomes deterministic. 3. **OpenWebUI egress was scoped to a single label.** With CEC service-redirect the actual upstream is whichever vLLM Service SR picks (any of 4 today, future-N tomorrow), not SR. Listing each label by hand re-couples the UI to the model fleet. Fix: widen to `entities: cluster` on TCP 80 / 8000 / 8080 — acceptable for a chat UI workload (restricted PSS, scoped IAM, `world` egress already capped at 443). 4. **`_defaultIngress` cross-namespace match in the InferenceService composition.** Cilium CNP `fromEndpoints.matchLabels` defaults to the policy's own namespace when no `io.kubernetes.pod.namespace` key is set. The default-deny + `xplane-openwebui` allow only matched pods in the llm namespace (where SR + promptfoo live). Adding the `apps` namespace label makes the rule actually match the real OpenWebUI pod and unblocks chat completions through the gateway. Plus the workaround for the still-using-the-old-composition issue (task #76): the four InferenceService claim manifests (phi4-mini, qwen3-8b, deepseek-r1-distill-qwen3-8b, llamaguard3-1b) get a `spec.networkPolicies.ingress` override that mirrors the new KCL composition's `_defaultIngress`. Drop the per-claim blocks once the next composition version is published. Verified live with a UI prompt: POST /v1/chat/completions {"model":"auto","messages":[...]} -> HTTP 200, x-vsr-selected-model: xplane-phi4-mini, body: "2+2 equals 4."

Three documentation artifacts shipped together (all from the same 2026-05-04 brainstorming session): **docs/superpowers/specs/2026-05-04-coding-llm-fleet-design.md.** Design doc for swapping DeepSeek-R1-Distill-Qwen-7B out for Qwen2.5-Coder-7B-Instruct and adding Qwen2.5-Coder-1.5B-Base as an always-warm FIM model for inline tab-complete. Covers fleet specification, two-path routing (client-deterministic for OpenCode + Continue, SR cascade for OpenWebUI MoM), GPU budget vs the 4-GPU NodePool cap, ~$700-900/mo cost envelope, and the operational concerns (Karpenter consolidation, ext_proc cold-connect, CNP overrides). **docs/superpowers/specs/2026-05-04-coding-llm-fleet-plan.md.** Companion implementation plan with concrete ordered steps: rename DeepSeek claim to Qwen2.5-Coder, add Qwen2.5-Coder-1.5B-Base FIM claim, rewrite SR decisions[] to route code prompts to xplane-qwen-coder, expose individual models in /v1/models, write client config docs for OpenCode and Continue, smoke-test cascade vs direct-pick, drop old DeepSeek weights from S3. Status table at the top tracks which phases shipped pre-redeploy vs which are deferred (smoke tests + S3 cleanup need a deployed cluster; composition republish is task #76). **clusters/mycluster-0-llm-platform/README.md.** Adds the new fleet shape table (5 claims) + whole-cluster `terramate script run --reverse destroy` procedure (one y/n prompt, walks every stack in reverse) + explicit list of data buckets preserved by the Orphan managementPolicies on the Bucket MRs.

Phased TDD-shaped plan to deliver the design at 2026-05-05-ai-gateway-redesign-design.md (commit 060c02e). 5 phases, each independently mergeable: P1 smoke (dedicated Envoy + qwen3-8b route), P2 SR ext_proc via EnvoyExtensionPolicy with filter-ordering verification (Lua fallback documented), P3 full fleet routing (Service-backed), P4 InferencePool + EPP per claim (folds task #76), P5 demolition (delete CEC + llm-router-proxy + GHCR workflow, repoint Tailscale, close #78). Cross-cutting verification table + per-phase rollback playbook + open items for implementation-time discovery.

Bumps the InferenceService composition from 0.3.2 → 0.3.3 with two additions to _defaultIngress so the post-P4 traffic flow lands cleanly on vLLM pods without per-claim CNP overrides: 1. AI Gateway data plane → vLLM TCP 8000. The dedicated Envoy AI Gateway data plane (envoy-ai-gateway-system, gateway.envoyproxy.io/owning-gateway-name=ai-gateway) routes traffic to vLLM via InferencePool selection — EPP returns a pod IP, gateway connects directly. Without this allow, the gateway would silently drop on the first inference request. 2. Endpoint Picker Plugin (EPP) → vLLM TCP 8000. Each EPP scrapes /metrics for queue depth + KV-cache pressure to score endpoints. All 5 EPP pods (one per InferencePool) share the `inferencepool` pod-template label set by the upstream chart; matched via matchExpressions Exists scoped to the llm namespace. Composition source URL bumped to 0.3.3-pr1434 (CI will publish via the crossplane-modules.yml workflow on this PR's next run, per the kcl.mod version-rewrite path). Drops the per-claim networkPolicies overrides on all 5 model claims (phi4-mini, qwen3-8b, llamaguard3-1b, qwen-coder, qwen-coder-fim). Each was a verbatim copy of the composition's _defaultIngress, added when an earlier composition version omitted the apps namespace label on the OpenWebUI selector. The default now covers everything those overrides did, plus the new AI Gateway + EPP sources. Closes task #76.

…L composition, per-claim CNP) Four bugs surfaced once OpenWebUI was wired into the AI gateway end-to-end (commit 556e7e1b) — each individually small but the combination was needed to make the chat UI return real completions instead of timeouts and 4xx: 1. **CEC backend Service port collision.** SR exposes 50051 (ext_proc gRPC) and 8080 (HTTP API). With `backendServices: [{name: vllm-semantic-router}]` Cilium pushed BOTH endpoints into the EDS cluster, so cilium-envoy would intermittently dial 8080 for ext_proc and get back HTTP/1.1 instead of an ext_proc gRPC stream — surfaced as `upstream connect error or disconnect/reset before headers. reset reason: protocol error`. `failure_mode_allow: true` did not save the request: the ext_proc passthrough returned 403. Fix: pin the cluster to 50051 with `number: ["50051"]`. Pinned the four xplane-* clusters to 8000 too for symmetry / Hubble clarity. 2. **`llm-router` Service had no endpoints.** Selector targeted `app.kubernetes.io/name: vllm-semantic-router`, but the chart actually labels the SR pod `semantic-router`. CEC service- redirect itself doesn't need endpoints, but the egress-policy plane on the source side does — clients with restrictive egress CNPs (default-deny + `toEndpoints`) couldn't resolve a destination identity and Cilium's L7 filter returned plain `Access denied` 403. Fix: switch the selector to `semantic-router` so endpoints exist and the policy decision becomes deterministic. 3. **OpenWebUI egress was scoped to a single label.** With CEC service-redirect the actual upstream is whichever vLLM Service SR picks (any of 4 today, future-N tomorrow), not SR. Listing each label by hand re-couples the UI to the model fleet. Fix: widen to `entities: cluster` on TCP 80 / 8000 / 8080 — acceptable for a chat UI workload (restricted PSS, scoped IAM, `world` egress already capped at 443). 4. **`_defaultIngress` cross-namespace match in the InferenceService composition.** Cilium CNP `fromEndpoints.matchLabels` defaults to the policy's own namespace when no `io.kubernetes.pod.namespace` key is set. The default-deny + `xplane-openwebui` allow only matched pods in the llm namespace (where SR + promptfoo live). Adding the `apps` namespace label makes the rule actually match the real OpenWebUI pod and unblocks chat completions through the gateway. Plus the workaround for the still-using-the-old-composition issue (task #76): the four InferenceService claim manifests (phi4-mini, qwen3-8b, deepseek-r1-distill-qwen3-8b, llamaguard3-1b) get a `spec.networkPolicies.ingress` override that mirrors the new KCL composition's `_defaultIngress`. Drop the per-claim blocks once the next composition version is published. Verified live with a UI prompt: POST /v1/chat/completions {"model":"auto","messages":[...]} -> HTTP 200, x-vsr-selected-model: xplane-phi4-mini, body: "2+2 equals 4."

Three documentation artifacts shipped together (all from the same 2026-05-04 brainstorming session): **docs/superpowers/specs/2026-05-04-coding-llm-fleet-design.md.** Design doc for swapping DeepSeek-R1-Distill-Qwen-7B out for Qwen2.5-Coder-7B-Instruct and adding Qwen2.5-Coder-1.5B-Base as an always-warm FIM model for inline tab-complete. Covers fleet specification, two-path routing (client-deterministic for OpenCode + Continue, SR cascade for OpenWebUI MoM), GPU budget vs the 4-GPU NodePool cap, ~$700-900/mo cost envelope, and the operational concerns (Karpenter consolidation, ext_proc cold-connect, CNP overrides). **docs/superpowers/specs/2026-05-04-coding-llm-fleet-plan.md.** Companion implementation plan with concrete ordered steps: rename DeepSeek claim to Qwen2.5-Coder, add Qwen2.5-Coder-1.5B-Base FIM claim, rewrite SR decisions[] to route code prompts to xplane-qwen-coder, expose individual models in /v1/models, write client config docs for OpenCode and Continue, smoke-test cascade vs direct-pick, drop old DeepSeek weights from S3. Status table at the top tracks which phases shipped pre-redeploy vs which are deferred (smoke tests + S3 cleanup need a deployed cluster; composition republish is task #76). **clusters/mycluster-0-llm-platform/README.md.** Adds the new fleet shape table (5 claims) + whole-cluster `terramate script run --reverse destroy` procedure (one y/n prompt, walks every stack in reverse) + explicit list of data buckets preserved by the Orphan managementPolicies on the Bucket MRs.

docs(vault): add security considerations

a623eab

github-actions Bot changed the title ~~docs(vault): add security considerations~~ Update Documentation for Vault Cluster Setup with Security Considerations Jan 8, 2024

github-actions Bot added the documentation Improvements or additions to documentation label Jan 8, 2024

Smana merged commit c682e8d into main Jan 8, 2024

Smana deleted the docs_update_vault_install_hardened branch January 8, 2024 19:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Documentation for Vault Cluster Setup with Security Considerations#76

Update Documentation for Vault Cluster Setup with Security Considerations#76
Smana merged 1 commit into
mainfrom
docs_update_vault_install_hardened

Smana commented Jan 8, 2024 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented Jan 8, 2024

Uh oh!

Smana commented Jan 8, 2024

Uh oh!

github-actions Bot commented Jan 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Smana commented Jan 8, 2024 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type

Description

Changes walkthrough

Uh oh!

github-actions Bot commented Jan 8, 2024

PR Analysis

PR Feedback

Uh oh!

Smana commented Jan 8, 2024

Uh oh!

github-actions Bot commented Jan 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Smana commented Jan 8, 2024 •

edited by github-actions Bot

Loading