Skip to content

Update Documentation for Vault Cluster Setup with Security Considerations#76

Merged
Smana merged 1 commit into
mainfrom
docs_update_vault_install_hardened
Jan 8, 2024
Merged

Update Documentation for Vault Cluster Setup with Security Considerations#76
Smana merged 1 commit into
mainfrom
docs_update_vault_install_hardened

Conversation

@Smana
Copy link
Copy Markdown
Owner

@Smana Smana commented Jan 8, 2024

Type

Documentation


Description

This PR primarily updates the documentation for the Vault cluster setup. The most significant changes include:

  • The 'Architecture' section has been removed from the README.
  • A new 'Security Considerations' section has been added, providing advice on how to enhance the security of the Vault setup.
  • The 'High Availability' section has been expanded with more detailed explanations of the architectural decisions.
  • The 'Getting Started' section has been slightly modified.
  • A minor change has been made to the instructions for building and importing the full chain bundle in the management README.

Changes walkthrough

Relevant files                                                                                                                                 
Documentation
README.md                                                                                                     
    terraform/vault/cluster/README.md

    The changes in this file include the removal of the
    'Architecture' section and the addition of a 'Security
    Considerations' section. The 'High Availability' section has
    been expanded with more detailed explanations of the
    architectural decisions. The 'Getting Started' section has
    also been slightly modified.

+17/-14
README.md                                                                                                     
    terraform/vault/management/README.md

    A minor change has been made to the instructions for
    building and importing the full chain bundle. The user is
    now instructed to navigate to the
    'terraform/vault/management' directory before executing the
    commands.

+1/-0

✨ Usage guide:

Overview:
The describe tool scans the PR code changes, and generates a description for the PR - title, type, summary, walkthrough and labels. The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on a PR.

When commenting, to edit configurations related to the describe tool (pr_description section), use the following template:

/describe --pr_description.some_config1=... --pr_description.some_config2=...

With a configuration file, use the following template:

[pr_description]
some_config1=...
some_config2=...
Enabling\disabling automation
  • When you first install the app, the default mode for the describe tool is:
pr_commands = ["/describe --pr_description.add_original_user_description=true" 
                         "--pr_description.keep_original_user_title=true", ...]

meaning the describe tool will run automatically on every PR, will keep the original title, and will add the original user description above the generated description.

  • Markers are an alternative way to control the generated description, to give maximal control to the user. If you set:
pr_commands = ["/describe --pr_description.use_description_markers=true", ...]

the tool will replace every marker of the form pr_agent:marker_name in the PR description with the relevant content, where marker_name is one of the following:

  • type: the PR type.
  • summary: the PR summary.
  • walkthrough: the PR walkthrough.

Note that when markers are enabled, if the original PR description does not contain any markers, the tool will not alter the description at all.

Custom labels

The default labels of the describe tool are quite generic: [Bug fix, Tests, Enhancement, Documentation, Other].

If you specify custom labels in the repo's labels page or via configuration file, you can get tailored labels for your use cases.
Examples for custom labels:

  • Main topic:performance - pr_agent:The main topic of this PR is performance
  • New endpoint - pr_agent:A new endpoint was added in this PR
  • SQL query - pr_agent:A new SQL query was added in this PR
  • Dockerfile changes - pr_agent:The PR contains changes in the Dockerfile
  • ...

The list above is eclectic, and aims to give an idea of different possibilities. Define custom labels that are relevant for your repo and use cases.
Note that Labels are not mutually exclusive, so you can add multiple label categories.
Make sure to provide proper title, and a detailed and well-phrased description for each label, so the tool will know when to suggest it.

More PR-Agent commands

To invoke the PR-Agent, add a comment using one of the following commands:

  • /review: Request a review of your Pull Request.
  • /describe: Update the PR title and description based on the contents of the PR.
  • /improve [--extended]: Suggest code improvements. Extended mode provides a higher quality feedback.
  • /ask <QUESTION>: Ask a question about the PR.
  • /update_changelog: Update the changelog based on the PR's contents.
  • /add_docs 💎: Generate docstring for new components introduced in the PR.
  • /generate_labels 💎: Generate labels for the PR based on the PR's contents.
  • /analyze 💎: Automatically analyzes the PR, and presents changes walkthrough for each component.

See the tools guide for more details.
To list the possible configuration parameters, add a /config comment.

See the describe usage page for a comprehensive guide on using this tool.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 8, 2024

PR Analysis

  • 🎯 Main theme: Updating documentation for Vault cluster setup and adding a new section on security considerations.
  • 📝 PR summary: This PR updates the documentation for setting up a Vault cluster, including changes to the architecture section and the addition of a new section on security considerations. The changes provide more detailed explanations of the architectural decisions and security measures implemented in the Vault cluster setup.
  • 📌 Type of PR: Documentation
  • 🧪 Relevant tests added: No
  • ⏱️ Estimated effort to review [1-5]: 1, because the PR is focused on documentation updates and does not involve any changes to the codebase.
  • 🔒 Security concerns: No

PR Feedback

💡 General suggestions: The PR is well-structured and provides valuable information on the architecture and security considerations for setting up a Vault cluster. It would be beneficial to provide more context or explanation for some of the security measures mentioned, especially for readers who may not be familiar with these concepts.

🤖 Code feedback:
relevant fileterraform/vault/cluster/README.md
suggestion      

It would be helpful to provide more context or explanation for the security considerations listed. For example, explain why it's important to keep the Root CA offline, or what the potential risks are if SSM is not disabled once the cluster is operational. [medium]

relevant line* Keep the Root CA offline.

relevant fileterraform/vault/cluster/README.md
suggestion      

The link to the external project for hardened AMIs is a good resource. However, it would be beneficial to briefly describe what a hardened AMI is and why it's recommended for use. [medium]

relevant line<a href=https://github.com/konstruktoid/hardened-images) from @konstruktoid. An Ubuntu AMI from Canonical is used by default.>* Use hardened AMIs, such as those built with [this project


✨ Usage guide:

Overview:
The review tool scans the PR code changes, and generates a PR review. The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on any PR.
When commenting, to edit configurations related to the review tool (pr_reviewer section), use the following template:

/review --pr_reviewer.some_config1=... --pr_reviewer.some_config2=...

With a configuration file, use the following template:

[pr_reviewer]
some_config1=...
some_config2=...
Utilizing extra instructions

The review tool can be configured with extra instructions, which can be used to guide the model to a feedback tailored to the needs of your project.

Be specific, clear, and concise in the instructions. With extra instructions, you are the prompter. Specify the relevant sub-tool, and the relevant aspects of the PR that you want to emphasize.

Examples for extra instructions:

[pr_reviewer] # /review #
extra_instructions="""
In the code feedback section, emphasize the following:
- Does the code logic cover relevant edge cases?
- Is the code logic clear and easy to understand?
- Is the code logic efficient?
...
"""

Use triple quotes to write multi-line instructions. Use bullet points to make the instructions more readable.

How to enable\disable automation
  • When you first install PR-Agent app, the default mode for the review tool is:
pr_commands = ["/review", ...]

meaning the review tool will run automatically on every PR, with the default configuration.
Edit this field to enable/disable the tool, or to change the used configurations

About the 'Code feedback' section

The review tool provides several type of feedbacks, one of them is code suggestions.
If you are interested only in the code suggestions, it is recommended to use the improve feature instead, since it dedicated only to code suggestions, and usually gives better results.
Use the review tool if you want to get a more comprehensive feedback, which includes code suggestions as well.

Auto-labels

The review tool can auto-generate two specific types of labels for a PR:

  • a possible security issue label, that detects possible security issues (enable_review_labels_security flag)
  • a Review effort [1-5]: x label, where x is the estimated effort to review the PR (enable_review_labels_effort flag)
Extra sub-tools

The review tool provides a collection of possible feedbacks about a PR.
It is recommended to review the possible options, and choose the ones relevant for your use case.
Some of the feature that are disabled by default are quite useful, and should be considered for enabling. For example:
require_score_review, require_soc2_review, enable_review_labels_effort, and more.

More PR-Agent commands

To invoke the PR-Agent, add a comment using one of the following commands:

  • /review: Request a review of your Pull Request.
  • /describe: Update the PR title and description based on the contents of the PR.
  • /improve [--extended]: Suggest code improvements. Extended mode provides a higher quality feedback.
  • /ask <QUESTION>: Ask a question about the PR.
  • /update_changelog: Update the changelog based on the PR's contents.
  • /add_docs 💎: Generate docstring for new components introduced in the PR.
  • /generate_labels 💎: Generate labels for the PR based on the PR's contents.
  • /analyze 💎: Automatically analyzes the PR, and presents changes walkthrough for each component.

See the tools guide for more details.
To list the possible configuration parameters, add a /config comment.

See the review usage page for a comprehensive guide on using this tool.

@Smana
Copy link
Copy Markdown
Owner Author

Smana commented Jan 8, 2024

/describe

@github-actions github-actions Bot changed the title docs(vault): add security considerations Update Documentation for Vault Cluster Setup with Security Considerations Jan 8, 2024
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Jan 8, 2024
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 8, 2024

PR Description updated to latest commit (a623eab)

@Smana Smana merged commit c682e8d into main Jan 8, 2024
@Smana Smana deleted the docs_update_vault_install_hardened branch January 8, 2024 19:00
Smana added a commit that referenced this pull request May 4, 2026
…L composition, per-claim CNP)

Four bugs surfaced once OpenWebUI was wired into the AI gateway
end-to-end (commit 556e7e1b) — each individually small but the
combination was needed to make the chat UI return real completions
instead of timeouts and 4xx:

1. **CEC backend Service port collision.** SR exposes 50051
   (ext_proc gRPC) and 8080 (HTTP API). With
   `backendServices: [{name: vllm-semantic-router}]` Cilium pushed
   BOTH endpoints into the EDS cluster, so cilium-envoy would
   intermittently dial 8080 for ext_proc and get back HTTP/1.1
   instead of an ext_proc gRPC stream — surfaced as
   `upstream connect error or disconnect/reset before headers.
   reset reason: protocol error`. `failure_mode_allow: true` did
   not save the request: the ext_proc passthrough returned 403.
   Fix: pin the cluster to 50051 with `number: ["50051"]`. Pinned
   the four xplane-* clusters to 8000 too for symmetry / Hubble
   clarity.

2. **`llm-router` Service had no endpoints.** Selector targeted
   `app.kubernetes.io/name: vllm-semantic-router`, but the chart
   actually labels the SR pod `semantic-router`. CEC service-
   redirect itself doesn't need endpoints, but the egress-policy
   plane on the source side does — clients with restrictive egress
   CNPs (default-deny + `toEndpoints`) couldn't resolve a
   destination identity and Cilium's L7 filter returned plain
   `Access denied` 403. Fix: switch the selector to
   `semantic-router` so endpoints exist and the policy decision
   becomes deterministic.

3. **OpenWebUI egress was scoped to a single label.** With CEC
   service-redirect the actual upstream is whichever vLLM Service
   SR picks (any of 4 today, future-N tomorrow), not SR. Listing
   each label by hand re-couples the UI to the model fleet. Fix:
   widen to `entities: cluster` on TCP 80 / 8000 / 8080 —
   acceptable for a chat UI workload (restricted PSS, scoped IAM,
   `world` egress already capped at 443).

4. **`_defaultIngress` cross-namespace match in the
   InferenceService composition.** Cilium CNP
   `fromEndpoints.matchLabels` defaults to the policy's own
   namespace when no `io.kubernetes.pod.namespace` key is set.
   The default-deny + `xplane-openwebui` allow only matched pods
   in the llm namespace (where SR + promptfoo live). Adding the
   `apps` namespace label makes the rule actually match the real
   OpenWebUI pod and unblocks chat completions through the
   gateway.

Plus the workaround for the still-using-the-old-composition issue
(task #76): the four InferenceService claim manifests
(phi4-mini, qwen3-8b, deepseek-r1-distill-qwen3-8b, llamaguard3-1b)
get a `spec.networkPolicies.ingress` override that mirrors the new
KCL composition's `_defaultIngress`. Drop the per-claim blocks
once the next composition version is published.

Verified live with a UI prompt:
  POST /v1/chat/completions {"model":"auto","messages":[...]}
  -> HTTP 200, x-vsr-selected-model: xplane-phi4-mini,
     body: "2+2 equals 4."
Smana added a commit that referenced this pull request May 4, 2026
Three documentation artifacts shipped together (all from the same
2026-05-04 brainstorming session):

**docs/superpowers/specs/2026-05-04-coding-llm-fleet-design.md.**
Design doc for swapping DeepSeek-R1-Distill-Qwen-7B out for
Qwen2.5-Coder-7B-Instruct and adding Qwen2.5-Coder-1.5B-Base as
an always-warm FIM model for inline tab-complete. Covers fleet
specification, two-path routing (client-deterministic for OpenCode +
Continue, SR cascade for OpenWebUI MoM), GPU budget vs the 4-GPU
NodePool cap, ~$700-900/mo cost envelope, and the operational
concerns (Karpenter consolidation, ext_proc cold-connect, CNP
overrides).

**docs/superpowers/specs/2026-05-04-coding-llm-fleet-plan.md.**
Companion implementation plan with concrete ordered steps:
rename DeepSeek claim to Qwen2.5-Coder, add Qwen2.5-Coder-1.5B-Base
FIM claim, rewrite SR decisions[] to route code prompts to
xplane-qwen-coder, expose individual models in /v1/models, write
client config docs for OpenCode and Continue, smoke-test cascade vs
direct-pick, drop old DeepSeek weights from S3. Status table at the
top tracks which phases shipped pre-redeploy vs which are deferred
(smoke tests + S3 cleanup need a deployed cluster; composition
republish is task #76).

**clusters/mycluster-0-llm-platform/README.md.** Adds the new
fleet shape table (5 claims) + whole-cluster
`terramate script run --reverse destroy` procedure (one y/n
prompt, walks every stack in reverse) + explicit list of data
buckets preserved by the Orphan managementPolicies on the
Bucket MRs.
Smana added a commit that referenced this pull request May 5, 2026
Phased TDD-shaped plan to deliver the design at
2026-05-05-ai-gateway-redesign-design.md (commit 060c02e). 5 phases,
each independently mergeable: P1 smoke (dedicated Envoy + qwen3-8b
route), P2 SR ext_proc via EnvoyExtensionPolicy with filter-ordering
verification (Lua fallback documented), P3 full fleet routing
(Service-backed), P4 InferencePool + EPP per claim (folds task #76),
P5 demolition (delete CEC + llm-router-proxy + GHCR workflow, repoint
Tailscale, close #78). Cross-cutting verification table + per-phase
rollback playbook + open items for implementation-time discovery.
Smana added a commit that referenced this pull request May 5, 2026
Bumps the InferenceService composition from 0.3.2 → 0.3.3 with two
additions to _defaultIngress so the post-P4 traffic flow lands cleanly
on vLLM pods without per-claim CNP overrides:

1. AI Gateway data plane → vLLM TCP 8000.
   The dedicated Envoy AI Gateway data plane (envoy-ai-gateway-system,
   gateway.envoyproxy.io/owning-gateway-name=ai-gateway) routes
   traffic to vLLM via InferencePool selection — EPP returns a pod
   IP, gateway connects directly. Without this allow, the gateway
   would silently drop on the first inference request.

2. Endpoint Picker Plugin (EPP) → vLLM TCP 8000.
   Each EPP scrapes /metrics for queue depth + KV-cache pressure to
   score endpoints. All 5 EPP pods (one per InferencePool) share the
   `inferencepool` pod-template label set by the upstream chart;
   matched via matchExpressions Exists scoped to the llm namespace.

Composition source URL bumped to 0.3.3-pr1434 (CI will publish via the
crossplane-modules.yml workflow on this PR's next run, per the kcl.mod
version-rewrite path).

Drops the per-claim networkPolicies overrides on all 5 model claims
(phi4-mini, qwen3-8b, llamaguard3-1b, qwen-coder, qwen-coder-fim).
Each was a verbatim copy of the composition's _defaultIngress, added
when an earlier composition version omitted the apps namespace label
on the OpenWebUI selector. The default now covers everything those
overrides did, plus the new AI Gateway + EPP sources. Closes task #76.
Smana added a commit that referenced this pull request May 9, 2026
…L composition, per-claim CNP)

Four bugs surfaced once OpenWebUI was wired into the AI gateway
end-to-end (commit 556e7e1b) — each individually small but the
combination was needed to make the chat UI return real completions
instead of timeouts and 4xx:

1. **CEC backend Service port collision.** SR exposes 50051
   (ext_proc gRPC) and 8080 (HTTP API). With
   `backendServices: [{name: vllm-semantic-router}]` Cilium pushed
   BOTH endpoints into the EDS cluster, so cilium-envoy would
   intermittently dial 8080 for ext_proc and get back HTTP/1.1
   instead of an ext_proc gRPC stream — surfaced as
   `upstream connect error or disconnect/reset before headers.
   reset reason: protocol error`. `failure_mode_allow: true` did
   not save the request: the ext_proc passthrough returned 403.
   Fix: pin the cluster to 50051 with `number: ["50051"]`. Pinned
   the four xplane-* clusters to 8000 too for symmetry / Hubble
   clarity.

2. **`llm-router` Service had no endpoints.** Selector targeted
   `app.kubernetes.io/name: vllm-semantic-router`, but the chart
   actually labels the SR pod `semantic-router`. CEC service-
   redirect itself doesn't need endpoints, but the egress-policy
   plane on the source side does — clients with restrictive egress
   CNPs (default-deny + `toEndpoints`) couldn't resolve a
   destination identity and Cilium's L7 filter returned plain
   `Access denied` 403. Fix: switch the selector to
   `semantic-router` so endpoints exist and the policy decision
   becomes deterministic.

3. **OpenWebUI egress was scoped to a single label.** With CEC
   service-redirect the actual upstream is whichever vLLM Service
   SR picks (any of 4 today, future-N tomorrow), not SR. Listing
   each label by hand re-couples the UI to the model fleet. Fix:
   widen to `entities: cluster` on TCP 80 / 8000 / 8080 —
   acceptable for a chat UI workload (restricted PSS, scoped IAM,
   `world` egress already capped at 443).

4. **`_defaultIngress` cross-namespace match in the
   InferenceService composition.** Cilium CNP
   `fromEndpoints.matchLabels` defaults to the policy's own
   namespace when no `io.kubernetes.pod.namespace` key is set.
   The default-deny + `xplane-openwebui` allow only matched pods
   in the llm namespace (where SR + promptfoo live). Adding the
   `apps` namespace label makes the rule actually match the real
   OpenWebUI pod and unblocks chat completions through the
   gateway.

Plus the workaround for the still-using-the-old-composition issue
(task #76): the four InferenceService claim manifests
(phi4-mini, qwen3-8b, deepseek-r1-distill-qwen3-8b, llamaguard3-1b)
get a `spec.networkPolicies.ingress` override that mirrors the new
KCL composition's `_defaultIngress`. Drop the per-claim blocks
once the next composition version is published.

Verified live with a UI prompt:
  POST /v1/chat/completions {"model":"auto","messages":[...]}
  -> HTTP 200, x-vsr-selected-model: xplane-phi4-mini,
     body: "2+2 equals 4."
Smana added a commit that referenced this pull request May 9, 2026
Three documentation artifacts shipped together (all from the same
2026-05-04 brainstorming session):

**docs/superpowers/specs/2026-05-04-coding-llm-fleet-design.md.**
Design doc for swapping DeepSeek-R1-Distill-Qwen-7B out for
Qwen2.5-Coder-7B-Instruct and adding Qwen2.5-Coder-1.5B-Base as
an always-warm FIM model for inline tab-complete. Covers fleet
specification, two-path routing (client-deterministic for OpenCode +
Continue, SR cascade for OpenWebUI MoM), GPU budget vs the 4-GPU
NodePool cap, ~$700-900/mo cost envelope, and the operational
concerns (Karpenter consolidation, ext_proc cold-connect, CNP
overrides).

**docs/superpowers/specs/2026-05-04-coding-llm-fleet-plan.md.**
Companion implementation plan with concrete ordered steps:
rename DeepSeek claim to Qwen2.5-Coder, add Qwen2.5-Coder-1.5B-Base
FIM claim, rewrite SR decisions[] to route code prompts to
xplane-qwen-coder, expose individual models in /v1/models, write
client config docs for OpenCode and Continue, smoke-test cascade vs
direct-pick, drop old DeepSeek weights from S3. Status table at the
top tracks which phases shipped pre-redeploy vs which are deferred
(smoke tests + S3 cleanup need a deployed cluster; composition
republish is task #76).

**clusters/mycluster-0-llm-platform/README.md.** Adds the new
fleet shape table (5 claims) + whole-cluster
`terramate script run --reverse destroy` procedure (one y/n
prompt, walks every stack in reverse) + explicit list of data
buckets preserved by the Orphan managementPolicies on the
Bucket MRs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant