-
Notifications
You must be signed in to change notification settings - Fork 120
feat: Enrich AKS node health tools #531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Enrich AKS node health tools #531
Conversation
nilo19
commented
Jun 17, 2025
- Add vmss_run_command tool for executing commands on VMSS instances
- Add get_node_resource_group and get_api_server_public_ip tools to AKS toolset
- Add comprehensive LLM workflow instructions for node-level troubleshooting
- Include prerequisite discovery pattern for VMSS operations
WalkthroughThis update enhances the AKS node health and core toolsets by adding new tools for retrieving VM Scale Set (VMSS) names, running commands on VMSS VM instances, obtaining the node resource group, and resolving the API server public IP. It also adds the prerequisite command Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant AKSToolset
participant AzureCLI
participant Kubeconfig
User->>AKSToolset: Start troubleshooting
AKSToolset->>AzureCLI: az account list (prerequisite)
AKSToolset->>AzureCLI: get_node_resource_group (for AKS cluster)
AKSToolset->>AzureCLI: list_vmss_names (in node resource group)
User->>AKSToolset: Provide VMSS and VM instance ID
AKSToolset->>AzureCLI: vmss_run_command (execute shell command on VMSS VM)
AKSToolset->>Kubeconfig: get_api_server_public_ip (resolve API server IP)
Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
holmes/plugins/toolsets/aks.yaml (2)
72-76
: Ensure consistency and error handling for node RG discovery
The newget_node_resource_group
tool mirrors the core RG lookup but focused onnodeResourceGroup
. Consider aligning flags/output with the existingget_cluster_resource_group
(e.g., include-o tsv
,--only-show-errors
) and adding a fallback or error if the query returns empty.
77-81
: Fix typo in user_description and normalize output
Theuser_description
reads “public AS cluster” instead of “public AKS cluster.” Also, pipingdig +short
may produce multiple lines; consider appending-o tsv
or filtering the result to ensure a single IP is returned.holmes/plugins/toolsets/aks-node-health.yaml (3)
7-7
: Add explicit login prerequisite
az account list
checks subscriptions but doesn’t guarantee you’re logged in. Consider addingaz login
(or documenting it) to surface authentication errors upfront.
10-42
: Clean up llm_instructions and clarify dependencies
The insertedllm_instructions
block is thorough, but:
- It contains trailing spaces on blank lines (YAML-lint errors).
- References
get_cluster_resource_group
which isn’t defined in this toolset (it lives inaks/core
).Remove trailing spaces, standardize indentation, and note the cross-toolset requirement or include an explicit call to the core toolset.
94-99
: Quote shell command and add error flags
Wrap{{ SHELL_COMMAND }}
in quotes to handle complex commands, and include--only-show-errors
on therun-command
invocation. E.g.:- az vmss run-command invoke --resource-group {{ NODE_RESOURCE_GROUP }} --name {{ VMSS_NAME }} --instance-id {{ VM_ID }} --command-id RunShellScript --scripts {{ SHELL_COMMAND }} + az vmss run-command invoke --resource-group {{ NODE_RESOURCE_GROUP }} --name {{ VMSS_NAME }} --instance-id {{ VM_ID }} --command-id RunShellScript --scripts "{{ SHELL_COMMAND }}" --only-show-errors
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
holmes/plugins/toolsets/aks-node-health.yaml
(2 hunks)holmes/plugins/toolsets/aks.yaml
(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
holmes/plugins/toolsets/aks-node-health.yaml
[error] 12-12: trailing spaces
(trailing-spaces)
[error] 14-14: trailing spaces
(trailing-spaces)
[error] 17-17: trailing spaces
(trailing-spaces)
[error] 22-22: trailing spaces
(trailing-spaces)
[error] 27-27: trailing spaces
(trailing-spaces)
[error] 30-30: trailing spaces
(trailing-spaces)
[error] 34-34: trailing spaces
(trailing-spaces)
[error] 37-37: trailing spaces
(trailing-spaces)
- Add vmss_run_command tool for executing commands on VMSS instances - Add get_node_resource_group and get_api_server_public_ip tools to AKS toolset - Add comprehensive LLM workflow instructions for node-level troubleshooting - Include prerequisite discovery pattern for VMSS operations
a2e29b0
to
59e3648
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @nilo19
nice work!