Skip to content

fix(bootstrap): surface Helm install failure on namespace timeout (#211)#462

Closed
Manoj-engineer wants to merge 1 commit intoNVIDIA:mainfrom
Manoj-engineer:fix/helm-error-diagnosis-211
Closed

fix(bootstrap): surface Helm install failure on namespace timeout (#211)#462
Manoj-engineer wants to merge 1 commit intoNVIDIA:mainfrom
Manoj-engineer:fix/helm-error-diagnosis-211

Conversation

@Manoj-engineer
Copy link

Vouched at: #420

Summary

when gateway start times out waiting for the openshell namespace, the error
message now checks for failed helm-install-* jobs in kube-system and surfaces
the actual Helm error and last 30 log lines instead of the generic "namespace not ready" message.

Related Issue

Fixes #211

Changes

  • Add diagnose_helm_failure() in openshell-bootstrap/src/lib.rs that queries
    helm-install-* jobs in kube-system for failed pods and returns job conditions
    • last 30 log lines
  • Wire into wait_for_namespace() final timeout branch
  • Fix awk filter: status.failed stays <none> during backoff retry window;
    filter on != "0" instead of != "<none>" && != "0" to catch actively-failing jobs
  • Add unit test helm_failure_hint_is_included_in_namespace_timeout_message

Testing

  • mise run pre-commit passes
  • Unit tests added/updated — 78/78 pass
  • E2E tests added/updated (not applicable — error path only)
  • Live-tested end-to-end: built a gateway image with corrupted serviceaccount.yaml,
    confirmed the Helm error appears in the terminal output on timeout

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (not applicable)

…IDIA#211)

Signed-off-by: Manoj-engineer <194872717+Manoj-engineer@users.noreply.github.com>
@Manoj-engineer Manoj-engineer requested a review from a team as a code owner March 18, 2026 23:03
@github-actions
Copy link

Thank you for your interest in contributing to OpenShell, @Manoj-engineer.

This project uses a vouch system for first-time contributors. Before submitting a pull request, you need to be vouched by a maintainer.

To get vouched:

  1. Open a Vouch Request discussion.
  2. Describe what you want to change and why.
  3. Write in your own words — do not have an AI generate the request.
  4. A maintainer will comment /vouch if approved.
  5. Once vouched, open a new PR (preferred) or reopen this one after a few minutes.

See CONTRIBUTING.md for details.

@github-actions github-actions bot closed this Mar 18, 2026
@github-actions
Copy link

github-actions bot commented Mar 18, 2026

All contributors have signed the DCO ✍️ ✅
Posted by the DCO Assistant Lite bot.

@Manoj-engineer
Copy link
Author

I have read the DCO document and I hereby sign the DCO.

@Manoj-engineer
Copy link
Author

recheck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve error message when Helm chart has malformed YAML

1 participant