Skip to content

Conversation

@Vishvajeet590
Copy link
Contributor

Description

This PR improves error visibility for model deployments by propagating detailed Kubernetes pod errors (such as OOMKilled, CrashLoopBackOff, ImagePullBackOff, etc.) to users. Previously, users only saw generic error messages like "predictor is not ready" or "CrashLoopBackOff" in the CaraML dashboard, making it difficult to diagnose deployment failures. With this change, users will see specific pod failure reasons, exit codes, and messages directly in the dashboard, enabling faster troubleshooting.

Modifications

  • Enhanced error handling in the deployment flow to include pod termination reason, exit code, and message in the error output.
  • Updated the deployment logic to propagate these detailed Kubernetes errors to the VersionEndpoint.Message field.
  • Ensured that the CaraML dashboard displays these detailed errors to users for any pod failure during deployment.

@Vishvajeet590 Vishvajeet590 added the enhancement New feature or request label Dec 15, 2025
@Vishvajeet590 Vishvajeet590 merged commit 8646e1f into main Dec 17, 2025
32 of 33 checks passed
@Vishvajeet590 Vishvajeet590 deleted the Propagate-Kubernetes-Error branch December 17, 2025 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants