Skip to content

Conversation

zouyee
Copy link
Contributor

@zouyee zouyee commented Jun 2, 2020

closes #955


checklist:

  • run make test and make lint
  • test manually (i.e. build/push all images, restart operator, and re-deploy APIs)
  • update examples
  • update docs and add any new files to summary.md (view in gitbook after merging)
  • cherry-pick into release branches if applicable
  • alert the dev team if the dev environment changed

@deliahu
Copy link
Member

deliahu commented Jun 3, 2020

@zouyee thanks for looking into this!

Since we don't require the CLI user to have kubectl configured, it would be better to implement this in the Cortex operator.

A better approach would be to add additional status code(s) in the Cortex operator to capture additional error cases. All of the status logic is implemented in operator/status.go and GetPodStatus() in k8s/pod.go. It should be possible to add new statuses here, add them to SubReplicaCounts, and then update addPodToReplicaCounts() and getStatusCode() accordingly.

Does that make sense? Let us know if you have any questions!

@zouyee
Copy link
Contributor Author

zouyee commented Jun 11, 2020

According to the design concept of k8s, it is not recommended to summarize the status when pull image or oom into the state, but the reason why the pod presents the pending state

@deliahu
Copy link
Member

deliahu commented Jun 16, 2020

I see what you're saying... however there is a nuance here, which is that the status we show in cortex get is a combination of the statuses of all the API replicas. This is computed in getStatusCode(), and is why we took the approach of wrapping the k8s statuses (since we eventually have to end up with a single descriptive status for the entire k8s deployment). Does that make sense?

@deliahu
Copy link
Member

deliahu commented Jun 26, 2020

@zouyee I have incorporated your change into #1167 (and 239316a), so I will go ahead and close this one. I mostly just moved the logic you added into the operator, and added a new API status for it. Thanks for your contribution!

@deliahu deliahu closed this Jun 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Surface pod creation errors
2 participants