Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check status of all the core pods for microshift #4009

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions pkg/crc/cluster/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -510,3 +510,32 @@ func DeleteMCOLeaderLease(ctx context.Context, ocConfig oc.Config) error {
_, _, err := ocConfig.RunOcCommand("delete", "-A", "lease", "--all")
return err
}

func CheckCorePodsRunning(ctx context.Context, ocConfig oc.Config) error {
praveenkumar marked this conversation as resolved.
Show resolved Hide resolved
if err := WaitForOpenshiftResource(ctx, ocConfig, "pod"); err != nil {
return err
}
coreNameSpaces := []string{"kube-system", "openshift-dns", "openshift-ingress", "openshift-ovn-kubernetes", "openshift-service-ca"}
waitForPods := func() error {
for _, namespace := range coreNameSpaces {
if !podRunningForNamespace(ocConfig, namespace) {
logging.Debugf("Pods in %s namespace are not running", namespace)
return &errors.RetriableError{Err: fmt.Errorf("pods in %s namespace are not running", namespace)}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, this is a bit wasteful as we'll try again and again the same namespaces even if we already found running pods. Maybe this can be done with a map? map keys are namespaces, iterate over the keys. When there are running pods in the namespace, remove it from the map?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it has bit of benefit in case some pod goes to reconciliation state (like in one iteration it is running but in second it is in pending state.) It is not full proof solution (with k8s context it is never going to be) but should be good for initial feedback if core pods are running.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the OpenShift case, once the oc get co check succeeds once, we retry 2 more times and we only decide the cluster is good when the oc get co check succeeds 3 times in a row. If you want to handle " in one iteration it is running but in second it is in pending state" it would be nice to have a consistent approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the OpenShift case, once the oc get co check succeeds once, we retry 2 more times and we only decide the cluster is good when the oc get co check succeeds 3 times in a row. If you want to handle " in one iteration it is running but in second it is in pending state" it would be nice to have a consistent approach.

yes in case of openshift we can iterate over all the clusteroperator at once because those are not namespace specific resource. Here we are not able to have a single call which provide use all the pods status in core namespaces otherwise I would've use same logic. So now we iterate over namespace by namespace and check the pods status.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not questioning the way the iterations are done, I was reacting to

it has bit of benefit in case some pod goes to reconciliation state (like in one iteration it is running but in second it is in pending state.)

For an OpenShift cluster, we roughly do iterate over a isClusterReady() function until it returns true. Once it returns true, we still run it 2 times in case the cluster was ready, but in a transient/conciliation state.
If reconciliation is something you want to try to handle better, I would use the same approach as for OpenShift for consistency, cluster is not ready before isClusterReady() succeeded 3 times in a row.

}
return nil
}
return errors.Retry(ctx, 2*time.Minute, waitForPods, 2*time.Second)
}

func podRunningForNamespace(ocConfig oc.Config, namespace string) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allPodsRunning(ocConfig oc.Config, namespace string) bool or checkAllPodsRunning is more descriptive/accurate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be in namespace context so checkAllPodsRunningInNamespace or allPodsRunningForNamespace ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a namespace argument, we don't have non-namespace function this could be confused with, so I don't think it's really useful to mention Namespace in the function name. It's more something for an api doc comment if you think it's important to inform API users that it will only iterate over a single namespace.

stdout, stderr, err := ocConfig.WithFailFast().RunOcCommand("get", "pods", "-n", namespace, "--field-selector=status.phase!=Running")
if err != nil {
logging.Debugf("Failed to get pods in %s namespace, stderr: %s", namespace, stderr)
return false
}
if len(stdout) != 0 {
return false
}
return true
}
5 changes: 4 additions & 1 deletion pkg/crc/machine/start.go
Original file line number Diff line number Diff line change
Expand Up @@ -1064,7 +1064,10 @@ func startMicroshift(ctx context.Context, sshRunner *crcssh.Runner, ocConfig oc.
return err
}

return cluster.WaitForAPIServer(ctx, ocConfig)
if err := cluster.WaitForAPIServer(ctx, ocConfig); err != nil {
return err
}
return cluster.CheckCorePodsRunning(ctx, ocConfig)
}

func ensurePullSecretPresentInVM(sshRunner *crcssh.Runner, pullSec cluster.PullSecretLoader) error {
Expand Down
Loading