Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement] check dns pod state as part of crc status operation #3852

Open
adrianriobo opened this issue Sep 29, 2023 · 3 comments
Open

[enhancement] check dns pod state as part of crc status operation #3852

adrianriobo opened this issue Sep 29, 2023 · 3 comments

Comments

@adrianriobo
Copy link
Contributor

This seems to be the root cause for #3851

When we initialize a microshift cluster we ensure the state of the cluster as running then we deploy and expose a service to test the connectivity, when we do it manually it works as expected, but when we run that through an automation it fails with:

time="2023-09-28T13:44:03Z" level=error msg="Post \"http://gateway/hosts/add\": dial tcp: lookup gateway on 10.43.0.10:53: read udp 10.42.0.6:35808->10.43.0.10:53: read: connection refused"

The code for crc status on microshift checks oc get node but does not take care about the state of the pods running within the cluster.

On this scenario we expected dns is working fine as crc status is running but it is not the case, so it may worthwhile to check the state of the dns to ensure the running state of the cluster.

@adrianriobo
Copy link
Contributor Author

After adding a check for dns on the e2e now route is always added as it is expected, but we end up having issues with the test service deployed, ending on CreateContainerError due to:

Warning  FailedCreatePodSandBox  34m                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_httpd-example-6bf9c787d7-d4mnd_testproj_f468e5fe-b4e9-45b6-b6f6-856426565406_0(6e1a579fdb65e06338b7e1e68f09866369562a4888142d363fb377c770b9bdf2): error adding pod testproj_httpd-example-6bf9c787d7-d4mnd to CNI network "ovn-kubernetes": plugin type="ovn-k8s-cni-overlay" name="ovn-kubernetes" failed (add): failed to send CNI request: Post "http://dummy/": dial unix /var/run/ovn-kubernetes/cni//ovn-cni-server.sock: connect: no such file or directory

So probably we will need to check the state for ovn as well

@gbraad
Copy link
Contributor

gbraad commented Apr 10, 2024

@adrianriobo is this still relevant?

@adrianriobo
Copy link
Contributor Author

adrianriobo commented Apr 10, 2024

It is relevant from a point of view of automation around it, on e2e we are trying to emulate a user experience and with the delays in between operations this is unlikely to happen (not having a healthy state cause we do not check).

But if we explore the CI use case and it would imply automation this should be in place.

Also as we do for OCP I guess it would be good to have the check #4009 at the start to ensure what we are delivering is something functional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Ready for review
Development

No branches or pull requests

3 participants