Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nodes unable to get K8s info #105

Closed
bilalshaikh42 opened this issue Sep 24, 2021 · 8 comments
Closed

Nodes unable to get K8s info #105

bilalshaikh42 opened this issue Sep 24, 2021 · 8 comments

Comments

@bilalshaikh42
Copy link
Member

Hello,
I am unsure what change on our end caused this problem to start appearing but the DN and SN seem to be unable to query the K8s API to connect to each other. Here is an error we are getting:

INFO> k8s_update_dn_info
DEBUG> getting pods for namespace: dev
ERROR> Unexpected MaxRetryError exception in doHealthCheck: HTTPConnectionPool(host='localhost', port=80): Max retries exceeded with url: /api/v1/namespaces/dev/pods (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3c35c3d250>: Failed to establish a new connection: [Errno 111] Connection refused'))

We have the deployment running scoped to a particular namespace using a Role/RoleBinding instead of the ClusterRole and ClusterRole binding, but this was working just fine before. Not sure if this could be caused by any recent updates

@jreadey
Copy link
Member

jreadey commented Sep 24, 2021

Not sure - did you update your Kubernetes version?
This got reported a few days ago in the HDF Forum:

"dn and sn nodes were resolving to localhost. It might be related to this kubernetes-client issue 3 (kubernetes-client/python#1284). We unsuccessfully tried pinning against a couple of different kubernetes-client versions as described in that link. What did work was an explicit call to k8s_client.Configuration().get_default_copy() in util/k8sclient.py"

I'm looking at dropping the Kubernetes package and just making an http request to the kubernetes server as a simpler approach.

@bilalshaikh42
Copy link
Member Author

That does sound like the same issue. GKE might have gotten a security update recently.

Another cause might be that the version of HDF/hsds was not pinned earlier so the latest pull might have changed the version. I can try to pin the deployment to some of the recent releases and see if that makes a difference unless making that change to the k8sclient.py file is a change that can be made and released directly on this repo

@jreadey
Copy link
Member

jreadey commented Sep 27, 2021

I've replaced the kubernetes package code in master with http gets to the kubernetes api endpoint.
Please give it a try and see if that works.
You'll need to make a change to the deployment yaml to set the head_port to null. See: https://github.com/HDFGroup/hsds/blob/master/admin/kubernetes/k8s_deployment_aws.yml.
Idea is that if a head container is present, the deployment will work with each pod functioning independently (useful if say you only need read functionality). If the head port is null, the SN will gather all the pod ips and dispatch DN requests through all pods. In that case you should see the node count that hsinfo reports go up as you scale the number of pods.

@bilalshaikh42
Copy link
Member Author

bilalshaikh42 commented Sep 27, 2021 via email

@jreadey
Copy link
Member

jreadey commented Sep 27, 2021

I just put out an image with tag: v0.7.0beta7

@bilalshaikh42
Copy link
Member Author

The service is working, and the pods are able to get the ips successfully. I am getting the following warning

WARN> expected to find f:status key but got: dict_keys(['f:metadata', 'f:spec'])

@jreadey
Copy link
Member

jreadey commented Sep 27, 2021

Great!
You can ignore the warning. I pushed out an updated image (same tag) with the logging cleaned up.
It was quite a chore spelunking through the k8s metadata json, so I had a ton of log output initially.

@bilalshaikh42
Copy link
Member Author

Got it. Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants