Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clusterapi: HasInstance function tries to find machine with wrong key #6774

Open
mweibel opened this issue Apr 29, 2024 · 1 comment · May be fixed by #6776
Open

clusterapi: HasInstance function tries to find machine with wrong key #6774

mweibel opened this issue Apr 29, 2024 · 1 comment · May be fixed by #6776
Labels
area/provider/cluster-api Issues or PRs related to Cluster API provider kind/bug Categorizes issue or PR as related to a bug.

Comments

@mweibel
Copy link
Contributor

mweibel commented Apr 29, 2024

Which component are you using?:
cluster-autoscaler

What version of the component are you using?:
latest master

What k8s version are you using (kubectl version)?:
1.28.5

What environment is this in?:
clusterapi provider running in an CAPZ cluster

What did you expect to happen?:
HasInstance method is able to find Machine which exists and does not report machine not found for node: X

What happened instead?:
Lookup of machine is using a wrong key.

The store contains items using {{cluster-namespace}}/{{machine-name}} but the lookup uses only {{machine-name}} as the key.

How to reproduce it (as minimally and precisely as possible):

  1. Setup a CAPZ
  2. Start cluster-autoscaler
  3. Set a breakpoint in clusterapi_controller.go#findMachine()
  4. Go into store find and compare the lookup key with the existing keys in the cache/informer

Anything else we need to know?:
Autoscaler was run in debug mode using following flags:

--cloud-provider=clusterapi 
--namespace=kube-system 
--node-group-auto-discovery=clusterapi:clusterName={{clusterName}},namespace={{managementClusterNamespace}}
--kubeconfig=~/.kube/{{clusterName}}.yaml 
--cloud-config=~/.kube/{{management-cluster}} 
--ignore-daemonsets-utilization=true 
--logtostderr=true 
--max-node-provision-time=0m30s 
--scale-down-delay-after-add=2m
--scale-down-unneeded-time=2m
--scan-interval=10s
--stderrthreshold=info
--v=4
--leader-elect=false

I'm not sure if there are specific flags which trigger the store keys being different. I'm going to look this up.
Ping @MaxFedotov - the HasInstance method was introduced in #6708 and maybe we could figure out what is different in our envs?

@mweibel mweibel added the kind/bug Categorizes issue or PR as related to a bug. label Apr 29, 2024
mweibel added a commit to helio/autoscaler that referenced this issue Apr 29, 2024
@mweibel mweibel linked a pull request Apr 29, 2024 that will close this issue
@elmiko
Copy link
Contributor

elmiko commented May 13, 2024

/area provider/cluster-api

@k8s-ci-robot k8s-ci-robot added the area/provider/cluster-api Issues or PRs related to Cluster API provider label May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/cluster-api Issues or PRs related to Cluster API provider kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants