-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Installation of node-exporter often fails #483
Comments
@torumakabe - thanks for filing the issue . node exporter is installed thru vm images on aks nodes (not thru prometheus collector), and collector just scrapes it. can u pls tell us the vm image version ? i am also assuming these are aks nodes ? |
@vishiy Thank you for your comment. All nodes are in AKS. The node image is "AKSCBLMariner-V2gen2-202304.10.0". |
This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
@torumakabe apologies for the delay. i am following up with AKS folks on this. |
@torumakabe - can u please confirm if the the corresponding node pools in the cluster is/are not in |
@vishiy I have tried with the latest image (AKSCBLMariner-V2gen2-202304.20.0) several times. I failed three times out of five attempts. Still flaky. Despite the failure to deploy Node Exporter, provisioning for all node pools has been successful.
When the deployment of Node Exporter fails, it fails on all nodes. There is no partial success. The following are the /usr/local/bin directories for the nodes where installation was successful and the nodes where it failed. successful
failed
Is there any possible cause that you can think of? Thanks. |
This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
@torumakabe - Could you please send the below log files and also share AKS cluster id ? |
This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
This issue was closed because it has been stalled for 12 days with no activity. |
This issue is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
This issue was closed because it has been stalled for 12 days with no activity. |
@torumakabe - is this issue resolved now ? I remember you were following up with AKS . |
@vishiy Thanks for your concern I talked to the AKS team and found out why node exporter is not installed. node exporter is installed by AKS-Operator, which tries to install node exporter several times during cluster creation. But the priority is low. Therefore, if a high priority task is taking a long time, it will try to install it again after enough waiting time. The wait time can be up to 24 hours, but I have confirmed that, indeed, if I wait, it will install. I would like the wait time to be shorter, in other words, the retry interval to be shorter, but I am satisfied at this point that I have found the cause of the problem. |
ok thank you. i will close this issue. |
Installing node-exporter under the following conditions often results in failure.
monitor_metrics
ofazurerm_kubernetes_cluster
resourceThe success and failure rates are equal, with a 50-50 split. As a result, node-exporter is not installed on the nodes as follows.
Additionally, in cases where Cilium was enabled along with it, all installations failed. I informed you about the network plugin for your reference, although it is unclear whether it has any impact.
It should be noted that all ama-metrics-node-* DaemonSets are running, and metrics can be collected from kubelet and cAdvisor.
What probable causes can you think of? Any advice would be appreciated.
The text was updated successfully, but these errors were encountered: