You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Platform I'm building on:
BOTTLEROCKET_x86_64_NVIDIA w/ p2.xlarge through AWS console
Kubernetes 1.29
What I expected to happen:
When creating node group in AWS EKS cluster console under compute tab for BOTTLEROCKET_x86_64_NVIDIA w/ p2.xlarge, when done creating, should join the EKS cluster. I have not changed anything and the rest of the OS images like AL2 with GPU join the cluster fine.
What actually happened:
Got the message : NodeCreationFailure | Instances failed to join the kubernetes cluster
How to reproduce the problem:
I am doing this through the AWS console under the EKS cluster compute tab. I am using this security group role for the nodes
These are my policies for the cluster node group role
I also added IAM permissions for my user that includes ssm:, ssmmessages:, and ec2messages:*
These are my policies for the cluster role itself
I am not sure what I am missing? I am using a VPC with public/private subnets and everything was working fine until I tried using bottlerocket.
The text was updated successfully, but these errors were encountered:
Hi @asluborski! Thanks for reaching out with this issue.
Have you referred to our Quickstart guide when launching nodes for EKS?
What user-data are you passing to the Bottlerocket instance when you try to launch it?
You can check the console logs via the Get System Logs option under Actions -> Monitor and Troubleshoot from the Instance. This will help figure out if the node is booting fully. If not, it can help diagnose this further.
I was able to find the issue. The conflict was with kubernetes version I was using for the cluster and the nvidia driver installing for the p2 instances. P2 instances require the legacy nvidia drivers 470.x, it was installing newer nvidia drivers(500.x), hence failing. I had to downgrade to Kubernetes 1.23 for bottlerocket to install legacy nvidia drivers.
Platform I'm building on:
BOTTLEROCKET_x86_64_NVIDIA w/ p2.xlarge through AWS console
Kubernetes 1.29
What I expected to happen:
When creating node group in AWS EKS cluster console under compute tab for BOTTLEROCKET_x86_64_NVIDIA w/ p2.xlarge, when done creating, should join the EKS cluster. I have not changed anything and the rest of the OS images like AL2 with GPU join the cluster fine.
What actually happened:
Got the message :
NodeCreationFailure | Instances failed to join the kubernetes cluster
How to reproduce the problem:
I am doing this through the AWS console under the EKS cluster compute tab. I am using this security group role for the nodes
These are my policies for the cluster node group role
I also added IAM permissions for my user that includes ssm:, ssmmessages:, and ec2messages:*
These are my policies for the cluster role itself
I am not sure what I am missing? I am using a VPC with public/private subnets and everything was working fine until I tried using bottlerocket.
The text was updated successfully, but these errors were encountered: