- Ensure that all cluster nodes are up and running.
Verify that the inventory file is updated as mentioned in the inventory sample file.
- For Slurm, all the applicable inventory groups are
slurm_control_node
,slurm_node
, andlogin
. - For Kubernetes, all the applicable groups are
kube_control_plane
,kube_node
, andetcd
. - The centralized authentication server inventory group, that is
auth_server
, is common for both Slurm and Kubernetes.
- For Slurm, all the applicable inventory groups are
- Verify that all nodes are assigned a group. The inventory file is case-sensitive. Follow the format provided in the sample file link.
Note
* The inventory file accepts both IPs and FQDNs as long as they can be resolved by DNS. * In a multi-node setup, IP's cannot be listed as a control plane and a cluster node. That is, don't include the kube_control_plane IP address in the compute group. In a single node setup, the compute node and the kube_control_plane must be the same.
- Users should also ensure that all repos are available on the cluster nodes.
- If the cluster requires more than 10 kubernetes nodes, use a docker enterprise account to avoid docker pull limits.