Skip to content

Latest commit

 

History

History
106 lines (57 loc) · 5.55 KB

File metadata and controls

106 lines (57 loc) · 5.55 KB

Before you run the provision tool

  • (Recommended) Run prereq.sh to get the system ready to deploy Omnia. Alternatively, ensure that Ansible 2.12.10 and Python 3.8 are installed on the system. SELinux should also be disabled.
  • Set the IP address of the control plane with a /16 subnet mask. The control plane NIC connected to remote servers (through the switch) should be configured with two IPs (BMC IP and admin IP) in a shared LOM or hybrid set up. In the case dedicated network topology, a single IP (admin IP) is required.

Control plane NIC IP configuration in a LOM setup

Control plane NIC IP configuration in a LOM setup

Control plane NIC IP configuration in a dedicated setup

Control plane NIC IP configuration in a dedicated setup
  • Set the hostname of the control plane using the hostname. domain name format.

    For example, controlplane.omnia.test is acceptable.

Note

The domain name specified for the control plane should be the same as the one specified under domain_name in input/provision_config.yml.

  • To provision the bare metal servers, download one of the following ISOs for deployment:

    1. Rocky 8
    2. RHEL 8.x

Caution

THE ROCKY LINUX OS VERSION ON THE CLUSTER WILL BE UPGRADED TO THE LATEST 8.x VERSION AVAILABLE IRRESPECTIVE OF THE PROVISION_OS_VERSION PROVIDED IN PROVISION_CONFIG.YML.

Note

Ensure the ISO provided has downloaded seamlessly (No corruption). Verify the SHA checksum/ download size of the ISO file before provisioning to avoid future failures.

Note the compatibility between cluster OS and control plane OS below:

Control Plane OS Compute Node OS Compatibility
RHEL1 RHEL Yes
RHEL2 Rocky Yes
Rocky Rocky Yes
  • To set up CUDA and OFED using the provisioning tool, download the required repositories from here:

    1. CUDA
    2. OFED
  • To dictate IP address/MAC mapping, a host mapping file can be provided. Use the pxe_mapping_file.csv to create your own mapping file.
  • Ensure that all connection names under the network manager match their corresponding device names. :

    nmcli connection

In the event of a mismatch, edit the file /etc/sysconfig/network-scripts/ifcfg-<nic name> using vi editor.

  • When discovering nodes via snmpwalk or a mapping file, all target nodes should be set up in PXE mode before running the playbook.
  • Nodes provisioned using the Omnia provision tool do not require a RedHat subscription to run provision.yml on RHEL target nodes.
  • For RHEL target nodes not provisioned by Omnia, ensure that RedHat subscription is enabled on all target nodes. Every target node will require a RedHat subscription.
  • Users should also ensure that all repos (AppStream, BaseOS and CRB) are available on the RHEL control plane.
  • Uninstall epel-release if installed on the control plane as Omnia configures epel-release on the control plane. To uninstall epel-release, use the following commands: :

    dnf remove epel-release -y

Note

To enable the repositories, run the following commands: :

subscription-manager repos --enable=codeready-builder-for-rhel-8-x86_64-rpms subscription-manager repos --enable=rhel-8-for-x86_64-appstream-rpms subscription-manager repos --enable=rhel-8-for-x86_64-baseos-rpms

Verify your changes by running: :

yum repolist enabled
  • Ensure that the pxe_nic and public_nic are in the firewalld zone: public.

Note

  • After configuration and installation of the cluster, changing the control plane is not supported. If you need to change the control plane, you must redeploy the entire cluster.
  • If there are errors while executing any of the Ansible playbook commands, then re-run the playbook.
  • For servers with an existing OS being discovered via BMC, ensure that the first PXE device on target nodes should be the designated active NIC for PXE booting.

  1. Ensure that control planes running RHEL have an active subscription or are configured to access local repositories. The following repositories should be enabled on the control plane: AppStream, Code Ready Builder (CRB), BaseOS. For RHEL control planes running 8.5 and below, ensure that sshpass is additionally available to install or download to the control plane (from any local repository).

  2. Ensure that control planes running RHEL have an active subscription or are configured to access local repositories. The following repositories should be enabled on the control plane: AppStream, Code Ready Builder (CRB), BaseOS. For RHEL control planes running 8.5 and below, ensure that sshpass is additionally available to install or download to the control plane (from any local repository).