Skip to content

(3.1.1) Issue with clusters in isolated networks

Francesco De Martino edited this page Feb 17, 2022 · 2 revisions

The issue

ParallelCluster 3.1.1 launched support for clusters in subnets with no internet access. The deploy a cluster in this mode the user has to configure the following settings:

...
Scheduling:
  ...
  SlurmSettings:
    Dns:
      DisableManagedDns: true
      UseEc2Hostnames: true

In this scenario a Slurm prolog script is configured to manage the update of the /etc/hosts file on compute nodes. The script is responsible to add entries for all hosts involved in the distributed job.

In case multiple parallel jobs are sharing the same node, due to a bug in the prolog script, as soon as the second job is allocated, the prolog fails and this causes the nodes to enter a DRAIN state. In this state no further job will be scheduled on those nodes, but the currently running jobs will keep running. In case the running job is not using the full capacity of the node the remaining idle capacity will remain unusable. Once the running job completes the nodes are replaced with new ones.

Affected versions

The only affected version is ParallelCluster 3.1.1

The workaround

You can fix the issue by executing the script below as root on the head node of the cluster:

#/bin/bash
set -e

cat << 'EOF' > /opt/slurm/etc/pcluster/prolog
#!/bin/bash
#
# Cookbook Name:: aws-parallelcluster
#
# Copyright 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance with the
# License. A copy of the License is located at
#
# http://aws.amazon.com/apache2.0/
#
# or in the "LICENSE.txt" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions and
# limitations under the License.
#
# This script writes content with the following format to /etc/hosts file
# #HOSTS_JOB_<JOBID>
# 192.168.1.2 queue-0-st-compute-resource-0-1 ip-192-168-1-2
# 192.168.1.10 queue-0-st-compute-resource-0-2 ip-192-168-1-10
# #END_JOB_<JOBID>
# This content contains the DNS information for all nodes the job <JOBID> that is allocated to.
# The nodes information of newer job is always inserted before nodes information of older job
# to ensure the latest DNS information is used.
#
# SlURM_NODE_ALIASES and SLURM_JOB_ID are an env vars provided by Slurm,

LOG_FILE_PATH="/var/log/parallelcluster/slurm_prolog_epilog.log"

_log() {
    text=$1
    level="${2:-INFO}" # If the second argument is not provided, "INFO" is the default log level
    log_time=$(date "+%Y-%m-%d %H:%M:%S")
    echo "${log_time} - ${level} - Job ${SLURM_JOB_ID} - ${text}" >> "${LOG_FILE_PATH}"
}

_log "Adding nodes to /etc/hosts"
# SLURM_NODE_ALIASES has value like "queue-0-dy-compute-resource-0-1:[192.168.1.2]:ip-192-168-1-2"
# The following line transforms this line to "192.168.1.2 queue-0-dy-compute-resource ip-192-168-1-2"
hosts=$(echo -n "${SLURM_NODE_ALIASES}" | awk 'BEGIN{RS=","; FS=":";ORS="\\n"}; {gsub(/\[|\]/,"",$2); print $2,$1,$3}' )
lines='#HOSTS_JOB_'"${SLURM_JOB_ID}\n${hosts}"'#END_JOB_'"${SLURM_JOB_ID}"
if grep -q '^#HOSTS_JOB_.*' /etc/hosts
then
    # If there is other nodes information in the file, the newest nodes information is inserted before the older ones.
    if ! sed_output=$(sed -i '0,/^#HOSTS_JOB_.*/s//'"${lines}\n&/" /etc/hosts 2>&1); then
        # If the sed command errored, log the stdout and stderr. Note that when executing the command, the stderr is redirected to stdout
        _log "Failed to add nodes: ${sed_output}" "ERROR"
        exit 1
    fi
else
    # If there is no other nodes information in the file, the nodes information is appended to the file.
    echo -e "${lines}" >> /etc/hosts
fi
_log "Finished adding nodes to /etc/hosts"
exit 0

EOF

The replacement can be automated with a custom bootstrap action configured to run the script above on the head node:

HeadNode:
  [...]
  CustomActions:
    OnNodeConfigured:
      # Script URL. This is run after all the bootstrap scripts are run
      Script: s3://bucket-name/on-node-configured.sh  # s3:// | https://
Clone this wiki locally