Skip to content

AWS ParallelCluster v3.7.0

Compare
Choose a tag to compare
@dreambeyondorange dreambeyondorange released this 30 Aug 12:11
· 21 commits to release-3.7 since this release
9917f01

We're excited to announce the release of AWS ParallelCluster Cookbook 3.7.0

This is associated with AWS ParallelCluster v3.7.0

ENHANCEMENTS

  • Add support for Ubuntu 22. RSA keys are not supported by default. See this page.
  • Add support for login nodes.
  • Add support to mount existing Amazon File Cache as shared storage.
  • Allow configuration of static and dynamic node priorities in Slurm compute resources via the ParallelCluster configuration YAML file.
  • Add a queue-level parameter (JobExclusiveAllocation) to ensure nodes in the partition are exclusively allocated to a single job at any given time.
  • Allow overriding the aws-parallelcluster-node package at cluster creation and update time (only on the head node during update). Useful for development purposes only.
  • Allow memory-based scheduling when multiple instance types are specified for a Slurm Compute Resource.
  • Avoid starting the NFS server on compute nodes.
  • Forward SLURM_RESUME_FILE to ParallelCluster resume program.

CHANGES

  • Deprecate Ubuntu 18.
  • Upgrade Slurm to version 23.02.4.
  • Update the default root volume size to 40 GB to account for limits on Centos 7.
  • Upgrade NVIDIA driver to version 535.54.03.
  • Upgrade CUDA library to version 12.2.0.
  • Upgrade NVIDIA Fabric manager to nvidia-fabricmanager-535.
  • Upgrade NICE DCV to version 2023.0-15487.
    • server: 2023.0.15487-1
    • xdcv: 2023.0.551-1
    • gl: 2023.0.1039-1
    • web_viewer: 2023.0.15487-1
  • Upgrade EFA installer to 1.25.1.
    • Efa-driver: efa-2.5.0-1
    • Efa-config: efa-config-1.15-1
    • Efa-profile: efa-profile-1.5-1
    • Libfabric-aws: libfabric-aws-1.18.1-1
    • Rdma-core: rdma-core-46.0-1
    • Open MPI: openmpi40-aws-4.1.5-4
  • Upgrade ARM PL to version 23.04.1 for Ubuntu 22.04 only.
  • Upgrade third-party cookbook dependencies:
    • apt-7.5.14 (from apt-7.4.0)
    • line-4.5.13 (from line-4.5.2)
    • openssh-2.11.3 (from openssh-2.10.3)
    • pyenv-4.2.3 (from pyenv-3.5.1)
    • selinux-6.1.12 (from selinux-6.0.5)
    • yum-7.4.13 (from yum-7.4.0)
    • yum-epel-5.0.2 (from yum-epel-4.5.0)
  • Assign Slurm dynamic nodes a priority (weight) of 1000 by default. This allows Slurm to prioritize idle static nodes over idle dynamic ones.
  • Change the default value of Imds/ImdsSupport from v1.0 to v2.0.
  • Make aws-parallelcluster-node daemons handle only ParallelCluster-managed Slurm partitions.
  • Restrict permission on file /tmp/wait_condition_handle.txt within the head node so that only root can read it.
  • Create a Slurm partition-nodelist mapping JSON file to be used by the node package daemons to recognize PC-managed Slurm partitions and nodelists.
  • Increase EFS-utils watchdog poll interval to 10 seconds. Note: This change is meaningful only if EncryptionInTransit is set to true, because watchdog does not run otherwise.

BUG FIXES

  • Add validation to ScaledownIdletime value, to prevent setting a value lower than -1.
  • Fix issue causing dangling IAM policies to be created when creating ParallelCluster CloudFormation custom resource provider with CustomLambdaRole.
  • Fix an issue that was causing misalignment of compute nodes DNS name on instances with multiple network interfaces,
    when using SlurmSettings/Dns/UseEc2Hostnames equals to True.
  • Fix cluster creation failure with Ubuntu Deep Learning AMI on GPU instances and DCV enabled.