Skip to content

Conversation

himani2411
Copy link
Contributor

@himani2411 himani2411 commented Sep 19, 2025

Description of changes

[Gb200] Support IMEX configuration to be local to a node

  • we remove /opt/parallelcluster/shared/nvidia-imex directory creation
  • We keep default path of /etc/nvidia-imex/nodes_config.cfg and /etc/nvidia-imex/config.cfg for IMEX configuration
  • We override /etc/nvidia-imex/nodes_config.cfg only if it is missing to avoid Imex start failures.
  • Update unit test

Tests

  • [Success] DLAMi AMI Build - AL2023
  • [Success] Normal Vanilla AMI Build - rocky9

References

  • Link to impacted open issues.
  • Link to related PRs in other packages (i.e. cookbook, node).
  • Link to documentation useful to understand the changes.

Checklist

  • Make sure you are pointing to the right branch.
  • If you're creating a patch for a branch other than develop add the branch name as prefix in the PR title (e.g. [release-3.6]).
  • Check all commits' messages are clear, describing what and why vs how.
  • Make sure to have added unit tests or integration tests to cover the new/modified code.
  • Check if documentation is impacted by this change.

Please review the guidelines for contributing and Pull Request Instructions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

end

action :create_configuration_files do
# We create or update IMEX configuration files if ParallelCluster is installing IMEX
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] this comment is correct, but it makes more sense to write it where create_configuration_files is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its Create or update because as per chef template the action create is https://docs.chef.io/resources/template/

(default) Create a file. If a file already exists (but does not match), update that file to match.

node_attributes 'dump node attributes'
end

action :create_configuration_files do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] What about calling any the action install_configuration_files? I think could be a good practice to call actions that are part of the install phase as install_SOMETHING and actions called in configuration phase configure_SOMETHING.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer keeping the action name as creation as the action of the template is create. Even though we use it in Install phase of our recipes, we are creating these configuration files.

* we remove /opt/parallelcluster/shared/nvidia-imex directory creation
* We keep default path of `/etc/nvidia-imex/nodes_config.cfg` and `/etc/nvidia-imex/config.cfg` for IMEX configuration
* We override `/etc/nvidia-imex/nodes_config.cfg` only if it is missing to avoid Imex start failures.
* Update unit test
@himani2411 himani2411 merged commit 2b959bc into aws:release-3.14 Sep 19, 2025
28 of 30 checks passed
himani2411 added a commit to himani2411/aws-parallelcluster-cookbook that referenced this pull request Oct 2, 2025
* we remove /opt/parallelcluster/shared/nvidia-imex directory creation
* We keep default path of `/etc/nvidia-imex/nodes_config.cfg` and `/etc/nvidia-imex/config.cfg` for IMEX configuration
* We override `/etc/nvidia-imex/nodes_config.cfg` only if it is missing to avoid Imex start failures.
* Update unit test

Co-authored-by: Himani Anil Deshpande <himanidp@amazon.com>
himani2411 added a commit that referenced this pull request Oct 2, 2025
* we remove /opt/parallelcluster/shared/nvidia-imex directory creation
* We keep default path of `/etc/nvidia-imex/nodes_config.cfg` and `/etc/nvidia-imex/config.cfg` for IMEX configuration
* We override `/etc/nvidia-imex/nodes_config.cfg` only if it is missing to avoid Imex start failures.
* Update unit test

Co-authored-by: Himani Anil Deshpande <himanidp@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants