-
Notifications
You must be signed in to change notification settings - Fork 107
[Gb200] Support IMEX configuration to be local to a node #3029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
end | ||
|
||
action :create_configuration_files do | ||
# We create or update IMEX configuration files if ParallelCluster is installing IMEX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] this comment is correct, but it makes more sense to write it where create_configuration_files
is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its Create or update because as per chef template the action create is https://docs.chef.io/resources/template/
(default) Create a file. If a file already exists (but does not match), update that file to match.
node_attributes 'dump node attributes' | ||
end | ||
|
||
action :create_configuration_files do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] What about calling any the action install_configuration_files? I think could be a good practice to call actions that are part of the install phase as install_SOMETHING and actions called in configuration phase configure_SOMETHING.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer keeping the action name as creation
as the action of the template is create. Even though we use it in Install phase of our recipes, we are creating these configuration files.
1cf4cc4
to
bda67af
Compare
* we remove /opt/parallelcluster/shared/nvidia-imex directory creation * We keep default path of `/etc/nvidia-imex/nodes_config.cfg` and `/etc/nvidia-imex/config.cfg` for IMEX configuration * We override `/etc/nvidia-imex/nodes_config.cfg` only if it is missing to avoid Imex start failures. * Update unit test
bda67af
to
831fdae
Compare
* we remove /opt/parallelcluster/shared/nvidia-imex directory creation * We keep default path of `/etc/nvidia-imex/nodes_config.cfg` and `/etc/nvidia-imex/config.cfg` for IMEX configuration * We override `/etc/nvidia-imex/nodes_config.cfg` only if it is missing to avoid Imex start failures. * Update unit test Co-authored-by: Himani Anil Deshpande <himanidp@amazon.com>
* we remove /opt/parallelcluster/shared/nvidia-imex directory creation * We keep default path of `/etc/nvidia-imex/nodes_config.cfg` and `/etc/nvidia-imex/config.cfg` for IMEX configuration * We override `/etc/nvidia-imex/nodes_config.cfg` only if it is missing to avoid Imex start failures. * Update unit test Co-authored-by: Himani Anil Deshpande <himanidp@amazon.com>
Description of changes
[Gb200] Support IMEX configuration to be local to a node
/etc/nvidia-imex/nodes_config.cfg
and/etc/nvidia-imex/config.cfg
for IMEX configuration/etc/nvidia-imex/nodes_config.cfg
only if it is missing to avoid Imex start failures.Tests
sbatch --segment=2 -N 2 /opt/parallelcluster/shared/nvidia-imex-status.job
https://github.com/aws/aws-parallelcluster/pull/7013/filessbatch --segment=2 -N 2 /opt/parallelcluster/shared/nvidia-imex-status.job
https://github.com/aws/aws-parallelcluster/pull/7013/filesReferences
Checklist
develop
add the branch name as prefix in the PR title (e.g.[release-3.6]
).Please review the guidelines for contributing and Pull Request Instructions.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.