Upgrade to Slurm 20.11.4 #2459

rexcsn · 2021-02-23T00:04:01Z

Add gpu_type to internal config to specify instance GPU model as GRES GPU Type in gres.conf
Updated integ test to submit test submitting job with specific model of GPU
Add test for checking slurm health check behavior for CLOUD nodes. Previously slurm will not perform node health check before ResumeTimeout expires. This is fixed in 20.11.4
Modify status check test to avoid slurm health check and use static capacity
Add additional test to make sure nodeaddr and nodehostname are reset on power_down. This should verify the behavior of relying on cloud_reg_addrs option

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

codecov · 2021-02-23T00:05:28Z

Codecov Report

Merging #2459 (e69cda7) into develop (0fdcbc4) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop    #2459      +/-   ##
===========================================
+ Coverage    62.07%   62.09%   +0.02%     
===========================================
  Files           40       40              
  Lines         6220     6224       +4     
===========================================
+ Hits          3861     3865       +4     
  Misses        2359     2359

Impacted Files	Coverage Δ
cli/src/pcluster/config/mappings.py	`100.00% <ø> (ø)`
cli/src/pcluster/config/json_param_types.py	`99.36% <100.00%> (+<0.01%)`	⬆️
cli/src/pcluster/utils.py	`67.79% <100.00%> (+0.15%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0fdcbc4...e69cda7. Read the comment docs.

demartinofra · 2021-02-24T08:54:27Z

CHANGELOG.md

 - Make `key_name` parameter optional to support cluster configurations without a key pair. 
 - Remove support for Python 3.4
 - Root volume size increased from 25GB to 35GB on all AMIs. Minimum root volume size is now 35GB.
+- Upgrade Slurm to version 20.11.4.


nit: can you move this up in the list of changes?

Also I would only report item #2 and #4 from the list below in this changelog. the other changes are not really meaningful for the user

demartinofra · 2021-02-24T08:58:04Z

cli/src/pcluster/config/json_param_types.py

            # Set gpus according to instance features
            gpus = instance_type_info.gpu_count()
            compute_resource_section.get_param("gpus").value = gpus
+            compute_resource_section.get_param("gpu_type").value = instance_type_info.gpu_type()


what if we just skip adding the gpu_type entry if there is no gpu rather than using the no_gpu_type?

The parameter will still be there in the Json because it is defined in mappings.py? Are you saying we should use an empty string "" instead of "no_gpu_type" as the default?

either empty "" or None so that it corresponds to a false value in Python. But feel free to leave it as is if you prefer.