Skip to content

Conversation

@hanwen-pcluste
Copy link
Contributor

@hanwen-pcluste hanwen-pcluste commented Aug 25, 2022

Description of changes

Add ODCR configuration under queue and compute resource levels: See commits descriptions for details

To Do
In a separate PR, we will create a dedicated integration test to verify target ODCR

Tests

A unit test is created for the new validator. Integration test (test_efa) is changed to use ODCR from configuration file. Manually verified the run_instances override has higher priority than the parameters in cluster config

Checklist

  • Make sure you are pointing to the right branch and add a label in the PR title (i.e. 2.x vs 3.x)
  • Check all commits' messages are clear, describing what and why vs how.
  • Make sure to have added unit tests or integration tests to cover the new/modified code.
  • Check if documentation is impacted by this change.

Please review the guidelines for contributing and Pull Request Instructions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@hanwen-pcluste hanwen-pcluste requested a review from a team as a code owner August 25, 2022 20:32
@hanwen-pcluste hanwen-pcluste force-pushed the release-3.2origin branch 2 times, most recently from 06633b1 to bbd702b Compare August 25, 2022 21:06
@codecov
Copy link

codecov bot commented Aug 25, 2022

Codecov Report

Merging #4295 (d4888ee) into develop (62c8286) will increase coverage by 0.07%.
The diff coverage is 96.74%.

@@             Coverage Diff             @@
##           develop    #4295      +/-   ##
===========================================
+ Coverage    88.29%   88.37%   +0.07%     
===========================================
  Files          158      159       +1     
  Lines        13349    13472     +123     
===========================================
+ Hits         11787    11906     +119     
- Misses        1562     1566       +4     
Impacted Files Coverage Δ
cli/src/pcluster/templates/cdk_builder_utils.py 87.08% <71.42%> (-0.55%) ⬇️
cli/src/pcluster/aws/aws_api.py 91.26% <85.71%> (-0.33%) ⬇️
cli/src/pcluster/schemas/cluster_schema.py 98.78% <91.66%> (-0.11%) ⬇️
cli/src/pcluster/aws/ec2.py 79.36% <100.00%> (+1.39%) ⬆️
cli/src/pcluster/aws/resource_groups.py 100.00% <100.00%> (ø)
cli/src/pcluster/config/cluster_config.py 95.33% <100.00%> (+0.17%) ⬆️
cli/src/pcluster/templates/cluster_stack.py 97.36% <100.00%> (+0.02%) ⬆️
cli/src/pcluster/validators/ec2_validators.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

instance_type=compute_resource.instance_type,
instance_type_data=instance_types_data[compute_resource.instance_type],
)
# The validation below has to be in cluster config class instead of queue class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can line 2382 and these two validators to be added under CommonSchedulerClusterConfig? The reason is that from the schema, it seems that it also support for scheduler plugin schema

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which validators? Sorry, I changed the line number with some rebase

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CapacityReservationValidator and CapacityReservationResourceGroupValidator.
I think the field capacity_reservation_target is both for slurm and scheduler plugin in the schema.
Why are the validators added for only class SlurmClusterConfig instaed of its parent class?
class SlurmClusterConfig(CommonSchedulerClusterConfig)

Copy link
Contributor Author

@hanwen-pcluste hanwen-pcluste Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with chenwany, ODCR should be supported by both Slurm and Scheduler Plugin. I made relevant changes. Thank you!

@hanwen-pcluste hanwen-pcluste changed the base branch from release-3.2 to develop September 7, 2022 15:01
@hanwen-pcluste hanwen-pcluste requested a review from a team as a code owner September 7, 2022 15:01
resources = self._client.list_group_resources(Group=group)["Resources"]
for resource in resources:
if resource["Identifier"]["ResourceType"] == "AWS::EC2::CapacityReservation":
capacity_reservation_ids.append(re.match("(.*)(cr-.*)", resource["Identifier"]["ResourceArn"]).group(2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to match exactly so there is only one result?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be good to clarify what the matches are using the name syntax ?P<match_name>

>>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
>>> m.group('first_name')
'Malcolm'
>>> m.group('last_name')
'Reynolds'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to match exactly so there is only one result?

I made the regex stricter, but not sure if there is a exact match

The second comment is addressed

@hanwen-pcluste hanwen-pcluste force-pushed the release-3.2origin branch 3 times, most recently from 24897bd to dd4a866 Compare September 12, 2022 20:53
@hanwen-pcluste hanwen-pcluste changed the title [3.2] Add Support for On-Demand Capacity Reservations (ODCRs) in Cluster Configuration File Add Support for On-Demand Capacity Reservations (ODCRs) in Cluster Configuration File Sep 12, 2022
@hanwen-pcluste hanwen-pcluste force-pushed the release-3.2origin branch 3 times, most recently from e8a8225 to f148263 Compare September 13, 2022 15:40
conditional_template_properties.update({"instance_type": compute_resource.instance_type})

capacity_reservation_specification = None
if isinstance(queue, (SlurmQueue, SchedulerPluginQueue)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check f isinstance(queue, (SlurmQueue, SchedulerPluginQueue)) necessary?
def _add_compute_resource_launch_template function is under class ComputeFleetConstruct, ComputeFleetConstruct is not for awsbatch scheduler.
See code:

        if not self._condition_is_batch():
            self.compute_fleet_resources = ComputeFleetConstruct(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

]
)

if self._config.scheduling.scheduler == "slurm":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if self._config.scheduling.scheduler != "awsbatch"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

1. The section can appear both on queue level and compute resource level. The section on compute resource level takes precedence over the queue level.

2. The capacity reservation specification is passed into the launch templates used by compute nodes. This approach is forward compatible if ParallelCluster uses EC2 fleet

Signed-off-by: Hanwen <hanwenli@amazon.com>
This commit also adds a validation to prevent the two parameters coexist in ODCR section

Signed-off-by: Hanwen <hanwenli@amazon.com>
…availability zone

Signed-off-by: Hanwen <hanwenli@amazon.com>
This policy is required to run validators for `CapacityReservationResourceGroupArn`.

The policy of describing capacity reservation is added in this commit. Because it is covered by existing policy: `ec2:Describe*`

The same policy is added to integration tests

Signed-off-by: Hanwen <hanwenli@amazon.com>
@hanwen-pcluste hanwen-pcluste merged commit 08a86fb into aws:develop Sep 14, 2022
hanwen-pcluste pushed a commit to hanwen-pcluste/aws-parallelcluster that referenced this pull request Sep 22, 2022
test_scheduler_plugin checks cluster configuration file to be the same as expected. We added ODCR section in aws#4295 and Placement Group section in Networking aws#4330 in cluster configuration file. Therefore, the expected cluster configuration file should be changed accordingly

Signed-off-by: Hanwen <hanwenli@amazon.com>
hanwen-pcluste pushed a commit to hanwen-pcluste/aws-parallelcluster that referenced this pull request Sep 22, 2022
test_scheduler_plugin checks cluster configuration file to be the same as expected. We added ODCR section in aws#4295 and Placement Group section in Compute Resource aws#4330 in cluster configuration file. Therefore, the expected cluster configuration file should be changed accordingly

Signed-off-by: Hanwen <hanwenli@amazon.com>
lukeseawalker pushed a commit that referenced this pull request Sep 23, 2022
test_scheduler_plugin checks cluster configuration file to be the same as expected. We added ODCR section in #4295 and Placement Group section in Compute Resource #4330 in cluster configuration file. Therefore, the expected cluster configuration file should be changed accordingly

Signed-off-by: Hanwen <hanwenli@amazon.com>
hanwen-pcluste pushed a commit to hanwen-pcluste/aws-parallelcluster that referenced this pull request Sep 26, 2022
…fied on compute resource or queue

This was a bug from aws#4295. The validation was only executed if capacity reservation target is specified on compute resource

Signed-off-by: Hanwen <hanwenli@amazon.com>
hanwen-pcluste pushed a commit that referenced this pull request Sep 26, 2022
…fied on compute resource or queue

This was a bug from #4295. The validation was only executed if capacity reservation target is specified on compute resource

Signed-off-by: Hanwen <hanwenli@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants