Skip to content

[Validators] Relax validator PlacementGroupCapacityReservationValidator by accepting PG-ODCR when PG is omitted/disabled in cluster config.#7252

Merged
gmarciani merged 2 commits intoaws:developfrom
gmarciani:wip/mgiacomo/3150/pgodcr-0226-1
Mar 2, 2026
Merged

[Validators] Relax validator PlacementGroupCapacityReservationValidator by accepting PG-ODCR when PG is omitted/disabled in cluster config.#7252
gmarciani merged 2 commits intoaws:developfrom
gmarciani:wip/mgiacomo/3150/pgodcr-0226-1

Conversation

@gmarciani
Copy link
Contributor

@gmarciani gmarciani commented Feb 27, 2026

Description of changes

Relax validator PlacementGroupCapacityReservationValidator by accepting PG-ODCR when PG is omitted/disabled in cluster config.

This relaxation is safe because, when PG in cluster config is omitted/disabled, the ODCR decides the placement using its own PG. See doc: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/cr-cpg.html

This relaxation allows users to consume PG-ODCR having cross-account PG.

UX

Given a cluster config with a PG-ODCR and omitted/disabled PG, the CLI now succeeds with more helpful warning:

{
  "validationMessages": [
    {
      "level": "WARNING",
      "type": "EfaPlacementGroupValidator",
      "message": "You may see better performance using a placement group for the queue 'q2'. You can ignore this warning if the compute resources in the queue use a capacity reservation that provides its own placement group."
    },
    {
      "level": "WARNING",
      "type": "EfaPlacementGroupValidator",
      "message": "You may see better performance using a placement group for the queue 'q3'. You can ignore this warning if the compute resources in the queue use a capacity reservation that provides its own placement group."
    }
  ],
  "message": "Request would have succeeded, but DryRun flag is set."
}

while it used to fail with less helpful messages:

{
  "configurationValidationErrors": [
    {
      "level": "WARNING",
      "type": "EfaPlacementGroupValidator",
      "message": "You may see better performance using a placement group for the queue."
    },
    {
      "level": "WARNING",
      "type": "EfaPlacementGroupValidator",
      "message": "You may see better performance using a placement group for the queue."
    },
    {
      "level": "ERROR",
      "type": "PlacementGroupCapacityReservationValidator",
      "message": "There are no open or targeted ODCRs that match the instance_type 'c5n.9xlarge' in 'us-east-1b' and no placement group provided. Please either provide a placement group or add an ODCR that does not target a placement group and targets the instance type."
    },
    {
      "level": "ERROR",
      "type": "PlacementGroupCapacityReservationValidator",
      "message": "There are no open or targeted ODCRs that match the instance_type 'c5n.9xlarge' in 'us-east-1b' and no placement group provided. Please either provide a placement group or add an ODCR that does not target a placement group and targets the instance type."
    }
  ],
  "message": "Invalid cluster configuration."
}

Now:

    {
      "level": "WARNING",
      "type": "EfaPlacementGroupValidator",
      "message": "You may see better performance using a placement group for the queue 'q2'. You can ignore this warning if the compute resources in the queue use a capacity reservation that provides its own placement group."
    },
    {
      "level": "WARNING",
      "type": "EfaPlacementGroupValidator",
      "message": "You may see better performance using a placement group for the queue 'q3'. You can ignore this warning if the compute resources in the queue use a capacity reservation that provides its own placement group."
    },

Changes

Case PG/Enabled ODCR has PG? PG is cross-account? Current Proposed Notes
2 omitted Yes No ❌ ERROR ✅ Pass Validation skipped
3 omitted Yes Yes ❌ ERROR ✅ Pass Validation skipped
5 false Yes No ❌ ERROR ✅ Pass Validation skipped
6 false Yes Yes ❌ ERROR ✅ Pass Validation skipped

Tests

  • Unit Tests
  • Cluster creation consuming cross-account PG ODCR. Verified that capacity can launch.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@gmarciani gmarciani force-pushed the wip/mgiacomo/3150/pgodcr-0226-1 branch from f9bcc53 to 779cbe3 Compare February 27, 2026 22:18
@gmarciani gmarciani changed the title [Test] Add unit test to cover the behavior of PlacementGroupCapacityR… [Validators] Relax validator PlacementGroupCapacityReservationValidator by accepting PG-ODCR when PG is omitted/disabled in cluster config. Feb 27, 2026
@gmarciani gmarciani added the 3.x label Feb 27, 2026
@gmarciani gmarciani force-pushed the wip/mgiacomo/3150/pgodcr-0226-1 branch from 779cbe3 to 5f4e320 Compare February 27, 2026 22:21
@gmarciani gmarciani marked this pull request as ready for review February 27, 2026 22:22
@gmarciani gmarciani requested review from a team as code owners February 27, 2026 22:22
@gmarciani gmarciani force-pushed the wip/mgiacomo/3150/pgodcr-0226-1 branch 3 times, most recently from 5a01896 to 1e7fe92 Compare March 2, 2026 18:31
subnet=queue.networking.subnet_ids[0],
instance_types=compute_resource.instance_types,
multi_az_enabled=queue.multi_az_enabled,
subnet_id_az_mapping=queue.networking.subnet_id_az_mapping,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why deleting this parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was injected only to provide the AZ in the error message:

There are no open or targeted ODCRs that match the instance_type 'c5n.9xlarge' in 'us-east-1b' and no placement group provided. Please either provide a placement group or add an ODCR that does not target a placement group and targets the instance type.

Not required anymore because we are removing that validation logic as part of the relaxation.

hanwen-cluster
hanwen-cluster previously approved these changes Mar 2, 2026
…or by accepting PG-ODCR when PG is omitted/disabled in cluster config.

This relaxation is safe because, when PG in cluster config is omitted/disabled, the ODCR decides the placement using its own PG.

This relaxation allows users to consume PG-ODCR having cross-account PG.
…ator` to better guide the user in addressing the failure.

In particular:

 1. we now specify the queue that contains the problem.

 2. we now specify that you can set PlacementGroup/Enabled=false even if you are using a PG-ODCR.
@gmarciani gmarciani force-pushed the wip/mgiacomo/3150/pgodcr-0226-1 branch from 3a8655a to 7dae62a Compare March 2, 2026 22:06
@gmarciani gmarciani merged commit 40d7e0d into aws:develop Mar 2, 2026
24 checks passed
@gmarciani gmarciani deleted the wip/mgiacomo/3150/pgodcr-0226-1 branch March 2, 2026 22:41
@codecov
Copy link

codecov bot commented Mar 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.06%. Comparing base (d4d705d) to head (7dae62a).
⚠️ Report is 11 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #7252   +/-   ##
========================================
  Coverage    90.06%   90.06%           
========================================
  Files          182      182           
  Lines        16646    16620   -26     
========================================
- Hits         14992    14969   -23     
+ Misses        1654     1651    -3     
Flag Coverage Δ
unittests 90.06% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants