Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ecs): nvidia support to BottlerocketEcsVariant enum for gpu-accelerated tasks #28488

Merged
merged 11 commits into from Dec 27, 2023

Conversation

badmintoncryer
Copy link
Contributor

This pull request introduces a new variant, AWS_ECS_1_NVIDIA, to the BottlerocketEcsVariant enum. This addition caters to the increasing demand for GPU-accelerated computing in containerized environments, particularly for tasks that require intensive computing power, such as machine learning and 3D rendering.

Closes #25980


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@github-actions github-actions bot added effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p2 labels Dec 25, 2023
@aws-cdk-automation aws-cdk-automation requested a review from a team December 25, 2023 13:31
@github-actions github-actions bot added the valued-contributor [Pilot] contributed between 6-12 PRs to the CDK label Dec 25, 2023
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request linter has failed. See the aws-cdk-automation comment below for failure reasons. If you believe this pull request should receive an exemption, please comment and provide a justification.

A comment requesting an exemption should contain the text Exemption Request. Additionally, if clarification is needed add Clarification Request to a comment.

@aws-cdk-automation aws-cdk-automation dismissed their stale review December 25, 2023 14:13

✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.

@badmintoncryer badmintoncryer marked this pull request as ready for review December 25, 2023 16:28
@aws-cdk-automation aws-cdk-automation added the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Dec 26, 2023
@badmintoncryer badmintoncryer changed the title feat(ecs): NVIDIA Support to Bottlerocket ECS Variants feat(ecs): nvidia support to BottlerocketEcsVariant enum for gpu-accelerated tasks Dec 26, 2023
Copy link
Contributor

@lpizzinidev lpizzinidev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution 👍
I left some comments and suggestions.

packages/aws-cdk-lib/aws-ecs/lib/amis.ts Show resolved Hide resolved
});
cluster.addCapacity('bottlerocket-asg', {
minCapacity: 2,
instanceType: new ec2.InstanceType('c5.large'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't Nvidia-based images require a GPU-based instance type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly! I'll try it later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was asking because from the Console you can deploy only with certain GPU-based instance types and it seemed strange that c5.large would deploy. Not sure if it's possible/worth validating this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you mentioned, this behavior does seem odd. However, since the integration test itself was successful, does this mean that deployment is possible via CloudFormation? Just to be sure, I have switched to integration testing with instances that have GPUs. Please let me know if further verification is needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that CloudFormation correctly deploys the ASG but the instances will fail to launch if the instance type is incompatible with the provided AMI.
We may want to leave it to the user to provide a valid configuration (unless maintainers think otherwise).

packages/aws-cdk-lib/aws-ecs/test/amis.test.ts Outdated Show resolved Hide resolved
@aws-cdk-automation aws-cdk-automation removed the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Dec 26, 2023
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request linter has failed. See the aws-cdk-automation comment below for failure reasons. If you believe this pull request should receive an exemption, please comment and provide a justification.

A comment requesting an exemption should contain the text Exemption Request. Additionally, if clarification is needed add Clarification Request to a comment.

@aws-cdk-automation aws-cdk-automation dismissed their stale review December 27, 2023 15:57

✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.

@badmintoncryer
Copy link
Contributor Author

@lpizzinidev Thank you for your review.
I have incorporated the feedback you provided. I haven't delved deeply into the issue of being able to deploy to instances without GPUs. Please let me know if anything additional is required.

@aws-cdk-automation aws-cdk-automation added the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Dec 27, 2023
Copy link
Contributor

mergify bot commented Dec 27, 2023

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: a03a2d7
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify mergify bot merged commit 832e29a into aws:main Dec 27, 2023
9 checks passed
Copy link
Contributor

mergify bot commented Dec 27, 2023

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

paulhcsun pushed a commit to paulhcsun/aws-cdk that referenced this pull request Jan 5, 2024
…lerated tasks (aws#28488)

This pull request introduces a new variant, AWS_ECS_1_NVIDIA, to the BottlerocketEcsVariant enum. This addition caters to the increasing demand for GPU-accelerated computing in containerized environments, particularly for tasks that require intensive computing power, such as machine learning and 3D rendering.

Closes aws#25980

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@badmintoncryer badmintoncryer deleted the 25980-add1Nvidia branch January 31, 2024 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p2 pr/needs-maintainer-review This PR needs a review from a Core Team Member valued-contributor [Pilot] contributed between 6-12 PRs to the CDK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(ecs): Add aws-ecs-1-nvidia to BottlerocketEcsVariant
4 participants