Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ecs): add support for elastic inference accelerators in ECS task defintions #13950

Merged

Conversation

@upparekh
Copy link
Contributor

@upparekh upparekh commented Apr 1, 2021

This PR would enable users to attach inference accelerators to their ECS tasks (currently not supported by Fargate). It adds an optional inferenceAccelerators property to the EC2 Task Definition. It also adds resourceRequirement property to the Container definition to enable containers to refer to the inference accelerators provided in the task definition.

Closes #12460

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@gitpod-io
Copy link

@gitpod-io gitpod-io bot commented Apr 1, 2021

@upparekh upparekh requested a review from SoManyHs Apr 1, 2021
@upparekh upparekh self-assigned this Apr 2, 2021
@SoManyHs SoManyHs added this to Needs review in ECS and ECS Patterns Issues via automation Apr 2, 2021
packages/@aws-cdk/aws-ecs/README.md Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ecs/lib/container-definition.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ecs/README.md Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ecs/README.md Outdated Show resolved Hide resolved
readonly deviceName?: string;

/**
* The Elastic Inference accelerator type to use.

This comment has been minimized.

@SoManyHs

SoManyHs Apr 2, 2021
Contributor

It might be helpful to include what values are valid here in the docstring (i.e., eia2.medium, eia2.large, and eia2.xlarge, according to the docs).
Normally, having the different valid values as an enum might be nice so that an IDE can pick up the valid values, but the downside is that we'd have to manually add any new types that get added in the future at the risk of always being behind what's available, so I think a string is fine for now.

* The type and amount of a resource to assign to a container.
* @default - No resources assigned.
*/
readonly resourceRequirements?: ResourceRequirement[];

This comment has been minimized.

@SoManyHs

SoManyHs Apr 2, 2021
Contributor

I'm wondering if we should have the new field be something that is more specific to EIA, since resourceRequirements in CFN technically encompasses both GPU and EIA. GPU currently is specified through its own field, and it may be confusing for this field to exist when we have GPU specified elsewhere.

This comment has been minimized.

@upparekh

upparekh Apr 2, 2021
Author Contributor

I agree, it might get confusing this way. I will make the required changes.

This comment has been minimized.

@upparekh

upparekh Apr 6, 2021
Author Contributor

I renamed the resourceRequirements prop to inferenceAcceleratorResources (as inferenceAccelerators was already a task definiton prop) and changed the type to be a list of device names (strings).

piradeepk and others added 2 commits Apr 2, 2021
Updates to documentation and allocation of props in tests

Co-authored-by: Hsing-Hui Hsu <hsinghui@amazon.com>
@upparekh upparekh requested a review from iamhopaul123 Apr 2, 2021
Copy link
Contributor

@iamhopaul123 iamhopaul123 left a comment

Very awesome PR (even without considering it is the first PR that you have) 🎉

upparekh and others added 2 commits Apr 5, 2021
Updated docs and sanity check condition

Co-authored-by: Penghao He <penghaoh@amazon.com>
@upparekh upparekh requested review from SoManyHs and iamhopaul123 Apr 6, 2021
Copy link
Contributor

@iamhopaul123 iamhopaul123 left a comment

Looks good just some nits. Just a reminder but make sure every error you throw is covered in the unit test!

packages/@aws-cdk/aws-ecs/lib/container-definition.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/aws-ecs/lib/container-definition.ts Outdated Show resolved Hide resolved
function renderResourceRequirements(gpuCount: number = 0, inferenceAcceleratorResources: string[] = []):
CfnTaskDefinition.ResourceRequirementProperty[] | undefined {
if (inferenceAcceleratorResources.length > 0 && gpuCount > 0) {
throw new Error('Both inference accelerator and gpu count defined in the container definition.');

This comment has been minimized.

@iamhopaul123

iamhopaul123 Apr 8, 2021
Contributor

Suggested change
throw new Error('Both inference accelerator and gpu count defined in the container definition.');
throw new Error('Cannot define both inference accelerator and gpu count in the container definition.');

We might need to be more specific for the reason why this condition fails https://uxdworld.com/2018/05/30/how-to-write-good-error-messages/
we also need unit test for this.

This comment has been minimized.

@upparekh

upparekh Apr 14, 2021
Author Contributor

Discussed about this offline but posting here for the record, we decided to render both gpuCount and inferenceAcceleratorResources properties and let the validators in other stages handle this check.

Updating error messages

Co-authored-by: Penghao He <penghaoh@amazon.com>
@gitpod-io
Copy link

@gitpod-io gitpod-io bot commented Apr 9, 2021

Copy link
Contributor

@iamhopaul123 iamhopaul123 left a comment

LGTM with just two nits. Feel free to take off the do-not-merge label once those are addressed!

function renderResourceRequirements(gpuCount: number = 0, inferenceAcceleratorResources: string[] = []):
CfnTaskDefinition.ResourceRequirementProperty[] | undefined {
const ret = [];
if (inferenceAcceleratorResources.length > 0) {

This comment has been minimized.

@iamhopaul123

iamhopaul123 Apr 14, 2021
Contributor

Just a nit but this check might not be required?

@gitpod-io
Copy link

@gitpod-io gitpod-io bot commented Apr 14, 2021

@mergify mergify bot dismissed iamhopaul123’s stale review Apr 14, 2021

Pull request has been modified.

@mergify
Copy link
Contributor

@mergify mergify bot commented Apr 14, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot dismissed iamhopaul123’s stale review Apr 14, 2021

Pull request has been modified.

@mergify
Copy link
Contributor

@mergify mergify bot commented Apr 14, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation commented Apr 15, 2021

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject89A8053A-LhjRyN9kxr8o
  • Commit ID: 790a118
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify
Copy link
Contributor

@mergify mergify bot commented Apr 15, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit 23986d7 into aws:master Apr 15, 2021
7 checks passed
7 checks passed
@github-actions
auto-approve
Details
@github-actions
validate-pr
Details
AWS CodeBuild us-east-1 (AutoBuildProject89A8053A-LhjRyN9kxr8o) Build succeeded for project AutoBuildProject89A8053A-LhjRyN9kxr8o
Details
@gitpod-io
Gitpod Open an online workspace in Gitpod
Details
@mergify
Rule: automatic merge (merge) The pull request has been merged automatically
Details
@semantic-pull-requests
Semantic Pull Request ready to be squashed
Details
@mergify
Summary 5 potential rules
Details
ECS and ECS Patterns Issues automation moved this from Needs review to Closed Apr 15, 2021
@upparekh upparekh deleted the upparekh:upparekh/support-for-elastic-inference-accelerators branch Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked issues

Successfully merging this pull request may close these issues.

5 participants