Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(aws-eks): Neuron device plugin is not installed when instance type is Trainium #29131

Closed
freschri opened this issue Feb 16, 2024 · 2 comments · Fixed by #29155
Closed

(aws-eks): Neuron device plugin is not installed when instance type is Trainium #29131

freschri opened this issue Feb 16, 2024 · 2 comments · Fixed by #29155
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. effort/medium Medium work item – several days of effort p2

Comments

@freschri
Copy link
Contributor

Describe the bug

if instance type is Trainium the neuron device plugin is wrongfully not installed

Expected Behavior

if instance type is Trainium the neuron device plugin is installed

Current Behavior

if instance type is Trainium the neuron device plugin is NOT installed

Reproduction Steps

use an instance of type Trainium

Possible Solution

No response

Additional Information/Context

Instance types of family Trainium have recently been added here: https://github.com/aws/aws-cdk/blame/main/packages/aws-cdk-lib/aws-ec2/lib/instance-types.ts

BUT:
[packages/aws-cdk-lib/aws-eks/lib/instance-types.ts] does not include them:
export const INSTANCE_TYPES = {
gpu: ['p2', 'p3', 'g2', 'g3', 'g4'],
inferentia: ['inf1', 'inf2'],
graviton: ['a1'],
graviton2: ['c6g', 'm6g', 'r6g', 't4g'],
graviton3: ['c7g'],
};

causing the check in packages/aws-cdk-lib/aws-eks/lib/cluster.ts to fail and the plugin not being installed:

function nodeTypeForInstanceType(instanceType: ec2.InstanceType) {
return INSTANCE_TYPES.gpu.includes(instanceType.toString().substring(0, 2)) ? NodeType.GPU :
INSTANCE_TYPES.inferentia.includes(instanceType.toString().substring(0, 4)) ? NodeType.INFERENTIA :
NodeType.STANDARD;
}

public addNodegroupCapacity(id: string, options?: NodegroupOptions): Nodegroup {
const hasInferentiaInstanceType = [
options?.instanceType,
...options?.instanceTypes ?? [],
].some(i => i && nodeTypeForInstanceType(i) === NodeType.INFERENTIA);
if (hasInferentiaInstanceType) {
this.addNeuronDevicePlugin();
}
...

CDK CLI Version

2.128.0

Framework Version

No response

Node.js Version

v21.6.1

OS

sonoma 14.3

Language

TypeScript

Language Version

No response

Other information

No response

@freschri freschri added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Feb 16, 2024
@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Feb 16, 2024
@pahud
Copy link
Contributor

pahud commented Feb 16, 2024

Yeah we could add it in the instance types. We welcome any PRs for this.

@pahud pahud added p2 effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Feb 16, 2024
@mergify mergify bot closed this as completed in #29155 Mar 15, 2024
mergify bot pushed a commit that referenced this issue Mar 15, 2024
@freschri – It's a little hard to find docs on this but I think this is what you're after?

Closes #29131.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. effort/medium Medium work item – several days of effort p2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants