Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional kubernetes GPU docs #2078

Merged
merged 2 commits into from Apr 25, 2022

Conversation

arnaldo2792
Copy link
Contributor

Issue number:
N / A

Description of changes:

QUICKSTART-EKS: add NVIDIA GPUs sample configuration

Now the documentation explicitly says that it is possible to use a GPU
per orchestrated container, and references the official kubernetes
documentation to schedule NVIDIA GPUs.
README: add NVIDIA GPUs section

This adds a new section for NVIDIA GPUs and lists what EC2 instance
types are supported by the official Bottlerocket `nvidia` k8s AMIs.

Testing done:

  • Links work as expected

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

README.md Outdated
@@ -740,6 +741,11 @@ There are a few important caveats about the provided kdump support:
* The system kernel will reserve 256MB for the crash kernel, only when the host has at least 2GB of memory; the reserved space won't be available for processes running in the host
* The crash kernel will only be loaded when the `crashkernel` parameter is present in the kernel's cmdline and if there is memory reserved for it

### NVIDIA GPUs Support
Bottlerocket's `nvidia` kubernetes variants include the required packages and configurations to leverage NVIDIA GPUs.
The official AMIs for these variants can be used with the following EC2 instance types: `p2`, `p3`, `p4`, `g4dn`, `g5` and `g5g`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Guard against having an incorrect list in the future.

Suggested change
The official AMIs for these variants can be used with the following EC2 instance types: `p2`, `p3`, `p4`, `g4dn`, `g5` and `g5g`.
The official AMIs for these variants can be used with EC2 GPU-equipped instance types such as: `p2`, `p3`, `p4`, `g4dn`, `g5` and `g5g`.

@@ -383,3 +383,20 @@ You can install them in your cluster by following the `helm install` instruction
The [GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/getting-started.html#install-nvidia-gpu-operator) can also be used to install these tools.
However, it is cumbersome to select the right subset of features to avoid conflicts with the software included in the variant.
Therefore we recommend installing the tools individually if they are required.

In hosts with multiple GPUs (i.e. EC2 `g4dn` instances) you can assign a GPU per container by specifying the resource in the containers' spec as described in the [official kubernetes documentation](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In hosts with multiple GPUs (i.e. EC2 `g4dn` instances) you can assign a GPU per container by specifying the resource in the containers' spec as described in the [official kubernetes documentation](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/):
In hosts with multiple GPUs (ex. EC2 `g4dn` instances) you can assign a GPU per container by specifying the resource in the containers' spec as described in the [official kubernetes documentation](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/):

e.g. would also work.

@arnaldo2792
Copy link
Contributor Author

Forced push addresses the comments above

@arnaldo2792 arnaldo2792 requested a review from jpculp April 20, 2022 23:43
README.md Outdated
@@ -740,6 +741,11 @@ There are a few important caveats about the provided kdump support:
* The system kernel will reserve 256MB for the crash kernel, only when the host has at least 2GB of memory; the reserved space won't be available for processes running in the host
* The crash kernel will only be loaded when the `crashkernel` parameter is present in the kernel's cmdline and if there is memory reserved for it

### NVIDIA GPUs Support
Bottlerocket's `nvidia` kubernetes variants include the required packages and configurations to leverage NVIDIA GPUs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe omit Kubernetes since we'll have ECS soon:

Suggested change
Bottlerocket's `nvidia` kubernetes variants include the required packages and configurations to leverage NVIDIA GPUs.
Bottlerocket's `nvidia` variants include the required packages and configurations to leverage NVIDIA GPUs.

README.md Outdated
### NVIDIA GPUs Support
Bottlerocket's `nvidia` kubernetes variants include the required packages and configurations to leverage NVIDIA GPUs.
The official AMIs for these variants can be used with EC2 GPU-equipped instance types such as: `p2`, `p3`, `p4`, `g4dn`, `g5` and `g5g`.
Please refer to the [Amazon EKS quickstart](QUICKSTART-EKS.md#aws-k8s--nvidia-variants) for further details about these variants.
Copy link
Contributor

@bcressey bcressey Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere in this doc, we refer to this doc as QUICKSTART-EKS so I'd like to continue using that name for consistency.

This adds a new section for NVIDIA GPUs and lists what EC2 instance
types are supported by the official Bottlerocket `nvidia` k8s AMIs.

Signed-off-by: Arnaldo Garcia Rincon <agarrcia@amazon.com>
@arnaldo2792
Copy link
Contributor Author

Forced push addresses comments above

README.md Outdated
### NVIDIA GPUs Support
Bottlerocket's `nvidia` variants include the required packages and configurations to leverage NVIDIA GPUs.
The official AMIs for these variants can be used with EC2 GPU-equipped instance types such as: `p2`, `p3`, `p4`, `g4dn`, `g5` and `g5g`.
Please see [QUICKSTART-EKS](QUICKSTART-EKS.md#aws-k8s--nvidia-variants) for further details about kubernetes variants.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Please see [QUICKSTART-EKS](QUICKSTART-EKS.md#aws-k8s--nvidia-variants) for further details about kubernetes variants.
Please see [QUICKSTART-EKS](QUICKSTART-EKS.md#aws-k8s--nvidia-variants) for further details about Kubernetes variants.

Now the documentation explicitly says that it is possible to use a GPU
per orchestrated container, and references the official kubernetes
documentation to schedule NVIDIA GPUs.

Signed-off-by: Arnaldo Garcia Rincon <agarrcia@amazon.com>
@arnaldo2792
Copy link
Contributor Author

(Forced push fixes comment above)

@arnaldo2792 arnaldo2792 merged commit cf82440 into bottlerocket-os:develop Apr 25, 2022
@arnaldo2792 arnaldo2792 deleted the k8s-gpu-docs branch June 21, 2022 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants