New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for GPU resources #577

Closed
dvavili opened this Issue Dec 11, 2017 · 4 comments

Comments

Projects
None yet
4 participants
@dvavili
Copy link
Contributor

dvavili commented Dec 11, 2017

For Machine learning workflows we would require to schedule jobs/pods on nodes with GPU resources. Need support to mention GPU resources in workflow spec.

@edlee2121

This comment has been minimized.

Copy link
Contributor

edlee2121 commented Dec 11, 2017

@jessesuen Need to document how to schedule pods on GPU nodes.

@pratulw pratulw added this to the M13 milestone Dec 11, 2017

@pratulw pratulw added the docs label Dec 11, 2017

@jessesuen

This comment has been minimized.

Copy link
Contributor

jessesuen commented Dec 11, 2017

Hi @dvavili, are you referring to kubernetes' Accelerators feature to allow support for having GPU limits and requests? https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/

If so, this should be supported already. You simply need to add the request and/or limit field to the resources field in the container spec. See example below:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: hello-world-
spec:
  entrypoint: whalesay
  templates:
  - name: whalesay
    container:
      image: docker/whalesay:latest
      command: [cowsay]
      args: ["hello world"]
      resources:
        limits: 
          alpha.kubernetes.io/nvidia-gpu: 2

The container and it's fields are passed through when argo constructs the pod spec so these resource requests should be preserved.

@jessesuen

This comment has been minimized.

Copy link
Contributor

jessesuen commented Dec 11, 2017

Otherwise, if you are asking about the ability to schedule a workflow (or part of a workflow) to run on a node with specific label (e.g. gpu), this is supported through the use of nodeSelectors. Please see the node-selector.yaml example:
https://github.com/argoproj/argo/blob/master/examples/node-selector.yaml

We support nodeSelector at both the container level and at workflow level. If specified at the workflow level, then all containers in the workflow will use the nodeSelector (but can be overridden at the container level).

The assumption is that the nodes are labeled in such a way that it can be queried using standard kubernetes node selectors.

@jessesuen

This comment has been minimized.

Copy link
Contributor

jessesuen commented Dec 13, 2017

Confirmed with @dvavili that they wanted to specify resources.limits and resources.requests which we already support. Closing bug.

@jessesuen jessesuen closed this Dec 13, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment