Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat]: enabling schedule with requests #352

Closed
JasonHe-WQ opened this issue Jun 13, 2024 · 1 comment
Closed

[feat]: enabling schedule with requests #352

JasonHe-WQ opened this issue Jun 13, 2024 · 1 comment

Comments

@JasonHe-WQ
Copy link
Contributor

JasonHe-WQ commented Jun 13, 2024

1. Issue or feature description

Add requests on GPU resources computation ,to enable more scheduling strategies and more utilization on GPU computation

2. Steps to reproduce the issue

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
    - name: ubuntu-container
      image: ubuntu:18.04
      command: ["bash", "-c", "sleep 86400"]
      resources:
        requests:
          nvidia.com/gpucores: 20 # Each vGPU uses 20% of the entire GPU (Optional,Integer)
        limits:
          nvidia.com/gpu: 1 # requesting 1 vGPUs
          nvidia.com/gpumem: 3000 # Each vGPU contains 3000m device memory (Optional,Integer)
          nvidia.com/gpucores: 25 # Each vGPU uses 25% of the entire GPU (Optional,Integer)

Only scheduler codes are needed to be edit, and it will be compatible to the HAMi DRA.

Attention: This feature will leads to QoS level change, original Pods with guaranteed QoS level will not be guaranteed with computation resource once any Pod with requests were scheduled on the same GPU. And if this feature is disabled, nothing will be changed.

3. Information to attach (optional if deemed irrelevant)

@JasonHe-WQ
Copy link
Contributor Author

Closing due to significant uncertainties in the CUDA driver, such as QoS issues, memory allocation for request-only tasks, and more.

For a closed-source commercial implementation, consider run.ai. Their relevant blog posts are provided below:

Maximize the Potential of Your GPUs: A Guide to Dynamic GPU Fractions & Node-Level Scheduler
Dynamic GPU Memory: Solving the Problem of Inefficient Resource Allocation in Inference Servers

@JasonHe-WQ JasonHe-WQ closed this as not planned Won't fix, can't repro, duplicate, stale Aug 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant