-
Notifications
You must be signed in to change notification settings - Fork 757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: explicit GPU runner mappings #2862
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2862 +/- ##
==========================================
+ Coverage 70.10% 70.27% +0.16%
==========================================
Files 114 120 +6
Lines 9811 9782 -29
==========================================
- Hits 6878 6874 -4
+ Misses 2933 2908 -25
|
bentoml/_internal/resource.py
Outdated
def from_spec(cls, spec: t.Union[int, str, t.List[int | str]]) -> t.List[int]: | ||
if not isinstance(spec, (int, str, t.List)): | ||
raise TypeError( | ||
"NVidia GPU resource limit must be int, str or a list specifing the exact GPUs to use." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe "GPU device IDs".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This area is my only concern with this PR. is this a bit too confusing?
if the user gives
nvidia.com/gpu: 3
then it is the number of resources
``nvidia.com/gpu: [3]` is the specific GPU number
I feel like if it is documented properly it should be fine, what do you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add more info in the doc section and this type error can point to that section of our doc
Let me know if the change to the resource config is okay and if it is, I'll update the configuration guide too, talking about how we can configure each runner specifically |
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this pretty much looks good to me minus the one comment.
Hello @jjmachan, Thanks for updating this PR. There are currently no PEP 8 issues detected in this PR. Cheers! 🍻 Comment last updated at 2022-08-10 19:00:29 UTC |
def test_default_gpu_strategy(monkeypatch): | ||
monkeypatch.setattr(strategy, "get_resource", unvalidated_get_resource) | ||
assert DefaultStrategy.get_worker_count(GPURunnable, {"nvidia.com/gpu": 2}) == 2 | ||
assert DefaultStrategy.get_worker_count(GPURunnable, {"nvidia.com/gpu": 0}) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was working as expected @sauyon . Now if resources are not specified it will default to a worker count of 1 and give a warning saying "no resource found, falling back to using a single worker" which was the logic earlier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it is the behavior caused by login in DefaultStrategy. I wonder if it's correct but we may solve that in another PR: #2894
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some very very style details. 😄
return resource_get_resource(x, y, validate=False) | ||
|
||
|
||
def test_default_gpu_strategy(monkeypatch): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Types of monkeypatch is
if TYPE_CHECKING:
from _pytest.monkeypatch import MonkeyPatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
1 Q: the if TYPE_CHECKING is a check to improve perf type checker right?
What does this PR address?
Fixes #2770
With this change, the
NvidiaGpuResource
will accept and return a list that specifies the exact GPUs to use. This enables the user to configure exactly which GPU to map to each runner worker.Runner currently supports:
This PR introduces:
Before submitting:
guide on how to create a pull request.
make format
andmake lint
script have passed (instructions)?those accordingly? Here are documentation guidelines and tips on writting docs.
Who can help review?
Feel free to tag members/contributors who can help review your PR.