feat: explicit GPU runner mappings #2862

jjmachan · 2022-08-04T19:01:54Z

What does this PR address?

Fixes #2770
With this change, the NvidiaGpuResource will accept and return a list that specifies the exact GPUs to use. This enables the user to configure exactly which GPU to map to each runner worker.

Runner currently supports:

runners:
  resources: 
    nvidia.com/gpu: 4 => still valid but will be automatically converted to use GPUs [0, 1, 2, 3] internally.

This PR introduces:

  resources: 
    nvidia.com/gpu: [2, 4] => only use GPU [2, 4] for runner worker 1 and 2

Before submitting:

Does the Pull Request follow Conventional Commits specification naming? Here are GitHub's
guide on how to create a pull request.
Does the code follow BentoML's code style, both make format and make lint script have passed (instructions)?
Did you read through contribution guidelines and follow development guidelines?
Did your changes require updates to the documentation? Have you updated
those accordingly? Here are documentation guidelines and tips on writting docs.
Did you write tests to cover your changes?

Who can help review?

Feel free to tag members/contributors who can help review your PR.

codecov · 2022-08-04T19:05:34Z

Codecov Report

Merging #2862 (9439bc3) into main (9305132) will increase coverage by 0.16%.
The diff coverage is 95.83%.

@@            Coverage Diff             @@
##             main    #2862      +/-   ##
==========================================
+ Coverage   70.10%   70.27%   +0.16%     
==========================================
  Files         114      120       +6     
  Lines        9811     9782      -29     
==========================================
- Hits         6878     6874       -4     
+ Misses       2933     2908      -25

Impacted Files	Coverage Δ
bentoml/_internal/resource.py	`76.96% <95.45%> (+0.86%)`	⬆️
bentoml/_internal/runner/strategy.py	`93.10% <100.00%> (+18.96%)`	⬆️
bentoml/_internal/utils/formparser.py	`20.00% <0.00%> (-57.94%)`	⬇️
bentoml/_internal/runner/utils.py	`90.16% <0.00%> (-1.70%)`	⬇️
bentoml/_internal/bento/build_config.py	`67.62% <0.00%> (-0.60%)`	⬇️
bentoml/_internal/runner/runner_handle/remote.py	`88.17% <0.00%> (-0.13%)`	⬇️
bentoml/_internal/server/service_app.py	`87.94% <0.00%> (-0.09%)`	⬇️
bentoml/_internal/cli/bento_management.py
bentoml/_internal/server/cli/dev_api_server.py
bentoml/_internal/service/openapi.py
... and 50 more

bentoml/_internal/resource.py

ssheng · 2022-08-05T07:31:08Z

bentoml/_internal/resource.py

+    def from_spec(cls, spec: t.Union[int, str, t.List[int | str]]) -> t.List[int]:
+        if not isinstance(spec, (int, str, t.List)):
+            raise TypeError(
+                "NVidia GPU resource limit must be int, str or a list specifing the exact GPUs to use."


maybe "GPU device IDs".

This area is my only concern with this PR. is this a bit too confusing?
if the user gives
nvidia.com/gpu: 3 then it is the number of resources
``nvidia.com/gpu: [3]` is the specific GPU number

I feel like if it is documented properly it should be fine, what do you think

I can add more info in the doc section and this type error can point to that section of our doc

bentoml/_internal/resource.py

jjmachan · 2022-08-05T07:40:35Z

Let me know if the change to the resource config is okay and if it is, I'll update the configuration guide too, talking about how we can configure each runner specifically

Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

sauyon

I think this pretty much looks good to me minus the one comment.

bentoml/_internal/resource.py

pep8speaks · 2022-08-08T10:14:26Z

Hello @jjmachan, Thanks for updating this PR.

There are currently no PEP 8 issues detected in this PR. Cheers! 🍻

Comment last updated at 2022-08-10 19:00:29 UTC

tests/unit/_internal/test_strategy.py

jjmachan · 2022-08-10T06:57:17Z

tests/unit/_internal/runner/test_strategy.py

+def test_default_gpu_strategy(monkeypatch):
+    monkeypatch.setattr(strategy, "get_resource", unvalidated_get_resource)
+    assert DefaultStrategy.get_worker_count(GPURunnable, {"nvidia.com/gpu": 2}) == 2
+    assert DefaultStrategy.get_worker_count(GPURunnable, {"nvidia.com/gpu": 0}) == 1


it was working as expected @sauyon . Now if resources are not specified it will default to a worker count of 1 and give a warning saying "no resource found, falling back to using a single worker" which was the logic earlier

Yeah it is the behavior caused by login in DefaultStrategy. I wonder if it's correct but we may solve that in another PR: #2894

aarnphm

some very very style details. 😄

tests/unit/_internal/test_configuration.py

tests/unit/_internal/runner/test_strategy.py

aarnphm · 2022-08-10T07:48:50Z

tests/unit/_internal/runner/test_strategy.py

+    return resource_get_resource(x, y, validate=False)
+
+
+def test_default_gpu_strategy(monkeypatch):


Types of monkeypatch is

if TYPE_CHECKING: from _pytest.monkeypatch import MonkeyPatch

done!
1 Q: the if TYPE_CHECKING is a check to improve perf type checker right?

sauyon and others added 5 commits August 4, 2022 23:56

make validation optional for get_resource

3c63ab4

--wip--

ee4f51c

Nvidaresource now returns a list of GPUs

2a4bde8

added tests

55cf1ff

patched tests

3123afa

jjmachan requested a review from a team as a code owner August 4, 2022 19:01

jjmachan requested review from sauyon and bojiang and removed request for a team August 4, 2022 19:01

jjmachan mentioned this pull request Aug 4, 2022

feat: explicit GPU runner mappings #2850

Closed

5 tasks

validate no gpu

cb52c18

aarnphm reviewed Aug 5, 2022

View reviewed changes

bentoml/_internal/resource.py Outdated Show resolved Hide resolved

bentoml/_internal/resource.py Outdated Show resolved Hide resolved

bentoml/_internal/resource.py Outdated Show resolved Hide resolved

ssheng reviewed Aug 5, 2022

View reviewed changes

jjmachan and others added 4 commits August 7, 2022 16:43

Update bentoml/_internal/resource.py

f042b4c

Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

Update bentoml/_internal/resource.py

d59f953

Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

Update bentoml/_internal/resource.py

b0be69e

Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

review fixes

bcda7bc

sauyon previously approved these changes Aug 7, 2022

View reviewed changes

bentoml/_internal/resource.py Outdated Show resolved Hide resolved

negative indexes in gpus are ignored

84776c4

jjmachan dismissed sauyon’s stale review via 84776c4 August 8, 2022 10:11

check if exception is raised

607689d

pep8

3b030b9

sauyon reviewed Aug 8, 2022

View reviewed changes

tests/unit/_internal/test_strategy.py Outdated Show resolved Hide resolved

sauyon reviewed Aug 8, 2022

View reviewed changes

tests/unit/_internal/test_strategy.py Outdated Show resolved Hide resolved

jjmachan added 2 commits August 9, 2022 08:51

mv test_strategy file

64bd5db

add an assert if you wannt check

206b84c

jjmachan commented Aug 10, 2022

View reviewed changes

aarnphm reviewed Aug 10, 2022

View reviewed changes

feedback changes

9439bc3

bojiang approved these changes Aug 11, 2022

View reviewed changes

ssheng merged commit 5860906 into bentoml:main Aug 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: explicit GPU runner mappings #2862

feat: explicit GPU runner mappings #2862

jjmachan commented Aug 4, 2022 •

edited by ssheng

codecov bot commented Aug 4, 2022 •

edited

ssheng Aug 5, 2022

jjmachan Aug 7, 2022 •

edited

jjmachan Aug 7, 2022

jjmachan commented Aug 5, 2022

sauyon left a comment

pep8speaks commented Aug 8, 2022 •

edited

jjmachan Aug 10, 2022

bojiang Aug 11, 2022 •

edited

aarnphm left a comment

aarnphm Aug 10, 2022

jjmachan Aug 10, 2022

		return resource_get_resource(x, y, validate=False)


		def test_default_gpu_strategy(monkeypatch):

feat: explicit GPU runner mappings #2862

feat: explicit GPU runner mappings #2862

Conversation

jjmachan commented Aug 4, 2022 • edited by ssheng

What does this PR address?

Before submitting:

Who can help review?

codecov bot commented Aug 4, 2022 • edited

Codecov Report

ssheng Aug 5, 2022

Choose a reason for hiding this comment

jjmachan Aug 7, 2022 • edited

Choose a reason for hiding this comment

jjmachan Aug 7, 2022

Choose a reason for hiding this comment

jjmachan commented Aug 5, 2022

sauyon left a comment

Choose a reason for hiding this comment

pep8speaks commented Aug 8, 2022 • edited

Comment last updated at 2022-08-10 19:00:29 UTC

jjmachan Aug 10, 2022

Choose a reason for hiding this comment

bojiang Aug 11, 2022 • edited

Choose a reason for hiding this comment

aarnphm left a comment

Choose a reason for hiding this comment

aarnphm Aug 10, 2022

Choose a reason for hiding this comment

jjmachan Aug 10, 2022

Choose a reason for hiding this comment

jjmachan commented Aug 4, 2022 •

edited by ssheng

codecov bot commented Aug 4, 2022 •

edited

jjmachan Aug 7, 2022 •

edited

pep8speaks commented Aug 8, 2022 •

edited

bojiang Aug 11, 2022 •

edited