Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrated Generic resource in service create #429

Merged
merged 7 commits into from
Nov 29, 2017

Conversation

RenaudWasTaken
Copy link
Contributor

Notes for reviewers

First contribution submitted to the repo, please advise if something's not right with the format or procedure, etc.

- What I did
I integrated Generic Resources in docker service create.

  • See here for tracking
  • See here for design document
  • See here for an example of how it was integrated in swarmctl

- How I did it
This work is based on #424 so, will need to be updated once it is merged.

- How to verify it

docker service create --name nginx --image nginx:latest --generic-resources "banana=2;apple=2"

This command should not schedule the task if no resources are advertised on any nodes.
If you want to advertise resources, these should be declared in the daemon.json or on the dockerd command line (i.e.: dockerd ... --generic-resources "banana=6;apple=4).

Note: Some tests will be added to the current.

- Description for the changelog

Added support of Generic Resources in the CLI

- A picture of a cute animal (not mandatory but encouraged)
ot1ynyu

@codecov-io
Copy link

codecov-io commented Aug 9, 2017

Codecov Report

Merging #429 into master will increase coverage by 0.21%.
The diff coverage is 72.18%.

@@            Coverage Diff             @@
##           master     #429      +/-   ##
==========================================
+ Coverage   51.42%   51.64%   +0.21%     
==========================================
  Files         215      216       +1     
  Lines       17716    17860     +144     
==========================================
+ Hits         9111     9223     +112     
- Misses       8138     8161      +23     
- Partials      467      476       +9

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for using a new microformat (key=value;key=value) instead of using either the CSV format (as we use for --mount), or allowing the flag to be set multiple times (--generic-resources foo=bar ----generic-resources bar=baz)? The latter would also allow, for example to --generic-resources-add foo=bar / --generic-resources-rm foo=bar.

I noticed this PR only handles DiscreteResourceSpec, whereas the API also supports NamedResourceSpec; could these be ambiguous? If so, perhaps the flag should take that into account ----generic-resources type=discrete,foo=bar.

If they are not abigious; does the API actually need two separate types?

@RenaudWasTaken
Copy link
Contributor Author

I noticed this PR only handles DiscreteResourceSpec, whereas the API also supports NamedResourceSpec; could these be ambiguous? If so, perhaps the flag should take that into account ----generic-resources type=discrete,foo=bar.
If they are not abigious; does the API actually need two separate types?

Generic Resources are used in two ways for this new feature:

  • Advertising GR at the node level:
    • This is done in the daemon.json / command line
    • You can advertise DiscreteResources (e.g.: FPGA=2)
    • You can advertise NamedResources (e.g.: GPU={UUID1,UUID2,UUID3};SSD={sda2,sda3})
  • Requesting a GR at the service level
    • This is done in this PR through the command line
    • A service can only request a DiscreteResource
    • Requesting NamedResources is something we chose not to address in the original PR mostly because we believe it might not be the right model

Is there a reason for using a new microformat (key=value;key=value) instead of using either the CSV format (as we use for --mount), or allowing the flag to be set multiple times (--generic-resources foo=bar ----generic-resources bar=baz)? The latter would also allow, for example to --generic-resources-add foo=bar / --generic-resources-rm foo=bar.

You are raising a pretty good point here. I actually considered using this in the original smarmkit PR.
However, there are two reasons that made us chose this new format:
- swarmkit was using an old version of the pflags that did not allow for that feature at the time.
- We didn't want to bloat the CLI with too many flags. E.g:
`docker service create --name cuda --image nvidia/cuda --generic-resources "GPU=2" --generic-resources "SSD=1" --generic-resources "InfiniBand=1" --generic-resources "InfiniBand=1" --label GPU_FAMILY="PASCAL"

Though @thaJeztah I'm not sure what --generic-resources-add and --generic-resources-rm are supposed to do ?
Also since Generic Resources are advertised in the daemon.json how would that work in the json file ?
i.e:

"genericResources": ["InfiniBand=1", "SSD=1", "GPU={UUID1,UUID2"}]

That would be a different format right ?

@thaJeztah
Copy link
Member

thaJeztah commented Aug 9, 2017

The --<something>-add / --<something>-rm flags allow you to "diff" options that are currently present in the service specs. For example (simplified);

$ docker service create --name myservice --label hello=world --label foo=bar busybox top

$ docker service inspect --format '{{ json .Spec.Labels}}' myservice | jq .
{
  "foo": "bar",
  "hello": "world"
}

For example, remove the foo label:

$ docker service update --label-rm foo myservice
$ docker service inspect --format '{{ json .Spec.Labels}}' myservice | jq .
{
  "hello": "world"
}

Or replace the hello label with a new label:

$ docker service update --label-rm hello --label-add new=sparkly myservice
$ docker service inspect --format '{{ json .Spec.Labels}}' myservice | jq .
{
  "new": "sparkly"
}

swarmkit was using an old version of the pflags that did not allow for that feature at the time.

Ah, yes. The swarmkit CLI is primarily for testing / development; I don't think we have to make the same choice here; basically, trying to keep a consistent UX with other options (but I'm not too familiar with this feature, so we can adjust where needed). Main thing that stood out to me is the use of a semicolon as separator (which is not used in any other flag currently to my knowledge)

To get a better understanding of the functionality; would:

GPU={UUID1,UUID2"}

Be any different than:

GPU=UUID1, GPU=UUID2

We didn't want to bloat the CLI with too many flags

It's a good intent 😄 Overall there's already a lot of flags, and when defining a service it's likely people already have to set many options. Having a docker-compose file to define the service would probably help for that.

Using the csv notation would still be an option possibly (thinking out loud), so:

docker service create --generic-resources GPU=UUID1, GPU=UUID2

Followed by

docker service update --generic-resources-rm GPU

Would remove both GPU=UUID1 and GPU=UUID2

Whereas:

docker service update --generic-resources-rm GPU=UUID2

Would only remove GPU=UUID2

Also since Generic Resources are advertised in the daemon.json how would that work in the json file?

If GPU={UUID1,UUID2"} is not different than GPU=UUID1, GPU=UUID2, it could look like;

"genericResources": ["InfiniBand=1", "SSD=1", "GPU=UUID1", "GPU=UUID2"]

Given that I was not in the review of the pull request for the daemon (moby/moby#33440), let me ping @stevvooe @aaronlehmann @cpuguy83 as well, in case there's things I'm overlooking.

Also, if it's decided to change the format, I think it would be good to update the format as used for the daemon flag consistent.

@RenaudWasTaken
Copy link
Contributor Author

RenaudWasTaken commented Aug 9, 2017

Ok I understand the The ---add / ---rm flags :)
When designing this feature we didn't have update in mind at all. This is a pretty good point

And in general I like the idea of being able to remove/add Generic Resources.
However I would need to look how this works internally (namely does it re-create the container) to make sure that it doesn't break the model that we designed.

Using the csv notation would still be an option possibly (thinking out loud), so:

Overall I do like the usage of the CSV without having to repeat the flag.
Back to the swarmkit and moby PR I guess :D

docker service create --generic-resources GPU=UUID1, GPU=UUID2

In the case of service create, we don't expect people to use NamedResources.
So it wouldn't be --generic-resources GPU=UUID1 but --generic-resources GPU=1.

@thaJeztah
Copy link
Member

However I would need to look how this works internally (namely does it re-create the container) to make sure that it doesn't break the model that we designed.

The "diff-ing" is a pure client-side implementation; the client takes the existing service-spec, applies the add / rm, and sends a full, updated spec over the API.

Aany change in the service spec would trigger a re-create of the service's tasks (containers), so when removing a generic resource, the existing containers are stopped and new containers created to take their place.

Back to the swarmkit and moby PR I guess :D

For moby; yes, if we agree the CSV notation is what we think is best and we want to keep the daemon flag the consistent. For Swarm, that's a "nice to have", but also probably good to stay consistent.

All of course if we agree the CSV notation works 😅

@thaJeztah
Copy link
Member

Bumping moby/moby in #679, which should bring in the required changes

@GordonTheTurtle
Copy link

Please sign your commits following these rules:
https://github.com/moby/moby/blob/master/CONTRIBUTING.md#sign-your-work
The easiest way to do this is to amend the last commit:

$ git clone -b "generic-resource" git@github.com:RenaudWasTaken/cli.git somewhere
$ cd somewhere
$ git rebase -i HEAD~842353949704
editor opens
change each 'pick' to 'edit'
save the file and quit
$ git commit --amend -s --no-edit
$ git rebase --continue # and repeat the amend for each commit
$ git push -f

Amending updates the existing PR. You DO NOT need to open a new one.

@RenaudWasTaken RenaudWasTaken force-pushed the generic-resource branch 3 times, most recently from 8daf7e8 to 740b0ab Compare November 21, 2017 17:28
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple doc fixes

### Create services requesting Generic Resources

You can narrow the kind of nodes your task can land on through the using the
`--generic-resources` flag (if the nodes advertise these resources):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/generic-resources/generic-resource/

also 2 instances in the example below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, also fixed in --generic-resources-add and --generic-resources-rm to --generic-resource-add` and --generic-resource-rm`

Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@RenaudWasTaken
Copy link
Contributor Author

Thanks @dnephin !
Who else do I need approval from / what is the process left to get this PR merged?

Copy link
Collaborator

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐸

Signed-off-by: Renaud Gaubert <renaud.gaubert@gmail.com>
Signed-off-by: Renaud Gaubert <renaud.gaubert@gmail.com>
Signed-off-by: Renaud Gaubert <renaud.gaubert@gmail.com>
Signed-off-by: Renaud Gaubert <renaud.gaubert@gmail.com>
Signed-off-by: Renaud Gaubert <renaud.gaubert@gmail.com>
Signed-off-by: Renaud Gaubert <renaud.gaubert@gmail.com>
Signed-off-by: Renaud Gaubert <renaud.gaubert@gmail.com>
@RenaudWasTaken
Copy link
Contributor Author

Rebased!
@thaJeztah PTAL :)

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@thaJeztah
Copy link
Member

We'll probably need a follow-up for the compose-file docs; https://github.com/docker/docker.github.io/blob/master/compose/compose-file/index.md

Also @RenaudWasTaken this only annotates what's available on the host, but it will still require the nvidia tooling to setup those resources inside the container, correct? What steps would be needed to support this without?

@3XX0
Copy link

3XX0 commented Nov 29, 2017

@thaJeztah Correct, it should work out of the box with nvidia-docker 2.0 set as the default runtime on your swarm.
There is a long standing issue to improve this a little bit, which probably means having something like this merged.

@RenaudWasTaken
Copy link
Contributor Author

Thanks a lot @thaJeztah !
Just opened a PR docker/docs#5416 for that :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants