Provide an ability to request libnetwork contact the network/IPAM plugin in a future time #843

Open
deitch opened this Issue Dec 28, 2015 · 9 comments

Comments

Projects
None yet
4 participants
@deitch

deitch commented Dec 28, 2015

The primary driver of this issue is DHCP. The libnetwork remote API (as far as I understood it) has it contacting a network/IPAM remote plugin when a container is created and when it is destroyed. But there are times when an IPAM driver may want to be contacted again between those events.

The only example I have right now is DHCP lease renewal, although I can imagine there would be others.

There are several possibilities:

  1. Provide a remote API to libnetwork. When an IPAM driver gives an address, it will be able to contact docker engine / libnetwork in the future and tell it to renew/expire/etc. an address. This seems very messy to me, and breaks the cleanliness of a plugin responding to requests. It also creates lots of security questions, not to mention tracking how it was contacted, etc.
  2. Have libnetwork poll a plugin to check if anything has changed or needs to be changed. This seems burdensome. In any case, in most cases, it simply will be unnecessary.
  3. Enable libnetwork to be told, "check back with me in X seconds."

The third option seems cleanest. It remains a libnetwork->plugin API, plugins do not need to keep track of where and how to contact a libnetwork to whom it gave an address, and communication occurs only when necessary.

The flow might look like this:

  1. Container starts (unchanged)
  2. libnetwork contacts plugin requesting IP (unchanged)
  3. plugin returns IP information, along with a "check address validity in X seconds" (NEW). Note: if the "check": 3600 field does not exist, then libnetwork works exactly as today: assign the IP and let it go.
  4. In X seconds (3600 in the example above), libnetwork contacts IPAM again and asks to revalidate the assigned address.
  5. The plugin returns one of the following statuses:
    • valid: libnetwork does nothing, since all is fine. May also include "check again in X seconds" field.
    • invalid: libnetwork removes the address
    • address: a new address, of exactly the same structural format format as if it had contacted libnetwork on creation and should be assigned.

This is my first attempt. Thoughts?

@jainvipin

This comment has been minimized.

Show comment
Hide comment
@jainvipin

jainvipin Dec 29, 2015

@deitch
+1 for bringing up the DHCP use case

Approach-3 doesn't work IMO because DHCP server can request the clients to refresh their leases anytime (e.g. https://tools.ietf.org/html/rfc3203).

IMHO, polling is a bit cumbersome (too much noise for scalable setups) and inaccurate (poll interval can't be so low).

@deitch
+1 for bringing up the DHCP use case

Approach-3 doesn't work IMO because DHCP server can request the clients to refresh their leases anytime (e.g. https://tools.ietf.org/html/rfc3203).

IMHO, polling is a bit cumbersome (too much noise for scalable setups) and inaccurate (poll interval can't be so low).

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Dec 29, 2015

@jainvipin good point! I wasn't thinking of DHCP server-driven changes. That might kill approach 3 (such a pity, I liked it :-) ), and approach 2 (polling) is cumbersome and non-scalable.

deitch commented Dec 29, 2015

@jainvipin good point! I wasn't thinking of DHCP server-driven changes. That might kill approach 3 (such a pity, I liked it :-) ), and approach 2 (polling) is cumbersome and non-scalable.

@jainvipin

This comment has been minimized.

Show comment
Hide comment
@jainvipin

jainvipin Dec 30, 2015

@deitch :-)
The bigger problem with polling is that it may be inaccurate i.e. container may use and keep an IP address between release renewal failing and next poll; assuming dhcp server doesn't allocate the IP to some other container meanwhile.

Async notification on other hand is less noisy, accurate, and puts the burden of dhcp subtelities to ipam driver (which is where it belongs). However async notification defies the convention...

@deitch :-)
The bigger problem with polling is that it may be inaccurate i.e. container may use and keep an IP address between release renewal failing and next poll; assuming dhcp server doesn't allocate the IP to some other container meanwhile.

Async notification on other hand is less noisy, accurate, and puts the burden of dhcp subtelities to ipam driver (which is where it belongs). However async notification defies the convention...

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Dec 30, 2015

Given that, as you said, the DHCP (or any other address management) server can be async - server tells client "renew now even though it is not yet time," then async may be the best way. But it requires the docker engine in which the libnetwork runs to have a new external API, along with the auth/auth questions involved, and greatly complicates the IPAM driver.

For a local driver, it is not quite as big a deal; but for a remote driver defined by a .spec or .json file, where the entire interaction is over http, it would need to keep track of how to update it. The simplest solution appears to be for the engine to add a "my API" field, with its full URL (more changes to the spec).

But even that isn't so clear-cut. A docker engine might be open only via UNIX socket /var/run/docker.sock but communicating outwards to a Web-based remote plugin. How would it communicate back?

deitch commented Dec 30, 2015

Given that, as you said, the DHCP (or any other address management) server can be async - server tells client "renew now even though it is not yet time," then async may be the best way. But it requires the docker engine in which the libnetwork runs to have a new external API, along with the auth/auth questions involved, and greatly complicates the IPAM driver.

For a local driver, it is not quite as big a deal; but for a remote driver defined by a .spec or .json file, where the entire interaction is over http, it would need to keep track of how to update it. The simplest solution appears to be for the engine to add a "my API" field, with its full URL (more changes to the spec).

But even that isn't so clear-cut. A docker engine might be open only via UNIX socket /var/run/docker.sock but communicating outwards to a Web-based remote plugin. How would it communicate back?

@jainvipin

This comment has been minimized.

Show comment
Hide comment
@jainvipin

jainvipin Dec 30, 2015

Something along the lines you mentioned would work for me, but there are bigger implications of these approaches and must be weighed against the benefits by libnetwork maintainers:

  • Use the standard channel (/var/run/docker.sock) with API that does equivalent of docker network release-ip <pool-id> <ip_address>; this mixes up the Ux with Driver APIs.
  • During registration, a driver (or spec file) can specify the URL for libnetwork to listen on; this requires that libnetwork/docker listens to another socket for libnetwork driver callbacks.

I am thinking it would also be a good idea to enumerate if there are other cases where remote driver would need to communicate back to libnetwork async. If yes, async method may be a good long term thing.

Something along the lines you mentioned would work for me, but there are bigger implications of these approaches and must be weighed against the benefits by libnetwork maintainers:

  • Use the standard channel (/var/run/docker.sock) with API that does equivalent of docker network release-ip <pool-id> <ip_address>; this mixes up the Ux with Driver APIs.
  • During registration, a driver (or spec file) can specify the URL for libnetwork to listen on; this requires that libnetwork/docker listens to another socket for libnetwork driver callbacks.

I am thinking it would also be a good idea to enumerate if there are other cases where remote driver would need to communicate back to libnetwork async. If yes, async method may be a good long term thing.

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Dec 30, 2015

I think I agree. Over the long term, the idea that for some activities - start container, stop container, inspect container, list images, remove images, etc. - can all be done async via a remote API, but change/release a container's IP cannot is an artificial separation. The engine already recognizes that you need to be able to do async activities to containers, why not network/IPAM activities too?

I think it is time for the libnetwork maintainers to jump in?

deitch commented Dec 30, 2015

I think I agree. Over the long term, the idea that for some activities - start container, stop container, inspect container, list images, remove images, etc. - can all be done async via a remote API, but change/release a container's IP cannot is an artificial separation. The engine already recognizes that you need to be able to do async activities to containers, why not network/IPAM activities too?

I think it is time for the libnetwork maintainers to jump in?

@GordonTheTurtle

This comment has been minimized.

Show comment
Hide comment
@GordonTheTurtle

GordonTheTurtle Aug 30, 2017

@deitch It has been detected that this issue has not received any activity in over 6 months. Can you please let us know if it is still relevant:

  • For a bug: do you still experience the issue with the latest version?
  • For a feature request: was your request appropriately answered in a later version?

Thank you!
This issue will be automatically closed in 1 week unless it is commented on.
For more information please refer to #1926

@deitch It has been detected that this issue has not received any activity in over 6 months. Can you please let us know if it is still relevant:

  • For a bug: do you still experience the issue with the latest version?
  • For a feature request: was your request appropriately answered in a later version?

Thank you!
This issue will be automatically closed in 1 week unless it is commented on.
For more information please refer to #1926

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Aug 30, 2017

Cannot believe this issue is a year and 8 months old. Did this ever get addressed?

deitch commented Aug 30, 2017

Cannot believe this issue is a year and 8 months old. Did this ever get addressed?

@Routhinator

This comment has been minimized.

Show comment
Hide comment
@Routhinator

Routhinator Oct 16, 2017

Is there any work being done on this? This is pretty critical for some services and something that most other container systems support.

Is there any work being done on this? This is pretty critical for some services and something that most other container systems support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment