Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service discovery hardening #1796

Merged
merged 2 commits into from
Jun 12, 2017
Merged

Conversation

fcrisciani
Copy link

@fcrisciani fcrisciani commented Jun 8, 2017

This patch addresses several issues found on the Service Discovery feature.

SetMatrix is a simple matrix of sets.
Added tests

This data structure will be used in following commit to handle
transient states where the same key can momentarely be associated
to more than a value

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
@fcrisciani fcrisciani changed the title Service discovery rewrite Service discovery hardening Jun 8, 2017
endpoint.go Outdated
@@ -515,6 +515,11 @@ func (ep *endpoint) sbJoin(sb *sandbox, options ...EndpointOption) (err error) {
return err
}

if err = ep.addServiceInfoToCluster(); err != nil {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move it back to EnableService (introduce back the sb lock)

sandbox.go Outdated
@@ -669,24 +669,14 @@ func (sb *sandbox) SetKey(basePath string) error {

func (sb *sandbox) EnableService() error {
for _, ep := range sb.getConnectedEndpoints() {
if ep.enableService(true) {
if err := ep.addServiceInfoToCluster(); err != nil {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restore Enable service logic (introducing again the sb lock)

@fcrisciani fcrisciani force-pushed the name-resolution-race branch 7 times, most recently from 0a8bf0c to 94372a6 Compare June 9, 2017 17:21
Copy link
Contributor

@mavenugo mavenugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushing the initial set of review comments. I have a few more areas to cover.

// but in any case the deleteServiceInfoToCluster will follow doing the cleanup if needed.
// In case the deleteServiceInfoToCluster arrives first, this one is happening after the endpoint is
// removed from the list, in this situation the delete will bail out not finding any data to cleanup
// and the add will bail out not finding the endpoint on the sandbox.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the excellent summary of the problem.

service.go Outdated
@@ -46,19 +48,51 @@ type service struct {
ingressPorts portConfigs

// Service aliases
aliases []string
serviceAliases []string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its okay to be called aliases since it belongs to the service object.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will restore

network.go Outdated
ipInfo.extResolver = true
if ok, _ := sr.ipMap.Contains(ipStr, ipInfo{name: name, extResolver: true}); !ok {
sr.ipMap.Remove(ipStr, ipInfo{name: name})
sr.ipMap.Insert(ipStr, ipInfo{name: name, extResolver: true})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fcrisciani I dont quite understand this logic. Can you pls explain ?
@sanimej could you PTAL this change ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the original logic what the code does is to fetch the element from the map and set the extResolver to true.
The new logic simply checks if there is already an element with extResolver to true, if there is not then removes the entry that has extResolver to false and inserts one to true.

Looking better the logic is slightly different from the original, in the sense that I insert all the time an entry if it is not there, instead the original logic if the entry is not there does not change it. I will add the check to see if the entry is there with extResolver to false

network.go Outdated
ipInfo, ok := elemSet[0].(ipInfo)
if !ok {
setStr, b := sr.ipMap.String(ip)
logrus.Warnf("Something is wrong with the ipInfo set for key %s set:%t %s", ip, b, setStr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a better warning message maybe ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion? It will be !ok if the element inserted is not of type ipInfo that is strange and major bug

network.go Outdated
// network db notifications)
// In such cases the resolution will be based on the first element of the set, and can vary
// during the system stabilitation
ipInfo, ok := elemSet[0].(ipInfo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the check for extResolver intentionally ignored here ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, will add that also

@@ -285,7 +285,6 @@ func (nDB *NetworkDB) CreateEntry(tname, nid, key string, value []byte) error {
nDB.indexes[byNetwork].Insert(fmt.Sprintf("/%s/%s/%s", nid, tname, key), entry)
nDB.Unlock()

nDB.broadcaster.Write(makeEvent(opCreate, tname, nid, key, value))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pls explain why it is okay to remove the broadcast event ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is triggering a notification of the handleEpTableEvent for local endpoints. This is not necessary considering that local endpoints got already configured by logic before. Also that can be risky because now the timing with remote events can arrive out of sync due to the fact that remote events will take more time than local dispatched ones

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am not against this change. But I like to understand this a bit more.

broadcaster.Write ultimately calls appropriate Watch routines such as handleEpTableEvent or handleNodeTableEvent or other tables managed by other components (such as overlay driver).

CreateEntry in networkdb is a generic API to create an event of any type. But changing this generic function for a specific case of Endpoint management seems risky without considering how it impacts other events.
What if there is another table (not endpoint), with an event created by one component and consumed not just by remote peers, but also by another local component which networkdb as the mechanism for communication ?

Also, if we remove this call, can you point me to another code path for which handleEpTableEvent is called for local endpoints ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the review on all the current usage of CreateEntry and Watch, there is no such case that will break and hence am fine with not having to broadcast this event from networkDB.

@fcrisciani
Copy link
Author

fcrisciani commented Jun 10, 2017 via email

// In such cases the resolution will be based on the first element of the set, and can vary
// during the system stabilitation
elem, ok := elemSet[0].(ipInfo)
if !ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of adding this defensive check ?
ipMap is completely controlled by libnetwork core and what is the purpose of checking for ipInfo type ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the SetMatrix is a generic data structure that store potentially any kind of type. This, as far as I know, is the only way to cast back to the original type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. This is a defensive check and am okay to keep it in and am hoping to never see the Error message ever :)

network.go Outdated
elem, ok := elemSet[0].(ipInfo)
if !ok {
setStr, b := sr.ipMap.String(ip)
logrus.Warnf("BUG expected set of ipInfo type for key %s set:%t %s", ip, b, setStr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning with a BUG seems odd. It certainly will raise an alarm with the users and as mentioned above, when do we expect such a Bug and if this is a concern, shouldnt this be resolved rather than logging it as a BUG ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it has to raise the alarm, It will mean that in that map got inserted a wrong type.

@@ -285,7 +285,6 @@ func (nDB *NetworkDB) CreateEntry(tname, nid, key string, value []byte) error {
nDB.indexes[byNetwork].Insert(fmt.Sprintf("/%s/%s/%s", nid, tname, key), entry)
nDB.Unlock()

nDB.broadcaster.Write(makeEvent(opCreate, tname, nid, key, value))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am not against this change. But I like to understand this a bit more.

broadcaster.Write ultimately calls appropriate Watch routines such as handleEpTableEvent or handleNodeTableEvent or other tables managed by other components (such as overlay driver).

CreateEntry in networkdb is a generic API to create an event of any type. But changing this generic function for a specific case of Endpoint management seems risky without considering how it impacts other events.
What if there is another table (not endpoint), with an event created by one component and consumed not just by remote peers, but also by another local component which networkdb as the mechanism for communication ?

Also, if we remove this call, can you point me to another code path for which handleEpTableEvent is called for local endpoints ?

for _, ep := range sb.getConnectedEndpoints() {
if ep.enableService(false) {
if err := ep.deleteServiceInfoFromCluster(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this, Isnt it better to have deleteServiceInfoFromCluster called from DisableService, but remove it from sbLeave in order to be consistent with the way EnableService and DisableService are handled. Also it brings in consistency between Join and Leave. WDYT ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great as a future addition, but we have to investigate some corner cases. For example I validated that if you do the docker kill <container> no DisableService is called. This of course will leave stale entries behind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

type loadBalancerBackend struct {
ip net.IP
containerName string
taskAliases []string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is weird to see taskAliases and containerName carried in loadBalancerBackend, where this structure is used so far for the VIP based IPVS programming. Are we changing this for other purposes now (such as DNS-RR as well) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see why we have to carry these additional details. This is required especially so that cleanupServiceBindings can call the lower level SD functions that expects to see containerName and taskAliases. But still, this seems out of place.

Copy link
Contributor

@mavenugo mavenugo Jun 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fcrisciani Please consider picking up this change mavenugo@92820b9

This is a minor rework of your changes that enables us to avoid the need to change this structure and also helps scope this PR for fixing the race issues and minimize the code-reorg for a later activity if there is a need.

@fcrisciani fcrisciani force-pushed the name-resolution-race branch 2 times, most recently from bbd63a7 to f2da09f Compare June 10, 2017 22:19
agent.go Outdated
n.addSvcRecords(name, ip, nil, true)
for _, alias := range taskaliases {
n.addSvcRecords(alias, ip, nil, true)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how you have removed this here and consolidated all the addSvcRecords under addEndpointNameResolution.

But removing this will result in unmanaged containers in attachable networks losing SD.

Hence, we have to call addEndpointNameResolution from here and addEndpointNameResolution must be changed to also handle unmanaged container scenario.

agent.go Outdated
n.deleteSvcRecords(name, ip, nil, true)
for _, alias := range taskaliases {
n.deleteSvcRecords(alias, ip, nil, true)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

agent.go Outdated
@@ -602,8 +622,7 @@ func (ep *endpoint) addServiceInfoToCluster() error {
if n.ingress {
ingressPorts = ep.ingressPorts
}

if err := c.addServiceBinding(ep.svcName, ep.svcID, n.ID(), ep.ID(), ep.virtualIP, ingressPorts, ep.svcAliases, ep.Iface().Address().IP); err != nil {
if err := c.addServiceBinding(ep.svcName, ep.svcID, n.ID(), ep.ID(), ep.Name(), ep.virtualIP, ingressPorts, ep.svcAliases, ep.myAliases, ep.Iface().Address().IP, "addServiceInfoToCluster"); err != nil {
return err
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should also call addEndpointNameResolution from here for the case of unmanaged container and its name resolution.

agent.go Outdated
@@ -655,8 +680,7 @@ func (ep *endpoint) deleteServiceInfoFromCluster() error {
if n.ingress {
ingressPorts = ep.ingressPorts
}

if err := c.rmServiceBinding(ep.svcName, ep.svcID, n.ID(), ep.ID(), ep.virtualIP, ingressPorts, ep.svcAliases, ep.Iface().Address().IP); err != nil {
if err := c.rmServiceBinding(ep.svcName, ep.svcID, n.ID(), ep.ID(), ep.Name(), ep.virtualIP, ingressPorts, ep.svcAliases, ep.myAliases, ep.Iface().Address().IP, "deleteServiceInfoFromCluster"); err != nil {
return err
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should also call deleteEndpointNameResolution from here for the case of unmanaged container and its name resolution.

}
}

if addService && len(vip) != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its hard for me to judge if addService variable is introduced here just as an optimization to avoid calling addSvcRecords multiple times for the same svcName <-> vip combination ?
or Is there a real problem if we call addSvcRecords multiple times for the same svcName <-> vip combination ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no real problem is only an optimization and also more symmetric towards the delSvcRecords that has the rmService (the case of the removal the logic is mandatory of course)

c := n.getController()
agent := c.getAgent()

name := ep.Name()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mavenugo I guess to handle properly the case of anonymous container this has to be brought up

if n.ingress {
ingressPorts = ep.ingressPorts
}
name := ep.Name()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mavenugo same here for the delete this name is needed

changed the ipMap to SetMatrix to allow transient states
Compacted the addSvc and deleteSvc into a one single method
Updated the datastructure for backends to allow storing all the information needed
to cleanup properly during the cleanupServiceBindings
Removed the enable/disable Service logic that was racing with sbLeave/sbJoin logic
Add some debug logs to track further race conditions

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Copy link
Contributor

@mavenugo mavenugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fcrisciani Thanks for addressing all the comments and fixing the hard to reproduce race test-cases.

LGTM

@mavenugo mavenugo merged commit 86ae3cf into moby:master Jun 12, 2017
@fcrisciani fcrisciani deleted the name-resolution-race branch June 12, 2017 05:34
mavenugo added a commit to mavenugo/docker that referenced this pull request Jun 12, 2017
Contains Service Discovery hardening fixes via
moby/libnetwork#1796

Fixes multiple issues such as moby#32830

Signed-off-by: Madhu Venugopal <madhu@docker.com>
andrewhsu pushed a commit to docker-archive/docker-ce that referenced this pull request Jun 13, 2017
This is a cherry-pick of moby/moby#33634
that brings in moby/libnetwork#1796.

Signed-off-by: Madhu Venugopal <madhu@docker.com>
andrewhsu pushed a commit to docker-archive/docker-ce that referenced this pull request Jun 24, 2017
Contains Service Discovery hardening fixes via
moby/libnetwork#1796

Fixes multiple issues such as #32830

Signed-off-by: Madhu Venugopal <madhu@docker.com>
Upstream-commit: 6868b8e
Component: engine
silvin-lubecki pushed a commit to silvin-lubecki/docker-ce that referenced this pull request Feb 3, 2020
This is a cherry-pick of moby/moby#33634
that brings in moby/libnetwork#1796.

Signed-off-by: Madhu Venugopal <madhu@docker.com>
silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Feb 3, 2020
This is a cherry-pick of moby/moby#33634
that brings in moby/libnetwork#1796.

Signed-off-by: Madhu Venugopal <madhu@docker.com>
silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Mar 10, 2020
This is a cherry-pick of moby/moby#33634
that brings in moby/libnetwork#1796.

Signed-off-by: Madhu Venugopal <madhu@docker.com>
silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Mar 16, 2020
Contains Service Discovery hardening fixes via
moby/libnetwork#1796

Fixes multiple issues such as #32830

Signed-off-by: Madhu Venugopal <madhu@docker.com>
Upstream-commit: 6868b8e
Component: engine
silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Mar 23, 2020
This is a cherry-pick of moby/moby#33634
that brings in moby/libnetwork#1796.

Signed-off-by: Madhu Venugopal <madhu@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants