Fix route deletion when replacing route in hostgw backend #803

Merged
merged 2 commits into from Sep 6, 2017

Conversation

Projects
None yet
2 participants
Contributor

julia-stripe commented Aug 31, 2017

Description

Bug fix for the hostgw backend.

We noticed (#801) that the Flannel was not replacing routes when new nodes were added to the cluster: we saw these error messages in our production cluster

[network.go:83] Subnet added: 10.32.10.0/24 via 10.68.29.72\n","stream":"stderr","time":"2017-08-29T17:00:21.968055987Z"}
[network.go:106] Replacing existing route to 10.32.10.0/24 via 10.68.26.131 with 10.32.10.0/24 via 10.68.29.72.\n","stream":"stderr","time":"2017-08-29T17:00:21.968211104Z"}
[network.go:108] Error deleting route to 10.32.10.0/24: no such process\n","stream":"stderr","time":"2017-08-29T17:00:21.96826321Z"}

This was happening when an existing route was being replaced with a new route.

This turns out to be for 2 reasons:

  1. The wrong LinkIndex was being set on the netlink messages: the LinkIndex was always set to 0, when it should be set to the index of the external network interface. (extIface.Iface.Index)
  2. Instead of deleting the old route, this code path was trying to delete the new route. Since the new route doesn't exist, the deletion fails.

I tested Flannel with this patch in our cluster and now replacing routes works correctly.

Fixes #801.

Member

tomdee commented Sep 6, 2017

Thanks for the fix @julia-stripe . it looks great to me so merging.

@tomdee tomdee merged commit e1b22bc into coreos:master Sep 6, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@julia-stripe julia-stripe deleted the julia-stripe:set-link-index branch Sep 6, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment