Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.7] routing: Fix route collisions in AWS ENI #14337

Conversation

christarazi
Copy link
Member

@christarazi christarazi commented Dec 9, 2020

Problem

To recap, the ENI IPAM mode allocates a per-ENI (interface) routing table. Each endpoint in this IPAM mode has two ip rules, "ingress" and "egress", and 2 routes. These 4 items will be created under same table ID. This table ID is the ifindex of the interface. It is possible that the ifindex of the interface is equal to 253, 254, or 255 [1], which are reserved for the main routing table. Therefore, it is possible for us to inject routes (and rules) reserved for the main routing of the system, which can obviously cause connectivity disruption.

For more on this issue, see #14336.

Proposed Solution

The solution is to compute the proper table ID by taking the "interface number" and adding an offset of 10, to avoid the possibility of colliding with the main routing table.

The "interface number" is given to us by the AWS API for the interface (CiliumNode.Status.ENI.ENIs[*].Number). This number is sequential meaning the first ENI has "interface number" of 1, and so on. The max number of ENIs on a given node is 15. So, the max possible table ID is 25. Therefore, we avoid the possibility of colliding with the main routing table. An offset of 10 was chosen because it avoids colliding with other system components which may allocate lower numbered table IDs and because it will be smaller than the main routing table ID range.

In order to rollout this solution, we must migrate the old ENI datapath on upgrade. Meaning, we must fixup all the old rules and routes which use the old table ID scheme, to the new table ID scheme. This includes creating new rules, routes, then deleting the old rules and routes only if all previous steps succeeded. This is a sensitive operation, as any failure could render the node's connectivity useless. If we encounter any errors during the migration process, we will attempt to rollback to the previous table ID scheme, to hopefully restore the node connectivity.

If the user wants to downgrade Cilium, the user must first set --egress-multi-home-ip-rule-compat=true and restart the agent. Cilium will see that the ENI datapath has been migrated and with the flag true, it will revert the migration. After this is complete, the user may downgrade their version of Cilium. The docs have been updated to mention this.

Note, while this does affect Azure IPAM mode as well, v1.7 does not support Azure mode, therefore support for it is missing in this PR. In the forward-ported version of this PR, it will be included.

[1]: See ip-route(8)


This is a direct backport to the v1.7 branch.

See commit msgs.

Related: #14336

@christarazi christarazi added area/cni Impacts the Container Networking Interface between Cilium and the orchestrator. area/eni Impacts ENI based IPAM. sig/ipam IP address management, including cloud IPAM kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. labels Dec 9, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.7 kind/backports This PR provides functionality previously merged into master. labels Dec 9, 2020
@christarazi christarazi force-pushed the pr/christarazi/v1.7-fix-routing-collisions-eni branch from aeeec7a to 33f56ae Compare December 9, 2020 20:22
Copy link
Member

@tgraf tgraf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor nits that I think would make the code more readable below.

The solution itself LGTM.

Two other remaining avenues for followup:

  • I'd guess Azure could be affected by the same issue?
  • Upgrade/downgrade. We had discussed some solutions out-of-band but my understanding is that this current implementation is sufficient assuming that you cordon the nodes prior to upgrade.

On the latter point, I'm not entirely sure how cleanup of previous rules/tables would happen during upgrade. Even if you cordon the node and vacate all workloads, will the old route tables / rules be removed? What about the ones for the cilium-health endpoint?

api/v1/openapi.yaml Show resolved Hide resolved
pkg/datapath/linux/routing/routing.go Outdated Show resolved Hide resolved
pkg/datapath/linux/routing/routing.go Outdated Show resolved Hide resolved
@christarazi
Copy link
Member Author

christarazi commented Dec 9, 2020

I'd guess Azure could be affected by the same issue?

Yes Azure is affected and I plan to address this in a future PR. I didn't want to pollute the v1.7 tree with references to an IPAM mode that we don't support.

Upgrade/downgrade. We had discussed some solutions out-of-band but my understanding is that this current implementation is sufficient assuming that you cordon the nodes prior to upgrade.

Using stale-rules could be a stopgap to cleanup in most cases. The only case where it won't help is if a second ENI created with ifindex between 10-25 (assuming the first ENI has ifindex of 1) for it to cause trouble. This is because the new table ID scheme adds an offset of 10. An example:

  • Two ENI devices on the node
    • First ENI has number 1 and ifindex 1
    • Second ENI has number 2 and ifindex 11
  • New table ID scheme would store the first ENI's rules and routes under table ID 11 (because ENI.Number + offset of 10 = 11)
  • There exists a rule / route table with ID 11 from a previous Cilium because the second ENI has ifindex 11 which is the table ID in the old scheme.
  • Therefore a collision

stale-rules won't delete a rule like this because it would be considered active (meaning valid EP IPs). This will require a manual fixup. For now, we rely on the second ENI to not have an ifindex between the range of 10-25.

Edit: This comment is out-of-date. Please refer to the PR body.

@christarazi christarazi force-pushed the pr/christarazi/v1.7-fix-routing-collisions-eni branch 4 times, most recently from 2c0af08 to dac0da6 Compare December 9, 2020 22:58
@christarazi christarazi force-pushed the pr/christarazi/v1.7-fix-routing-collisions-eni branch 3 times, most recently from 003a0ce to 06086f4 Compare December 10, 2020 21:03
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look over the test as well now but I already have a few preliminary comments on the new migration code below.

daemon/daemon_main.go Outdated Show resolved Hide resolved
pkg/datapath/linux/routing/migrate.go Outdated Show resolved Hide resolved
pkg/datapath/linux/routing/migrate.go Outdated Show resolved Hide resolved
pkg/datapath/linux/routing/migrate.go Outdated Show resolved Hide resolved
@christarazi christarazi force-pushed the pr/christarazi/v1.7-fix-routing-collisions-eni branch from 06086f4 to 1c22be6 Compare December 10, 2020 22:51
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I wondered whether there's some more filtering we could do to reduce the number of rules being dumped out of the kernel (eg by requesting with a more specific filter of matching rules, ie "give me only the rules with priority X"), but I didn't look so closely at the libraries/kernel API to figure out whether that's possible.

daemon/daemon_main.go Outdated Show resolved Hide resolved
pkg/datapath/linux/routing/migrate.go Outdated Show resolved Hide resolved
pkg/datapath/linux/routing/migrate.go Outdated Show resolved Hide resolved
@christarazi christarazi force-pushed the pr/christarazi/v1.7-fix-routing-collisions-eni branch 2 times, most recently from 3d57827 to 1b28823 Compare December 11, 2020 02:16
pkg/datapath/linux/routing/migrate.go Outdated Show resolved Hide resolved
pkg/datapath/linux/routing/migrate_test.go Outdated Show resolved Hide resolved
This is useful to aggregate the items to revert in one stack, so that it
can all be done at once.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi christarazi force-pushed the pr/christarazi/v1.7-fix-routing-collisions-eni branch from ce6161a to f6480ab Compare December 23, 2020 00:17
@christarazi christarazi removed the dont-merge/preview-only Only for preview or testing, don't merge it. label Jan 4, 2021
@christarazi
Copy link
Member Author

test-backport-1.7

@christarazi
Copy link
Member Author

For reviewers:

The latest push includes a major change in the approach to migrate the ENI datapath.

Previous approach treated the migration as all-or-nothing. This meant that any error encountered during the migration would cause us to abort and revert all successful operations leading to the failure.

The new approach is not all-or-nothing, but rather per-rule (or per-endpoint). This means that any rule (or endpoint) can fail the migration, but we will still carry on and attempt to migrate the rest. We can do as such because we are able to guarantee that a failure at any point in the migration process for the rule will not disrupt connectivity (see d100429#diff-4d5ad11dc62beee3c82e4eaee46e08a8edc1232423564923d5bd0e97864714f7R90-R137 for details). Either the "newer" datapath or the "older" datapath will be in-place. In some cases, it is possible to have both datapaths in-place, but connectivity is not disrupted in that case. We take measures to warn the user of such cases if it occurs.

The main reason for the new approach is that it is less risky due to being iterative, rather than all-or-nothing. In addition, if a failure were to occur in the very late stages of the previous approach, the number of operations to revert would be quite large. The more operations we do, the more likely a failure occurs. The new approach performs the minimum required operations, by relying on the atomic nature of the migration.

scopedLog.WithError(err).WithField("revertError", revErr).Warn(downgradeRevertWarning)
}

return nil, fmt.Errorf("failed to delete old rule: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this should actually

Suggested change
return nil, fmt.Errorf("failed to delete old rule: %w", err)
return nil, nil

Since the rule insertion above actually made the downgrade effective. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^^ Something I missed in my original post here, if so we should also remove the RevertStack handling logic a few lines above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. Will fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, would it be useful for the user to be informed of this, so that they can cleanup the leftover state themselves?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we should log like we do for the other "failed cleanup" steps.

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 Just a few minor nits but this is looking good. Main function feedback is just on adjusting the downgrade path so we call it successful after the higher priority rule is put in place, regardless of any subsequent failure in operations.

Documentation/install/upgrade.rst Outdated Show resolved Hide resolved
pkg/datapath/linux/routing/migrate.go Outdated Show resolved Hide resolved
pkg/datapath/linux/routing/migrate.go Outdated Show resolved Hide resolved
@christarazi christarazi force-pushed the pr/christarazi/v1.7-fix-routing-collisions-eni branch from f6480ab to 8f1ec3a Compare January 5, 2021 01:47
@christarazi
Copy link
Member Author

Incremental diff implementing Joe's feedback from above.
diff --git a/Documentation/install/upgrade.rst b/Documentation/install/upgrade.rst
index 274f5c4b8..718bcda01 100644
--- a/Documentation/install/upgrade.rst
+++ b/Documentation/install/upgrade.rst
@@ -329,9 +329,7 @@ node. This version fixes the issue and introduces a new flag to handle
 compatibility. This flag is ``--egress-multi-home-ip-rule-compat``.
 
 The flag defaults to ``false`` meaning it will instruct Cilium to perform a
-migration to the new ENI datapath upon Cilium upgrade. This operation will
-modify the routing table to fix a collision that in rare cases can cause
-connectivity loss to the node.
+migration to the new ENI datapath upon Cilium upgrade.
 
 If the flag is ``true`` upon upgrade, Cilium will **not** perform a migration.
 
diff --git a/pkg/datapath/linux/routing/migrate.go b/pkg/datapath/linux/routing/migrate.go
index add344b5e..47caaa0d1 100644
--- a/pkg/datapath/linux/routing/migrate.go
+++ b/pkg/datapath/linux/routing/migrate.go
@@ -26,8 +26,8 @@ import (
 // MigrateENIDatapath migrates the egress rules inside the Linux routing policy
 // database (RPDB) for ENI IPAM mode. It will return the number of rules that
 // were successfully migrated and the number of rules we've failed to migrated.
-// A -1 is returned for the failed number of rules indicating we couldn't even
-// start the migration.
+// A -1 is returned for the failed number of rules if we couldn't even start
+// the migration.
 //
 // The compat flag will control what Cilium will do in the migration process.
 // If the flag is false, this instructs Cilium to ensure the datapath is newer
@@ -119,12 +119,22 @@ func (m *migrator) MigrateENIDatapath(compat bool) (int, int) {
 	// If a failure occurs at (3), we have already succeeded in getting the new
 	// state in-place to direct traffic for the endpoint. In any case of
 	// upgrade or downgrade, it is possible for both states to be in-place if
-	// there are any failures, including any failures in reverting. The
-	// datapath selected will depend on the rule priority. Concretely, for
-	// upgrades, the newer rule will have a lower priority, so the original
-	// datapath will be selected. For downgrades, the newer rule will have a
-	// higher priority, so the newer datapath will be selected. In either case,
-	// no connectivity is affected for the endpoint.
+	// there are any failures, especially if there were any failures in
+	// reverting. The datapath selected will depend on the rule priority.
+	//
+	// Concretely, for upgrades, the newer rule will have a lower priority, so
+	// the original datapath will be selected. The migration is deemed a
+	// failure because the original datapath (with a rule that has a higher
+	// priority) is being selected for the endpoint. It is necessary to attempt
+	// reverting the failed migration work [(1) & (2)], as leaving the state
+	// could block a user's from retrying this upgrade again.
+	//
+	// For downgrades, the newer rule will have a higher priority, so the newer
+	// datapath will be selected. The migration is deemed a success and we
+	// explicitly avoid reverting, because it's not necessary to revert this
+	// work merely because we failed to cleanup old, ineffectual state.
+	//
+	// In either case, no connectivity is affected for the endpoint.
 	//
 	// If we fail at (4), then the old rule will have been deleted and the new
 	// state is in-place, which would be servicing the endpoint. The old routes
@@ -156,17 +166,6 @@ func (m *migrator) MigrateENIDatapath(compat bool) (int, int) {
 
 	if isUpgrade {
 		for _, r := range v1Rules {
-			// Let's say we have an ENI device attached to the node with
-			// ifindex 3 and interface number 2. The following rule will exist
-			// on the node _before_ migration.
-			//   110:    from 192.168.11.171 to 192.168.0.0/16 lookup 3
-			// After the migration, this rule will become:
-			//   111:    from 192.168.11.171 to 192.168.0.0/16 lookup 12
-			// The priority has been updated to 111 and the table ID is 12
-			// because the interface number is 2 plus the routing table offset
-			// (linux_defaults.RouteTableInterfacesOffset). See copyRoutes()
-			// for what happens with routes.
-
 			if routes, err := m.upgradeRule(r); err != nil {
 				log.WithError(err).WithField("rule", r).Warn("Failed to migrate endpoint to new ENI datapath. " +
 					"Previous datapath is still intact and endpoint connectivity is not affected.")
@@ -184,16 +183,6 @@ func (m *migrator) MigrateENIDatapath(compat bool) (int, int) {
 		}
 	} else if isDowngrade {
 		for _, r := range v2Rules {
-			// Let's say we have an ENI device attached to the node with
-			// ifindex 9 and interface number 3. The following rule will exist
-			// on the node _before_ migration.
-			//   111:    from 192.168.11.171 to 192.168.0.0/16 lookup 13
-			// After the migration, this rule will become:
-			//   110:    from 192.168.11.171 to 192.168.0.0/16 lookup 9
-			// The priority has been reverted back to 110 and the table ID back
-			// to 9 because the ifindex is 9. See copyRoutes() for what happens
-			// with routes.
-
 			if routes, err := m.downgradeRule(r); err != nil {
 				log.WithError(err).WithField("rule", r).Warn("Failed to downgrade endpoint to original ENI datapath. " +
 					"Previous datapath is still intact and endpoint connectivity is not affected.")
@@ -280,6 +269,17 @@ func NewMigrator(getter interfaceDB) *migrator {
 // returns the old routes that the caller should remove at a later time, along
 // with an error.
 func (m *migrator) upgradeRule(rule netlink.Rule) ([]netlink.Route, error) {
+	// Let's say we have an ENI device attached to the node with ifindex 3 and
+	// interface number 2. The following rule will exist on the node _before_
+	// migration.
+	//   110:    from 192.168.11.171 to 192.168.0.0/16 lookup 3
+	// After the migration, this rule will become:
+	//   111:    from 192.168.11.171 to 192.168.0.0/16 lookup 12
+	// The priority has been updated to 111 and the table ID is 12 because the
+	// interface number is 2 plus the routing table offset
+	// (linux_defaults.RouteTableInterfacesOffset). See copyRoutes() for what
+	// happens with routes.
+
 	scopedLog := log.WithField("rule", rule)
 
 	routes, err := m.rpdb.RouteListFiltered(netlink.FAMILY_V4, &netlink.Route{
@@ -335,9 +335,10 @@ func (m *migrator) upgradeRule(rule netlink.Rule) ([]netlink.Route, error) {
 		return nil, fmt.Errorf("failed to create new rule: %w", err)
 	}
 
-	if err := m.deleteOldRule(rule); err != nil {
-		// We revert here because we want to ensure that the new state
-		// that we just created above is reverted.
+	if err := m.rpdb.RuleDel(&rule); err != nil {
+		// We revert here because we want to ensure that the new state that we
+		// just created above is reverted. See long comment describing the
+		// migration in MigrateENIDatapath().
 		if revErr := stack.Revert(); revErr != nil {
 			scopedLog.WithError(err).WithField("revertError", revErr).Warn(upgradeRevertWarning)
 		}
@@ -353,6 +354,15 @@ func (m *migrator) upgradeRule(rule netlink.Rule) ([]netlink.Route, error) {
 // attached ENI device on the node. It returns the "old" routes (new datapath)
 // that the caller should remove at a later time, along with an error.
 func (m *migrator) downgradeRule(rule netlink.Rule) ([]netlink.Route, error) {
+	// Let's say we have an ENI device attached to the node with ifindex 9 and
+	// interface number 3. The following rule will exist on the node _before_
+	// migration.
+	//   111:    from 192.168.11.171 to 192.168.0.0/16 lookup 13
+	// After the migration, this rule will become:
+	//   110:    from 192.168.11.171 to 192.168.0.0/16 lookup 9
+	// The priority has been reverted back to 110 and the table ID back to 9
+	// because the ifindex is 9. See copyRoutes() for what happens with routes.
+
 	scopedLog := log.WithField("rule", rule)
 
 	oldTable := rule.Table
@@ -379,12 +389,13 @@ func (m *migrator) downgradeRule(rule netlink.Rule) ([]netlink.Route, error) {
 		return nil, fmt.Errorf("failed to create new routes: %w", err)
 	}
 
-	revert, err = m.createNewRule(
+	// We don't need the revert stack return value because the next opertion to
+	// delete the rule will not revert the stack. See below comment on why.
+	_, err = m.createNewRule(
 		rule,
 		linux_defaults.RulePriorityEgress,
 		newTable,
 	)
-	stack.Extend(revert)
 	if err != nil {
 		if revErr := stack.Revert(); revErr != nil {
 			scopedLog.WithError(err).WithField("revertError", revErr).Warn(downgradeRevertWarning)
@@ -393,12 +404,11 @@ func (m *migrator) downgradeRule(rule netlink.Rule) ([]netlink.Route, error) {
 		return nil, fmt.Errorf("failed to create new rule: %w", err)
 	}
 
-	if err := m.deleteOldRule(rule); err != nil {
-		if revErr := stack.Revert(); revErr != nil {
-			scopedLog.WithError(err).WithField("revertError", revErr).Warn(downgradeRevertWarning)
-		}
-
-		return nil, fmt.Errorf("failed to delete old rule: %w", err)
+	if err := m.rpdb.RuleDel(&rule); err != nil {
+		// We avoid reverting and returning an error here because the newer
+		// datapath is already in-place. See long comment describing the
+		// migration in MigrateENIDatapath().
+		return nil, nil
 	}
 
 	return routes, nil
@@ -472,8 +482,8 @@ func (m *migrator) retrieveTableIDFromInterfaceNumber(ifaceNum int) (int, error)
 // copyRoutes upserts `routes` under the `from` table ID to `to` table ID. It
 // returns a RevertStack and an error. The RevertStack contains functions that
 // would revert all the successful operations that occurred in this function.
-// The caller of this function should be aware to ensure the stack is reverted
-// upon an error.
+// The caller of this function MUST revert the stack when this function returns
+// an error.
 func (m *migrator) copyRoutes(routes []netlink.Route, from, to int) (revert.RevertStack, error) {
 	var revertStack revert.RevertStack
 
@@ -527,8 +537,8 @@ func (m *migrator) copyRoutes(routes []netlink.Route, from, to int) (revert.Reve
 // createNewRule inserts `rule` with the table ID of `newTable` and a priority
 // of `toPrio`. It returns a RevertStack and an error. The RevertStack contains
 // functions that would revert all the successful operations that occurred in
-// this function. The caller of this function should be aware to ensure the
-// stack is reverted upon an error.
+// this function. The caller of this function MUST revert the stack when this
+// function returns an error.
 func (m *migrator) createNewRule(rule netlink.Rule, toPrio, newTable int) (revert.RevertStack, error) {
 	var revertStack revert.RevertStack
 
@@ -549,14 +559,6 @@ func (m *migrator) createNewRule(rule netlink.Rule, toPrio, newTable int) (rever
 	return revertStack, nil
 }
 
-func (m *migrator) deleteOldRule(rule netlink.Rule) error {
-	if err := m.rpdb.RuleDel(&rule); err != nil {
-		return fmt.Errorf("unable to delete old rule: %w", err)
-	}
-
-	return nil
-}
-
 func (m *migrator) deleteOldRoutes(routes []netlink.Route) error {
 	for _, r := range routes {
 		if err := m.rpdb.RouteDel(&r); err != nil {

@joestringer
Copy link
Member

Two more pieces of feedback:

  • Spelling: opertion -> operation
  • Would be good to log in the downgrade "succeed even though rule deletion failed" case.

@christarazi christarazi force-pushed the pr/christarazi/v1.7-fix-routing-collisions-eni branch from 8f1ec3a to 702ed0d Compare January 5, 2021 19:44
@christarazi
Copy link
Member Author

Done, incremental diff
commit baed31723b277de6f4bddea4dbd1b51b5336f44d
Author: Chris Tarazi <chris@isovalent.com>
Date:   Tue Jan 5 11:42:16 2021 -0800

    fixup! routing: Add ENI route table migration logic
    
    Signed-off-by: Chris Tarazi <chris@isovalent.com>

diff --git a/pkg/datapath/linux/routing/migrate.go b/pkg/datapath/linux/routing/migrate.go
index 47caaa0d1..2740903fd 100644
--- a/pkg/datapath/linux/routing/migrate.go
+++ b/pkg/datapath/linux/routing/migrate.go
@@ -389,8 +389,8 @@ func (m *migrator) downgradeRule(rule netlink.Rule) ([]netlink.Route, error) {
 		return nil, fmt.Errorf("failed to create new routes: %w", err)
 	}
 
-	// We don't need the revert stack return value because the next opertion to
-	// delete the rule will not revert the stack. See below comment on why.
+	// We don't need the revert stack return value because the next operation
+	// to delete the rule will not revert the stack. See below comment on why.
 	_, err = m.createNewRule(
 		rule,
 		linux_defaults.RulePriorityEgress,
@@ -408,6 +408,7 @@ func (m *migrator) downgradeRule(rule netlink.Rule) ([]netlink.Route, error) {
 		// We avoid reverting and returning an error here because the newer
 		// datapath is already in-place. See long comment describing the
 		// migration in MigrateENIDatapath().
+		scopedLog.WithError(err).Warn(downgradeFailedRuleDeleteWarning)
 		return nil, nil
 	}
 
@@ -419,6 +420,8 @@ const (
 		"Endpoint connectivity should not be affected. It is advised to retry the migration."
 	downgradeRevertWarning = "Reverting the new ENI datapath failed. However, both the new and previous datapaths are still intact. " +
 		"Endpoint connectivity should not be affected. It is advised to retry the migration."
+	downgradeFailedRuleDeleteWarning = "Downgrading the datapath has succeeded, but failed to cleanup the original datapath. " +
+		"It is advised to manually remove the old rule."
 )
 
 // retrieveTableIDFromIfIndex computes the correct table ID based on the

@christarazi
Copy link
Member Author

christarazi commented Jan 5, 2021

test-backport-1.7

Edit: Cilium-Ginkgo-Tests failed to provision, everything else passed

@christarazi
Copy link
Member Author

test-backport-1.7

@joestringer
Copy link
Member

@christarazi seems like there's a test breakage of the new tests, maybe related to the most recent changes? https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Validated/20049/testReport/junit/(root)/Suite-runtime/RuntimePrivilegedUnitTests_Run_Tests/

This commit will fixup the ENI datapath depending on the
egress-multi-home-ip-rule-compat flag (see previous commits for
context). The migration logic supports both upgrading and downgrading
the ENI datapath.

This logic must run on startup before the API is served and before the
health endpoint is created, so that no endpoints are prematurely
crreated before Cilium has had the chance to migrate the entire
datapath.

See cilium#14336.

Suggested-by: Joe Stringer <joe@cilium.io>
Suggested-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit fixes a potential route collision in AWS ENI IPAM modes,
where the ifindex could equal the main routing table ID (from 253-255)
[1], causing traffic to be subject to these routes incorrectly. This is
admittedly rare, but we've seen this from a user report. The impact is
that most traffic on the node is suddenly blackholed.

To fix this, we say that each device or interface (ENI) will have their
own dedicated routing table. The table ID will start with an offset of
10 because it is highly unlikely to collide with the main routing table
ID (from 253-255). We grab the number associated with the ENI device
(`Number`) and add the offset. For example, if we have an ENI device
"eni-0" which has a `Number` of 5, then the table ID will be 10 + 5 =
15.

Another important piece to note is that only the egress rule will reside
inside the per-device tables, whereas the ingress rule will stay in the
main routing table. This is because we want the main routing table to
hold the routes to the endpoint.

Moving forward, the ENI datapath will now create rules under a new
egress priority value (RulePriorityEgressv2), as long as the
egress-multi-home-ip-rule-compat flag is false. If it's true, then the
datapath will create rules under the original egress priority value
(RulePriorityEgress). This helps disambiguate when running with the
older or newer ENI datapath.

See cilium#14336.

[1]: See ip-route(8)

Reported-by: Vlad Ungureanu <vladu@palantir.com>
Suggested-by: Joe Stringer <joe@cilium.io>
Suggested-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi christarazi force-pushed the pr/christarazi/v1.7-fix-routing-collisions-eni branch from 702ed0d to 0124757 Compare January 6, 2021 04:36
@christarazi
Copy link
Member Author

christarazi commented Jan 6, 2021

Ah, I see what went wrong @joestringer. I simply forgot to update the unit test which covers the failing downgrade case. This is because we changed the logic to not fail the migration if we aren't able to delete the old rule (item 3 from the long comment, if you recall).

Here's the diff for the fix:

diff --git a/pkg/datapath/linux/routing/migrate_test.go b/pkg/datapath/linux/routing/migrate_test.go
index d1f775144..9b7fc3aee 100644
--- a/pkg/datapath/linux/routing/migrate_test.go
+++ b/pkg/datapath/linux/routing/migrate_test.go
@@ -262,14 +262,13 @@ func (m *MigrateSuite) TestMigrateENIDatapathDowngradeSuccess(c *C) {
 }
 
 func (m *MigrateSuite) TestMigrateENIDatapathDowngradeFailure(c *C) {
-	// This test case will cover one downgrade failure path where we
-	// successfully migrate all the new rules and routes back to the original
-	// scheme, but fail to delete the new rule. This test case will be set up
+	// This test case will cover one downgrade failure path where we failed to
+	// migrate the rule to the old scheme. This test case will be set up
 	// identically to the successful case. "New" meaning the rules and routes
 	// using the new datapath scheme, hence downgrading. After we call
 	// MigrateENIDatapath(), we assert that we failed to migrate 1 rule. We
 	// assert that the revert of the downgrade was successfully as well,
-	// meaning we expect the "new" rules and routes to be reinstated.
+	// meaning we expect the "newer" rules and routes to be reinstated.
 
 	runFuncInNetNS(c, func() {
 		index := 5
@@ -278,16 +277,16 @@ func (m *MigrateSuite) TestMigrateENIDatapathDowngradeFailure(c *C) {
 
 		m.defaultNetlinkMock()
 
-		// Here we inject the error on deleting a rule. The first call we want to
+		// Here we inject the error on adding a rule. The first call we want to
 		// fail, but the second we want to succeed, because that will be the
 		// revert.
-		var onRuleDelCount int
-		m.OnRuleDel = func(r *netlink.Rule) error {
-			if onRuleDelCount == 0 {
-				onRuleDelCount++
+		var onRuleAddCount int
+		m.OnRuleAdd = func(r *netlink.Rule) error {
+			if onRuleAddCount == 0 {
+				onRuleAddCount++
 				return errors.New("fake error")
 			}
-			return netlink.RuleDel(r)
+			return netlink.RuleAdd(r)
 		}
 
 		// Set up the interfaceDB mock. The MAC address returned is coming from the

@christarazi
Copy link
Member Author

test-backport-1.7

Copy link
Contributor

@twpayne twpayne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A few nits mainly around UX, but they're minor so feel free to ignore them.

@@ -54,6 +54,7 @@ cilium-agent [flags]
--disable-k8s-services Disable east-west K8s load balancing by cilium
--dns-max-ips-per-restored-rule int Maximum number of IPs to maintain for each restored DNS rule (default 1000)
--egress-masquerade-interfaces string Limit egress masquerading to interface selector
--egress-multi-home-ip-rule-compat Use a new scheme to store rules and routes under ENI and Azure IPAM modes, if false. Otherwise, it will use the old scheme.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The words "new" and "old" will likely only make sense to people who have read this PR. Is it possible to find a more descriptive description. Maybe something like "Offset routing table IDs under ENI and Azure IPAM modes to avoid collisions with reserved table IDs."

@@ -770,6 +773,10 @@ func init() {
flags.Bool(option.EnableAPIRateLimit, false, "Enables the use of the API rate limiting configuration")
option.BindEnv(option.EnableAPIRateLimit)

flags.Bool(option.EgressMultiHomeIPRuleCompat, false,
"Use a new scheme to store rules and routes under ENI and Azure IPAM modes, if false. Otherwise, it will use the old scheme.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: see nit about flag description in generated Documentation/cmdref/cilium-agent.md above.

// RouteTableInterfacesOffset is the offset for the per-ENI routing tables.
// Each ENI interface will have its own table starting with this offset. It
// is 10 because it is highly unlikely to collide with the main routing
// table which is between 253-255. See ip-route(8).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: table id 0 is also reserved (as well as 253-255).

pkg/datapath/linux/routing/migrate.go Show resolved Hide resolved
scopedLog.WithError(err).WithField("routes", routes).
Warnf("Failed to cleanup after successfully migrating endpoint to %s ENI datapath. "+
"It is recommended that theses routes are cleaned up, as it is possible in the future "+
"to collide with another endpoint with the same IP.", version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would the user actually perform the cleanup? By running ip route del manually to remove routes with priority 110 for example? It's maybe worth giving the user a hint here.

downgradeRevertWarning = "Reverting the new ENI datapath failed. However, both the new and previous datapaths are still intact. " +
"Endpoint connectivity should not be affected. It is advised to retry the migration."
downgradeFailedRuleDeleteWarning = "Downgrading the datapath has succeeded, but failed to cleanup the original datapath. " +
"It is advised to manually remove the old rule."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe note that old rules have priority 110.

pkg/datapath/linux/routing/migrate.go Show resolved Hide resolved
@@ -0,0 +1,575 @@
// Copyright 2016-2020 Authors of Cilium
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Copyright year needs to be updated to 2021 I think. Same with other files touched in this PR.

return filterRulesByPriority(rules, prio), nil
}

func (m *MigrateSuite) defaultNetlinkMock() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but there are a number of projects that generate Go mocks, avoiding having to manually create them. One example is gomock.

@joestringer joestringer merged commit c4933f1 into cilium:v1.7 Jan 6, 2021
@joestringer
Copy link
Member

Good feedback from @twpayne , mostly cosmetic / future-looking improvements that we can address in a follow-up PR.

@christarazi
Copy link
Member Author

@twpayne Thanks for the review. I will address these items in a followup and I'll ping you on that PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cni Impacts the Container Networking Interface between Cilium and the orchestrator. area/eni Impacts ENI based IPAM. kind/backports This PR provides functionality previously merged into master. kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/ipam IP address management, including cloud IPAM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants