Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set static MAC address when creating ecs-bridge bridge #111

Merged
merged 1 commit into from
May 31, 2023

Conversation

sparrc
Copy link
Contributor

@sparrc sparrc commented May 31, 2023

The ecs-bridge bridge currently is created without setting a static MAC address. This means that the bridge MAC address inherits the 'lowest' MAC address of all of it's connected veth interfaces.

What this means for ECS tasks using awsvpc networking mode is that the MAC address of ecs-bridge can be changing as tasks are killed an added to an instance (on the ec2 launch type).

This presents a race condition where some tasks might have an arpcache entry pointing to the old MAC address when a task is killed and the ecs-bridge MAC changes. In rare cases this can lead to brief periods of lost network connectivity within the local network on the host (such as requests to the ECS Task Metadata Endpoint or the ECS credentials endpoint).

By setting a static MAC address at creation we disable this behavior of inheriting the 'lowest' MAC address. This means that the ecs-bridge MAC stays constant through the entire lifetime of the ECS container instance.

Testing

Tested on a container instance using awsvpc network mode. Verified that MAC address is set and stays constant through many task starts/stops.

Ran ECS agent functional testing suite.

Description for the changelog

bugfix: Set static MAC address when creating ecs-bridge bridge

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@ofiliz ofiliz self-assigned this May 31, 2023
plugins/ecs-bridge/engine/engine.go Outdated Show resolved Hide resolved
"bridge create: unable to generate mac addr %s", err)
}

log.Infof("Setting ecs-bridge hardware addr (MAC) %v %v", bridge, mac)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at command.go and engine.go, it seems the logging is done by command.go before calling the implementation of engine in engine.go. We may want to stick with that convention and remove the log statement here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep this log statement. Is there any drawback to logging this? Historically we've generally had issues with these plugins under-logging and not telling us what they are doing, rather than the opposite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's virtually no risk of this spamming the logfile, as this function is generally only called once per instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still log the error if the operation fails, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but issues where there's an explicit failure are always the easiest to debug :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think this log statement provides any additional value, but don't have a strong opinion. Let's go with your approach.

@@ -14,10 +14,13 @@
package engine
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add an integration test which creates a bridge using this engine interface, attaches a dummy interface with a lower MAC address than the bridge MAC address. and confirms that the bridge MAC address does not change. The current mock tests won't work (since we need a real interface). However given the urgency of this fix, I don't want us to delay this PR. I can also add such a test as part of our unified networking library program.

The ecs-bridge bridge currently is created without setting a static MAC
address. This means that the bridge MAC address inherits the 'lowest'
MAC address of all of it's connected veth interfaces.

What this means for ECS tasks using awsvpc networking mode is that the
MAC address of ecs-bridge can be changing as tasks are killed an added
to an instance (on the ec2 launch type).

This presents a race condition where some tasks might have an arpcache
entry pointing to the old MAC address when a task is killed and the
ecs-bridge MAC changes. In rare cases this can lead to brief periods of
lost network connectivity within the local network on the host (such as
requests to the ECS Task Metadata Endpoint or the ECS credentials
endpoint).

By setting a static MAC address at creation we disable this behavior of
inheriting the 'lowest' MAC address. This means that the ecs-bridge MAC
stays constant through the entire lifetime of the ECS container
instance.
Copy link
Contributor

@ofiliz ofiliz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@sparrc sparrc merged commit 52308d3 into master May 31, 2023
sparrc added a commit to sparrc/amazon-ecs-agent that referenced this pull request Jun 2, 2023
This pulls in a fix for an error that can occur when using awsvpc
networking mode where a stale MAC address can cause temporary loss of
network connectivity in a container.

Pulls in the following PRs with the fix and comments:

aws/amazon-ecs-cni-plugins#111
aws/amazon-ecs-cni-plugins#112
sparrc added a commit to sparrc/amazon-ecs-agent that referenced this pull request Jun 2, 2023
This pulls in a fix for an error that can occur when using awsvpc
networking mode where a stale MAC address can cause temporary loss of
network connectivity in a container.

Pulls in the following PRs with the fix and comments:

aws/amazon-ecs-cni-plugins#111
aws/amazon-ecs-cni-plugins#112
sparrc added a commit to aws/amazon-ecs-agent that referenced this pull request Jun 5, 2023
This pulls in a fix for an error that can occur when using awsvpc
networking mode where a stale MAC address can cause temporary loss of
network connectivity in a container.

Pulls in the following PRs with the fix and comments:

aws/amazon-ecs-cni-plugins#111
aws/amazon-ecs-cni-plugins#112
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants