Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a proposal for "ENIs for Tasks" #701

Merged
merged 1 commit into from
Aug 25, 2017
Merged

Conversation

aaithal
Copy link
Contributor

@aaithal aaithal commented Feb 8, 2017

Summary

Adds a proposal document for "ENIs for Tasks", for addressing #702.

Implementation details

N/A

Testing

N/A

Description for the changelog

None

Licensing

This contribution is under the terms of the Apache 2.0 License: Yes

@CpuID
Copy link

CpuID commented Feb 8, 2017

Very interesting :)

@aaithal aaithal changed the title Adding a proposal for ENIs for Tasks Adding a proposal for "ENIs for Tasks" Feb 8, 2017
Copy link
Contributor

@samuelkarp samuelkarp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good so far, just nits and questions.

Also: Can we call out Windows as out of scope for this proposal? Windows networking is sufficiently different that much of the technical approach in this proposal does not apply, so it'd be better for us to treat it separately.

comments in [this github issue](https://github.com/aws/amazon-ecs-agent/issues/185),
the `bridge` networking mode results in a number of issues for ECS users:

1. All of the containers launched with this mode share the `docker0` bridge,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worthwhile adding an appendix with some data about the performance penalty of using the docker0 bridge.

proposals/eni.md Outdated
permissions and limitations under the License.
-->
### Introduction
ECS relies on the networking capability provided by docker to set up the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Docker is a proper noun, except when referring to the docker CLI or package (literal name).

proposals/eni.md Outdated
using the IP address allocated to them by docker

Providing EC2-esque network capabilities for containers, where they get their
own Network Interface that's routable within the VPC can alleviate many of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grammar nit: There should be a comma after "routable within the VPC".

proposals/eni.md Outdated
|default(global) namespace | |
| | |
| +-------+-------+ +-------------+ | +-----------+
| | ve_br_en1 | | | | |net="bridge|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ASCII-art is top-notch, but you're missing a closing " after "bridge".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't all be @petderek :)

proposals/eni.md Outdated
Tasks (containers in tasks) launched into these namespaces will be addressable
by the primary IP Address of the respective secondary network interfaces. For
example, all containers of the task using ENI `ENI-1` will be addressable within
the VPC using the IP Address `172.31.13.14/20`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  • I assume that /20 is only indicating the size of the subnet, but that the actual IP address is 172.31.13.14/32 in this example. Is that correct? Might be worth calling out the subnet separately to avoid confusion that the ENI might be assigned a whole /20 itself.
  • Do we need to do anything special to enable IPv6 here?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. Some clarification about how we want addressing to work here- can each container have a specific /32 in that block? Can you control the size of the block allocated? Can I specify the IPs for each container in the task definition?

Copy link
Contributor Author

@aaithal aaithal Feb 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /20 here indicates the size of the subnet in which the ENI has been created. The IP Address is whatever's allocated by the VPC when the ENI was created. On, the host, it's still scoped to the /20 address range so that traffic from this ENI can reach other network interfaces on the VPC using the gateway.

Do we need to do anything special to enable IPv6 here?

If the ENI has been configured with an IPv6 address, we probably should set that address on the interface as well. I haven't played with that yet, but it would be along the same lines as setting the IPv4 address.

@bchav Each container in a task doesn't get its own IP Address. All containers in the task share the network interface and hence are addressable by the IP Address of the ENI.

Having said that, I can see how the /20 there is being talked about without much context. I'll modify that.

proposals/eni.md Outdated
1. Plugin to assign ENI to a network namespace:
1. Get MAC Address for the ENI from EC2 Instance Metaddata Service
1. Get ENI device name on default namespace
1. Get n/w gateway mask
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we spell out "network"?

proposals/eni.md Outdated
1. Get n/w gateway mask
1. Get Primary IP Address of the ENI
1. Move the ENI to container's namespace
1. Assign the primary IP Address to interface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why we need to assign the primary IP address to the interface; is it not already assigned given that we're able to retrieve it from IMDS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the ENI is moved to the pause container's network namespace, it needs to the brought up and doesn't come up with any IP address assigned to it since there's no dhclient running. Hence, we assign the IP address to it. I'm not sure if we can avoid this by just bringing it up and running dhclient in pause container's namespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we run dhclient in the foreground of the pause container then? Or through an init system like tini or dumb-init?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dhclient by nature runs in the background. I haven't seen any implementation where it's monitored by an init system. Unless @nmeyerhans disagrees, I don't think we need to monitor it via an init script. The current plan is for the plugin executable to invoke it in the container's namespace and exit (when dhclient daemonizes itself and returns).

1. Assign the primary IP Address to interface
1. Setup route to internet via the gateway
1. Delete entries from the routing table if needed for the ENI device
1. Start `dhclient` to renew leases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to renew leases of the ENI's IP address? Does dhclient need to run inside the pause container, inside the agent container, or elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it'd be running in the pause container's network namespace. It'd be started by plugin as stated here.

proposals/eni.md Outdated
1. Start `dhclient` to renew leases

1. Plugin to establish route to ECS Agent for Credentials
1. ECS Agent determines an available local IP Address from `169.254.76.0/24`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we plan to determine whether an IP address in that range is available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IPAM plugin would use a datastore to do this. There are precedents for doing this sort of thing in both CNI and Docker codebases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so are we just planning to take the whole /24 for use as IP addresses on the ecs-eni bridge?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. It might not be as narrow as /24 though. We might decide to go with /22 or /23 at least. The IP range has not yet been decided. I'll modify this to be a bit more generic to reflect this.

1. Delete entries from the routing table if needed for the ENI device
1. Start `dhclient` to renew leases

1. Plugin to establish route to ECS Agent for Credentials
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the ECS agent invoke the plugin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using the methods exposed by the libcontainernetworking package. Probably somewhere in task_manager.go (I did not want to post more granular/implementation details in the proposal itself, which is why it's missing here).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link to that package? I suppose I wasn't clear in my question, what I really want to know is:

  • Does the plugin binary exist inside the ECS agent container and we just exec it?
  • Is the plugin binary inside a separate container that we have to start?
  • Do we have specific plans yet on how we're going to distribute/install the plugin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plugin binaries would be distributed via an RPM package. These would be mounted via host volumes mechanism into ECS Agent.

enable the same.

### Overview of the solution
When an ENI is attached to the container instance for the purpose of being used
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting, that there are limits for a max number of ENIs per host which set a bound for a max number of Tasks can be run on a single host. The numbers are relatively small which might lead to low utilization of a host.

How are you going to address the limitations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brb Thank you for bringing that up. We are aware of these limits and are working towards addressing these internally.

example, all containers of the task using ENI `ENI-1` will be addressable within
the VPC using the IP Address `172.31.13.14`.

The `ecs-eni-br` bridge is used by the containers to communicate with the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered any alternative approach? I believe that Task containers could communicate with ECS Agent via unix domain socket. Such approach would be way less complex (no bridge, no IP allocations).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We considered it initially when building the Task IAM Roles feature. But, we decided against it as it has implications on AWS SDK as well. An IP address based mechanism lets us and the SDK be platform agnostic (irrespective of linux/windows), where as a unix domain socket for doing this would mean significant cruft to deal with this in AWS SDK.

that they use the `pause` container's network namespace.

#### CNI Plugin Sequence
1. Plugin to assign ENI to a network namespace:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, the CNI spec does not define so fine grained entry points (there is only "Add container to a network"), therefore both proposed plugins might need to be merged into a single one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are planning to chain the execution from the ECS Agent, which would the responsible party for invoking these plugins. Although, this could change depending on the implementation details

containers of the task with `--net=container:pause-container-id`, thus ensuring
that they use the `pause` container's network namespace.

#### CNI Plugin Sequence
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a motivation for providing the functionality below in the form of a CNI plugin? To let ECS be powered by / to run any CNI-compatible orchestrator (e.g. Kubernetes)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main motivation is to avoid the following challenges of Docker's remote libnetwork plugins:

  1. Life-cycle maintenance of the plugin process itself (No need for an init script to monitor the plugin)
  2. Maintaining versions of the plugin (Maintaining compatibility between agent version and plugins)
  3. Vending updates to plugin
  4. Simplicity

@aaithal aaithal changed the base branch from dev to enis July 25, 2017 21:27
@aaithal aaithal merged commit bb9727e into aws:enis Aug 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants