This document describes the network/cni
isolator, a network isolator
for the MesosContainerizer that implements
the Container Network Interface
(CNI) specification. The
network/cni
isolator allows containers launched using the
MesosContainerizer
to be attached to several different types of IP
networks. The network technologies on which containers can possibly
be launched range from traditional layer 3/layer 2 networks such as
VLAN, ipvlan, macvlan, to the new class of networks designed for
container orchestration such as
Calico,
Weave and
Flannel. The
MesosContainerizer
has the network/cni
isolator enabled by
default.
Having a separate network namespace for each container is attractive for orchestration engines such as Mesos, since it provides containers with network isolation and allows users to operate on containers as if they were operating on an end-host. Without network isolation users have to deal with managing network resources such as TCP/UDP ports on an end host, complicating the design of their application.
The challenge is in implementing the ability in the orchestration
engine to communicate with the underlying network in order to
configure IP connectivity to the container. This problem arises due
to the diversity in terms of the choices of IPAM (IP address
management system) and networking technologies available for enabling
IP connectivity. To solve this problem we would need to adopt a driver
based network orchestration model, where the MesosContainerizer
can
offload the business intelligence of configuring IP connectivity to a
container, to network specific drivers.
The Container Network Interface (CNI) is a specification proposed by CoreOS that provides such a driver based model. The specification defines a JSON schema that defines the inputs and outputs expected of a CNI plugin (network driver). The specification also provides a clear separation of concerns for the container run time and the CNI plugin. As per the specification the container run time is expected to configure the namespace for the container, a unique identifier for the container (container ID), and a JSON formatted input to the plugin that defines the configuration parameters for a given network. The responsibility of the plugin is to create a veth pair and attach one of the veth pairs to the network namespace of the container, and the other end to a network understood by the plugin. The CNI specification also allows for multiple networks to exist simultaneously, with each network represented by a canonical name, and associated with a unique CNI configuration. There are already CNI plugins for a variety of networks such as bridge, ipvlan, macvlan, Calico, Weave and Flannel.
Thus, introducing support for CNI in Mesos through the network/cni
isolator provides Mesos with tremendous flexibility to orchestrate
containers on a wide variety of network technologies.
The network/cni
isolator is enabled by default. However, to use the
isolator there are certain actions required by the operator and the
frameworks. In this section we specify the steps required by the
operator to configure CNI networks on Mesos and the steps required by
frameworks to attach containers to a CNI network.
In order to configure the network/cni
isolator the operator
specifies two flags at Agent startup as follows:
sudo mesos-slave --master=<master IP> --ip=<Agent IP>
--work_dir=/var/lib/mesos
--network_cni_config_dir=<location of CNI configs>
--network_cni_plugins_dir=<search path for CNI plugins>
Note that the network/cni
isolator learns all the available networks
by looking at the CNI configuration in the --network_cni_config_dir
at startup. This implies that if a new CNI network needs to be added
after Agent startup, the Agent needs to be restarted. The
network/cni
isolator has been designed with recover
capabilities
and hence restarting the Agent (and therefore the network/cni
isolator) will not affect container orchestration.
Optionally, the operator could specify the
--network_cni_root_dir_persist
flag. This flag would allow
network/cni
isolator to persist the network related information
across reboot and allow network/cni
isolator to carry out network
cleanup post reboot. This is useful for the CNI networks that depend
on the isolator to clean their network state.
The network/cni
isolator learns about all the CNI networks by
reading the CNI configuration specified in --network_cni_config_dir
. Hence, if the operator wants to add a CNI network, the corresponding
configuration needs to be added to --network_cni_config_dir
.
While the network/cni
isolator learns the CNI networks by reading
the CNI configuration files in --network_cni_config_dir
, it does not
keep an in-memory copy of the CNI configurations. The network/cni
isolator only stores a mapping of the CNI network names to the
corresponding CNI configuration files. Whenever the network/cni
isolator needs to attach a container to a CNI network it reads the
corresponding configuration from the disk and invokes the appropriate
plugin with the specified JSON configuration. Though the network/cni
isolator does not keep an in-memory copy of the JSON configuration, it
checkpoints the CNI configuration used to launch a container.
Checkpointing the CNI configuration protects the resources, associated
with the container, by freeing them correctly when the container is
destroyed, even if the CNI configuration is deleted.
The fact that the network/cni
isolator always reads the CNI
configurations from the disk allows the operator to dynamically add,
modify and delete CNI configurations without the need to restart the
agent. Whenever the operator adds a new CNI configuration, or modifies
an existing CNI configuration, the agent will pick up this new CNI
configuration when the next container is launched on that specific CNI
network. Similarly when the operator deletes a CNI network the
network/cni
isolator will "unlearn" the CNI network (since it will
have a reference to this CNI network when it started) in case a
framework tries to launch a container on the deleted CNI network.
Frameworks can specify the CNI network to which they want their
containers to be attached by setting the name
field in the
NetworkInfo
protobuf. The name
field was introduced in the
NetworkInfo
protobuf as part of
MESOS-4758. Also,
by specifying multiple instances of the NetworkInfo
protobuf with
different name
in each of the protobuf, the MesosContainerizer
will attach the container to all the different CNI networks specified.
The default behavior for containers is to join the host network
, i.e., if the framework does not specify a name
in the
NetworkInfo
protobuf, the network/cni
isolator will be a no-op for
that container and will not associate a new network namespace with the
container. This would effectively make the container use the host
network namespace, attaching
it to the host network.
**NOTE**: While specifying multiple `NetworkInfo` protobuf allows a
container to be attached to different CNI networks, if one of the
`NetworkInfo` protobuf is without the `name` field the `network/cni`
isolator simply "skips" the protobuf, attaching the container to all
the specified CNI networks except the `host network`. To attach a
container to the host network as well as other CNI networks you
will need to attach the container to a CNI network (such as
bridge/macvlan) that, in turn, is attached to the host network.
When invoking CNI plugins (e.g., with command ADD), the isolator will
pass on some Mesos meta-data to the plugins by specifying the args
field in the network configuration
JSON
according to the CNI spec. Currently, the isolator only passes on
NetworkInfo
of the corresponding network to the plugin. This is
simply the JSON representation of the NetworkInfo
protobuf. For
instance:
{
"name" : "mynet",
"type" : "bridge",
"args" : {
"org.apache.mesos" : {
"network_info" : {
"name" : "mynet",
"labels" : {
"labels" : [
{ "key" : "app", "value" : "myapp" },
{ "key" : "env", "value" : "prod" }
]
},
"port_mappings" : [
{ "host_port" : 8080, "container_port" : 80 },
{ "host_port" : 8081, "container_port" : 443 }
]
}
}
}
}
It is important to note that labels
or port_mappings
within the
NetworkInfo
is set by frameworks launching the container, and the
isolator passses on this information to the CNI plugins. As per the
spec, it is the prerogative of the CNI plugins to use this meta-data
information as they see fit while attaching/detaching containers to a CNI
network. E.g., CNI plugins could use labels
to enforce domain
specific policies, or port_mappings
to implement NAT rules.
The network/cni
isolator allocates a network namespace to a
container when it needs to attach the container to a CNI network. The
network namespace is checkpointed on the host file system and can be
useful to debug network connectivity to the network namespace. For a
given container the network/cni
isolator checkpoints its network
namespace at:
/var/run/mesos/isolators/network/cni/<container ID>/ns
The network namespace can be used with the ip
command from the
iproute2
package by creating a symbolic link to the network namespace. Assuming
the container ID is 5baff64c-d028-47ba-864e-a5ee679fc069
you can
create the symlink as follows:
ln -s /var/run/mesos/isolators/network/cni/5baff64c-d028-47ba-8ff64c64e-a5ee679fc069/ns /var/run/netns/5baff64c
Now we can use the network namespace identifier 5baff64c
to run
commands in the new network name space using the
iproute2 package.
E.g. you can view all the links in the container network namespace by
running the command:
ip netns exec 5baff64c ip link
Similarly you can view the container's route table by running:
ip netns exec 5baff64c ip route show
NOTE: Once
MESOS-5278 is
completed, executing commands within the container network namespace
would be simplified and we will no longer have a dependency on the
iproute2
package to debug Mesos container networking.
This section presents examples for launching containers on different
CNI networks. For each of the examples the assumption is that the CNI
configurations are present at /var/lib/mesos/cni/config
, and the
plugins are present at /var/lib/mesos/cni/plugins
. The Agents
therefore need to be started with the following command:
sudo mesos-slave --master=<master IP> --ip=<Agent IP>
--work_dir=/var/lib/mesos
--network_cni_config_dir=/var/lib/mesos/cni/config
--network_cni_plugins_dir=/var/lib/mesos/cni/plugins
--isolation=filesystem/linux,docker/runtime
--image_providers=docker
Apart from the CNI configuration parameters, we are also starting the
Agent with the ability to launch docker images on
MesosContainerizer
. We enable this ability in the
MesosContainerizer
by enabling the filesystem/linux
and
docker/runtime
isolator and setting the image provider to
docker
.
To present an example of a framework launching containers on a
specific CNI network, the mesos-execute
CLI framework has been
modified to take a --networks
flag which will allow this example
framework to launch containers on the specified network. You can find
the mesos-execute
framework in your Mesos installation directory at
<mesos installation>/bin/mesos-execute
.
The
bridge
plugin attaches containers to a Linux bridge. Linux bridges could be
configured to attach to VLANs and VxLAN allowing containers to be
plugged into existing layer 2 networks. We present an example below,
where the CNI configuration instructs the MesosContainerizer
to
invoke a bridge plugin to connect a container to a Linux bridge. The
configuration also instructs the bridge plugin to assign an IP address
to the container by invoking a
host-local
IPAM.
First, build the CNI plugin according to the instructions in the CNI repository then copy the bridge binary to the plugins directory on each agent.
Next, create the configuration file and copy this to the CNI configuration directory on each agent.
{
"name": "cni-test",
"type": "bridge",
"bridge": "mesos-cni0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "192.168.0.0/16",
"routes": [
{ "dst":
"0.0.0.0/0" }
]
}
}
The CNI configuration tells the bridge plugin to attach the
container to a bridge called mesos-cni0
. If the bridge does not
exist the bridge plugin will create one.
It is important to note the routes
section in the ipam
dictionary.
For Mesos, the executors
launched as containers need to register
with the Agent in order for a task to be successfully launched.
Hence, it is imperative that the Agent IP is reachable from the
container IP and vice versa. In this specific instance we specified a
default route for the container, allowing containers to reach any
network that will be routeable by the gateway, which for this CNI
configuration is the bridge itself.
Another interesting attribute in the CNI configuration is the ipMasq
option. Setting this to true will install an iptable
rule in the
host network namespace that would SNAT all traffic originating from
the container and egressing the Agent. This allows containers to talk
to the outside world even when they are in an address space that is
not routeable from outside the agent.
Below we give an example of launching a Ubuntu
container and
attaching it to the mesos-cni0
bridge. You can launch the Ubuntu
container using the mesos-execute
framework as follows:
sudo mesos-execute --command=/bin/bash
--docker_image=ubuntu:latest --master=<master IP>:5050 --name=ubuntu
--networks=cni-test --no-shell
The above command would pull the Ubuntu
image from the docker hub
and launch it using the MesosContainerizer
and attach it to the
mesos-cni0
bridge.
You can verify the network settings of the Ubuntu
container by
creating a symlink to the network namespace and running the ip
command as describe in the section "Accessing container network
namespace".
Assuming we created a reference for the network namespace in
/var/run/netns/5baff64c
. The output of the IP address and route table
in the container network namespace would be as follows:
$ sudo ip netns exec 5baff64c ip addr show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 8a:2c:f9:41:0a:54 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.2/16 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::882c:f9ff:fe41:a54/64 scope link
valid_lft forever preferred_lft forever
$ sudo ip netns exec 5baff64c ip route show
default via 192.168.0.1 dev eth0
192.168.0.0/16 dev eth0 proto kernel scope link src 192.168.0.2
For private, isolated, networks such as a bridge network where the IP address of a container is not routeable from outside the host it becomes imperative to provide containers with DNAT capabilities so that services running on the container can be exposed outside the host on which the container is running.
Unfortunately, there is no CNI plugin available in the
containernetworking/cni
repository that provides port-mapping functionality.
Hence, we have developed a port-mapper CNI plugin that resides
within the Mesos code base called the mesos-cni-port-mapper
. The
mesos-cni-port-mapper
is designed to work with any other CNI plugin
that requires DNAT capabilities. One of the most obvious being the
bridge
CNI plugin.
We explain the operational semantics of the mesos-cni-port-mapper
plugin by taking an example CNI configuration that allows the
mesos-cni-port-mapper
to provide DNAT functionality to the bridge
plugin.
{
"name" : "port-mapper-test",
"type" : "mesos-cni-port-mapper",
"excludeDevices" : ["mesos-cni0"],
"chain": "MESOS-TEST-PORT-MAPPER",
"delegate": {
"type": "bridge",
"bridge": "mesos-cni0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "192.168.0.0/16",
"routes": [
{ "dst":
"0.0.0.0/0" }
]
}
}
}
For the CNI configuration above, apart from the parameters that the
mesos-cni-port-mapper
plugin accepts, the important point to note in
the CNI configuration of the plugin is the "delegate" field. The
"delegate" field allows the mesos-cni-port-mapper
to wrap the CNI
configuration of any other CNI plugin, and allows the plugin to
provide DNAT capabilities to any CNI network. In this specific case
the mesos-cni-port-mapper
is providing DNAT capabilities to
containers running on the bridge network mesos-cni0
. The parameters
that the mesos-cni-port-mapper
accepts are listed below:
- name : Name of the CNI network.
- type : Name of the port-mapper CNI plugin.
- chain : The chain in which the iptables DNAT rule will be added in the NAT table. This allows the operator to group DNAT rules for a given CNI network under its own chain, allowing for better management of the iptables rules.
- excludeDevices: These are a list of ingress devices on which the DNAT rule should not be applied.
- delegate : This is a JSON dict that holds the CNI JSON configuration of a CNI plugin that the port-mapper plugin is expected to invoke.
The mesos-cni-port-mapper
relies heavily on iptables
to provide
the DNAT capabilities to a CNI network. In order for the port-mapper
plugin to function properly we have certain minimum
requirements for iptables as listed below:
- iptables 1.4.20 or higher: This because we need to use the -w option of iptables in order to allow atomic writes to iptables.
- Require the xt_comments module of iptables: We use the comments module to tag iptables rules belonging to a container. These tags are used as a key while deleting iptables rules when the specific container is deleted.
Finally, while the CNI configuration of the port-mapper plugin tells
the plugin as to how and where to install the iptables
rules, and which
CNI plugin to "delegate" the attachment/detachment of the container, the
port-mapping information itself is learned by looking at the
NetworkInfo
set in the args
field of the CNI configuration passed
by Mesos to the port-mapper plugin. Please refer to the "Passing
network labels and port-mapping information to CNI
plugins" section for more details.
Calico provides 3rd-party CNI plugin that works out-of-the-box with Mesos CNI.
Calico takes a pure Layer-3 approach to networking, allocating a unique, routable IP address to each Meso task. Task routes are distributed by a BGP vRouter run on each Agent, which leverages the existing Linux kernel forwarding engine without needing tunnels, NAT, or overlays. Additionally, Calico supports rich and flexible network policy which it enforces using bookended ACLs on each compute node to provide tenant isolation, security groups, and external reachability constraints.
For information on setting up and using Calico-CNI, see Calico's guide on adding Calico-CNI to Mesos.
Weave provides a CNI implementation that works out-of-the-box with Mesos.
Weave provides hassle free configuration by assigning an ip-per-container and providing a fast DNS on each node. Weave is fast, by automatically choosing the fastest path between hosts. Multicast addressing and routing is fully supported. It has built in NAT traversal and encryption and continues to work even during a network partition. Finally, Multi-cloud deployments are easy to setup and maintain, even when there are multiple hops.
For more information on setting up and using Weave CNI, see Weave's CNI documentation