Skip to content

[WIP, READY FOR REVIEW] Integration of policies with services and the Internet access#609

Merged
brecode merged 44 commits intocontiv:masterfrom
milanlenco:integration
Mar 1, 2018
Merged

[WIP, READY FOR REVIEW] Integration of policies with services and the Internet access#609
brecode merged 44 commits intocontiv:masterfrom
milanlenco:integration

Conversation

@milanlenco
Copy link
Copy Markdown
Collaborator

@milanlenco milanlenco commented Feb 23, 2018

WORK IN PROGRESS: PLEASE DO NOT MERGE YET

TO-BE-DONE:

  • end-to-end testing (none done yet, just via UTs; proof of concept done manually via VPP CLI)
  • documentation (algorithm explanation, diagrams)

This pull request primarily includes the refactor of the policy rendering code, which was necessary to adapt to the limitations of the VPP/NAT plugin. For policies we always need to evaluate rules against the local IP addresses and not the NATed addresses of services or the node itself. This is, however, not the case for inbound ACLs, which makes them unusable. Previously we were using both directions, but now we combine ingress with egress and install all rules into outbound ACLs. Furthermore, to apply access control on the inter-node and pod-to-internet traffic, we need to reflect the ingress policies into a "global" ACL, installed on the node's output interfaces, also outbound side.
A detailed algorithm description + diagrams depicting the order of VPP nodes will be part of the documentation.

Similar restrictions are also present in the VPPTCP stack - each pod has only a single "local" table of rules assigned (evaluated in the ingress direction) and the stack additionally provides a single "global" table, evaluated in the ingress direction for traffic entering the node.
The equivalent limitations of VPPTCP stack and ACL+NAT (just different orientation of tables) have allowed us to unify the cache and the rendering algorithm to a large degree between the two renderers. That's the second contribution of the pull request.

The third contribution is in the service plugin: the plugin now also installs SNAT configuration which allows Internet access from pods. The SNAT is configured on the physical interface which acts as the default GW (host-VPP interconnect is not supported for Internet access). The implementation is DHCP-aware.
The only issue is that we cannot SNAT inter-node traffic, otherwise policies on the destination node will be evaluated against the NATed address of the source node and not the source pod. The solution is to split inter-node traffic from the pod-internet traffic. This is possible with VXLANs (inter-node traffic is encapsulated. whereas pod-internet is not), or by having an additional physical interface which acts as the default GW. With VXLANs disabled and only one physical interface available, SNAT gets therefore disabled (and needs to be performed by and external NAT device). This is a limitation for which we don't have a workaround at the moment.

Both policies and services (+SNAT) have resync fully implemented, i.e. restart scenarious are supported.

Milan Lenco added 30 commits February 13, 2018 09:32
Pods should be able to access kubernetes services (e.g. DNS)
even if they are isolated from the kube-system namespace by the
installed K8s network policies.
However, this is not the case in the opposite direction.
Policy may disallow kube-system pod to conntact pod from another
namespace.
This commit implements source NATing for all traffic leaving the cluster
network, which in effect opens up the Internet access for all pods.
The SNAT was included into the Service plugin in order to keep the NAT-related
configuration all in one place.

The solution is to add the IP address of the default GW interface into
the pool of VPP/NAT44 addresses and to enable postrouting on that
interface.

The traffic going between cluster nodes should not be NATed otherwise
the ACLs of the destination node would no longer match against
pod IPs, but rather against node IPs, which breaks the semantic.
It is possible to separate external traffic from the internal one
only with the assistance of VXLANs, therefore the SNAT is not supported
and gets disabled in the L2-only mode.
RendererCache combines capabilites of the VPPTCP and ACL caches
under a unified interface.

The rules are grouped into tables (ContivRuleTable type) and the
configuration is represented as a list of local tables, applied
on the ingress or the egress side of pods, and a single global table,
applied on the interfaces connecting the node with the rest
of the cluster.

The list of local tables is minimalistic in the sense that pods with
the same set of rules will share the same local table. Whether shared
tables are installed in one instance or as separate copies for each
associated pod is up to the renderer (usually determined by
the capabilities of the destination network stack).

All tables match only one side of the traffic - either ingress or
egress, depending on the cache orientation as selected in the Init method.
The cache combines the received ingress and egress Contiv rules
into the single chosen direction in a way that maintains the original
semantic (the global table is introduced to accomplish the task).

The rules are ordered in tables such that if rule *r1* matches subset
of the traffic matched by *r2*, then r1 precedes r2 in the list.
It is the order at which the rules should by applied by the rule
matching algorithm in the destination network stack (otherwise the
more specific rules could be overshadowed and never matched).

There are two types of tables distinguished:
  1. Local table: should be applied to match against traffic leaving
                  (IngressOrientation) or entering (EgressOrientation)
                  a selected subset of pods.
                  Every pod has at most one local table installed at
                  any given time. For a given local table, the set
                  of rules is immutable. Different content is treated
                  as a new local table (and the original table may
                  get unassigned from some or all originally
                  associated pods).
                  Local table has always at least one rule, otherwise
                  it is simply not tracked and returned by the cache.
  2. Global table: should be applied to match against traffic entering
                   (IngressOrientation) or leaving (EgressOrientation)
                   the node. There is always exactly one global table
                   installed (per node).
                   The global table may contain an empty set of rules
                   (meaning ALLOW-ALL).
In Resync we are not able to *easily* fully reconstruct the policy
configuration, most notably the IP addresses of pods.
For pods no longer existing after the resync it should not be
necessary to know the IP address anyway, therefore it can be nil.
@coveralls
Copy link
Copy Markdown

coveralls commented Feb 23, 2018

Coverage Status

Coverage decreased (-0.2%) to 75.711% when pulling 5b13c9d on milanlenco:integration into 4d3ee29 on contiv:master.

@brecode
Copy link
Copy Markdown
Member

brecode commented Mar 1, 2018

LGTM

@brecode brecode merged commit 7221ca1 into contiv:master Mar 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants