Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support policy matching against kube-apiserver #17714

Closed

Conversation

christarazi
Copy link
Member

@christarazi christarazi commented Oct 26, 2021

Superseded by #17823

See commit msgs.

This PR implements support for automated policy matching against the
kube-apiserver. It does so by detecting where the kube-apiserver is running,
whether it's inside the cluster or not.

TODO:

Fixes: #14724

@christarazi christarazi added kind/feature This introduces new functionality. release-note/major This PR introduces major new functionality to Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. labels Oct 26, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Oct 26, 2021
@christarazi christarazi changed the title pr/christarazi/apiserver policy Support policy matching against kube-apiserver Oct 26, 2021
@joestringer joestringer added this to the 1.11.0-rc1 milestone Oct 26, 2021
@christarazi christarazi force-pushed the pr/christarazi/apiserver-policy branch from f01da3e to 5928021 Compare November 5, 2021 00:14
joestringer and others added 19 commits November 4, 2021 17:44
Create dedicated functions for checking whether an identity represents
any node in the cluster or a remote node in the cluster. This will be
useful for an upcoming commit where a remote node may have the
REMOTE_NODE_ID identity or alternatively another hardcoded identity.

Signed-off-by: Joe Stringer <joe@cilium.io>
This identity will be used to identify remote nodes which also have the
kube-apiserver colocated, and allows policy at the higher layer to
differentiate nodes with this identity vs. other nodes in the cluster.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Joe Stringer <joe@cilium.io>
Now that cilium#12544 has been fixed, remove the code that the comment
indicated to.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Fix it for
(*EgressRule).GetDestinationEndpointSelectorsWithRequirements(). Seems
like it was case of copy-pasta from the corresponding ingress function.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit expands on the mock identity allocator implementation to
actually perform some real work, rather than a complete no-op. It also
renames the identity allocator from "Fake" to "Mock" to better represent
its behavior.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Has() is a new helper that will be used by subsequent commits where we
need to search if a Labels object contains a specific Label.

Additionally, this commit implements the ability to remove Labels,
providing the inverse logic to MergeLabels().

Signed-off-by: Chris Tarazi <chris@isovalent.com>
These will useful in future commits where we need to reference these
labels individual via Has() or similar.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
In a subsequent commit, the reserved ID cache will be updated from
concurrent users. This commit prepares the code for that.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
With the advent of the kube-apiserver reserved identity, it is now
possible for the host to have the kube-apiserver label. This means that
we need to provide a way to update the reserved ID cache within Cilium
for IDs with multiple labels. This breaks the long-held assumption that
reserved IDs can only have one label.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Previously, the function prototype that IterateReservedIdentities() took
in only considered reserved identities to have a single label. Now with
the introduction of the kube-apiserver reserved identity, that is no
longer the case, as it contains two labels. This commits changes the
function prototype to match the newly introduced
AddReservedIdentityWithLabels(), which allow a full set of labels to be
passed.

In addition, we introduce a new map which holds the numeric identity to
the full set of labels, which is the map that will be iterated over in
IterateReservedIdentities().

As the previous commit mentioned, this code transitions from the
assumption of one label per reserved identity, to potentially multiple
labels per reserved identity.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit adds the new kube-apiserver label which enables traffic to
be identified as kube-apiserver traffic. Along with the new label, we
create a new reserved ID which will be associated with traffic going to
the kube-apiserver.

There are a couple nuances to note here. The kube-apiserver label will
used in three scenarios:
  * Attached along with the local host label if the kube-apiserver is
    running on the local host. The identity of the local host will now
    be used for traffic to the local host + to the local kube-apiserver.
    The reserved ID of the local host *does not* change however. It is
    still 1, to prevent breaking changes with connectivity.
  * Attached along with the remote-node label as part of the new
    kube-apiserver reserved identity. This identity is only used if the
    kube-apiserver is running on remote nodes.
  * Attached along with the CIDR labels if the kube-apiserver is running
    outside of the cluster. This will result in a regular CIDR identity.

In subsequent commits, the above three cases are what will be
implemented to allow policy matching on the kube-apiserver. This commit
setting the groundwork for that.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
This will be useful in subsequent commits where it will be used to
Upsert() into the ipcache.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit introduces a userspace map in the ipcache package called
IdentityMetadata (IDMD) that assoicates CIDRs to specific labels. This
is useful for figuring out the corresponding identity of the IP as the
CIDR -> label mappings will be used to generate identities.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit refactors the function for allocating CIDR identities for
prefixes to accept labels from the caller, as well as making it generic
for any identity allocation (not exclusive to CIDR). The lower-level
function now operates on a single prefix and its assoicated labels,
while the higher-level functions perform the iteration over a slice of
prefixes, passing in the default CIDR labels for each prefix.

Now that the lower-level function is no longer specific to allocating
CIDR identities, rename it to reflect its generic nature. The CIDR label
is only attached to the identity if the prefix is associated with world.

This refactor is useful in subsequent commits where the caller of the
lower-level function will want to control the labels associated with a
prefix.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit reduces the regeneration.Owner interface by removing the
GetPolicyRepository() method from it. This is to remove the import of
package policy from the regeneration package. This will ease the
subsequent commit by avoiding an import cycle within the regeneration
package.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit moves the policy updater trigger from the Daemon to the
specific policy package. This allows us to reuse the new policy.Updater
object in the future, without having to abstract away the entire Daemon
object.

In other words, triggering policy updates was only possible from the
Daemon object through (*Daemon).TriggerPolicyUpdates(). Now, we can
trigger policy updates by using the policy.Updater object instead.

The Daemon object also contains too much logic for its own good, so this
is a good start to start moving code and delegating to other packages.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
The KubeAPIServer source will be used only when an IP is associated with
the kube-apiserver. The primary user of this source will be
ipcache.InjectLabels() (introduced in a subsequent commit), which will
perform this check and overwrite the already passed-in source with the
KubeAPIServer source if there's a kube-apiserver label.

The aforementioned overwrite can occur as follows:
  * CN watcher associates the local node IPs with its corresponding
    reservered:host label and upserts into the ipcache with Local source
  * EP watcher associates a kube-apiserver backend IP with its
    corresponding reserved:kube-apiserver label and upserts into the
    ipcache with CustomResource source (as the default source)
  * InjectLabels() sees that there has been an upsert with the
    reservered:kube-apiserver label so it upgrades the source of that
    request to KubeAPIServer source

This source is now the strongest source throughout Cilium (greater than
Local) because of the above.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
This will be used in the subsequent commit to ensure
ipcache.InjectLabels() can retry upon failure to allocate CIDR
identities if the allocator is not yet initialized.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi christarazi force-pushed the pr/christarazi/apiserver-policy branch from 5928021 to c6b2ee6 Compare November 5, 2021 06:40
Any update to the IdentityMetadata (IDMD) map must trigger an update to
the ipcache. Since the IdentityMetadata maps IPS / CIDRs to labels, if
an update was made, we must to check if that update was related to the
labels. If so, that means it might affect identities. Hence, we must to
allocate a new identity, trigger policy updating, and push it into the
ipcache. This commit provides the mechanism to do so (InjectLabels()).
In future commits, this will be triggered by other subsystems as they
encounter IPs / CIDRs for which we need to assoicate labels with.

Corresponding functions to find and delete IPs / CIDRs with specific
labels is also provided, which include releasing any allocated
identities and deletion from the ipcache.

Here's a diagram of how this works for the kube-apiserver policy
matching:

  +------------+  (1)        (1)  +------------+
  | EP Watcher +-----+      +-----+ CN Watcher |
  +-----+------+   W |      | W   +------+-----+
        |            |      |            |
        |            v      v            |
        |            +------+            |
        |            | IDMD |            |
        |            +------+            |
        |               ^                |
        |               |                |
        |           (3) |R               |
        | (2)    +------+--------+   (2) |
        +------> |Label Injector |<------+
       Trigger   +-------+-------+ Trigger
                     (4) |W
                         |
                         v
                       +---+
                       |IPC|
                       +---+
  legend:
  * W means write
  * R means read

Signed-off-by: Chris Tarazi <chris@isovalent.com>
By leveraging the special "kubernetes" Service / Endpoint object, we can
learn about which IPs the kube-apiserver is behind. These IPs will be
upserted into the ipcache with an identity that is to-be-allocated.  In
order to allocate an identity for these IPs, the
kube-apiserver-associated IPs are inserted into the IDMD map with the
"reserved:kube-apiserver" label. Then the watcher will trigger the label
injection via ipcache.TriggerLabelInjection().

We do not make a distinction on where the kube-apiserver is running,
whether it's on the local node, on a remote node in the cluster, or if
it's running outside of the cluster. All kube-apiserver-associated IPs
will have the "reserved:kube-apiserver" label.

Note that we are not hooking into the delete event. We only subscribe to
the add / update events. This is because we perform a diff on the
currently associated kube-apiserver IPs inside the IDMD and what the add
/ update event contains (desired IPs), then remove the current IPs that
are not in the desired set. This operation is equivalent to subscribing
to the delete event and removing the "desired" IPs from the IDMD.
However, we don't need to subscribe to the delete event because this
special Service is automatically recreated / reconciled by the apiserver
itself if the object is removed or edited.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
It is common for the kube-apiserver to be running as host-network on a
node in the cluster. In order to correctly classify traffic to the
kube-apiserver with the kube-apiserver identity, we need to know which
node IPs are for local node or remote nodes. For these node IPs, they
will be associated with the "reserved:host" or "reserved:remote-node"
label, respectively, in the IDMD map.

Once the node manager has made these associations, we need to trigger
label injection via ipcache.TriggerLabelInjection() in order to figure
out to if a new identity must be allocated and if we need to trigger
policy calculation.

After this, if
* the local host has the kube-apiserver running, it will have an IP with
  the following labels: "reserved:host,reserved:kube-apiserver". This
  traffic will be classified as "host" traffic with ID 1. This does not
  trigger a new identity allocation, but the ID 1 will now contan the
  "reserved:kube-apiserver" label. This requires updating the selector
  cache, as ID 1 itself has changed, which is handled in
  ipcache.TriggerLabelInjection().
* a remote node(s) has the kube-apiserver running, it will have an IP
  with the following labels
  "reserved:remote-node,reserved:kube-apiserver". This traffic will be
  classified as "kube-apiserver" traffic with the newly created ID of 7.
  No further policy recalculation is needed in this case, as ID 7
  already exists.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Now that the kube-apiserver policy matching has been implemented, expose
the user-facing policy API by allowing the user to express:

```
egress:
  toEntities:
  - kube-apiserver
```

While we're at it, bump the K8s CRD schema version because this is a
user-facing change to the CNP / CCNP CRDs.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi christarazi force-pushed the pr/christarazi/apiserver-policy branch from c6b2ee6 to 6cd39d5 Compare November 5, 2021 22:10
@joestringer joestringer self-assigned this Nov 5, 2021
@christarazi
Copy link
Member Author

/test

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found these issues during an initial pass. I'll push a fresh PR with some of these fixes + rebase against master, then start iterating further on these items & finish up the review.

pkg/node/manager/manager.go Show resolved Hide resolved
pkg/node/manager/manager.go Show resolved Hide resolved
pkg/ipcache/cidr.go Show resolved Hide resolved
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed this completely now. For anything I thought was crucial to fix, I prepared a fix as a squash/fixup commit at https://github.com/joestringer/cilium/tree/pr/christarazi/apiserver-policy. I will fold those changes into #17823. I can't modify this PR anyway so will need to follow up on that PR.

I'll go back through my own comments and mark "resolved" the ones that I addressed.

pkg/ipcache/metadata.go Show resolved Hide resolved
pkg/ipcache/metadata.go Show resolved Hide resolved
pkg/ipcache/metadata.go Show resolved Hide resolved
pkg/ipcache/metadata.go Show resolved Hide resolved
pkg/ipcache/metadata.go Show resolved Hide resolved
pkg/ipcache/metadata.go Show resolved Hide resolved
pkg/ipcache/metadata.go Show resolved Hide resolved
pkg/ipcache/metadata.go Show resolved Hide resolved
pkg/k8s/watchers/endpoint.go Show resolved Hide resolved
pkg/k8s/watchers/cilium_node.go Show resolved Hide resolved
if !ok {
id = f.localID
f.ipToIdentity[ip.String()] = id
f.idRefCount[id]++
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This testutils refcount increase should happen unconditionally rather than only if the IP is getting a new identity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ Using pkg/counter could also help avoid issues like this.

Comment on lines +313 to +315
n := labels.NewLabelsFromModel(nil)
n.MergeLabels(l)
return n // copy of leftover
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The identity handling here is a bit fishy - if the IdentityMetadata labels set has changed and identities are generally tied to the lifetime of the entry in the IdentityMetadata map, then this case should deallocate the identity for the previous set of labels and allocate the identity for the new set of labels.

@joestringer
Copy link
Member

Superseded by #17823, which is now merged. Closing. @christarazi I've left some comments open here and will be happy to discuss with you about any remaining items from here or the other PR when you're back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature This introduces new functionality. release-note/major This PR introduces major new functionality to Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: Better policy construct to match on apiserver connectivity
2 participants