Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support policy matching against kube-apiserver #17823

Merged

Conversation

joestringer
Copy link
Member

@joestringer joestringer commented Nov 8, 2021

See commit msgs, implementation by @christarazi.

This PR implements support for automated policy matching against the
kube-apiserver. It does so by detecting where the kube-apiserver is running,
whether it's inside the cluster or not.

Tasks:

To follow up:

Based on PR #17714. I reviewed that PR and addressed all important comments that I had directly on the PR.
Fixes: #14724

@joestringer joestringer added kind/feature This introduces new functionality. release-note/major This PR introduces major new functionality to Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. labels Nov 8, 2021
@joestringer joestringer force-pushed the pr/christarazi/apiserver-policy+rebase branch 2 times, most recently from 867f71d to 1105e37 Compare November 9, 2021 01:40
@joestringer
Copy link
Member Author

/test

@joestringer joestringer marked this pull request as ready for review November 9, 2021 01:57
@joestringer joestringer requested a review from a team November 9, 2021 01:57
@joestringer joestringer requested a review from a team as a code owner November 9, 2021 01:57
@joestringer joestringer requested a review from a team November 9, 2021 01:57
@joestringer joestringer requested review from a team as code owners November 9, 2021 01:57
@joestringer joestringer requested a review from a team November 9, 2021 01:57
@joestringer joestringer requested review from a team as code owners November 9, 2021 01:57
@joestringer joestringer requested a review from a team November 9, 2021 01:57
@joestringer joestringer requested a review from a team as a code owner November 9, 2021 01:57
@joestringer joestringer requested a review from a team November 9, 2021 01:57
@joestringer joestringer requested a review from a team as a code owner November 9, 2021 01:57
christarazi and others added 11 commits November 16, 2021 12:30
The KubeAPIServer source will be used only when an IP is associated with
the kube-apiserver. The primary user of this source will be
ipcache.InjectLabels() (introduced in a subsequent commit), which will
perform this check and overwrite the already passed-in source with the
KubeAPIServer source if there's a kube-apiserver label.

The aforementioned overwrite can occur as follows:
  * CN watcher associates the local node IPs with its corresponding
    reserved:host label and upserts into the ipcache with Local source.
  * EP watcher associates a kube-apiserver backend IP with its
    corresponding reserved:kube-apiserver label and upserts into the
    ipcache with CustomResource source (as the default source).
  * InjectLabels() sees that there has been an upsert with the
    reserved:kube-apiserver label so it upgrades the source of that
    request to KubeAPIServer source.

This source is now the strongest source throughout Cilium (greater than
Local) because of the above.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
This will be used in the subsequent commit to ensure
ipcache.InjectLabels() can retry upon failure to allocate CIDR
identities if the allocator is not yet initialized.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Any update to the IdentityMetadata (IDMD) map must trigger an update to
the ipcache. Since the IdentityMetadata maps IPS / CIDRs to labels, if
an update was made, we must to check if that update was related to the
labels. If so, that means it might affect identities. Hence, we must to
allocate a new identity, trigger policy updating, and push it into the
ipcache. This commit provides the mechanism to do so (InjectLabels()).
In future commits, this will be triggered by other subsystems as they
encounter IPs / CIDRs for which we need to associate labels with.

Corresponding functions to find and delete IPs / CIDRs with specific
labels is also provided, which include releasing any allocated
identities and deletion from the ipcache.

Here's a diagram of how this works for the kube-apiserver policy
matching:

  +------------+  (1)        (1)  +------------+
  | EP Watcher +-----+      +-----+ CN Watcher |
  +-----+------+   W |      | W   +------+-----+
        |            |      |            |
        |            v      v            |
        |            +------+            |
        |            | IDMD |            |
        |            +------+            |
        |               ^                |
        |               |                |
        |           (3) |R               |
        | (2)    +------+--------+   (2) |
        +------> |Label Injector |<------+
       Trigger   +-------+-------+ Trigger
                     (4) |W
                         |
                         v
                       +---+
                       |IPC|
                       +---+
  legend:
  * W means write
  * R means read

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Co-authored-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
By leveraging the special "kubernetes" Service / Endpoint object, we can
learn about which IPs the kube-apiserver is behind. These IPs will be
upserted into the ipcache with an identity that is to-be-allocated.  In
order to allocate an identity for these IPs, the
kube-apiserver-associated IPs are inserted into the IDMD map with the
"reserved:kube-apiserver" label. Then the watcher will trigger the label
injection via ipcache.TriggerLabelInjection().

We do not make a distinction on where the kube-apiserver is running,
whether it's on the local node, on a remote node in the cluster, or if
it's running outside of the cluster. All kube-apiserver-associated IPs
will have the "reserved:kube-apiserver" label.

Note that we are not hooking into the delete event. We only subscribe to
the add / update events. This is because we perform a diff on the
currently associated kube-apiserver IPs inside the IDMD and what the add
/ update event contains (desired IPs), then remove the current IPs that
are not in the desired set. This operation is equivalent to subscribing
to the delete event and removing the "desired" IPs from the IDMD.
However, we don't need to subscribe to the delete event because this
special Service is automatically recreated / reconciled by the apiserver
itself if the object is removed or edited.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Co-authored-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
It is common for the kube-apiserver to be running as host-network on a
node in the cluster. In order to correctly classify traffic to the
kube-apiserver with the kube-apiserver identity, we need to know which
node IPs are for local node or remote nodes. For these node IPs, they
will be associated with the "reserved:host" or "reserved:remote-node"
label, respectively, in the IDMD map.

Once the node manager has made these associations, we need to trigger
label injection via ipcache.TriggerLabelInjection() in order to figure
out to if a new identity must be allocated and if we need to trigger
policy calculation.

After this, if
* the local host has the kube-apiserver running, it will have an IP with
  the following labels: "reserved:host,reserved:kube-apiserver". This
  traffic will be classified as "host" traffic with ID 1. This does not
  trigger a new identity allocation, but the ID 1 will now contan the
  "reserved:kube-apiserver" label. This requires updating the selector
  cache, as ID 1 itself has changed, which is handled in
  ipcache.TriggerLabelInjection().
* a remote node(s) has the kube-apiserver running, it will have an IP
  with the following labels
  "reserved:remote-node,reserved:kube-apiserver". This traffic will be
  classified as "kube-apiserver" traffic with the newly created ID of 7.
  No further policy recalculation is needed in this case, as ID 7
  already exists.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Co-authored-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
Create dedicated functions for checking whether an identity represents
any node in the cluster or a remote node in the cluster. This will be
useful for an upcoming commit where a remote node may have the
REMOTE_NODE_ID identity or alternatively another hardcoded identity.

Signed-off-by: Joe Stringer <joe@cilium.io>
This identity will be used to identify remote nodes which also have the
kube-apiserver co-located, and allows policy at the higher layer to
differentiate nodes with this identity vs. other nodes in the cluster.

Signed-off-by: Joe Stringer <joe@cilium.io>
Now that the kube-apiserver policy matching has been implemented, expose
the user-facing policy API by allowing the user to express:

```
egress:
  toEntities:
  - kube-apiserver
```

Signed-off-by: Chris Tarazi <chris@isovalent.com>
This is to minimize the likelihood of confusion between 'prg' and 'prog'.

Signed-off-by: Joe Stringer <joe@cilium.io>
This should not ultimately be a global, but until issue 17849 is
resolved it needs to be. At least ensure that it's not accessed outside
the package, since it must be handled while holding the 'idMDMU' lock
immediately above.

Signed-off-by: Joe Stringer <joe@cilium.io>
Add a short blurb to describe the relationship between the
IdentityMetadata map and the rest of the Cilium agent. For a more
detailed design, see earlier commits, in particular:
"ipcache, policy: Inject labels from identity metadata".

Signed-off-by: Joe Stringer <joe@cilium.io>
@joestringer joestringer force-pushed the pr/christarazi/apiserver-policy+rebase branch from 4697c3b to 8b5ae5e Compare November 16, 2021 20:31
@joestringer
Copy link
Member Author

joestringer commented Nov 16, 2021

/test

Job 'Cilium-PR-K8s-1.22-kernel-4.9' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sServicesTest Checks graceful termination of service endpoints Checks client terminates gracefully on service endpoint deletion

Failure Output

FAIL: Timed out after 30.001s.

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-4.9 so I can create a new GitHub issue to track it.

@joestringer
Copy link
Member Author

ci-eks job was hit by an ongoing infrastructure issue that prevents EKS clusters from being provisioned. Given previous successful operations and the remaining jobs succeeding, this can be safely ignored.
travis job failed with #11560.

@joestringer joestringer added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 16, 2021
@ti-mo ti-mo merged commit c781955 into cilium:master Nov 17, 2021
@joestringer joestringer deleted the pr/christarazi/apiserver-policy+rebase branch November 17, 2021 16:13
Comment on lines +303 to +325
if l := RemoveLabels(prefix, lbls, src); l != nil {
la = l.LabelArray()
}
idsToDelete[id.ID] = la
if len(lbls) > 0 {
// If for example kube-apiserver label is removed from
// a remote-node, then RemoveLabels() will return a
// non-empty set representing the new full set of
// labels to associate with the node. In order to
// propagate the new identity, we must emit a delete
// event for the old identity and then an add event for
// the new identity.
idsToAdd[id.ID] = la
}
}
if len(idsToDelete) > 0 {
var wg sync.WaitGroup
// SelectorCache.UpdateIdentities() asks for callers to avoid
// handing the same identity in both 'adds' and 'deletes'
// parameters here, so make two calls. These changes will not
// be propagated to the datapath until later.
updater.UpdateIdentities(nil, idsToDelete, &wg)
updater.UpdateIdentities(idsToAdd, nil, &wg)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back through my git stash, I think there's a corner case here that is maybe not currently possible, but could hit someone really weirdly in future:

id above this block is some identity allocated based on a series of labels, like reserved:kube:apiserver,cidr:w.x.y.z/n,.... RemoveLabels will potentially then subtract one or more of those labels out of the list. So l ends up with say only cidr:w.x.y.z/n. There's no guarantee that there is any identity associated with that remaining set of labels at this point. Least of which, the identity id does not correspond to the new l set of labels.

From a quick peruse of the code, I think that SelectorCache.UpdateIdentities() will warn in this case, but I can't tell what the behaviour would actually be. Maybe one case to consider is if you had the apiserver on a remote node in the cluster, then you removed the kube-apiserver from that node (eg shift it to the current node). What happens to the selectorcache in that case? Do we suddenly change the cached labels for this identity?

cc @christarazi

jrajahalme added a commit to jrajahalme/cilium that referenced this pull request May 19, 2022
K8s cache is not fully synced also when k8sSyncedChecker is still nil.

Found during code inspection, not sure if this can manifest as a bug on
runtime.

Fixes: cilium#17823
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
joestringer pushed a commit that referenced this pull request May 25, 2022
K8s cache is not fully synced also when k8sSyncedChecker is still nil.

Found during code inspection, not sure if this can manifest as a bug on
runtime.

Fixes: #17823
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
jibi pushed a commit that referenced this pull request May 26, 2022
[ upstream commit 078c052 ]

K8s cache is not fully synced also when k8sSyncedChecker is still nil.

Found during code inspection, not sure if this can manifest as a bug on
runtime.

Fixes: #17823
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
jibi pushed a commit that referenced this pull request May 31, 2022
[ upstream commit 078c052 ]

K8s cache is not fully synced also when k8sSyncedChecker is still nil.

Found during code inspection, not sure if this can manifest as a bug on
runtime.

Fixes: #17823
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature This introduces new functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/major This PR introduces major new functionality to Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: Better policy construct to match on apiserver connectivity
10 participants