Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework ipcache to handle metadata from multiple sources via a dedicated worker goroutine #21142

Open
15 of 29 tasks
joestringer opened this issue Aug 30, 2022 · 0 comments
Open
15 of 29 tasks
Assignees
Labels
area/daemon Impacts operation of the Cilium daemon. kind/enhancement This would improve or streamline existing functionality. kind/meta Meta-task for co-ordination. pinned These issues are not marked stale by our issue bot. sig/agent Cilium agent related. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.

Comments

@joestringer
Copy link
Member

joestringer commented Aug 30, 2022

This meta issue tracks a variety of enhancements being made to pkg/ipcache in order to make it more resilient and able to consistently merge information gathered about IPs from various sources (kvstore, k8s resources, local node, etc.).

Broadly speaking, these changes define a new "desired state" structure plus a resolution loop in order to realize that desired state. The desired state is synchronously updated by callers to define information such as a list of labels that one resource wants to associate with an IP, or an encryption key to associate with it. Then the upsert/remove call (typically from a k8s watcher) would trigger a separate goroutine to then realize the state internally. The realization of the state will combine all of the information from different sources together, potentially allocate identities to associate with the IP, trigger policy calculation for any new identities, and ensure that the datapath is consistently updated with any new policy state as well as any new bpf ipcache map state. For more details on the high level API, see #19765.

Tasks:

  1. Remove static package-level variables from pkg/ipcache
  2. Create new asynchronous API for handling insertion of labels / metadata /auxiliary data into the IPCache
  3. Convert the existing callers of IPCache to the new API
  4. Remove old synchronous ipcache.Upsert() APIs
    • Support natively injecting security identities into the ipcache API (possibly related to NodeManager changes) ipcache: Add ability to override identity via UpsertMetadata #21667
    • Refactor the AllocateCIDRsForIPs to use the new ipcache API
      • Mitigate issues where a CIDRSet labels refer to the same IP prefix as the IP/Identity backing an FQDN
    • Re-evaluate the use of pkg.Source for resolving ipcache upsert conflicts
    • Merge the main IPCache and IPCache.metadata structures together to reduce duplication? (TBD)
  5. Expand testing of these features

Related: #17962
Related: #18301

@joestringer joestringer added kind/enhancement This would improve or streamline existing functionality. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/daemon Impacts operation of the Cilium daemon. pinned These issues are not marked stale by our issue bot. sig/agent Cilium agent related. labels Aug 30, 2022
@joestringer joestringer changed the title Rework ipcache to handle metadata from multiple sources via a dedicated worker thread Rework ipcache to handle metadata from multiple sources via a dedicated worker goroutine Aug 31, 2022
@joestringer joestringer added this to Needs triage in Agent Project Tracking via automation Aug 31, 2022
christarazi added a commit to christarazi/cilium that referenced this issue Jan 20, 2023
This new function is a convenience and safe wrapper for creating a copy
of a Labels object from another.

This is especially useful going forward with the IPcache rework
(cilium#21142), where many call sites
are being converted from using raw identities over to labels. Being able
to create copies of labels without accidentally copying the reference to
the backing slice is crucial to avoid very subtle and hard to find bugs.

For example:

```
// Node 1 comes online.
node1Labels := labels.LabelRemoteNote
// Detect that node1 also has the kube-apiserver deployed on it, so add
// its label.
node1Labels.MergeLabels(labels.LabelKubeAPIServer)
...

// Node 2 comes onlne.
node2Labels := labels.LabelRemoteNote
// Now node2Labels contains the kube-apiserver label as well because
// node1Labels was set to the reference of labels.LabelRemoteNote.
```

Signed-off-by: Chris Tarazi <chris@isovalent.com>
christarazi added a commit that referenced this issue Jan 23, 2023
This new function is a convenience and safe wrapper for creating a copy
of a Labels object from another.

This is especially useful going forward with the IPcache rework
(#21142), where many call sites
are being converted from using raw identities over to labels. Being able
to create copies of labels without accidentally copying the reference to
the backing slice is crucial to avoid very subtle and hard to find bugs.

For example:

```
// Node 1 comes online.
node1Labels := labels.LabelRemoteNote
// Detect that node1 also has the kube-apiserver deployed on it, so add
// its label.
node1Labels.MergeLabels(labels.LabelKubeAPIServer)
...

// Node 2 comes onlne.
node2Labels := labels.LabelRemoteNote
// Now node2Labels contains the kube-apiserver label as well because
// node1Labels was set to the reference of labels.LabelRemoteNote.
```

Signed-off-by: Chris Tarazi <chris@isovalent.com>
ldelossa pushed a commit that referenced this issue Jan 24, 2023
This new function is a convenience and safe wrapper for creating a copy
of a Labels object from another.

This is especially useful going forward with the IPcache rework
(#21142), where many call sites
are being converted from using raw identities over to labels. Being able
to create copies of labels without accidentally copying the reference to
the backing slice is crucial to avoid very subtle and hard to find bugs.

For example:

```
// Node 1 comes online.
node1Labels := labels.LabelRemoteNote
// Detect that node1 also has the kube-apiserver deployed on it, so add
// its label.
node1Labels.MergeLabels(labels.LabelKubeAPIServer)
...

// Node 2 comes onlne.
node2Labels := labels.LabelRemoteNote
// Now node2Labels contains the kube-apiserver label as well because
// node1Labels was set to the reference of labels.LabelRemoteNote.
```

Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi christarazi moved this from Needs triage to High priority features in Agent Project Tracking Jan 27, 2023
@joestringer joestringer added the kind/meta Meta-task for co-ordination. label Mar 16, 2023
christarazi added a commit that referenced this issue Apr 14, 2023
Similar to the health reserved identity, the ingress identity shouldn't
have CIDR labels associated with it, so exclude it from the identity
resolution logic. The ingress IPs comes from the CiliumNode object.

Without this commit, ingress IPs will have a CIDR labels and therefore a
CIDR identity, instead of having the reserved ingress identity.

Related: #21142

Signed-off-by: Chris Tarazi <chris@isovalent.com>
christarazi added a commit that referenced this issue May 16, 2023
Similar to the health reserved identity, the ingress identity shouldn't
have CIDR labels associated with it, so exclude it from the identity
resolution logic. The ingress IPs comes from the CiliumNode object.

Without this commit, ingress IPs will have a CIDR labels and therefore a
CIDR identity, instead of having the reserved ingress identity.

Related: #21142

Signed-off-by: Chris Tarazi <chris@isovalent.com>
christarazi added a commit that referenced this issue May 17, 2023
Similar to the health reserved identity, the ingress identity shouldn't
have CIDR labels associated with it, so exclude it from the identity
resolution logic. The ingress IPs comes from the CiliumNode object.

Without this commit, ingress IPs will have a CIDR labels and therefore a
CIDR identity, instead of having the reserved ingress identity.

Related: #21142

Signed-off-by: Chris Tarazi <chris@isovalent.com>
christarazi added a commit that referenced this issue May 30, 2023
Switch over from using direct ipcache insertion with raw identities to a
label-based model, see GH-21142.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
christarazi added a commit that referenced this issue May 30, 2023
Move to the new async ipcache API, see GH-21142.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
christarazi added a commit that referenced this issue Jun 7, 2023
Switch over from using direct ipcache insertion with raw identities to a
label-based model, see GH-21142.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
christarazi added a commit that referenced this issue Jun 7, 2023
Move to the new async ipcache API, see GH-21142.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
dylandreimerink pushed a commit that referenced this issue Jun 8, 2023
Switch over from using direct ipcache insertion with raw identities to a
label-based model, see GH-21142.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
dylandreimerink pushed a commit that referenced this issue Jun 8, 2023
Move to the new async ipcache API, see GH-21142.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
romanspb80 pushed a commit to romanspb80/cilium that referenced this issue Jun 22, 2023
Switch over from using direct ipcache insertion with raw identities to a
label-based model, see ciliumGH-21142.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
romanspb80 pushed a commit to romanspb80/cilium that referenced this issue Jun 22, 2023
Move to the new async ipcache API, see ciliumGH-21142.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
joestringer added a commit that referenced this issue Aug 17, 2023
The following commits have introduced a new API for interacting with the
ipcache, which is now preferred for all future interactions:

- Commit e637814 ("ipcache: Introduce new asynchronous API")
- Commit b6cef92 ("ipcache: Add new {Upsert,Remove}Prefixes() APIs")
- Commit f9173b7 ("ipcache: Add ability to override identity via UpsertIdentity")

Mark the older APIs as deprecated in order to help to slowly phase out
future usage of those APIs. For more details, see GitHub issue #21142
("Rework ipcache to handle metadata from multiple sources via a
dedicated worker goroutine").

Note that there are still various paths using the older APIs to insert
ipcache entries. If an entry is inserted using the older API, then the
corresponding delete event should likewise use the same older API to
delete the entry. Future users should both insert and delete using the
newer API. Do not mix and match the upserts and deletes from deprecated
APIs and non-deprecated APIs for the same event type / resource update
into ipcache.

Suggested-by: Aditi Ghag <aditi@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
joestringer added a commit that referenced this issue Aug 29, 2023
The following commits have introduced a new API for interacting with the
ipcache, which is now preferred for all future interactions:

- Commit e637814 ("ipcache: Introduce new asynchronous API")
- Commit b6cef92 ("ipcache: Add new {Upsert,Remove}Prefixes() APIs")
- Commit f9173b7 ("ipcache: Add ability to override identity via UpsertIdentity")

Mark the older APIs as deprecated in order to help to slowly phase out
future usage of those APIs. For more details, see GitHub issue #21142
("Rework ipcache to handle metadata from multiple sources via a
dedicated worker goroutine").

Note that there are still various paths using the older APIs to insert
ipcache entries. If an entry is inserted using the older API, then the
corresponding delete event should likewise use the same older API to
delete the entry. Future users should both insert and delete using the
newer API. Do not mix and match the upserts and deletes from deprecated
APIs and non-deprecated APIs for the same event type / resource update
into ipcache.

Suggested-by: Aditi Ghag <aditi@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
aditighag pushed a commit that referenced this issue Aug 31, 2023
The following commits have introduced a new API for interacting with the
ipcache, which is now preferred for all future interactions:

- Commit e637814 ("ipcache: Introduce new asynchronous API")
- Commit b6cef92 ("ipcache: Add new {Upsert,Remove}Prefixes() APIs")
- Commit f9173b7 ("ipcache: Add ability to override identity via UpsertIdentity")

Mark the older APIs as deprecated in order to help to slowly phase out
future usage of those APIs. For more details, see GitHub issue #21142
("Rework ipcache to handle metadata from multiple sources via a
dedicated worker goroutine").

Note that there are still various paths using the older APIs to insert
ipcache entries. If an entry is inserted using the older API, then the
corresponding delete event should likewise use the same older API to
delete the entry. Future users should both insert and delete using the
newer API. Do not mix and match the upserts and deletes from deprecated
APIs and non-deprecated APIs for the same event type / resource update
into ipcache.

Suggested-by: Aditi Ghag <aditi@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
@squeed squeed self-assigned this Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. kind/enhancement This would improve or streamline existing functionality. kind/meta Meta-task for co-ordination. pinned These issues are not marked stale by our issue bot. sig/agent Cilium agent related. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
No open projects
Agent Project Tracking
High priority features
Development

No branches or pull requests

4 participants