Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor k8s identities GC into a cell.Module #22892

Merged
merged 15 commits into from Jan 31, 2023

Conversation

pippolo84
Copy link
Member

@pippolo84 pippolo84 commented Dec 29, 2022

Port the current implementation of the CiliumIdentity GC to a separate cell.Module.

This makes possible to improve the testing of the GC code through unit tests, without relying on more complicated (and slow) controlplane tests.
Besides, the modularization of the GC cleans up the operator bootstrap code, since all the GC specific parts are now moved to their cell.

The new cell contains the code for both GC modes: CRD allocated identities and kvstore allocated identities.
The GC is moved inside the identity package and has the following structure:

operator/identity/
├── cell.go
├── crd_gc.go
├── gc.go
├── heartbeat.go
└── kvstore_gc.go

where:

  • cell.go contains all the exported API for the new cell, alongside its dependencies and configuration.
  • gc.go contains the common part of both gc modes.
  • crd_gc.go and heartbeat.go contain the CRD allocation mode specific GC part.
  • kvstorecontains the kvstore allocation mode specific GC part.

The new cell relies on resource.Resource[*v2.CiliumIdentity] to listen for CiliumIdentity related events and react accordingly.

Note for the reviewers

This is probably better to review commit by commit. The most impactful one is 1a482f8d3a08147410e65d91adbc6e49515a49b4, but despite the number of files modified, it aims to add the minimum amount of changes needed to modularize the identities GC code in a separate cell.
Further cleanups are left in the subsequent commits.

@pippolo84 pippolo84 added area/operator Impacts the cilium-operator component release-note/misc This PR makes changes that have no direct user impact. labels Dec 29, 2022
@pippolo84 pippolo84 changed the title Refactor k8s identities GC into a cell.Module [WIP] Refactor k8s identities GC into a cell.Module Dec 29, 2022
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Dec 29, 2022
@pippolo84 pippolo84 force-pushed the pr/pippolo84/k8s-identity-gc-module branch 3 times, most recently from dea7575 to 8e99618 Compare December 30, 2022 17:15
operator/cmd/k8s_identity_gc_test.go Outdated Show resolved Hide resolved
operator/cmd/k8s_identity_gc_test.go Outdated Show resolved Hide resolved
operator/cmd/k8s_identity_gc_test.go Outdated Show resolved Hide resolved
operator/cmd/k8s_identity_gc_test.go Outdated Show resolved Hide resolved
operator/cmd/k8s_identity.go Outdated Show resolved Hide resolved
operator/cmd/k8s_identity.go Outdated Show resolved Hide resolved
operator/cmd/k8s_identity.go Outdated Show resolved Hide resolved
operator/cmd/k8s_identity.go Outdated Show resolved Hide resolved
operator/cmd/k8s_identity.go Outdated Show resolved Hide resolved
}

// signal that endpoints are sync-ed, otherwise identities gc won't start
close(watchers.CiliumEndpointsSynced)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add Resource[*CiliumEndpoints] and get rid of this. The test also doesn't seem to check whether we correctly keep identities that have a CiliumEndpoint associated with them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll also need to have the CiliumEndpoints indexed by identity ID (e.g. with something like: https://github.com/joamaki/cilium/blob/pr/joamaki/operator-resource-trials/pkg/k8s/resource/index.go).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll make this the priority for the next architecture-related PR.

@pippolo84 pippolo84 force-pushed the pr/pippolo84/k8s-identity-gc-module branch from 8e99618 to 484ae47 Compare January 11, 2023 14:56
@pippolo84
Copy link
Member Author

I've merged the kvstore mode gc into the same cell and reworked the code. The changes regarding the unit test are not there yet.

@pippolo84 pippolo84 force-pushed the pr/pippolo84/k8s-identity-gc-module branch 2 times, most recently from 070876d to d44050e Compare January 12, 2023 16:38
operator/identity/cell.go Outdated Show resolved Hide resolved
@pippolo84 pippolo84 force-pushed the pr/pippolo84/k8s-identity-gc-module branch from d44050e to e890124 Compare January 13, 2023 14:12
@pippolo84
Copy link
Member Author

The unit test has been extended to check that identities used by an endpoint are not gc-ed.
Unfortunately, the identities gc is tightly coupled to the global watchers.CiliumEndpointStore and to manage that in the unit test I had to add some ugly code.
That should eventually go away once the Resource[*CiliumEndpoints] will be introduced.

@pippolo84 pippolo84 force-pushed the pr/pippolo84/k8s-identity-gc-module branch 4 times, most recently from 716d684 to b5839ff Compare January 15, 2023 11:26
@pippolo84 pippolo84 marked this pull request as ready for review January 16, 2023 08:38
@pippolo84 pippolo84 requested review from a team as code owners January 16, 2023 08:38
@pippolo84 pippolo84 added the dont-merge/bad-bot To prevent MLH from marking ready-to-merge. label Jan 27, 2023
@aanm aanm removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 27, 2023
@pippolo84
Copy link
Member Author

pippolo84 commented Jan 27, 2023

Added this commit to prevent modification of identity objects returned (and owned) by the resource informer.

@pippolo84 pippolo84 force-pushed the pr/pippolo84/k8s-identity-gc-module branch from 4dfee60 to c4fab44 Compare January 30, 2023 10:14
Add Cilium Identities as a shared resource in the proper cell.  This
will enable the modularization of all subsystems that depend on
CiliumIdentities events streaming.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Add a new cell.Module to implement the CiliumIdentity GC.

This improves the testability of the GC code through unit tests, without
relying on more complicated (and slow) controlplane tests.
Besides, the modularization of the GC cleans up the operator bootstrap
code, since all the GC specific parts are now moved to their cell.

The cell supports both GC modes: CRD allocated identities or kvstore
allocated identities.
The code is moved inside the `identity` package and has the following
structure:

operator/identity/
├── cell.go
├── crd_gc.go
├── gc.go
├── heartbeat.go
└── kvstore_gc.go

where:

- `cell.go` contains all the exported API for the new cell, alongside
  its dependencies and configuration.
- `gc.go` contains the common part of both gc modes.
- `crd_gc.go` and `heartbeat.go` contain the CRD allocation mode
  specific GC part.
- `kvstore`contains the kvstore allocation mode specific GC part.

The new cell relies on `resource.Resource[*v2.CiliumIdentity]` to listen
for CiliumIdentity related events and react accordingly.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
The heartbeat store used to keep track of the last insert/update
timestamps of each identities is an implementation detail related only
to the inner workings of the identities GC.

The commit changes all the related APIs to make them unexported.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Unlike global configuration options, cell-specific configuration options
(i.e. the ones defined through cell.Config(...)) will not be loaded from
`agentOption` or `operatorOption`, but from the `*viper.Viper` object
bound to the test hive.

`modCellConfig` function exposes the operator hive viper struct to each
controlplane test, so to allow changing those options as needed.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Add a unit test for the CiliumIdentities GC operating in CRD allocation
mode.

The new unit test verifies that an Identity is correctly gc-ed when it
has neither been updated in the last heartbeat cycle nor used by a
CiliumEndpoint.
Besides, the test adds a check for leaked goroutines using goleak.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
The global node watcher relies on an unexported rate limiting queue that
starts a goroutine to listen for related events.
Being unexported, there are no means to stop that goroutine when running
unit tests like the identities gc one in operator/identity.
To avoid leaking goroutines, an exported API is added so to stop the
node queue before the test termination.

This is a temporary solution before refactoring the watchers package to
use the new "hive and cells" architecture.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Controlplane test identities GC has been superseded by the identities GC
unit test in operator/identity. The two test are functionally
equivalent, so the controlplane one is removed.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Do not start the heartbeat updater for CRD mode identities GC when the
related interval is set to 0.

The heartbeat updater single purpose is to mark all identities that have
been just created, in order to avoid the GC to collect them immediately
after creation.  When the GC is disabled, there is no point in starting
it.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
To avoid repeating log entries, inner logging of updates and deletions
ops on identities is removed.

Also, the log entry related to the collection of an identity is lowered
to debug level to avoid flooding the log too much.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
In order to gracefully shutdown the identities GC, all critical errors
are reported to the upper layer instead of relying on a call to Fatal().

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
All the code in operator/identity is there only to implement the Cilium
identities gc. Renaming it makes its purpose clearer and also avoids the
name clashing with pkg/identity package.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Since the GC does not export any API, it is possible not to return the
internal object and just register its start and stop hooks to
hive.Lifecycle directly.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Even if (prometheus.Gauge).Set requires a float64 value, declaring the
counters for successful and unsuccessful gc runs as such might feel
confusing to the casual reader.  The type is then changed to int and a
type conversion to float64 is added at each Set call site.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
The identities returned by the underlying identity resource informer are
owned by the informer itself, thus they should be deep copied before
being modified.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
This commit regenerates the cmdref documentation files following the
refactor of the identites gc into a cell.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
@pippolo84 pippolo84 force-pushed the pr/pippolo84/k8s-identity-gc-module branch from c4fab44 to 45485f3 Compare January 30, 2023 11:02
@pippolo84
Copy link
Member Author

Rebased onto master to fix build-commits test failure and updated operator cmdref.

@pippolo84
Copy link
Member Author

/test

@pippolo84 pippolo84 removed the dont-merge/bad-bot To prevent MLH from marking ready-to-merge. label Jan 30, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 31, 2023
@qmonnet qmonnet merged commit abdf7ac into cilium:master Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/operator Impacts the cilium-operator component ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants