Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource API refactoring and shared resources #21744

Merged
merged 8 commits into from Dec 22, 2022

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Oct 15, 2022

First commit simplifies the Resource[T] API and event type and fixes an issue with coalescing events.
Rest of the commits add k8s.SharedResourceCell with resources for node, services and namespaces and refactor the the relevant pkg/k8s/watchers to use the resources instead of their own informers and stores.

See individual commits for details.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 15, 2022
@joamaki joamaki force-pushed the pr/joamaki/shared-resources branch 3 times, most recently from fa03a9b to e8cacdb Compare November 17, 2022 13:10
@joamaki joamaki added the release-note/misc This PR makes changes that have no direct user impact. label Nov 18, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 18, 2022
@joamaki
Copy link
Contributor Author

joamaki commented Nov 18, 2022

/test

@joamaki joamaki force-pushed the pr/joamaki/shared-resources branch 4 times, most recently from 4e06d6b to 85938a3 Compare November 21, 2022 16:33
@joamaki
Copy link
Contributor Author

joamaki commented Nov 22, 2022

/test

@joamaki joamaki changed the title DRAFT: Shared resources Resource API refactoring and shared resources Nov 22, 2022
@joamaki joamaki force-pushed the pr/joamaki/shared-resources branch 2 times, most recently from 195ac70 to 7e19e07 Compare November 24, 2022 10:39
Copy link
Member

@dylandreimerink dylandreimerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks alright, a few questions around the actual shared resources cell. Also, would be good to get someone else to take an additional look at the watchers rework in particular since I don't know enough about the existing code.

pkg/k8s/shared_resources.go Show resolved Hide resolved
pkg/k8s/shared_resources.go Show resolved Hide resolved
@joamaki joamaki force-pushed the pr/joamaki/shared-resources branch 2 times, most recently from 3923eae to 1825938 Compare November 29, 2022 09:54
@joamaki joamaki force-pushed the pr/joamaki/shared-resources branch 2 times, most recently from 885f24f to 6d338e1 Compare December 20, 2022 08:09
@joamaki
Copy link
Contributor Author

joamaki commented Dec 20, 2022

/test

@joamaki joamaki marked this pull request as ready for review December 20, 2022 09:43
@joamaki joamaki requested review from a team as code owners December 20, 2022 09:43
@joamaki
Copy link
Contributor Author

joamaki commented Dec 20, 2022

/test

As Resource[T] was used more it became apparent that making it an
stream.Observable and having Event[T] be an interface makes it inconvenient
to use:

- Almost always we're subscribing to it for the whole duration of the
  application and do not need to handle completion errors. Consuming the events
  via a channel is also often the preferred way as it always "for-select"
  over multiple resources.

- Event[T] contains almost the same payload for all event types, we might
  as well keep it simple and just use a single struct.

This changes Observe() into `Events(ctx, opts) <-chan Event[T]` and moves
rate limiting and error handling to be per-subscriber options. Event[T]
becomes a single struct with a kind tag to specific event type.

In order to handle large datasets at startup the events are now emitted
immediately rather than waiting for informer to synchronize the store
first.

The BenchmarkResource in resource_test.go was also ported to current master
and the difference between the unbuffered channel approach to callback was
about 2x (2600ns/op for channels, 1350ns/op for callbacks), which is negligble
enough. Benchmark was also ran with buffered event channel with no difference
to unbuffered.  Unbuffered is preferable as allows better chances for coalescing
away the intermediate updates for objects with the same key.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
This introduces a set of centrally managed resource.Resource[T] objects. These aim
to replace the current k8s/watchers and k8s/watchers/subscriber, with a single
interface (Resource[T]) that one can be observed and its store queried from any cell.
With this we'll gradually remove K8sWatcher by moving code from k8s/watchers/ into feature
specific packages that subscribe to Resource[T]. See pkg/k8s/resource/example.go for usage
example of Resource[T].

To start things off, in this patch the Resource[*Service] and Resource[*Node] are introduced
into the object graph and the existing service and node watchers are refactored to be
implemented on top of it. This keeps the existing code working, but allows new cells to
use these objects directly without dependency on K8sWatcher.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Add Resource[*slim_corev1.Namespace] to the shared resources and refactor
the namespace watcher to use it.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
The initial implementation was broken as it checked if the line length exceeds width
after printing. Tested manually with long list of objects.

Fixes: 58c6dbf ("hive: Detect terminal width for use in InfoLeaf line wrapping")
Signed-off-by: Jussi Maki <jussi@isovalent.com>
Use the k8s.SharedResourcesCell in operator and clustermesh-apiserver.
Since resources are lazy, this does not cause unnecessary informers
to run.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
This adds a finalizer to check that the Event[T] has been marked Done()
before it's garbage collected. If Done() is not called we'll end up permanently
blocking events for that key, which can lead to very hard to debug bugs.

The finalizer is set on a heap allocated bool instead of Event[T] itself to
keep that struct as simple as possible with all fields exported, so it can be used in
tests without being exposed to implementation details. The reference chain is:

  Event[T] -> Done() closure -> *bool.

The panic looks like this:

  panic: *resource.resource[*k8s.io/api/core/v1.Node].Events() called from
    /home/jussi/go/src/github.com/cilium/cilium/pkg/k8s/resource/resource_test.go:515 has a
     broken event handler that did not call Done() before event was garbage collected

  goroutine 18 [running]:
  github.com/cilium/cilium/pkg/k8s/resource.(*resource[...]).Events.func1.1()
          /home/jussi/go/src/github.com/cilium/cilium/pkg/k8s/resource/resource.go:343 +0x7d

The BenchmarkResource results with the finalizer is 3900 ns/op versus ~3000 ns/op.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
- Add goleak wrapper to the tests
- Add stopping of hive to all the tests to make the goleak check pass.
- Simplify the pre-init/init phases.
- Increase timeouts of few flaky test cases

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Instead of checking for the value of the sentinel, it's much
more efficient to just unset the finalizer. Checking the value
was also not correct since there was no memory barrier that would've
prevented reading a stale value.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki
Copy link
Contributor Author

joamaki commented Dec 21, 2022

/test

Copy link
Member

@nebril nebril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operator changes LGTM.

@joamaki joamaki added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Dec 22, 2022
@joamaki joamaki merged commit 817b52d into cilium:master Dec 22, 2022
@dylandreimerink dylandreimerink mentioned this pull request Jan 23, 2023
19 tasks
@aanm aanm mentioned this pull request Jan 23, 2023
3 tasks
@christarazi christarazi added the sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. label Mar 29, 2023
christarazi added a commit to christarazi/cilium that referenced this pull request Mar 29, 2023
Previously, the error was not passed down to the Done() which would
never attempt a retry if there was an error in the node handler.

Fixes: b4335ee ("k8s: Introduce shared resources")
Fixes: cilium#21744
Signed-off-by: Chris Tarazi <chris@isovalent.com>
borkmann pushed a commit that referenced this pull request Mar 30, 2023
Previously, the error was not passed down to the Done() which would
never attempt a retry if there was an error in the node handler.

Fixes: b4335ee ("k8s: Introduce shared resources")
Fixes: #21744
Signed-off-by: Chris Tarazi <chris@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants