New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resource API refactoring and shared resources #21744
Conversation
249e3c0
to
8b26dc2
Compare
8b26dc2
to
50a7589
Compare
50a7589
to
9d3e4d3
Compare
fa03a9b
to
e8cacdb
Compare
8e6b86f
to
d7f1f84
Compare
/test |
4e06d6b
to
85938a3
Compare
/test |
195ac70
to
7e19e07
Compare
7e19e07
to
912d9ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks alright, a few questions around the actual shared resources cell. Also, would be good to get someone else to take an additional look at the watchers rework in particular since I don't know enough about the existing code.
3923eae
to
1825938
Compare
885f24f
to
6d338e1
Compare
/test |
6d338e1
to
ee2ced8
Compare
/test |
As Resource[T] was used more it became apparent that making it an stream.Observable and having Event[T] be an interface makes it inconvenient to use: - Almost always we're subscribing to it for the whole duration of the application and do not need to handle completion errors. Consuming the events via a channel is also often the preferred way as it always "for-select" over multiple resources. - Event[T] contains almost the same payload for all event types, we might as well keep it simple and just use a single struct. This changes Observe() into `Events(ctx, opts) <-chan Event[T]` and moves rate limiting and error handling to be per-subscriber options. Event[T] becomes a single struct with a kind tag to specific event type. In order to handle large datasets at startup the events are now emitted immediately rather than waiting for informer to synchronize the store first. The BenchmarkResource in resource_test.go was also ported to current master and the difference between the unbuffered channel approach to callback was about 2x (2600ns/op for channels, 1350ns/op for callbacks), which is negligble enough. Benchmark was also ran with buffered event channel with no difference to unbuffered. Unbuffered is preferable as allows better chances for coalescing away the intermediate updates for objects with the same key. Signed-off-by: Jussi Maki <jussi@isovalent.com>
This introduces a set of centrally managed resource.Resource[T] objects. These aim to replace the current k8s/watchers and k8s/watchers/subscriber, with a single interface (Resource[T]) that one can be observed and its store queried from any cell. With this we'll gradually remove K8sWatcher by moving code from k8s/watchers/ into feature specific packages that subscribe to Resource[T]. See pkg/k8s/resource/example.go for usage example of Resource[T]. To start things off, in this patch the Resource[*Service] and Resource[*Node] are introduced into the object graph and the existing service and node watchers are refactored to be implemented on top of it. This keeps the existing code working, but allows new cells to use these objects directly without dependency on K8sWatcher. Signed-off-by: Jussi Maki <jussi@isovalent.com>
Add Resource[*slim_corev1.Namespace] to the shared resources and refactor the namespace watcher to use it. Signed-off-by: Jussi Maki <jussi@isovalent.com>
The initial implementation was broken as it checked if the line length exceeds width after printing. Tested manually with long list of objects. Fixes: 58c6dbf ("hive: Detect terminal width for use in InfoLeaf line wrapping") Signed-off-by: Jussi Maki <jussi@isovalent.com>
Use the k8s.SharedResourcesCell in operator and clustermesh-apiserver. Since resources are lazy, this does not cause unnecessary informers to run. Signed-off-by: Jussi Maki <jussi@isovalent.com>
This adds a finalizer to check that the Event[T] has been marked Done() before it's garbage collected. If Done() is not called we'll end up permanently blocking events for that key, which can lead to very hard to debug bugs. The finalizer is set on a heap allocated bool instead of Event[T] itself to keep that struct as simple as possible with all fields exported, so it can be used in tests without being exposed to implementation details. The reference chain is: Event[T] -> Done() closure -> *bool. The panic looks like this: panic: *resource.resource[*k8s.io/api/core/v1.Node].Events() called from /home/jussi/go/src/github.com/cilium/cilium/pkg/k8s/resource/resource_test.go:515 has a broken event handler that did not call Done() before event was garbage collected goroutine 18 [running]: github.com/cilium/cilium/pkg/k8s/resource.(*resource[...]).Events.func1.1() /home/jussi/go/src/github.com/cilium/cilium/pkg/k8s/resource/resource.go:343 +0x7d The BenchmarkResource results with the finalizer is 3900 ns/op versus ~3000 ns/op. Signed-off-by: Jussi Maki <jussi@isovalent.com>
- Add goleak wrapper to the tests - Add stopping of hive to all the tests to make the goleak check pass. - Simplify the pre-init/init phases. - Increase timeouts of few flaky test cases Signed-off-by: Jussi Maki <jussi@isovalent.com>
Instead of checking for the value of the sentinel, it's much more efficient to just unset the finalizer. Checking the value was also not correct since there was no memory barrier that would've prevented reading a stale value. Signed-off-by: Jussi Maki <jussi@isovalent.com>
ee2ced8
to
cacf0e6
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Operator changes LGTM.
Previously, the error was not passed down to the Done() which would never attempt a retry if there was an error in the node handler. Fixes: b4335ee ("k8s: Introduce shared resources") Fixes: cilium#21744 Signed-off-by: Chris Tarazi <chris@isovalent.com>
First commit simplifies the Resource[T] API and event type and fixes an issue with coalescing events.
Rest of the commits add k8s.SharedResourceCell with resources for node, services and namespaces and refactor the the relevant pkg/k8s/watchers to use the resources instead of their own informers and stores.
See individual commits for details.