Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

identity: Make identity allocations observable #26373

Merged

Conversation

mhofstetter
Copy link
Member

@mhofstetter mhofstetter commented Jun 20, 2023

This makes the identity allocator changes observable in order to provide cells backend-agnostic access to identity allocations.

The identity allocation changes are now provided in the hive as stream.Observable[cache.IdentityChange]. When observing the initial listing is first waited for, then the current state is replayed followed by sync event and updates to state:

[ (wait for OnListDone()), Upsert, Upsert, Sync, Upsert, Delete, ... ]

Related to #25898 as this lays the foundation to replace the use of Resource[CiliumIdentity] with the observable identity allocator. This comes with the advantage of getting rid of the duplicated K8s Watcher for CiliumIdentities and the additional support for KVStore identity backend. A follow up PR will refactor the authentication related part (garbage collection).

Original PR by @joamaki: #26229

@mhofstetter mhofstetter added kind/enhancement This would improve or streamline existing functionality. area/daemon Impacts operation of the Cilium daemon. release-note/misc This PR makes changes that have no direct user impact. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. labels Jun 20, 2023
@mhofstetter mhofstetter requested review from a team as code owners June 20, 2023 11:28
@mhofstetter mhofstetter force-pushed the pr/mhofstetter/observable-allocator branch from 1a077f6 to 9b74b74 Compare June 20, 2023 11:32
Copy link
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I've left a few comments inline.

pkg/allocator/cache.go Outdated Show resolved Hide resolved
pkg/allocator/cache.go Show resolved Hide resolved
pkg/allocator/cache.go Show resolved Hide resolved
pkg/allocator/allocator_test.go Outdated Show resolved Hide resolved
pkg/allocator/cache.go Show resolved Hide resolved
pkg/allocator/cache.go Outdated Show resolved Hide resolved
@mhofstetter mhofstetter force-pushed the pr/mhofstetter/observable-allocator branch from 9b74b74 to 4a44720 Compare June 20, 2023 15:38
@mhofstetter
Copy link
Member Author

@giorio94 thanks for your fast feedback 🚀 i applied your suggestions - PTAL!

Copy link
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments and concerns around type reuse and lifecycle questions. I'm clearly not yet qualified enough to review code that uses streams.

pkg/identity/cache/allocator.go Outdated Show resolved Hide resolved
pkg/identity/cache/allocator.go Outdated Show resolved Hide resolved
pkg/allocator/cache.go Show resolved Hide resolved
pkg/allocator/cache.go Show resolved Hide resolved
@mhofstetter mhofstetter force-pushed the pr/mhofstetter/observable-allocator branch from 4a44720 to f9eb988 Compare June 21, 2023 08:43
@mhofstetter
Copy link
Member Author

Some minor comments and concerns around type reuse and lifecycle questions. I'm clearly not yet qualified enough to review code that uses streams.

@joestringer thanks for your feedback and questions. I included your feedback (labels.Labels) in the latest force push and tried to answer your questions as good as possible 🙏

@mhofstetter
Copy link
Member Author

Added @dylandreimerink as reviewer to this PR to have an additional pair of 👀 focussing on the "Observable" aspects.

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor nits remaining: mostly just documenting each go ... call to declare why it's necessary and how we know it will complete.

pkg/identity/cache/allocator.go Outdated Show resolved Hide resolved
pkg/identity/cache/allocator.go Outdated Show resolved Hide resolved
pkg/allocator/cache.go Show resolved Hide resolved
@mhofstetter mhofstetter force-pushed the pr/mhofstetter/observable-allocator branch from f9eb988 to f6c1a02 Compare June 22, 2023 06:21
@mhofstetter
Copy link
Member Author

@joestringer @christarazi thanks for your input. i addressed your suggestions in the latest force push.

@mhofstetter
Copy link
Member Author

/test

Comment on lines 530 to 531
// Calling complete from a new go routine, the same way as it would happen if the IdentityAllocator would
// be set. This is how it should be expected according to stream.Observable and prevents faulty assumptions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still confused, bear with me:

the same way as it would happen if the IdentityAllocator would be set

What is that way? Is there a function reference or other docs description that the reader can compare with? Given that complete() is an opaque function, it's hard to understand what complete does here and why that means it needs to be run in a goroutine. This comment assumes that the reader already knows this code very well, and then explains to them that "yes it works the way you expect". But for a fresh reader, they will not have the context, so then how will they build up the context to validate this assumption?

This is how it should be expected according to stream.Observable

Observable interface says:

      // Observe a stream of values as long as the given context is valid.                                                                                                                              
      // 'next' is called for each item, and finally 'complete' is called                                                                                                                               
      // when the stream is complete, or an error has occurred.                                                                                                                                         
      //                                                                                                                                                                                                
      // Observable implementations are allowed to call 'next' and 'complete'                                                                                                                           
      // from any goroutine, but never concurrently.  

Nothing in the above tells me that complete must be run in a new goroutine.

prevents faulty assumptions.

What are those faulty assumptions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no specific contract with regards to calling complete from the same or a different routine as Observe. But looking at existing examples, we always call complete from a goroutine spawned inside Observe. Due to that behavior, we might that users of stream.Observables which for some reason don't expect complete to be called before the call to Observe returns. I assume those are the "prevents faulty assumptions." referred to.

The faulty assumption could lead to the following which would deadlock if complete is called synchronously:

mu.Lock()
someStream.Observe(ctx, func(){...}, func(err error){
  mu.Lock()
  defer mu.Unlock()
})
mu.Unlock()

So by calling complete from a goroutine we avoid deadlocks by uninformed users which can be called "resilience". But its also implementing an implicit contract, so perhaps if we want to allow this we should document it.

TL;DR calling complete without a goroutine is valid as long as users of the API don't use it wrongly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dylandreimerink I see what you're saying, but it seems odd to me that a caller would grab a mutex outside the Observe and also in the complete function that they provide to Observe(). I would assume that the functions that are called in Observe() can be called from the current thread, so that pattern would just be unsafe. In fact, from the Observable documentation, it sounds like Observable implementations could call it even from the same goroutine, which says that the locking example above is not respecting the Observe() API. ie grabbing a mutex then calling Observe(...) where the Observable can choose the same goroutine to call complete() could easily trigger a deadlock. Therefore we shouldn't assume that people will write such code, we'll catch it in review and reject it because it doesn't use the Observe() function correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more bit of background: We should have reasons for goroutines, and avoid spawning goroutines if they're not necessary. They do take a little bit of resources, and they also add complexity due to (1) lifecycle management, ensuring they're completed and (2) by making things more asynchronous. I'd imagine there are good reasons for a lot of the existing complete() goroutine cases today, but it makes me nervous to see "well run it in a goroutine to avoid bad assumptions", because we can still make other bad assumptions around the goroutines and then we have bad assumptions and additional async complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe... could you rephrase exactly what the concrete problem is with having complete() run synchronously? I'm clearly still missing something here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stepping back through, I think the argument is that since CachingIdentityAllocator -> Allocator -> cache -> newCache() initializes the stream with a stream.Multicast[...], and since the implementation of stream.Multicast runs complete() from a goroutine, this code must also do the same. Broadly that seems like a consistency argument, which seems fine at face value. Based on the Observable() interface API, it should be equally fine to just run the completion function directly without the goroutine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that my latest comment is correct, my suggestion would be either:

(a) drop the goroutine since it's not needed, or
(b) document that assumption more explicitly:

Suggested change
// Calling complete from a new go routine, the same way as it would happen if the IdentityAllocator would
// be set. This is how it should be expected according to stream.Observable and prevents faulty assumptions.
// Calling complete from a new go routine, the same way as it would happen from stream.Multicast() in the Observe() function above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joestringer @dylandreimerink

i came up with c) : removing the actual go-routine from the id allocation business logic and using the existing stream API stream.Empty[allocator.AllocatorChange] in case where m.IdentityAllocator is nil. I refactored the logic a little bit - so the only difference is the observable (empty or m.IdentityAllocator). IMO this way the code should be readable and understandable from a id allocation point of view - if someone is understanding the observable patterns (there are enough pointers in the go doc of package stream).

even though the called stream.Empty is doing the same thing (immediately calling complete from a new go-routine), it's up to the stream package to provide the necessary (and allowed) observable patterns as API and document them well (so not every business logic need to document this again and again).

It's really a good discussion and i think there are valid arguments to keep or remove the go-routine. It's just that i think the context to discuss this in this "id allocation business related" PR isn't the right one. It should be part of the stream API and is relevant for all usages of it. -> can we discuss this in a follow up? will bring this up once @joamaki is back.

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next update:

Trying to understand the initial reason to differentiate between whether m.IdentityAllocator is initialized or not triggered me to dive into this a little bit more. I came to the conclusion that it's necessary to wrap the whole (m *CachingIdentityAllocator) Observe in an additional short lived go-routine (:see_no_evil:) which waits until the m.IdentityAllocator is initialized before starting to observe it.

The reason for this is that the global identity allocator is initialized asynchronously - and might not be properly intialized once starting to observe. This would result in the immediate completion. IMO in these situations its expected that the observation waits until the identity allocator is ready and initialized anyway - otherwise the depending functionality doesn't work as expected (e.g. auth gc job would stop listening for these events)

This comes with the "advantage" that we don't need to discuss whether complete is called from a separate go-routine by itself. (because the whole functionality runs in its own go-routine)

I documented this additional short-lived go-routine! (same situation here - locking the mutex (necessary to ensure that the identitAllocator is really set) shouldn't be a problem - because the inner observer starts a go-routine too)

PTAL @joestringer @dylandreimerink 🙏

PS: yes at best we would start to properly modularize the global identity allocator and register proper lifecycle hooks for the initialization (including the backends) - so dependent components can expect a initialized component.

pkg/allocator/cache.go Show resolved Hide resolved
Copy link
Member

@dylandreimerink dylandreimerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall!

pkg/identity/cache/allocator.go Outdated Show resolved Hide resolved
@mhofstetter mhofstetter force-pushed the pr/mhofstetter/observable-allocator branch from f6c1a02 to e6e26c0 Compare June 26, 2023 07:22
@mhofstetter
Copy link
Member Author

@dylandreimerink thanks a lot for your review: i removed the TODO, refactored towards using stream.Map(:pray:) and stream.Empty (discovered after reading through the API of stream)

@mhofstetter mhofstetter force-pushed the pr/mhofstetter/observable-allocator branch 2 times, most recently from e1cfa29 to 5afd28a Compare June 26, 2023 10:30
@mhofstetter
Copy link
Member Author

mhofstetter commented Jun 26, 2023

The latest force push changes that the global identity allocator needs to be initialized before starting to observe the identity allocator for changes.

This is necessasry, because the initialization of the global identity allocator is performed asynchronously when the daemon gets started.

@mhofstetter mhofstetter force-pushed the pr/mhofstetter/observable-allocator branch from 5afd28a to 941eccf Compare June 26, 2023 10:58
Copy link
Member

@jrajahalme jrajahalme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joestringer
Copy link
Member

/test

This makes the identity allocator changes observable in order
to provide cells backend-agnostic access to identity allocations.

The identity allocation changes are now provided in the hive as
stream.Observable[cache.IdentityChange]. When observing
the initial listing is first waited for, then the current state
is replayed followed by sync event and updates to state:

`[ (wait for OnListDone()), Upsert, Upsert, Sync, Upsert, Deletem, ... ]`

Co-authored-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
@mhofstetter mhofstetter force-pushed the pr/mhofstetter/observable-allocator branch from 941eccf to 8bbafb6 Compare June 27, 2023 06:54
@mhofstetter
Copy link
Member Author

rebased to main (failed tests related to #26496)

@mhofstetter
Copy link
Member Author

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 27, 2023
@borkmann borkmann merged commit 591ee68 into cilium:main Jun 27, 2023
65 checks passed
@mhofstetter mhofstetter deleted the pr/mhofstetter/observable-allocator branch June 27, 2023 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. kind/enhancement This would improve or streamline existing functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants