New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
monitor: Initialize agent in deamon early #17407
monitor: Initialize agent in deamon early #17407
Conversation
test-me-please Edit: ConfirmanceGKE hit known flake #17337 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one comment.
a.Lock() | ||
defer a.Unlock() | ||
|
||
if a.events != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this check be before we call LoadPinnedMap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good catch, we should. Especially since the code doesn't close the map fd if this case is hit in the current code. Will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the call LoadPinnedMap
below the check. Unfortunately this means that we need to keep the lock while loading the map (to avoid a TOCTOW issue with a.events
), but I think that should be fine.
The `pkg/monitor/agent` package mainly performs three tasks: - Managing subscribers of monitor events (such as `cilium monitor` clients or the Hubble observer) - Receiving and broadcasting user-space notifications ("agent events") to the above subscribers. - Polling the perf event ring buffer for datapath notifications and broadcasting them to subscribers. The third task can only be performed once the perf event ring buffer map has been set up by the daemon. However, some subsystems will try to send user-space notifications before the events map has been set up. This often causes an unnecessary log warning to be emitted, which regularly confuses users (see cilium#14146). Fundamentally however, there is no reason to drop user-space events if the datapath is not set up yet. Therefore, this commit splits the monitor agent constructor into two functions: The first one, `NewAgent` initializes the subscriber management and allows user-space notifications to be received early, the second function `AttachToEventsMap` opens the perf event ring buffer to poll datapath events. By splitting up the initialization phase and invoking the constructor earlier, we can avoid the data race in cilium#17167 and we also avoid emitting a unhelpful warning if user-space events are created early. Note: Because subscribers (i.e. Hubble or `cilium monitor` clients) are still attached relatively late, we will still drop early events if there is no one listening for them. This commit does not address that - but instead of emitting a confusing warning that the user cannot do anything about, it will drop the event the same way Cilium up to version 1.8 did. Fixes: cilium#17167 Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
4994e40
to
8cd9c6e
Compare
test-me-please Job 'Cilium-PR-K8s-GKE' has 1 failure but they might be new flake since it also hit 1 known flake: #17403 (79.85) |
Marking this ready to merge. Both GKE failures are known flakes and this PR fixes some other unit test flakes, so it should be exempt from the zero flake merge policy. |
This can mitigate issues on v1.10 where Cilium can erroneously print warnings in the log on startup like |
The
pkg/monitor/agent
package mainly performs three tasks:cilium monitor
clients or the Hubble observer)
to the above subscribers.
broadcasting them to subscribers.
The third task can only be performed once the perf event ring buffer map
has been set up by the daemon. However, some subsystems will try to send
user-space notifications before the events map has been set up. This
often causes an unnecessary log warning to be emitted, which regularly
confuses users (see #14146). Fundamentally however, there is no reason
to drop user-space events if the datapath is not set up yet.
Therefore, this commit splits the monitor agent constructor into two
functions: The first one,
NewAgent
initializes the subscribermanagement and allows user-space notifications to be received early, the
second function
AttachToEventsMap
opens the perf event ring buffer topoll datapath events.
By splitting up the initialization phase and invoking the constructor
earlier, we can avoid the data race in #17167 and we also avoid emitting
a unhelpful warning if user-space events are created early.
Note: Because subscribers (i.e. Hubble or
cilium monitor
clients) arestill attached relatively late, we will still drop early events if there
is no one listening for them. This commit does not address that - but
instead of emitting a confusing warning that the user cannot do anything
about, it will drop the event the same way Cilium up to version 1.8 did.
Fixes: #17167