-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/sort events by timestamp #1103
Feature/sort events by timestamp #1103
Conversation
Can you please open an issue explaining what this is and what is the problem you were trying to solve |
The current solution does not seem to solve the problem of disordered events from all sources, just from Perf Buffer CPUs union disordering. It seems that there is another reason for the disordering right now from tests I made, so this PR should stay as a draft until I will give global solution. |
Hi Alon, great work so far! maybe the 'disordering' between events is related to multi core cpu? if so then that would be logical as the libbpf implements a buffer per core. If that's true then we shouldn't expect disordering in single core cpu. What do you think? |
Hi Michael! |
c703184
to
692e071
Compare
After testing the performance of tracee-ebpf using my new feature, this is the results:
The standard deviation of the results is about +-2% of CPU usage. The result is that the complete feature increase CPU usage by 27%, and the perf-buffer specific solution increase by 16%. |
0af0c36
to
18b5cf9
Compare
70a46b9
to
98fac21
Compare
245cb39
to
390802c
Compare
390802c
to
bed37dd
Compare
9625664
to
754343c
Compare
754343c
to
515dd0e
Compare
Also, let's not make this feature the default for now as it brings some overhead, and needs to be experimented first |
6627ee2
to
08fe4de
Compare
related #1360 - maybe we can fix it there? |
I will mark this PR as a draft until I am sure the bug is solved (currently working on fixing it) |
bbaa54a
to
2bf42f9
Compare
@AlonZivony is this good for a new review ? |
It's great. |
Okay @AlonZivony I think PR could have the following commits:
And, like showed in comments after suggested titles, most of them are single file commits (or small changes to existing files) so I think that, with these commits, your rebasing commits work will be minimal and, still, allow good maintainability for cherry-picks and further fixes. Also, are you addressing things I mentioned in comments ? Like adding the algorithm description to the source code, some changes to error messages, a missing link to next/previous for empty pool nodes (iirc). Looking forward to the final PR so I can +1 it officially. |
c5c47f1
to
746924e
Compare
746924e
to
8f06cb7
Compare
defer eq.mutex.Unlock() | ||
if eq.head == nil { | ||
if eq.tail != nil { | ||
return nil, fmt.Errorf("BUG: TAIL without a HEAD") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say "discrepancy: tail without head"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @AlonZivony, awesome job.
Thanks! |
Created a method to reorder events and passing them forward sorted by chronological order.
For more information about the issue, see #1113 .
Algorithm
Events from the kernel are received one-by-one from the data channel.
We can rely on the fact that for each CPU, all the events received from it should be ordered by timestamps (except for syscalls which are not ordered because of the way we create the event in 2 steps).
So by adding to each event the source CPU, we can put it in its own CPU queue of events and get all the events to be almost ordered.
To avoid syscalls sorting problem, we can find the appropriate place to put the event in the queue from the end of it, and with maximum of 3 iterations we should find the matching place chronologically.
After we put all the events in queues according to the source CPU, we can extract each time the oldest first-in-queue event from all CPUs and send it forward.
To be able to promise that all events prior to oldest first-in-queue event arrived to other CPU's queues, we can use the fact that the CPUs' queues are ordered and be sure that if all queues sent events after given timestamp, all events prior to that timestamp have arrived. So, we can check for each CPU what is the most recent event it sent (for CPUs that sent new events). From those, we can check which one has the most ancient timestamp. Then, we can be sure that all events with older timestamps than that timestamp were received - and send them.
However, because of the other sorting problem cases (syscalls case and the vCPU case) we cannot send them right away, we need a time buffer. This part is a bit complicated - the vCPU case makes us wait at least 100ms to be sure that a vCPU didn't send a new event right after the previous (the other case is that it has no events to send). The syscalls case makes us wait about 3ms to be sure that there are no new events older than last event received in the CPU. Because the vCPU delayed event can be a syscall event (with older timestamp than the newest event in the CPU's queue), this make things event more complicated.
The solution to the 2 sorting problem cases is to wait at least 100ms until we send the events up to the decided timestamp. This way we can be sure that there won't be any new events received with older timestamp than the chosen one.
To summarize the algorithm - we have a CPU queue for each CPU. We insert new events to the matching CPU's queue, and follow which CPU was updated with new event. Each interval, we check from the most recent events from each CPU which has the oldest timestamp. After a delay of at least 100ms, we send all events from queues up to that timestamp in an ordered way. This way, we can be sure that all sent events are sorted.
Concepts Used
Queues
I implemented a queue structure myself for this PR, because I need an access to the internal queue to be able to insert new events not at the tail of the queue (in the case of syscalls events that are received in unsorted way).
Pools
To reduce allocations and freeing amount, I introduced a Pool struct in this PR. The struct is used to make through it
alloc
andfree
of new event nodes which are used in the CPU queues. The pool save the freed nodes, and whenalloc
is required it return a saved free node if there is one saved, or alloc new one.To avoid pooling large amount of nodes, the max pooled amount is the number of allocated nodes. If the number exceeds it, the Pool will free half of the amount pooled in it.