-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
remove some padding from Task type #58363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| uint8_t pad0[3]; | ||
| // === 64 bytes (cache line) | ||
| // === 56 bytes | ||
| uint64_t rngState[JL_RNG_SIZE]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a down side, this now sits "across" a cache line -- which may not be a problem, just thought it should be mentioned explicitly :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly the rngState actually is split in "4+1" (xoshiro 4 + splitmix 1). If this was flipped (to 1+4) it would align more nicely with the cacheline... but since all of the rngState is accessed at once it probably won't matter either way.
| _Atomic(uint64_t) last_started_running_at; | ||
| // time this task has spent running; updated when it yields or finishes. | ||
| _Atomic(uint64_t) running_time_ns; | ||
| // === 64 bytes (cache line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I though "OK, this comment doesn't make sense anymore, this is no longer offset 2*64 but rather 56+64"... Looking closer I think it actually also was wrong before?!
JL_RNG_SIZE is 5 so rngState takes up 58=40 bytes; the removed metrics_enabled and pad1 fields another 4 bytes (but alignment rules for structs may have resulted in another 4 bytes of padding inserted by the compiler); then 38 for atomics; this means prior to your changes, this "section" took up 68 or even 72 bytes? So no cache line "marker" here....
Of course with your changes it is indeed now at offset 56+64, so that would suggest moving this comment down one step? Or deleting it ...
| uint16_t priority; | ||
| _Atomic(uint8_t) _isexception; // set if `result` is an exception to throw or that we exited with | ||
| // flag indicating whether or not to record timing metrics for this task | ||
| uint8_t metrics_enabled; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about grouping reentrant_timing (which needs 4 bits), metrics_enabled (one bit) and sticky (also one bit) into a single byte. That frees up one byte here, so threadpoolid could then be moved here instead of dangling at the end.
Of course that then makes access to these fields a bit more annoying. Not sure how problematic that is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I meant a C bitfield, i.e.
struct {
uint8_t reentrant_timing : 4;
uint8_t sticky : 1;
uint8_t metrics_enabled : 1;
};
int8_t threadpoolid;
This makes Task 8 bytes smaller and removes the annoying padding fields.
With another 8 bytes, we'd get into the next lower size class which would be more helpful. One way I found to do that is to move the copy_stack and started fields from jl_ucontext_t back into jl_task_t, avoiding the padding from the nested struct. That is a bit tricky though and might not make sense. Or, we could move the 2 dangling int8 fields (threadpoolid and reentrant_timing) inside jl_ucontext_t which doesn't make much sense either but is probably technically easy. Any other ideas?