-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(profiling): consolidate memalloc global state #12845
base: main
Are you sure you want to change the base?
Conversation
The memory profiler has several bits of global state. There are the data structures to hold sampled allocations for the allocation and heap profilers. There are locks protecting those data structures. There is a scratch buffer used for building tracebacks. We can't completely avoid global state for the memory profiler since the allocators themselves are "global". But right now this state is spread across several different global objects accessed by different parts of the code. These objects are inherently interdependent. Data structures depend on their locks, and building profiles depens on all this state. I'd like to consolidate this state and pass it around explicitly to the places where it's needed. This is mainly motivated by transitioning to C++ (or Rust). In C++, at least, managing lifetime and initialization for global state is painful. We need the state we use to actually be initialized by the time we use it, and it needs to stay alive until the last time we need it. This is more painful when there are multiple inter-dependent globals. It's easier if there is one object. We can construct it in one go with "new", let RAII set everything up, and store in a single global pointer. Then we can pass that pointer around explicitly to the places where the profiler state is manipulated so that, even if it's global, we can be more confident the state is initialized & valid when we're looking at the code locally. Admittedly the "passing things around explicitly" part is also a stylistic preference. Really the main thing is to encapsulate access to the shared profiler state so we can be more confident that it's managed correctly. This PR doesn't consolidate everything. We currently need to leave the reentrancy guard as-is, since C structs can't have thread-local (or static) members. This PR also picks up a stray change that's not necessarily worth submitting separately: the format strings for the memalloc_start error messages were usign the wrong format specifiers. This was causing the "v1" unit tests to fail on my macbook. We can use inttypes.h to get portable format specifiers. I just did this to make sure the tests were all green locally; doesn't feel worth burning a bunch of CI time just for that by itself.
|
Just a note not totally related to the PR: if we want to support free-threaded mode in the future, we must ensure that native extensions have no global native state (it's fine for module objects to have a global Python state, as there would be copies of those for each sub-interpreter). |
Thanks for the call out! Just to make sure I understand, would that mean in practice that we'd attach module-level state to the module object? i.e. the one we return from the module initialization function? Edit to add: I found this guide, which seems useful if that's the direction we should take: https://docs.python.org/3/howto/isolating-extensions.html |
Bootstrap import analysisComparison of import times between this PR and main. SummaryThe average import time in this PR is: 236 ± 2 ms. The average import time in main is: 238 ± 2 ms. The import time difference between this PR and main is: -1.69 ± 0.09 ms. Import time breakdownThe following import paths have grown:
|
BenchmarksBenchmark execution time: 2025-03-21 19:45:58 Comparing candidate commit a3bc251 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 498 metrics, 2 unstable metrics. |
The memory profiler has several bits of global state. There are the data
structures to hold sampled allocations for the allocation and heap
profilers. There are locks protecting those data structures. There is a
scratch buffer used for building tracebacks. We can't completely avoid
global state for the memory profiler since the allocators themselves are
"global". But right now this state is spread across several different
global objects accessed by different parts of the code. These objects
are inherently interdependent. Data structures depend on their locks,
and building profiles depens on all this state.
I'd like to consolidate this state and pass it around explicitly to the
places where it's needed. This is mainly motivated by transitioning to
C++ (or Rust). In C++, at least, managing lifetime and initialization
for global state is painful. We need the state we use to actually be
initialized by the time we use it, and it needs to stay alive until the
last time we need it. This is more painful when there are multiple
inter-dependent globals. It's easier if there is one object. We can
construct it in one go with "new", let RAII set everything up, and store
in a single global pointer. Then we can pass that pointer around
explicitly to the places where the profiler state is manipulated so
that, even if it's global, we can be more confident the state is
initialized & valid when we're looking at the code locally.
Admittedly the "passing things around explicitly" part is also a
stylistic preference. Really the main thing is to encapsulate access to
the shared profiler state so we can be more confident that it's managed
correctly.
This PR doesn't consolidate everything. We currently need to leave the
reentrancy guard as-is, since C structs can't have thread-local (or
static) members.
This PR also picks up a stray change that's not necessarily worth
submitting separately: the format strings for the memalloc_start error
messages were usign the wrong format specifiers. This was causing the
"v1" unit tests to fail on my macbook. We can use inttypes.h to get
portable format specifiers. I just did this to make sure the tests were
all green locally; doesn't feel worth burning a bunch of CI time just
for that by itself.
Checklist
Reviewer Checklist