Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(profiling): consolidate memalloc global state #12845

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nsrip-dd
Copy link
Contributor

The memory profiler has several bits of global state. There are the data
structures to hold sampled allocations for the allocation and heap
profilers. There are locks protecting those data structures. There is a
scratch buffer used for building tracebacks. We can't completely avoid
global state for the memory profiler since the allocators themselves are
"global". But right now this state is spread across several different
global objects accessed by different parts of the code. These objects
are inherently interdependent. Data structures depend on their locks,
and building profiles depens on all this state.

I'd like to consolidate this state and pass it around explicitly to the
places where it's needed. This is mainly motivated by transitioning to
C++ (or Rust). In C++, at least, managing lifetime and initialization
for global state is painful. We need the state we use to actually be
initialized by the time we use it, and it needs to stay alive until the
last time we need it. This is more painful when there are multiple
inter-dependent globals. It's easier if there is one object. We can
construct it in one go with "new", let RAII set everything up, and store
in a single global pointer. Then we can pass that pointer around
explicitly to the places where the profiler state is manipulated so
that, even if it's global, we can be more confident the state is
initialized & valid when we're looking at the code locally.

Admittedly the "passing things around explicitly" part is also a
stylistic preference. Really the main thing is to encapsulate access to
the shared profiler state so we can be more confident that it's managed
correctly.

This PR doesn't consolidate everything. We currently need to leave the
reentrancy guard as-is, since C structs can't have thread-local (or
static) members.

This PR also picks up a stray change that's not necessarily worth
submitting separately: the format strings for the memalloc_start error
messages were usign the wrong format specifiers. This was causing the
"v1" unit tests to fail on my macbook. We can use inttypes.h to get
portable format specifiers. I just did this to make sure the tests were
all green locally; doesn't feel worth burning a bunch of CI time just
for that by itself.

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

The memory profiler has several bits of global state. There are the data
structures to hold sampled allocations for the allocation and heap
profilers. There are locks protecting those data structures. There is a
scratch buffer used for building tracebacks. We can't completely avoid
global state for the memory profiler since the allocators themselves are
"global". But right now this state is spread across several different
global objects accessed by different parts of the code. These objects
are inherently interdependent. Data structures depend on their locks,
and building profiles depens on all this state.

I'd like to consolidate this state and pass it around explicitly to the
places where it's needed. This is mainly motivated by transitioning to
C++ (or Rust). In C++, at least, managing lifetime and initialization
for global state is painful. We need the state we use to actually be
initialized by the time we use it, and it needs to stay alive until the
last time we need it. This is more painful when there are multiple
inter-dependent globals. It's easier if there is one object. We can
construct it in one go with "new", let RAII set everything up, and store
in a single global pointer. Then we can pass that pointer around
explicitly to the places where the profiler state is manipulated so
that, even if it's global, we can be more confident the state is
initialized & valid when we're looking at the code locally.

Admittedly the "passing things around explicitly" part is also a
stylistic preference. Really the main thing is to encapsulate access to
the shared profiler state so we can be more confident that it's managed
correctly.

This PR doesn't consolidate everything. We currently need to leave the
reentrancy guard as-is, since C structs can't have thread-local (or
static) members.

This PR also picks up a stray change that's not necessarily worth
submitting separately: the format strings for the memalloc_start error
messages were usign the wrong format specifiers. This was causing the
"v1" unit tests to fail on my macbook. We can use inttypes.h to get
portable format specifiers. I just did this to make sure the tests were
all green locally; doesn't feel worth burning a bunch of CI time just
for that by itself.
Copy link
Contributor

CODEOWNERS have been resolved as:

ddtrace/profiling/collector/_memalloc.h                                 @DataDog/profiling-python
ddtrace/profiling/collector/_memalloc.c                                 @DataDog/profiling-python
ddtrace/profiling/collector/_memalloc_heap.c                            @DataDog/profiling-python
ddtrace/profiling/collector/_memalloc_heap.h                            @DataDog/profiling-python

@P403n1x87
Copy link
Contributor

Just a note not totally related to the PR: if we want to support free-threaded mode in the future, we must ensure that native extensions have no global native state (it's fine for module objects to have a global Python state, as there would be copies of those for each sub-interpreter).

@nsrip-dd
Copy link
Contributor Author

nsrip-dd commented Mar 21, 2025

Just a note not totally related to the PR: if we want to support free-threaded mode in the future, we must ensure that native extensions have no global native state (it's fine for module objects to have a global Python state, as there would be copies of those for each sub-interpreter).

Thanks for the call out! Just to make sure I understand, would that mean in practice that we'd attach module-level state to the module object? i.e. the one we return from the module initialization function?

Edit to add: I found this guide, which seems useful if that's the direction we should take: https://docs.python.org/3/howto/isolating-extensions.html

Copy link
Contributor

Bootstrap import analysis

Comparison of import times between this PR and main.

Summary

The average import time in this PR is: 236 ± 2 ms.

The average import time in main is: 238 ± 2 ms.

The import time difference between this PR and main is: -1.69 ± 0.09 ms.

Import time breakdown

The following import paths have grown:

ddtrace.auto 0.020 ms (0.01%)
ddtrace.bootstrap.sitecustomize 0.020 ms (0.01%)
ddtrace.bootstrap.preload 0.020 ms (0.01%)
ddtrace.internal.products 0.020 ms (0.01%)
ddtrace.settings.dynamic_instrumentation 0.020 ms (0.01%)

The following import paths have shrunk:

ddtrace.auto 1.909 ms (0.81%)
ddtrace.bootstrap.sitecustomize 1.251 ms (0.53%)
ddtrace.bootstrap.preload 1.251 ms (0.53%)
ddtrace.internal.products 1.251 ms (0.53%)
ddtrace.internal.remoteconfig.client 0.624 ms (0.26%)
ddtrace 0.658 ms (0.28%)

@pr-commenter
Copy link

pr-commenter bot commented Mar 21, 2025

Benchmarks

Benchmark execution time: 2025-03-21 19:45:58

Comparing candidate commit a3bc251 in PR branch nick.ripley/consolidate-memalloc-state with baseline commit 00c254b in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 498 metrics, 2 unstable metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants