Skip to content

[PoC] Jemalloc profiler#3533

Closed
squadgazzz wants to merge 27 commits intomainfrom
jemalloc-profiler
Closed

[PoC] Jemalloc profiler#3533
squadgazzz wants to merge 27 commits intomainfrom
jemalloc-profiler

Conversation

@squadgazzz
Copy link
Contributor

@squadgazzz squadgazzz commented Jul 31, 2025

Description

This is a proof of concept for integrating the Jemalloc memory profiler into various services to collect memory dumps on demand. Profiling with the Jemalloc allocator is currently considered the most resource-efficient option (though this still needs to be validated in prod) and doesn’t require running additional applications, which is a major advantage.

The implementation is based on various open-source projects(e.g. https://github.com/tikv/tikv/blob/master/components/tikv_alloc/src/jemalloc.rs#L327)

Changes

  • Allocator selection: Mimalloc is currently used across all services and provides the best performance, so Jemalloc is only considered for collecting memory dumps. The allocator is selected based on the feature selected during the compilation. The Dockerfile is updated accordingly. This allows granular selection of the Jemalloc allocator. This PR contains implementation only for autopilot. Other crates will be supported in follow-up PRs. The major disadvantage for this approach is that the binary needs to be recompiled. Selecting the memory allocator after seems to be impossible.
  • Profiler activation: The Jemalloc profiler only records allocations that occur while profiling is active. By default, profiling is disabled. This is useful when allocations during service warm-up consume most of the memory and memory leaks slowly later.
  • Profiler control: This is implemented using a combination of a USR2 signal and environment variables. TODO: update it. This approach is much easier than introducing an HTTP API with auth, etc:
    • Set the MEM_DUMP_PATH ENV param to specify the dump output directory.
    • Set the PROFILER_COMMAND ENV param with one of the values: TODO: update it
      • enable - activates profiling
      • disable - disables profiling
      • dump - stores the recorded dump
      • run_for(<Duration, e.g. 1h>) - automatically activates profiling, records the dump for the provided duration, stores the dump, and disables profiling.
    • Send USR2 using kill -USR2 <pid> to execute the command specified in the previous step.

Further automation

Based on the control flexibility, some infra automation can be implemented based on the resource consumption. For example, once memory reaches 50%, start profiling and record a dump when it reaches 90%.

How to test

Try running it locally and in staging. That would require an infra change with PVC creation(it was already done for the Heaptrack profiler)

@squadgazzz squadgazzz requested a review from Copilot July 31, 2025 20:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces jemalloc memory profiling capabilities to replace the existing mimalloc allocator. The implementation adds signal-based memory profiling that can be triggered via SIGUSR2 to collect heap dumps for performance analysis.

  • Replaces mimalloc with jemalloc as the global allocator
  • Adds a JemallocMemoryProfiler that responds to SIGUSR2 signals to collect heap dumps
  • Integrates the profiler into the autopilot service startup process

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
crates/shared/src/lib.rs Adds new alloc module to shared library
crates/shared/src/alloc.rs Implements JemallocMemoryProfiler with signal handling and dump functionality
crates/shared/Cargo.toml Adds tikv-jemalloc-ctl dependency for profiling controls
crates/autopilot/src/run.rs Integrates memory profiler into autopilot startup
crates/autopilot/src/main.rs Switches global allocator from mimalloc to jemalloc
crates/autopilot/Cargo.toml Replaces mimalloc with tikv-jemallocator dependency
Dockerfile Adds make package for build dependencies

@github-actions
Copy link

github-actions bot commented Aug 9, 2025

This pull request has been marked as stale because it has been inactive a while. Please update this pull request or it will be automatically closed.

@squadgazzz squadgazzz changed the title Jemalloc profiler [PoC] Jemalloc profiler Aug 22, 2025
squadgazzz added a commit that referenced this pull request Aug 25, 2025
# Description
Even after #3499, a memory leak [was
noticed](#3554) when the
UniV3 liquidity fetching is enabled in the Baseline solver. Looking at
the code, I don't see an obvious reason for it other than the
UniswapV3QuoterV2 contract instance is being extensively cloned on each
`/solve` request.

# Changes

- Use `Arc`'ed UniswapV3QuoterV2 contract instance.

## How to test
I've tried to resurrect
[this](a9ff88f)
e2e test, but the liquidity fetching form subgraph sometimes takes a
very long time, so the test is too flaky to enable it.

## Follow-ups

In any case, I need to find a way to make the Jemalloc profiler
work[#3533] to collect memory dumps later in case the memory leak
happens again.

## Related Issues

#3554
@github-actions
Copy link

github-actions bot commented Sep 4, 2025

This pull request has been marked as stale because it has been inactive a while. Please update this pull request or it will be automatically closed.

@github-actions github-actions bot added the stale label Sep 4, 2025
m-sz pushed a commit that referenced this pull request Sep 4, 2025
# Description
Even after #3499, a memory leak [was
noticed](#3554) when the
UniV3 liquidity fetching is enabled in the Baseline solver. Looking at
the code, I don't see an obvious reason for it other than the
UniswapV3QuoterV2 contract instance is being extensively cloned on each
`/solve` request.

# Changes

- Use `Arc`'ed UniswapV3QuoterV2 contract instance.

## How to test
I've tried to resurrect
[this](a9ff88f)
e2e test, but the liquidity fetching form subgraph sometimes takes a
very long time, so the test is too flaky to enable it.

## Follow-ups

In any case, I need to find a way to make the Jemalloc profiler
work[#3533] to collect memory dumps later in case the memory leak
happens again.

## Related Issues

#3554
@MartinquaXD
Copy link
Contributor

Given that you are waiting for the maintainers to unblock you, can this PR and the other one temporarily be closed?

@squadgazzz squadgazzz closed this Sep 9, 2025
@github-actions github-actions bot locked and limited conversation to collaborators Sep 9, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants