Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR introduces jemalloc memory profiling capabilities to replace the existing mimalloc allocator. The implementation adds signal-based memory profiling that can be triggered via SIGUSR2 to collect heap dumps for performance analysis.
- Replaces mimalloc with jemalloc as the global allocator
- Adds a JemallocMemoryProfiler that responds to SIGUSR2 signals to collect heap dumps
- Integrates the profiler into the autopilot service startup process
Reviewed Changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/shared/src/lib.rs | Adds new alloc module to shared library |
| crates/shared/src/alloc.rs | Implements JemallocMemoryProfiler with signal handling and dump functionality |
| crates/shared/Cargo.toml | Adds tikv-jemalloc-ctl dependency for profiling controls |
| crates/autopilot/src/run.rs | Integrates memory profiler into autopilot startup |
| crates/autopilot/src/main.rs | Switches global allocator from mimalloc to jemalloc |
| crates/autopilot/Cargo.toml | Replaces mimalloc with tikv-jemallocator dependency |
| Dockerfile | Adds make package for build dependencies |
|
This pull request has been marked as stale because it has been inactive a while. Please update this pull request or it will be automatically closed. |
# Conflicts: # Cargo.lock # crates/autopilot/src/run.rs
# Description Even after #3499, a memory leak [was noticed](#3554) when the UniV3 liquidity fetching is enabled in the Baseline solver. Looking at the code, I don't see an obvious reason for it other than the UniswapV3QuoterV2 contract instance is being extensively cloned on each `/solve` request. # Changes - Use `Arc`'ed UniswapV3QuoterV2 contract instance. ## How to test I've tried to resurrect [this](a9ff88f) e2e test, but the liquidity fetching form subgraph sometimes takes a very long time, so the test is too flaky to enable it. ## Follow-ups In any case, I need to find a way to make the Jemalloc profiler work[#3533] to collect memory dumps later in case the memory leak happens again. ## Related Issues #3554
|
This pull request has been marked as stale because it has been inactive a while. Please update this pull request or it will be automatically closed. |
# Description Even after #3499, a memory leak [was noticed](#3554) when the UniV3 liquidity fetching is enabled in the Baseline solver. Looking at the code, I don't see an obvious reason for it other than the UniswapV3QuoterV2 contract instance is being extensively cloned on each `/solve` request. # Changes - Use `Arc`'ed UniswapV3QuoterV2 contract instance. ## How to test I've tried to resurrect [this](a9ff88f) e2e test, but the liquidity fetching form subgraph sometimes takes a very long time, so the test is too flaky to enable it. ## Follow-ups In any case, I need to find a way to make the Jemalloc profiler work[#3533] to collect memory dumps later in case the memory leak happens again. ## Related Issues #3554
|
Given that you are waiting for the maintainers to unblock you, can this PR and the other one temporarily be closed? |
Description
This is a proof of concept for integrating the Jemalloc memory profiler into various services to collect memory dumps on demand. Profiling with the Jemalloc allocator is currently considered the most resource-efficient option (though this still needs to be validated in prod) and doesn’t require running additional applications, which is a major advantage.
The implementation is based on various open-source projects(e.g. https://github.com/tikv/tikv/blob/master/components/tikv_alloc/src/jemalloc.rs#L327)
Changes
autopilot. Other crates will be supported in follow-up PRs. The major disadvantage for this approach is that the binary needs to be recompiled. Selecting the memory allocator after seems to be impossible.This is implemented using a combination of aTODO: update it. This approach is much easier than introducing an HTTP API with auth, etc:USR2signal and environment variables.MEM_DUMP_PATHENV param to specify the dump output directory.Set theTODO: update itPROFILER_COMMANDENV param with one of the values:enable- activates profilingdisable- disables profilingdump- stores the recorded dumprun_for(<Duration, e.g. 1h>)- automatically activates profiling, records the dump for the provided duration, stores the dump, and disables profiling.kill -USR2 <pid>to execute the command specified in the previous step.Further automation
Based on the control flexibility, some infra automation can be implemented based on the resource consumption. For example, once memory reaches 50%, start profiling and record a dump when it reaches 90%.
How to test
Try running it locally and in staging. That would require an infra change with PVC creation(it was already done for the Heaptrack profiler)