-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EXPERIMENT: [C++] Access mimalloc through dynamically-resolved symbols #41128
base: main
Are you sure you want to change the base?
Conversation
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format?
or
In the case of PARQUET issues on JIRA the title also supports:
See also: |
6a53c52
to
b40f113
Compare
@github-actions crossbow submit wheel-manylinux-2-28* |
Revision: b40f113 Submitted crossbow builds: ursacomputing/crossbow @ actions-468528d58b |
We can see the difference in the code generated for freeing mimalloc memory:
|
Also, I do not see any significant and robust regressions in our nano-benchmarks when running them locally. |
Interestingly, the binary wheels built in the Crossbow builds above use a different indirection code (a direct call to
I've checked that |
Rationale for this change
The memray memory profiler works by interposing certain dynamic symbols in the profiled process to replace them with their own functions that will collect memory allocation data. It will currently, to the best of my knowledge, only recognize system C calls such
malloc
,mmap
...When a third-party allocator like mimalloc or jemalloc is being used, such that Arrow does by default, memray does not see the logical allocation calls made through these allocator's APIs (because they are not interposed), but only the raw memory reservations that they issue using system routines.
This can lead people using memray to think that a given Arrow workload (or any workload using such allocators, really) that an inordinate amount of memory is being used, while the reported memory mostly represents non-committed virtual memory that the allocator keeps for performance reasons. Concrete example in GH-40301: we allocate a number of 1kiB buffers from mimalloc, but memray sees a similar number of 64MiB calls to
mmap
.We discussed how to enhance memray such as to account for the corresponding logical allocations, and we came to the conclusion that it requires that Arrow exposes API calls that can be dynamically interposed. Since we typically build against a static
libmimalloc.a
, the mimalloc symbols cannot be exposed (at least, I cannot seem to get this to work on Ubuntu). This means we need to define our own symbols wrapping the mimalloc APIs.What changes are included in this PR?
Define public, interposable symbols that redirect into the mimalloc APIs that we use.
Are these changes tested?
Not for now. We could probably test them, at least on Linux, by compiling an almost trivial shared library and interposing it using
LD_PRELOAD
.Are there any user-facing changes?
No. There should not be any noticeable performance regression, except perhaps on memory pool micro-benchmarks.