Allow aggregating allocations in memory in the tracked process #277

godlygeek · 2022-12-23T00:10:42Z

Relates-to: #68

src/memray/_memray/snapshot.h

pablogsal · 2023-01-03T14:32:57Z

Some profiling data of hotspots:

pablogsal · 2023-01-03T14:34:10Z

Seems that locking continues to be a pain (this is self time):

src/memray/_memray/snapshot.cpp

src/memray/_memray/record_writer.cpp

pablogsal · 2023-01-06T17:10:57Z

You need to fix the conflicts in src/memray/commands/run.py from your other PR :)

tests/integration/test_main.py

Previously we handled this by allowing clients of the `RecordWriter` to write each of the 3 types of records used for describing the memory mappings in the capture file. Instead, provide a method that takes all loaded segments and writes them. This is slightly less efficient, but this is a better abstraction and prevents the tracker from needing to know implementation details of the representation of memory mappings used by the writer. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

Cut and paste some functions into different locations in the file. This has no effect on the compiled code, but this prefactoring will make the next diff easier to read. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

This refactoring lays the ground work for introducing an abstract base class that `RecordWriter` extends, and thereby a new type of writer. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

A future commit will make `RecordWriter` an abstract base class with multiple implementations to choose from. This means that every call will necessarily indirect through a vtable, and there will no longer be any advantage to having declared these methods as inline. In preparation for that change, move the methods into the .cpp file. There are no changes to the contents of any of these methods. The only changes are moving them and removing the `inline` declarations. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

Going forward, we want `RecordWriter` to be the name of an abstract base class with multiple implementations. Call our original implementation `StreamingRecordWriter`, as it streams each record to the provided sink immediately, without any aggregations or pre-processing. Keep the original `RecordWriter` name as an alias for now, to avoid needing to change clients of the `RecordWriter`. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

This lays the groundwork for the reader to recognize and decode different file formats. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

Keep the utility methods for writing objects of different types to the sink in the base class, so that they can be shared by different writers. Make `d_sink` a protected member of the parent class, so that these utility methods in the parent class have a sink to work with. Rather than construct instances of subclasses directly, expose a `createRecordWriter` factory function that creates and configures an appropriate `RecordWriter` instance based on the provided `FileFormat`. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

Our existing aggregators don't need much information about what effects any given munmap() call resulted in. One cares only about how it changed the total number of heap bytes tracked by memray, and one doesn't even care about that as long as the interval tree is updated to reflect the new set of allocations. We'll be introducing a new aggregator that needs additional information, though. Specifically, it needs to know which allocations were affected by the `munmap` (so that it can allocate the freed bytes to the correct location), and whether they were completely freed, or shrunk, or split into two separate pieces (so that it can adjust the number of allocations associated with each location). Make `removeInterval` expose extra information about what effects the `munmap` call had on existing allocations, to support this new use case. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

By tracking the last known high water mark as of the last allocation or deallocation at any given location, we can report records contributing to the high water mark or records indicating leaked allocations by just looping over the history associated with each allocation. Tracking all of the information that we need takes time proportional to the number of allocations, and reporting the high water mark or leaked records takes time proportional to the number of distinct allocation locations. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

This allows callers to iterate over every node in the tree by calling `nextNode` for each valid index. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

This implementation of the `RecordWriter` interface processes the allocations fed to it in-memory in order to detect the high water mark records or the leaked records. Once all allocations have been processed, it writes information about either the high water mark records or the leaked records to the capture file. It does not retain history of each allocation that was performed, only the number of allocations and number of bytes that each allocation location contributed to the high water mark or to the leaked allocations. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

No interface changes are required to support returning aggregated allocations, as we were already returning an allocation object which included `n_allocations` as a field. However, interface changes are required to support the heap memory size over time graph. Up until now the `RecordReader` has emitted only the rss size and time, and the caller needed to augment those by tracking the heap size on its own and adding that field. With aggregated capture files the caller can't do this (since it has no way of knowing the heap size over time), so instead we need to emit a different type of record, `MemorySnapshot`. The new record type will come only from aggregated capture files, since capture files with all allocations don't contain enough information for the record reader to provide the heap size without tracking it itself. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

Expose the `FileFormat` enum from the `memray._memray` module, and let users specify a file format to use when constructing a `Tracker`. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

The `AggregatingRecordWriter` throws the thread records it gets into an `unordered_map`, which can allocate memory. Don't try to capture that allocation. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

Update `TestHighWatermark` and `TestLeaks` to exercise both capture file formats. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

Apparently Alpine implements `calloc` in terms of `malloc`, and `posix_memalign` in terms of `aligned_alloc`. Ensure that we don't record duplicate allocations on Alpine. Our `ALL_ALLOCATIONS` capture file format incorrectly recorded both the `calloc` and the `malloc` it is implemented in terms of, but because the `malloc` happens first and returns the same pointer as the `calloc` later returns, the aggregator wound up overwriting the allocation record for the `malloc` with the one for the `calloc`, silently dropping it. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

Add a test helper to `memray._memray` to help us exercise `HighWaterMarkAggregator`. I originally intended to put this in `_memray_test_utils.pyx`, but because that doesn't link in any `.cpp` files or `libunwind`, that turned out to require a lot of changes, and it was easier to just put this in `_memray.pyx`. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

Some operations aren't supported when working with aggregated capture files, and we raise if you attempt them. Add tests exercising this. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

pablogsal

LGTM

This PR is already gigantic and I have reviewed it 2 times and could not find anything immediate that appears to be incorrect so I prefer to land it now and then fix things later if we discover that we missed something.

Excellent, job @godlygeek! 🚀

godlygeek self-assigned this Dec 23, 2022

godlygeek force-pushed the record_aggregated_allocations branch 9 times, most recently from 4c4716e to bbe7644 Compare December 24, 2022 06:31

godlygeek commented Dec 24, 2022

View reviewed changes

src/memray/_memray/snapshot.h Outdated Show resolved Hide resolved

godlygeek force-pushed the record_aggregated_allocations branch from bbe7644 to e49af89 Compare December 30, 2022 23:37

godlygeek marked this pull request as ready for review December 30, 2022 23:49

godlygeek force-pushed the record_aggregated_allocations branch 2 times, most recently from 4afec07 to 127874a Compare December 31, 2022 01:45

godlygeek force-pushed the record_aggregated_allocations branch 4 times, most recently from e21c1ab to 1e805fe Compare January 4, 2023 23:39

pablogsal reviewed Jan 6, 2023

View reviewed changes

tests/integration/test_main.py Show resolved Hide resolved

godlygeek force-pushed the record_aggregated_allocations branch 2 times, most recently from 3c5a151 to d895ff0 Compare January 7, 2023 01:46

pablogsal force-pushed the main branch from e5d4262 to 8f62df2 Compare January 13, 2023 22:33

godlygeek added 3 commits January 17, 2023 12:50

Rearrange code in record_writer.h

63515df

Cut and paste some functions into different locations in the file. This has no effect on the compiled code, but this prefactoring will make the next diff easier to read. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

Drop public template functions in RecordWriter

2aa277f

This refactoring lays the ground work for introducing an abstract base class that `RecordWriter` extends, and thereby a new type of writer. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

godlygeek added 18 commits January 17, 2023 13:53

Add a file format enum to our record header

6cb7e81

This lays the groundwork for the reader to recognize and decode different file formats. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

Expose the size of the FrameTree graph

4603496

This allows callers to iterate over every node in the tree by calling `nextNode` for each valid index. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

Let Tracker API users users choose the file format

026e2a6

Expose the `FileFormat` enum from the `memray._memray` module, and let users specify a file format to use when constructing a `Tracker`. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

docs: Document the FileFormat enum

c7d4162

Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

Ignore allocations under registerThreadNameImpl

06f48b4

The `AggregatingRecordWriter` throws the thread records it gets into an `unordered_map`, which can allocate memory. Don't try to capture that allocation. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

tests: Test memray parse on aggregated captures

67b3130

Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

Give memray run a new --aggregate option

83efcd7

Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

tests: Exercise AGGREGATED_ALLOCATIONS captures

ebce280

Update `TestHighWatermark` and `TestLeaks` to exercise both capture file formats. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

test: Exercise unsupported FileReader operations

fb14387

Some operations aren't supported when working with aggregated capture files, and we raise if you attempt them. Add tests exercising this. Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>

godlygeek force-pushed the record_aggregated_allocations branch from d895ff0 to fb14387 Compare January 17, 2023 19:09

pablogsal mentioned this pull request Jan 18, 2023

Add the option to do statistical sampling of allocations #94

Closed

pablogsal approved these changes Jan 19, 2023

View reviewed changes

pablogsal merged commit 95e8210 into bloomberg:main Jan 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow aggregating allocations in memory in the tracked process #277

Allow aggregating allocations in memory in the tracked process #277

godlygeek commented Dec 23, 2022

pablogsal commented Jan 3, 2023

pablogsal commented Jan 3, 2023

pablogsal commented Jan 6, 2023

pablogsal left a comment

Allow aggregating allocations in memory in the tracked process #277

Allow aggregating allocations in memory in the tracked process #277

Conversation

godlygeek commented Dec 23, 2022

pablogsal commented Jan 3, 2023

pablogsal commented Jan 3, 2023

pablogsal commented Jan 6, 2023

pablogsal left a comment

Choose a reason for hiding this comment