Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: memory usage profiling #80

Merged
merged 11 commits into from
Jul 18, 2024
Merged

ENH: memory usage profiling #80

merged 11 commits into from
Jul 18, 2024

Conversation

tcpan
Copy link
Collaborator

@tcpan tcpan commented Jun 7, 2024

added memory usage profiling for file write and reads. The read test memory usages are measured per function call and reported as min/mean/max.

Also reporting average wall time for the read function calls. Comparing this to the "sec" times can indicate potential presence of multithreaded code.

@briangow
Copy link
Collaborator

briangow commented Jun 7, 2024

Thanks @tcpan ! I started a test run of this on my Mac with the p169 MIMIC-IV waveform file from the repo:

Format: waveform_benchmark.formats.wfdb.WFDBFormat16
         (WFDB with 16-bit binary storage)
Record: /Users/briangow/Dropbox/MIT/chorus/data/mimic_iv/waveforms/waves/p169/p16955095/82284982/82284982
         604 seconds x 5 channels
         566400 timepoints, 562240 samples (99.3%)
________________________________________________________________________________________________________________________________
Read performance (median of N trials):
 #seek  #read      KiB      sec   walltime                Mem (min/mean/max)    [N]
   nan    nan      nan  17.9688     4.8178   (143.4844/144.1875/145.5625)    [3] read 1 x 604s, all channels
   nan    nan      nan  89.4565    23.2836   ( 90.2656/102.5750/145.5781)    [3] read 5 x 500s, all channels

It would be nice to have units for walltime and Mem. More importantly, the reported read times under sec were much slower than expected. I confirmed that they are indeed much faster when switching back to main and running the same file again:

________________________________________________________________
Read performance (median of N trials):
 #seek  #read      KiB      sec     [N]
   nan    nan      nan   0.0050  [1206] read 1 x 604s, all channels
   nan    nan      nan   0.0201   [361] read 5 x 500s, all channels

Do you know why the memory profiler seems to be interfering with the read time test?

@tcpan
Copy link
Collaborator Author

tcpan commented Jun 14, 2024

I can replicate this and I think it has to do with the number of times I call memory_profile function, and possibly the package's internal function calls. I will look for an alternative solution.

@tcpan
Copy link
Collaborator Author

tcpan commented Jul 2, 2024

I am creating a new pull request on this topic.

@tcpan tcpan closed this Jul 2, 2024
@tcpan tcpan reopened this Jul 2, 2024
@tcpan
Copy link
Collaborator Author

tcpan commented Jul 2, 2024

Actually, I am reusing this pull request.

The new benchmark uses both memory_profiler, tracemalloc, as well as resource usage package (part of PerformanceCounter class already). Memory_profiler uses psutils internally, it is used at a coarse level (e.g. 50x50s). The overhead is not zero but should be a lot lower.

The results looks like this:
========== combined: memory_profiler/rusage maxrss/tracemalloc(self)
Read performance (median of N trials):
#seek #read KiB sec walltime Mem MB (used/maxrss/malloced) [N]
0 -1 800 0.0055 0.0154 1927.1328/4093.8281/ 0.0281 [653] read 1 x 214981s, all channels
0 -1 3988 0.0280 0.0750 1928.3281/4093.8281/ 0.0285 [134] read 5 x 500s, all channels
0 -1 39844 0.2218 0.5299 1928.3516/4093.8281/ 0.0588 [20] read 50 x 50s, all channels
0 -1 398452 2.2707 4.7811 1927.6992/4093.8281/ 0.0610 [3] read 500 x 5s, all channels
0 -1 800 0.0048 0.0153 2917.5391/4093.8281/ 0.0280 [654] read 1 x 214981s, one channel
0 -1 3976 0.0189 0.0477 1927.7422/4093.8281/ 0.0285 [212] read 5 x 500s, one channel
0 -1 39856 0.2088 0.5028 1927.8086/4093.8281/ 0.0635 [21] read 50 x 50s, one channel
0 -1 398108 2.0060 4.4357 1927.8242/4093.8281/ 0.0666 [3] read 500 x 5s, one channel

This was tested with the numpy package on my 8th gen intel laptop cpu. One thing that is interesting when compared to other format is how low the memory usage is. My theory is that its use of memmap is helping. However, its memory usage may go up with deep copy.

@briangow
Copy link
Collaborator

@tcpan thanks for this update! This is running much faster now. However, it does still slow down the read test times some. I think we discussed making it optional to run the memory profiler. If you agree, I think we should do that.

@briangow briangow requested a review from bemoody July 11, 2024 18:32
@tcpan
Copy link
Collaborator Author

tcpan commented Jul 12, 2024

I've made memory profiling optional. I have also move the walltime measurement into PerformanceCounter class. Walltime is higher than cpu time as it can include io and other system waits.

@briangow
Copy link
Collaborator

@tcpan thanks for the updates! I see that you also do the memory profiling for the waveform write now. It takes significantly longer to write the waveform when the profiler is enabled. Did you expect this?

Also, since this seems to be getting close to being merged it'd be good to add a section about how this works to https://github.com/chorus-ai/chorus_waveform/blob/main/BENCHMARK.md .

@tcpan
Copy link
Collaborator Author

tcpan commented Jul 17, 2024

memory profiling for write was there before. It is expected that write will take longer - the python memory_profiler package is polling during the execution. The PerformanceCounter class is lighter weight, but can only get the max memory usage and not the memory usage during the call. hence memory_profiler is still being used.

I updated the BENCHMARK.md file.

@briangow
Copy link
Collaborator

Thanks @tcpan !

@briangow briangow merged commit 93ce7aa into main Jul 18, 2024
1 check passed
@tompollard tompollard deleted the memory_profile branch July 18, 2024 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants