Is multi-processing supported? #35

semaphore-egg · 2022-04-21T14:57:33Z

Thank you guys for this amazing beautiful cool tool!

Feature Request

I am dealing with some memory problems related to pytorch dataloader for several days. And just tried memray with a simple script below. I found that in the live mode, the information of main process is reported but all processes are detected as threads and no information is reported.

from torch.utils.data import Dataset, DataLoader
import numpy as np
import torch
import sys

class DataIter(Dataset):
    def __init__(self):
        n = int(2.4e7)
        self.data = [x for x in range(n)]

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        data = self.data[idx]
        data = np.array([data], dtype=np.int64)
        return torch.tensor(data)


train_data = DataIter()
train_loader = DataLoader(train_data, batch_size=300,
                          shuffle=True,
                          drop_last=True,
                          pin_memory=False,
                          num_workers=12)

for i, item in enumerate(train_loader):
    if i % 1000 == 0:
        print(i, end='\t', flush=True)

Screenshot of main process:

screen shot of other process:

The following command memray run --live simple_multi_worker.py is used.

Is there a way to observe multi-processing information?

The text was updated successfully, but these errors were encountered:

godlygeek · 2022-04-21T20:08:06Z

Is there a way to observe multi-processing information?

Not with live mode. We don't have any way right now for one UI to be ingesting data from multiple processes.

What we do have is the --follow-fork option for memray run. That will cause it to write one output file per child process, and you can then inspect each of those output files individually, for instance by using memray flamegraph to generate a flame graph for each that you can open up in a browser.

This will only work if it's forking and not exec'ing - meaning that it will be able to gather meaningful data if you use a multiprocessing.Pool, but not if you use a subprocess.run() call. As far as I can tell at a quick glance, though, DataLoader seems to be using multiprocessing, and so this ought to work.

--follow-fork mode is pretty new, so there may still be some kinks to work out - try it and let me know if you hit any issues.

rossjp · 2022-04-21T20:55:20Z

Are there any plans to create/extend a reporter to accept and integrate data from multiple capture files? I'm wrapping a multi-worker gunicorn process with memray and I end up with a capture file per worker. Inspecting them separately is useful, but inspecting them all merged together would also provide some insights.

godlygeek · 2022-04-21T22:10:00Z

There aren't any such plans. When we discussed amongst ourselves, the consensus was that trying to analyze information from multiple processes at the same time was likely to cause more confusion than anything else, and we had trouble coming up with any cases where seeing, say, multiple workers at once would tell you anything that you wouldn't be able to identify by analyzing them individually.

In fact, for the gunicorn case, I would think that what would make the most sense is just to drop the number of workers down to 1 while you're investigating it, so that all requests are reaching the same worker instance.

But you might be seeing something we didn't - can you describe a case where there's some interesting feature of the memory usage of a pool of worker processes that would be difficult to identify by looking at their allocations individually, but easy to identify by looking at their allocations in aggregate?

semaphore-egg · 2022-04-22T08:16:03Z

Great, --follow-fork works! Here is another question.

The script I provide is to trace the copy-on-write caused by accessing python objects from forked-process. Accessing a python object from a forked process changes the reference-counting thus triggers page duplication. It seems that memray do not report memory consumption related to COW.

So does it means we can not use memray to trace COW?

pablogsal · 2022-04-22T11:10:07Z

So do it means we can not use memray to trace COW?

Memray traces two things:

Request for allocations to the system allocators: these include malloc, mmap, calloc, realloc, valloc... and a bunch more.
Resident size every bunch of milliseconds directly from the kernel.

When a process is forked, the memory maps are shared between the part and the child until a write happens, as you indicate. When the write happens it triggers an implicit interrupt generated directly from the MMU, which in turn causes the kernel to update the page table with the new (writable) pages, decrements the number of references, and performs the write.

This means that all of this happens in kernel space and therefore memray cannot really "see" anything here. The only thing memray will be able to see is that the resident size is increased by the kernel when that happens. We don't really have a way to know what operation causes this to happen, as this is deeply underneath us and will require instrumentation or similar.

So the answer is sadly that is very unlikely that you can use many common profilers to properly trace COW unless they allow instrumentation (like valgrind does).

semaphore-egg · 2022-04-22T11:19:33Z

This is pretty reasonable. Thank you guys so much!

pablogsal assigned godlygeek Apr 21, 2022

godlygeek added the question Further information is requested label Apr 21, 2022

semaphore-egg closed this as completed Apr 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is multi-processing supported? #35

Is multi-processing supported? #35

semaphore-egg commented Apr 21, 2022

godlygeek commented Apr 21, 2022

rossjp commented Apr 21, 2022

godlygeek commented Apr 21, 2022

semaphore-egg commented Apr 22, 2022 •

edited

pablogsal commented Apr 22, 2022

semaphore-egg commented Apr 22, 2022

Is multi-processing supported? #35

Is multi-processing supported? #35

Comments

semaphore-egg commented Apr 21, 2022

Feature Request

godlygeek commented Apr 21, 2022

rossjp commented Apr 21, 2022

godlygeek commented Apr 21, 2022

semaphore-egg commented Apr 22, 2022 • edited

pablogsal commented Apr 22, 2022

semaphore-egg commented Apr 22, 2022

semaphore-egg commented Apr 22, 2022 •

edited