Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why load_balanced_view use so many memory? #539

Open
GF-Huang opened this issue Jul 28, 2021 · 14 comments
Open

Why load_balanced_view use so many memory? #539

GF-Huang opened this issue Jul 28, 2021 · 14 comments
Labels

Comments

@GF-Huang
Copy link

image

@minrk
Copy link
Member

minrk commented Aug 2, 2021

The IPython client is spending all its time serializing 100k messages here. The main thing is that a load balanced view creates one message per item by default, which is 100k tasks in this case. That's a lot! The second is that IPython has special-handling of numpy arrays, preserving the arguments as numpy arrays through serialization, and reconstructing the result as a numpy array. That means that IPython is creating 100,000 single-field numpy arrays to send and then serializing them. This saves a lot when sending large arrays, but costs a lot when sending a very large number of tiny arrays. It's not sending 1 100k times, it's sending np.ndarray([1]) 100k times. That's where ~all the time and memory is being spent.

You'll see much better behavior if you don't use a numpy array for this very simple case, or if you use e.g. chunksize=1000 to reduce the number of messages.

You might also consider the new LoadBalancedView.imap which sends messages more efficiently when you have a very large input stream by only submitting a limited number of messages and waiting for results before preparing and serializing more.

@GF-Huang
Copy link
Author

GF-Huang commented Aug 4, 2021

How to wait_interactivefor imap result?

@minrk
Copy link
Member

minrk commented Aug 4, 2021

imap is a generator, unilke other AsyncResult objects, so AsyncResult methods are not available. You can use tqdm directly, if you like:

source = range(1024)
gen = view.imap(lambda x: x, source)
for result in tqdm.tqdm(gen, total=len(source)):
    ...

@GF-Huang
Copy link
Author

GF-Huang commented Aug 5, 2021

I don't know why it take a long time but no progress.

image

You can see it only take 30+ms at local per combination.

image

@minrk
Copy link
Member

minrk commented Aug 5, 2021

After you cancel, what do you get for c.queue_status()?

@GF-Huang
Copy link
Author

GF-Huang commented Aug 5, 2021

It has been more than 1 minute so far and it is still stucks.

image

@GF-Huang
Copy link
Author

GF-Huang commented Aug 5, 2021

I take a look the Windows Task Manager, python no CPU usage, but it still stucks.

@minrk
Copy link
Member

minrk commented Aug 5, 2021

Do simple executions work? rc[:].apply_sync(os.getpid) and view.apply_sync(os.getpid)?

@GF-Huang
Copy link
Author

GF-Huang commented Aug 5, 2021

It still stucks, I think I should restart the kernel.

image

@GF-Huang
Copy link
Author

GF-Huang commented Aug 5, 2021

After cancel, it still stucks, perhaps because cluster has been released by with ... as ...?

image

@minrk
Copy link
Member

minrk commented Aug 5, 2021

Can you call getpid before the code that's causing the problem?

@GF-Huang
Copy link
Author

GF-Huang commented Aug 5, 2021

Seems works well.

image

@minrk
Copy link
Member

minrk commented Aug 6, 2021

Does map work with a simpler operation (start with echo, maybe return the same data type as your real task)? I'm trying to isolate the issue. A hang is certainly weird unless the processes are actually stuck on something. It's very strange that queue_status() would hang, since that doesn't involve the engines at all. Make sure you are calling that within the context manager if you are using it, or otherwise while the client's connection is still open and the controller still running.

@GF-Huang
Copy link
Author

GF-Huang commented Aug 6, 2021

It seems very slow. My machine has 24 cores.

image

@minrk minrk added the question label Sep 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants