New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simulation, add max_calls arg to ray.remote to avoid rayidle in gpus #1384
Conversation
Thanks for the PR @mofanv ! CC'ing @pedropgusmao @jafermarq @Ryan0v0, thoughts? |
Thanks for looking into this @mofanv. Have you noticed any slowdown in training when for example you use many workers? (i.e. more than what you can have running concurrently). I suspect, using |
Hi, I have not noticed any significant slowdown with flower/examples/simulation_pytorch/main.py Line 159 in aa9b946
mobilenetv3 and it started to have CUDA out of memory error.
|
I'd imagine the slowdown will only happen in settings were you N clients per round but your system (e.g. GPUs and CPUs) can only accommodate M at a given time, M<N. For example. If you have 50 clients per round but your system can only run 4 concurrently, then the remaining 46 will be scheduled by ray to run later, once some resources have been freed. In this example, I would imagine that having About the |
Thanks for the clarification. I wonder will such re-allocation causes a significant overhead. As I can imagine that most of the execution time is due to model training on GPUs? Also, if I understand it correctly, resting clients, i.e., ray workers, for later use still holds the memory. But this may occupy a large amount of memory when clients are training a large model? I tried 8 clients on 8 GPUs (i.e., I have also tested on the original |
Which version of Ray do you use? Ray 1.12 resulted in an issue similar to what you describe "more |
I see. I was using Ray 1.13, and by downgrading it to 1.11.1, no |
I checked again. This OOM problem also exists in Ray 2.0.0 |
@mofanv Where should the |
Not sure is it still a issue in the current version. I think the official suggestion is to downgrade Ray version.
|
@mofanv Thank you. Another workaround that worked for me was to limit the number number of CPU cores that ray can utilize to be equal to the utilized CPU cores from client per round. This can be set in the initialization parameters of ray in |
|
Thank you very much, you really saved my life, I checked all night long and finally solved the problem, the method that changing all As for downloading the ray version, it didn't work for me. After I downloaded the version from |
Potentially useful: https://docs.ray.io/en/latest/ray-core/tasks/using-ray-with-gpus.html#workers-not-releasing-gpu-resources In my case, a very simple FL on a tiny dataset ran out of memory on my 24GB RTX 3090 after about 15 clients/5 rounds, because new threads were being launched for every round and not being freed. Additionally, this behaviour can vary from system-to-system because Ray seems to keep around As @vtsouval suggested, pointing users to set (Caveat: I'm a first-time Flower user, so I could be off-base here. That being said, this was the first snag I hit using it and I'm probably not the only one, so +1 for finding a fix for this) |
@mxbi if you use Ray 1.11 it will work just fine (as i indicated earlier in this thread #1384 (comment)) . Afaik this issue with Flower+Ray happens with Ray 2+ @pedropgusmao @danieljanes why aren't we pinning Ray 1.11 to Flower simulation? I always downgrade Ray manually. |
Tbh, I don't think keeping an older version as the default is good long-term as this might lead to broken dependencies later on. |
I'm running into this issue with flwr==1.3.0 and ray==2.2.0. I need to train 10 clients per round on 1 GPU, but only 5 clients will fit in my GPU memory. Ray keeps creating processes and leaving them "IDLE", consuming memory until I get a CUDA OOM error. I've tried changing the arguments like |
@alexkyllo Although I am using python 3.9 (ray 2.2.0, flwr1.2.0), I have chosen not to downgrade the ray version and still use ray 2.2.0. Firstly, open |
Thanks, this works, but I will need to fork the flwr package to make it work across my environments. I was a little confused because @pedropgusmao 's comment mentioned that Would it be a good idea to enhance flwr to make this |
Reference Issues/PRs
Fixes #1152 and #1376
What does this implement/fix? Explain your changes.
Issue:
@ray.remote
in ray_client_proxy.py is repeatedly called when running simulation. By default, after each client finishes its training, the ray worker still rests in GPUs asRay::IDLE
. This accumulates and causes CUDA memory to run out.Change: By adding one argument
@ray.remote(max_calls=1)
, each ray worker will be removed after every client finishes.Any other comments?
None