-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problem with multi GPU execution #57
Comments
Hello, Multi-gpu cases are started with mpirun, e.g., with 4 GPU's: mpirun -np 4 -hostfile machines interfoam -parallel -devices "(0 0 0 0)" To do this you will need a ThirdParty-dev with openmpi installed and setup for GPU's, like the one I posted here. To transparent about my system setup, most of my RapidCFD usage are on Ubuntu 16.04 machines, running CUDA 8.0, with K20 GPU's. The referenced damBreak tutorial is too large to run on a single K20. However, I can run the referenced tutorial just fine on two K20's on this Ubuntu 16.04, CUDA 8.0 setup. I just repeated running this case (for many time steps, not to completion) on two GPU's this morning to make sure that my information is up-to-date for you. However, I do confirm that I do see this error when attempting to repeat this damBreak tutorial with Ubuntu 18.04, CUDA 9.0 (installed from repository) and a single Titan Black GPU. Other test cases I tried (albeit smaller in grid size / lower RAM requirements) work just fine on this machine with Ubuntu 18.04 and CUDA 9.0. I know from experience that this single Titan Black GPU has too little GPU RAM for the referenced damBreak case (a damBreak case with smaller grid works fine), therefore I am suspicious that the Therefore, I hope you are successful when running with more than one GPU. Best regards, Eric |
Thank you for the detailed reply. I ran it normally without using the mpirun command and maybe that is why I got the __copy::H->D error because it was using only one GPU and it has insufficient memory. I have two K20's to work with. I will run the case on them using mpirun and update you here. Thanks again! |
good luck. I made a small mistake in the above, I left out Some examples:
machines is a text file that contains the server name and the number of cpu's,
The above (number 3) command is what I just retested moments ago to ensure that the information I am giving you is accurate, however, this may be specific to my network. Good luck! |
Hi Eric, I am getting the following error when I am trying to run the second command given by you. I have done all the ThirdParty installations.
FOAM exiting --> FOAM FATAL ERROR:
FOAM exiting mpirun noticed that the job aborted, but has no info as to the process
|
Do I have to recompile RapidCFD after ThirdParty Installations? |
Yes, unfortunately, you will need to recompile again. |
Hi, // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // Create mesh for time = 0 PIMPLE: Operating solver in PISO mode Reading field p_rgh Reading field U terminate called after throwing an instance of 'thrust::system::system_error' I was not getting this error before ThirdParty installations. Do you have any idea why is this caused and how to fix this? Regards, |
I am not sure, Manasi, why this error now appears, but for sure it appears related to ThirdParty-dev. However, this is a copy of the ThirdParty-dev I use. Maybe check issue #2, download the linked ThirdParty-dev in the second post in this issue, and recompile RapidCFD (I know you've recompiled many times by now). Daniel noted that the path to CUDA in hard coded in Allwmake, so please take care of this. Maybe this it is better to use this version of ThirdParty-dev since a person replied to issue #2 that this ThirdParty-dev worked. Good luck! |
Ok. Recompiling with the ThirdParty given in #2. However, the given ThirdParty must have been written for Cuda 8.0 whereas I am using Cuda 9.2. This might lead to some issues there also. Also, can you tell me what is the output of "echo $FOAM_MPI" in your RapidCFD installation? The ThirdParty-dev you are using has openmpi-2.1.1 whereas the FOAM_MPI in the RapidCFD is openmpi-1.8.4. I had to make changes to the RapidCFD etc/config/settings.sh so that FOAM_MPI changes to openmpi-2.1.1 after which I was able to successfully install ThirdParty-dev mentioned by you but it is giving the above error on running interFoam. Did you also have to make the changes to your RC etc/config/settings.sh to successfully install the ThirdParty-dev mentioned by you? Manasi |
After recompiling with the ThirdParty-dev in #2, I am still getting the same error for interFoam. However, I am able to run icoFoam on 2 GPU's using mpirun -np 2 icoFoam -parallel -devices "(0 1)". The test case is very small though (400 cells). Manasi |
Oh yes, you are correct, set/config/settings.sh had to be updated for the openmpi version. So if I understand correctly, you can run a small case on two cpu's, but on a larger case you get the error:
? |
I tried to come up with detailed instructions on how to install and configure a ThirdParty-dev with openmpi-2.1.1... My system:
Step 1: Get the software Note that I have multiple RapidCFD setups on my system, so I choose a customized RapidCFD-dev name (here, RapidCFD- dev6) so I do not change any other RapidCFD setup. This may not be needed on your computer, and you can work in the default named RapidCFD-dev and ThirdParty-dev directories.
Step 2: Setup RapidCFD directory Only needed if you have multiple RapidCFD installs:
Although I did not need to do this on my present machine, if you have a non-repository install of CUDA, maybe add to the end of bashrc something like
More changes to RapidCFD files...
Step 3: Setup ThirdParty directory
Edit the Allwmake file find the right spot and add the --with-cuda support: e.g.,
Step 4: Compile RapidCFD
|
When compilation completed, I tried a 2,268 cell damBreak case. I modified system/decomposePar to run with 2 processors. Then using OpenFOAM 2.3.x, I ran Next, I did:
followed by:
I ran this small damBreak case only long enough to confirm that the error you received at startup is not shown on my system. Here, I have only 1 GPU installed. In your machine, you can use So, I hope that I replicated a lot of the steps for your setup, noting that in my re-install
I truly hope that this more "step-by-step" set of instructions helps you out. If you have persistent problems, please consider the option to keep your existing (working) 1-GPU RapidCFD in-tact, and setup a new RapidCFD install for > 1 GPU by installing RapidCFD in a differently named directory, as I described here. Best, |
Update: I tried running the 2k cell damBreak tutorial to completion with the same GPU:
It did not work out so well, at some point around 0.8 sec the time steps became very small. The same behavior is not seen when running on 1 GPU. I repeated this same case with 2 GPU's on my Ubuntu 16.04 / CUDA 8 setup that was previously (and successfully) used for larger RapidCFD cases. I got the same result. So, if you are trying to get two GPU's to run on a 2k cell damBreak case, maybe you will have this same problem. I decided not to troubleshoot because the problem occurred on a case way too small for RapidCFD, and because I ran into a same problem on a system that has successfully run much larger cases successfully. I have the feeling that we are many time zones apart, so I thought I would preemptively let you know of my experience with multi-GPU on a very tiny case... Therefore, once you can start a case correctly with mpirun and your 2 GPU's, maybe you can move to your larger test cases and don't worry about fixing more problems on small cases... |
Thanks a lot! Will try to install and see if it works. I get the above error while executing interFoam (with or without mpirun, doesn't matter). Manasi |
I am not sure I can explain why interFoam does not work (invalid device function error), but icoFoam does work. I have not seen such behavior before. Hopefully a clean install removes this problem. If it persists... I am not sure what to say. Did interFoam ever work in the past, or did the error only appear after one of your recompiles? If interFoam once worked but now it's broken, that's hope that something can be fixed with a recompile. However, if interFoam never worked... I do not know how to help, because I have used it successfully many times following standard install instructions for even 1 gpu. |
Yes it worked before. Doesn't work after the ThirdParty-dev installation. |
Hi, after the new installation in a different directory (RapidCFD-dev1 with ThirdParty-dev1), icoFoam works but interFoam still doesn't work. Following error- |
I have a couple more ideas in my quiver. Have you tried each GPU individually? And, are both GPU's the same (e.g., both K20's) or are they different GPU's? e.g., please try I experienced a similar problem when working with CUDA-10 on a single GPU installation. The error was with icoFoam:
I made several changes at once to c and c++ (they are attached here as .txt files so GitHub will accept them, please remove the .txt extension if you wish to use these). The key thing I added was:
I am not sure which of these additions allowed icoFoam to run with CUDA-10. The GPU I was using was compute capability 5.2... so replace the 52 with the right value for your GPU card. After making these changes, cleaning the install, and recompiling, RapidCFD worked with CUDA-10. As usual, if you make changes to c and c++, you need to |
As a follow-up, when I've compiled RapidCFD to use with two different types of GPU's, I've setup the CC flags in the
So, here, the compiled code will work on several different compute capacity cards when running in either single GPU or multi-GPU modes.... |
I can try these but after a couple of days. |
Ok, sounds good. Hope your testing goes well. |
hi can you provide me link or reference as from where did you learn to write this command to run on gpu |
The devices flag specifies which gpu to use on a computer that has more than one gpu. So in the example you gave gpu numbers 0 and 1 will be used. The use of this flag is discussed in some rapidcfd issues like #16 |
Thanks TonkomoLLC , |
My apologies - I have never used slum before |
sir TonkomoLLC |
Hi,
I am pretty sure that I learned how to use the devices flag from posts #16 and #24, which pre-dated my involvement with RapidCFD support. Probably the most important post of all the above that gets to the root of how the devices flag works is from #16
In particular, look at the link on the word "here" that pulls up the RapidCFD argList. Since I did not write the devices flag, and it is unique to RapidCFD, I believe that this is about all that I know about this topic. I really hope that this reply can help you find what you need. |
Dear sir
You indeed is very helpful
Thank you so much sir
I will look into it
…On Tue, Dec 6, 2022, 7:50 PM Eric Daymo ***@***.***> wrote:
Hi,
There are a number of issues that describe the 'devices' flag in RapidCFD
issues:
- #16 <#16>
- #24 <#24>
- #43 <#43>
- #46 <#46>
- #50 <#50>
- #57 <#57>
- #58 <#58>
- #75 <#75>
- #76 <#76>
I am pretty sure that I learned how to use the devices flag from posts #16
<#16> and #24
<#24>, which pre-dated my
involvement with RapidCFD support.
Probably the most important post of all the above that gets to the root of
how the devices flag works is from #16
<#16>
OpenFOAM requires -parallel option to be included in the argument list
when running parallel cases.
The logic for GPU selection in parallel cases is defined here
<https://github.com/Atizar/RapidCFD-dev/blob/master/src/OpenFOAM/global/argList/argList.C#L774>.
In short if you have two GPUs and want to run case using both of them you
do not have to specify anything, the GPUs will be selected automatically.
If, for example, you have 4 GPUs and want to use those with CUDA IDs 2 and
3, the syntax should be:
mpirun -n 2 icoFoam -parallel -devices "(2 3)"
You are correct, RapidCFD includes only solvers, for preprocessing you
need original OpenFOAM.
In particular, look at the link on the word "here" that pulls up the
RapidCFD argList.
Since I did not write the devices flag, and it is unique to RapidCFD, I
believe that this is about all that I know about this topic. I really hope
that this reply can help you find what you need.
—
Reply to this email directly, view it on GitHub
<#57 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A23DM7ED6O5US5HZREKJ64DWL5DTNANCNFSM4G6ZQRPQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Hi,
I am trying to execute the multi GPU case provided in the RapidCFD test case github.
I am getting the following segmentation fault-
terminate called after throwing an instance of 'thrust::system::system_error'
what(): __copy:: H->D: failed: invalid argument
Aborted (core dumped)
I did not do anything special for the multi GPU execution. I ran the interFoam solver as I ran the icoFoam for single GPU. Are there some special instructions for executing with multiple GPU's?
Thanks and Regards,
Manasi
The text was updated successfully, but these errors were encountered: