Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with multi GPU execution #57

Open
manasi-t24 opened this issue Mar 15, 2019 · 29 comments
Open

problem with multi GPU execution #57

manasi-t24 opened this issue Mar 15, 2019 · 29 comments

Comments

@manasi-t24
Copy link

Hi,

I am trying to execute the multi GPU case provided in the RapidCFD test case github.
I am getting the following segmentation fault-

terminate called after throwing an instance of 'thrust::system::system_error'
what(): __copy:: H->D: failed: invalid argument
Aborted (core dumped)

I did not do anything special for the multi GPU execution. I ran the interFoam solver as I ran the icoFoam for single GPU. Are there some special instructions for executing with multiple GPU's?

Thanks and Regards,
Manasi

@TonkomoLLC
Copy link
Contributor

TonkomoLLC commented Mar 15, 2019

Hello,

Multi-gpu cases are started with mpirun, e.g., with 4 GPU's:

mpirun -np 4 -hostfile machines interfoam -parallel -devices "(0 0 0 0)"

To do this you will need a ThirdParty-dev with openmpi installed and setup for GPU's, like the one I posted here.

To transparent about my system setup, most of my RapidCFD usage are on Ubuntu 16.04 machines, running CUDA 8.0, with K20 GPU's. The referenced damBreak tutorial is too large to run on a single K20. However, I can run the referenced tutorial just fine on two K20's on this Ubuntu 16.04, CUDA 8.0 setup. I just repeated running this case (for many time steps, not to completion) on two GPU's this morning to make sure that my information is up-to-date for you.

However, I do confirm that I do see this error when attempting to repeat this damBreak tutorial with Ubuntu 18.04, CUDA 9.0 (installed from repository) and a single Titan Black GPU. Other test cases I tried (albeit smaller in grid size / lower RAM requirements) work just fine on this machine with Ubuntu 18.04 and CUDA 9.0. I know from experience that this single Titan Black GPU has too little GPU RAM for the referenced damBreak case (a damBreak case with smaller grid works fine), therefore I am suspicious that the __copy::H->D:failed error may be due to insufficient memory.

Therefore, I hope you are successful when running with more than one GPU.

Best regards,

Eric

@manasi-t24
Copy link
Author

Thank you for the detailed reply. I ran it normally without using the mpirun command and maybe that is why I got the __copy::H->D error because it was using only one GPU and it has insufficient memory.

I have two K20's to work with. I will run the case on them using mpirun and update you here.

Thanks again!
Manasi

@TonkomoLLC
Copy link
Contributor

TonkomoLLC commented Mar 16, 2019

good luck. I made a small mistake in the above, I left out -parallel and I updated my comment.

Some examples:

  1. Run interFoam in parallel on GPU device 1 on two machines connected by a network
    mpirun -np 2 -hostfile machines interFoam -parallel -devices "(1 1)"

machines is a text file that contains the server name and the number of cpu's,
e.g.,

machine1name cpu=1
machine2name cpu=1
  1. Run interFoam on a single machine with 2 GPU's --> This is most likely what you are looking for

mpirun -np 2 interFoam -parallel -devices "(0 1)"

  1. in my particular case, I have openmpi configured for TCP and infiniband, but presently my infinband network is off. Therefore, to force interfoam to run on a single machine in my present situation:

mpirun --mca btl tcp,self -np 2 interFoam -parallel -devices "(0 1)"

The above (number 3) command is what I just retested moments ago to ensure that the information I am giving you is accurate, however, this may be specific to my network.

Good luck!

@manasi-t24
Copy link
Author

Hi Eric,

I am getting the following error when I am trying to run the second command given by you. I have done all the ThirdParty installations.
Error-
--> FOAM FATAL ERROR:
Trying to use the dummy Pstream library.
This dummy library cannot be used in parallel mode

From function UPstream::init(int& argc, char**& argv)
in file UPstream.C at line 37.

FOAM exiting

--> FOAM FATAL ERROR:
Trying to use the dummy Pstream library.
This dummy library cannot be used in parallel mode

From function UPstream::init(int& argc, char**& argv)
in file UPstream.C at line 37.

FOAM exiting


mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.

Do you have any idea how to fix this?

Thanks,
Manasi

@manasi-t24
Copy link
Author

Do I have to recompile RapidCFD after ThirdParty Installations?

@TonkomoLLC
Copy link
Contributor

Yes, unfortunately, you will need to recompile again.

@manasi-t24
Copy link
Author

Hi,
After recompiling RapidCFD, I ran the interFoam solver for damBreak for the case with 2268 cells. I am getting the following error -
/---------------------------------------------------------------------------
| RapidCFD by simFlow (sim-flow.com) |
*---------------------------------------------------------------------------*/
Build : dev-f3775ac96129
Exec : interFoam
Date : Mar 18 2019
Time : 01:53:05
Host : "marskepler"
PID : 22656
Case : /home/manasi/OpenFOAM/OpenFOAM-2.3.0/tutorials/multiphase/interFoam/laminar/damBreak-rcfd
nProcs : 1
sigFpe : Floating point exception trapping - not supported on this platform
fileModificationChecking : Monitoring run-time modified files using timeStampMaster
allowSystemOperations : Allowing user-supplied system call operations

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

Create mesh for time = 0

PIMPLE: Operating solver in PISO mode

Reading field p_rgh

Reading field U

terminate called after throwing an instance of 'thrust::system::system_error'
what(): parallel_for failed: invalid device function
Aborted (core dumped)

I was not getting this error before ThirdParty installations. Do you have any idea why is this caused and how to fix this?

Regards,
Manasi

@TonkomoLLC
Copy link
Contributor

I am not sure, Manasi, why this error now appears, but for sure it appears related to ThirdParty-dev. However, this is a copy of the ThirdParty-dev I use. Maybe check issue #2, download the linked ThirdParty-dev in the second post in this issue, and recompile RapidCFD (I know you've recompiled many times by now). Daniel noted that the path to CUDA in hard coded in Allwmake, so please take care of this. Maybe this it is better to use this version of ThirdParty-dev since a person replied to issue #2 that this ThirdParty-dev worked.

Good luck!

@manasi-t24
Copy link
Author

Ok. Recompiling with the ThirdParty given in #2. However, the given ThirdParty must have been written for Cuda 8.0 whereas I am using Cuda 9.2. This might lead to some issues there also.

Also, can you tell me what is the output of "echo $FOAM_MPI" in your RapidCFD installation? The ThirdParty-dev you are using has openmpi-2.1.1 whereas the FOAM_MPI in the RapidCFD is openmpi-1.8.4. I had to make changes to the RapidCFD etc/config/settings.sh so that FOAM_MPI changes to openmpi-2.1.1 after which I was able to successfully install ThirdParty-dev mentioned by you but it is giving the above error on running interFoam. Did you also have to make the changes to your RC etc/config/settings.sh to successfully install the ThirdParty-dev mentioned by you?

Manasi

@manasi-t24
Copy link
Author

After recompiling with the ThirdParty-dev in #2, I am still getting the same error for interFoam. However, I am able to run icoFoam on 2 GPU's using mpirun -np 2 icoFoam -parallel -devices "(0 1)". The test case is very small though (400 cells).

Manasi

@TonkomoLLC
Copy link
Contributor

Oh yes, you are correct, set/config/settings.sh had to be updated for the openmpi version.
I forgot about this... I had not installed thirdparty-dev in a long time...

So if I understand correctly, you can run a small case on two cpu's, but on a larger case you get the error:

what(): parallel_for failed: invalid device function

?

@TonkomoLLC
Copy link
Contributor

TonkomoLLC commented Mar 18, 2019

I tried to come up with detailed instructions on how to install and configure a ThirdParty-dev with openmpi-2.1.1...

My system:

  • Ubuntu 18.04
  • CUDA Intalled from repository (9.1)
    Since CUDA is from the repository, I do not have specialized CUDA environment settings

Step 1: Get the software

Note that I have multiple RapidCFD setups on my system, so I choose a customized RapidCFD-dev name (here, RapidCFD- dev6) so I do not change any other RapidCFD setup. This may not be needed on your computer, and you can work in the default named RapidCFD-dev and ThirdParty-dev directories.

git clone https://github.com/Atizar/RapidCFD-dev.git RapidCFD-dev6
git clone https://github.com/OpenFOAM/ThirdParty-5.x.git ThirdParty-dev6

Step 2: Setup RapidCFD directory

Only needed if you have multiple RapidCFD installs:

cd RapidCFD
nano etc/bashrc  --> change `export WM_PROJECT_VERSION=dev` to `dev6`

Although I did not need to do this on my present machine, if you have a non-repository install of CUDA, maybe add to the end of bashrc something like

export CUDA_HOME=/usr/local/cuda
export PATH=${CUDA_HOME}/bin:${PATH}
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH

More changes to RapidCFD files...

nano etc/config.sh/settings.sh --> change `export FOAM_MPI=openmpi-1.8.4` to `2.1.1`
source etc/bashrc
nano wmake/rules/linux64Nvcc/c  --> change sm_30 to sm_35 in each file (specific to my GPU, set to right param for your GPU) 
nano wmake/rules/linux64Nvcc/c++ --> change sm_30 to sm_35 in each file (specific to my GPU, set to right param for your GPU) 

export WM_NCOMPPROCS=16  (or whatever is needed for your computer)

Step 3: Setup ThirdParty directory

cd ../ThirdParty-dev6
wget -N https://download.open-mpi.org/release/open-mpi/v2.1/openmpi-2.1.1.tar.gz
tar -zxvf openmpi-2.1.1.tar.gz

Edit the Allwmake file

find the right spot and add the --with-cuda support:

e.g.,

        # start with GridEngine support,
        # it can be built without external libraries
        configOpt="--with-sge"

        # Add CUDA  ===>  add this here
# use the following if CUDA is installed with the repository, and the CUDA headers are not in /usr/local/CUDA... 
        configOpt="--with-cuda=/usr/include"
# use the following if CUDA is installed in /usr/local/cuda
#         configOpt="--with-cuda"

        # Infiniband support
        # if [ -d /usr/local/ofed -a -d /usr/local/ofed/lib64 ]
        # then
        #     configOpt="$configOpt --with-openib=/usr/local/ofed"
        #     configOpt="$configOpt --with-openib-libdir=/usr/local/ofed/lib64"
        # fi

        ./configure \
            --prefix=$MPI_ARCH_PATH \
            --disable-orterun-prefix-by-default \
            --enable-shared --disable-static \
            --enable-mpi-thread-multiple \
            --libdir=$MPI_ARCH_PATH/lib${WM_COMPILER_LIB_ARCH} \
            --enable-mpi-fortran=none \
            --disable-mpi-profile \
            $configOpt \
            ;
./Allwmake

Step 4: Compile RapidCFD

cd ../RapidCFD-dev6
./Allwmake

@TonkomoLLC
Copy link
Contributor

TonkomoLLC commented Mar 18, 2019

When compilation completed, I tried a 2,268 cell damBreak case.

I modified system/decomposePar to run with 2 processors. Then using OpenFOAM 2.3.x, I ran decomposePar.

Next, I did:

source /opt/RapidCFD-dev6/etc/bashrc

followed by:

mpirun -np 2 interFoam -parallel -devices "(0 0)"

I ran this small damBreak case only long enough to confirm that the error you received at startup is not shown on my system.

Here, I have only 1 GPU installed. In your machine, you can use (0 1) for your two devices, and hopefully it will work.

So, I hope that I replicated a lot of the steps for your setup, noting that in my re-install

  • I used a fresh install in a directory named RapidCFD-dev6 and ThirdParty-dev6, so that I did not overwrite previous installs
  • You probably want use configOpt="--with-cuda" in the ThirdParty-dev6 Allwmake instead of what I used, since you are not using a repository install of CUDA
  • I used a repository version of CUDA, so I did not need to modify further the RapidCFD bashrc to note the location of the CUDA install
  • I only had 1 GPU, so I could not truly test mpirun with two GPUs
  • I had already previously modified CUDA so that I eliminated the device free error as discussed in Slow startup / floating point error #39 and Thrust system error at the end #54

I truly hope that this more "step-by-step" set of instructions helps you out. If you have persistent problems, please consider the option to keep your existing (working) 1-GPU RapidCFD in-tact, and setup a new RapidCFD install for > 1 GPU by installing RapidCFD in a differently named directory, as I described here.

Best,
Eric

@TonkomoLLC
Copy link
Contributor

TonkomoLLC commented Mar 18, 2019

Update:

I tried running the 2k cell damBreak tutorial to completion with the same GPU:

mpirun -np 2 interFoam -parallel -devices "(0 0)"

It did not work out so well, at some point around 0.8 sec the time steps became very small. The same behavior is not seen when running on 1 GPU.

I repeated this same case with 2 GPU's on my Ubuntu 16.04 / CUDA 8 setup that was previously (and successfully) used for larger RapidCFD cases. I got the same result.

So, if you are trying to get two GPU's to run on a 2k cell damBreak case, maybe you will have this same problem. I decided not to troubleshoot because the problem occurred on a case way too small for RapidCFD, and because I ran into a same problem on a system that has successfully run much larger cases successfully.

I have the feeling that we are many time zones apart, so I thought I would preemptively let you know of my experience with multi-GPU on a very tiny case... Therefore, once you can start a case correctly with mpirun and your 2 GPU's, maybe you can move to your larger test cases and don't worry about fixing more problems on small cases...

@manasi-t24
Copy link
Author

Thanks a lot! Will try to install and see if it works.
And as you have asked about the following error-
what(): parallel_for failed: invalid device function

I get the above error while executing interFoam (with or without mpirun, doesn't matter).
However, icoFoam works fine (with and without mpirun, both ) though the case I tested on was very small.

Manasi

@TonkomoLLC
Copy link
Contributor

I am not sure I can explain why interFoam does not work (invalid device function error), but icoFoam does work. I have not seen such behavior before. Hopefully a clean install removes this problem. If it persists... I am not sure what to say. Did interFoam ever work in the past, or did the error only appear after one of your recompiles? If interFoam once worked but now it's broken, that's hope that something can be fixed with a recompile. However, if interFoam never worked... I do not know how to help, because I have used it successfully many times following standard install instructions for even 1 gpu.

@manasi-t24
Copy link
Author

Yes it worked before. Doesn't work after the ThirdParty-dev installation.
Anyway, I am doing a complete recompile of RapidCFD. Hoping that works.

@manasi-t24
Copy link
Author

Hi, after the new installation in a different directory (RapidCFD-dev1 with ThirdParty-dev1), icoFoam works but interFoam still doesn't work.

Following error-
terminate called after throwing an instance of 'thrust::system::system_error'
what(): parallel_for failed: invalid device function
Aborted (core dumped)

@TonkomoLLC
Copy link
Contributor

I have a couple more ideas in my quiver.

Have you tried each GPU individually? And, are both GPU's the same (e.g., both K20's) or are they different GPU's?

e.g., please try interFoam -device 0 and interFoam -device 1. If one device works, but the other doesn't, then that's a clue that the problem lies somewhere in the c++ and c rules.

I experienced a similar problem when working with CUDA-10 on a single GPU installation.

The error was with icoFoam:

Create time

Create mesh for time = 0

Reading transportProperties

Reading field p

Reading field U

Reading/calculating face flux field phi


Starting time loop

Time = 0.005

Courant Number mean: 0 max: 0
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  parallel_for failed: invalid device function
Aborted (core dumped)

I made several changes at once to c and c++ (they are attached here as .txt files so GitHub will accept them, please remove the .txt extension if you wish to use these).

The key thing I added was:

-gencode=arch=compute_52,code=sm_52 -arch=sm_52 -g -G

I am not sure which of these additions allowed icoFoam to run with CUDA-10. The GPU I was using was compute capability 5.2... so replace the 52 with the right value for your GPU card. After making these changes, cleaning the install, and recompiling, RapidCFD worked with CUDA-10.

As usual, if you make changes to c and c++, you need to wclean everything and recompile - unfortunately...

c.txt
c++.txt

@TonkomoLLC
Copy link
Contributor

As a follow-up, when I've compiled RapidCFD to use with two different types of GPU's, I've setup the CC flags in the c++ file as follows:

CC          = nvcc -Xptxas -dlcm=cg -m64 -gencode=arch=compute_30,code=sm_30 \
              -gencode=arch=compute_35,code=sm_35 \
              -gencode=arch=compute_61,code=sm_61 \
              -gencode=arch=compute_35,code=compute_35

So, here, the compiled code will work on several different compute capacity cards when running in either single GPU or multi-GPU modes....

@manasi-t24
Copy link
Author

I can try these but after a couple of days.
But just for your information, icoFoam is working fine with the cavity test case (5M cells) using the command -> mpirun -np 2 icoFoam -parallel -devices "(0 1)"
So I think the specific problem is with interFoam.

@TonkomoLLC
Copy link
Contributor

Ok, sounds good. Hope your testing goes well.

@Dcn303
Copy link

Dcn303 commented Dec 1, 2022

I can try these but after a couple of days. But just for your information, icoFoam is working fine with the cavity test case (5M cells) using the command -> mpirun -np 2 icoFoam -parallel -devices "(0 1)" So I think the specific problem is with interFoam.

hi can you provide me link or reference as from where did you learn to write this command to run on gpu
and what does this " -devices (0 1) " signifies in the command
Thanks

@TonkomoLLC
Copy link
Contributor

TonkomoLLC commented Dec 1, 2022

The devices flag specifies which gpu to use on a computer that has more than one gpu. So in the example you gave gpu numbers 0 and 1 will be used. The use of this flag is discussed in some rapidcfd issues like #16

@Dcn303
Copy link

Dcn303 commented Dec 2, 2022

The devices flag specifies which gpu to use on a computer that has more than one gpu. So in the example you gave gpu numbers 0 and 1 will be used. The use of this flag is discussed in some rapidcfd issues like #16

Thanks TonkomoLLC ,
hi, can you kindly provide me script to use GPU with slurm
and in this case how will be write the devices value as
Thanks

@TonkomoLLC
Copy link
Contributor

My apologies - I have never used slum before

@Dcn303
Copy link

Dcn303 commented Dec 6, 2022

sir TonkomoLLC
can you please provide me a reference or a link or study material on how you have learn to write
mpirun -np 2 interFoam -parallel -devices "(0 1)"
like any reference about how you have learnt to use this flag
-devices
to enable on GPU
Thank you sir

@TonkomoLLC
Copy link
Contributor

Hi,
There are a number of issues that describe the 'devices' flag in RapidCFD issues:

I am pretty sure that I learned how to use the devices flag from posts #16 and #24, which pre-dated my involvement with RapidCFD support.

Probably the most important post of all the above that gets to the root of how the devices flag works is from #16

OpenFOAM requires -parallel option to be included in the argument list when running parallel cases.

The logic for GPU selection in parallel cases is defined here. In short if you have two GPUs and want to run case using both of them you do not have to specify anything, the GPUs will be selected automatically.
If, for example, you have 4 GPUs and want to use those with CUDA IDs 2 and 3, the syntax should be:
mpirun -n 2 icoFoam -parallel -devices "(2 3)"

You are correct, RapidCFD includes only solvers, for preprocessing you need original OpenFOAM.

In particular, look at the link on the word "here" that pulls up the RapidCFD argList.

Since I did not write the devices flag, and it is unique to RapidCFD, I believe that this is about all that I know about this topic. I really hope that this reply can help you find what you need.

@Dcn303
Copy link

Dcn303 commented Dec 6, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants