Performance Testing with LBPM

For performance testing purposes it can be useful to be able to quickly generate synthetic test cases to assess the performance of LBPM, without needing to rely on large image sizes to construct subdomains. A simple case can be constructed by using a random close pack of 1896 spheres, which is available from the example/Sph1896 directory. To set up a simulation from the example, copy the contents of the example directory to some working location, e.g. cp -r $LBPM_DIR/example/Sph1896 ./ An example input file input.db is included. The GenerateSphereTest executable will convert the provided sphere packing into a binary format, discretizing the sphere system into a single 400x400x400 sub-domain. To generate the input data, run the command

[mcclurej@thor Sph1896]$ mpirun -np 1 $LBPM_DIR/bin/GenerateSphereTest input.db
********************************************************
Running Sphere Packing pre-processor for LBPM-WIA	
********************************************************
voxel length = 0.003125 micron 
Reading the sphere packing 
Reading the packing file...
Number of spheres extracted is: 1896
Domain set.
Sauter Mean Diameter (computed from sphere packing) = 34.151490 
Media porosity = 0.359970 
MorphOpen: Initializing with saturation 0.500000 
Media Porosity: 0.366707 
Maximum pore size: 14.075796 
Performing morphological opening with target saturation 0.500000 
Final saturation=0.503602
Final critical radius=4.793676

This will create a single input file ID.00000. The sphere pack is fully periodic, which means that it can be repeated arbitrarily many times in any direction and each process should perform identical computations. If we want to run this example with a process grid that is 4x4x4, we can create 64 copies of this input file as follows

export NRANKS=64
export BASEID="ID.000"
for i in `seq -w 1 $NRANKS`; do idfile="$BASEID$i"; echo $idfile; cp ID.00000 $idfile; done

Note that the value provided for BASEID should ensure that the width of the number appended to the files ID.xxxxx is five (i.e. should be consistent with seq -w output). This provides an identical input geometry for each MPI sub-domain. The input.db input file can then be updated to reflect the desired domain structure by altering the process grid. Within the Domain section of the file, specify the 4 x 4 x 4 process grid as follows

    nproc = 4, 4, 4        // Number of processors (Npx,Npy,Npz)

You can then launch

MPI_LAUNCHER = mpirun
MPI_NUMPROCS_FLAG = -n
MPI_FLAGS = "bind-to core"
$MPI_LAUNCHER $MPI_NUMPROCS_FLAG 64 $MPI_FLAGS $LBPM_DIR/bin/lbpm_color_simulator input.db

The output below is for a four-GPU run on an IBM Power8 Minksi node with NVLINK and 4 NVIDIA P100 GPU.

mpirun -np 4 $MPIARGS  $LBPM_BIN/lbpm_color_simulator input.db
********************************************************
Running Color LBM	
********************************************************
MPI rank=0 will use GPU ID 0 / 4 
MPI rank=2 will use GPU ID 2 / 4 
MPI rank=3 will use GPU ID 3 / 4 
MPI rank=1 will use GPU ID 1 / 4 
voxel length = 0.001563 micron 
voxel length = 0.001563 micron 
Read input media... 
Initialize from segmented data: solid=0, NWP=1, WP=2 
Media porosity = 0.359970 
Initialized solid phase -- Converting to Signed Distance function 
Domain set.
Create ScaLBL_Communicator 
Set up memory efficient layout, 11795503 | 11795520 | 33386248 
Allocating distributions 
Setting up device map and neighbor list 
Component labels: 1 
   label=0, affinity=-1.000000, volume fraction==0.652189
Initializing distributions 
Initializing phase field 
********************************************************
No. of timesteps: 3000 
Affinities - rank 0:
Main: 0
Thread 1: 1
-------------------------------------------------------------------
********************************************************
CPU time = 0.034209 
Lattice update rate (per core)= 344.810959 MLUPS 
Lattice update rate (total)= 1379.243836 MLUPS 
********************************************************

Example arguments for GPU-based Open MPI:

export MPIARGS="--bind-to core --mca pml ob1 --mca btl vader,self,smcuda,openib --mca btl_openib_warn_default_gid_prefix 0 --mca btl_smcuda_use_cuda_ipc_same_gpu 0 --mca btl_openib_want_cuda_gdr 0 --mca btl_openib_cuda_async_recv false --mca btl_smcuda_use_cuda_ipc 0 --mca btl_openib_allow_ib true --mca btl_openib_cuda_rdma_limit 1000 -x LD_LIBRARY_PATH"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Testing with LBPM

Clone this wiki locally