Lattice Boltzmann Method

2D lattice boltzmann fluid sim. Achieves 5.7 GLUPS on an RTX 2070, approx. 92% of maximum achievable memory bandwidth. (Another test yields 4k630Hz)

Achieves 14.3 GLUPS / 1.03 TBps on a single A100 80GB GPU, with INNER_TIMESTEPS=6 and BLOCKS_THREADS_TUNE_CONSTANT=12.

Nice things to look at

YouTube videos

Older version screenshot (top is density, bottom is direction field):

Requirements

My setup:

AMD Ryzen 3700x
Nvidia GeForce RTX 2070 with drivers 460.20 and CUDA 11.2
Ubuntu 20.04 on WSL
Python 3.7.4 using conda I suspect but cannot test that this will work with much earlier versions / lower specs. (Was previously on 18.04 pure linux, 440.59, 10.2)

To install, just:

sudo apt install cuda-toolkit-11-1 (plus all the rest of https://docs.nvidia.com/cuda/wsl-user-guide/index.html)
pip install pycuda imageio-ffmpeg

Benchmarking

Achieved memory bandwidth is 406GBps, compared to 441GBps achieved in the bandwidthTest CUDA sample, and 506GBps bandwidth SOL (this was overclocked to 7899MHz; stock is 7000MHz/448GBps).

Python profiling:

Run python -m cProfile -s cumtime latticeboltzmann.py | less for perhaps a minute.

Also kcachegrind with it:

python -m cProfile -s tottime -o profile_data.pyprof latticeboltzmann.py
pyprof2calltree -i profile_data.pyprof -k

Nvidia profiling:

First allow NV usage: echo 'options nvidia "NVreg_RestrictProfilingToAdminUsers=0"' | sudo tee /etc/modprobe.d/nsight-privilege.conf and reboot.

Then run nv-nsight-cu-cli --target-processes all python latticeboltzmann.py. A few seconds of samples will do.

nvvp is also nice, but you need to sudo apt install openjdk-8-jdk then nvvp -vm /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java.

Notes on overclocking:

Using MSI Afterburner, I can get a +200MHz core overclock on my 2070, yielding almost no performance boost. Overclocking the memory has much larger gains; +1100MHz overclock yields a nearly 20% performance boost.

Run the UnifiedMemoryPerf CUDA sample to get a sense of when we start encountering errors.

Notes on nvcc:

Useful command to dump all intermediate products

nvcc -keep -cubin --use_fast_math -O3 -Xptxas -O3,-v -arch sm_75 --extra-device-vectorization --restrict lb_cuda_kernel.cu && cuobjdump -sass lb_cuda_kernel.cubin | grep '\/\*0' > lb_cuda_kernel.sass

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.gitignore		.gitignore
README.md		README.md
dump_cuda_asm.sh		dump_cuda_asm.sh
latticeboltzmann.jl		latticeboltzmann.jl
latticeboltzmann.py		latticeboltzmann.py
latticeboltzmann_helpers.pyx		latticeboltzmann_helpers.pyx
lb_cuda_kernel.cu		lb_cuda_kernel.cu
lb_cuda_kernel.py		lb_cuda_kernel.py
prof.out.nsight-cuprof-report		prof.out.nsight-cuprof-report
requirements.txt		requirements.txt
screenshot.png		screenshot.png
testgl.py		testgl.py
yeet.png		yeet.png
yeet2.png		yeet2.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lattice Boltzmann Method

Nice things to look at

Requirements

Benchmarking

Python profiling:

Nvidia profiling:

Notes on overclocking:

Notes on nvcc:

Future directions

Resources

About

Releases

Packages

Languages

cchan/latticeboltzmann

Folders and files

Latest commit

History

Repository files navigation

Lattice Boltzmann Method

Nice things to look at

Requirements

Benchmarking

Python profiling:

Nvidia profiling:

Notes on overclocking:

Notes on nvcc:

Future directions

Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages