Skip to content

Commit

Permalink
Faster and simpler marching-cubes implementation, added simplified ma…
Browse files Browse the repository at this point in the history
…rching-cubes for solid surface rendering, refactoring in OpenCL rendering kernels
  • Loading branch information
ProjectPhysX committed May 1, 2024
1 parent 04ab760 commit 649fd40
Show file tree
Hide file tree
Showing 2 changed files with 181 additions and 228 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via [OpenCL](https://github.com/ProjectPhysX/OpenCL-Wrapper "OpenCL-Wrapper"). Free for non-commercial use.

<a href="https://youtu.be/-MkRBeQkLk8"><img src="https://img.youtube.com/vi/o3TPN142HxM/maxresdefault.jpg" width="50%"></img></a><a href="https://youtu.be/oC6U1M0Fsug"><img src="https://img.youtube.com/vi/oC6U1M0Fsug/maxresdefault.jpg" width="50%"></img></a><br>
<a href="https://youtu.be/XOfXHgP4jnQ"><img src="https://img.youtube.com/vi/XOfXHgP4jnQ/maxresdefault.jpg" width="50%"></img></a><a href="https://youtu.be/BStzTRmLW7Q"><img src="https://img.youtube.com/vi/BStzTRmLW7Q/maxresdefault.jpg" width="50%"></img></a>
<a href="https://youtu.be/XOfXHgP4jnQ"><img src="https://img.youtube.com/vi/XOfXHgP4jnQ/maxresdefault.jpg" width="50%"></img></a><a href="https://youtu.be/clAqgNtySow"><img src="https://img.youtube.com/vi/clAqgNtySow/maxresdefault.jpg" width="50%"></img></a>
(click on images to show videos on YouTube)

<details><summary>Update History</summary>
Expand Down Expand Up @@ -220,7 +220,7 @@ $$f_j(i\\%2\\ ?\\ \vec{x}+\vec{e}_i\\ :\\ \vec{x},\\ t+\Delta t)=f_i^\textrm{tem
- FluidX3D (D3Q19) requires only 55 Bytes/cell with [Esoteric-Pull](https://doi.org/10.3390/computation10060092)+[FP16](https://www.researchgate.net/publication/362275548_Accuracy_and_performance_of_the_lattice_Boltzmann_method_with_64-bit_32-bit_and_customized_16-bit_number_formats)<br>
- 🟧🟧🟧🟧🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟨🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩<br>(density 🟧, velocity 🟦, flags 🟨, DDFs 🟩; each square = 1 Byte)
- allows for 19 Million cells per 1 GB VRAM
- in-place streaming with [Esoteric-Pull](https://doi.org/10.3390/computation10060092): eliminates redundant copy `B` of density distribution functions (DDFs) in memory; almost cuts memory demand in half and slightly increases performance due to implicit bounce-back boundaries; offers optimal memory access patterns for single-cell in-place streaming
- in-place streaming with [Esoteric-Pull](https://doi.org/10.3390/computation10060092): eliminates redundant copy of density distribution functions (DDFs) in memory; almost cuts memory demand in half and slightly increases performance due to implicit bounce-back boundaries; offers optimal memory access patterns for single-cell in-place streaming
- [decoupled arithmetic precision (FP32) and memory precision (FP32 or FP16S or FP16C)](https://www.researchgate.net/publication/362275548_Accuracy_and_performance_of_the_lattice_Boltzmann_method_with_64-bit_32-bit_and_customized_16-bit_number_formats): all arithmetic is done in FP32 for compatibility on all hardware, but DDFs in memory can be compressed to FP16S or FP16C: almost cuts memory demand in half again and almost doubles performance, without impacting overall accuracy for most setups
- <details><summary>only 8 flag bits per lattice point (can be used independently / at the same time)</summary>

Expand Down Expand Up @@ -460,6 +460,7 @@ Colors: 🔴 AMD, 🔵 Intel, 🟢 Nvidia, ⚪ Apple, 🟡 ARM, 🟤 Glenfly
| 🔴&nbsp;Radeon&nbsp;RX&nbsp;5700&nbsp;XT | 9.75 | 8 | 448 | 1368 (47%) | 3253 (56%) | 3049 (52%) |
| 🔴&nbsp;Radeon&nbsp;RX&nbsp;5600&nbsp;XT | 6.73 | 6 | 288 | 1136 (60%) | 2214 (59%) | 2148 (57%) |
| 🔴&nbsp;Radeon&nbsp;RX&nbsp;Vega&nbsp;64 | 13.35 | 8 | 484 | 1875 (59%) | 2878 (46%) | 3227 (51%) |
| 🔴&nbsp;Radeon&nbsp;RX&nbsp;590 | 5.53 | 8 | 256 | 1257 (75%) | 1573 (47%) | 1688 (51%) |
| 🔴&nbsp;Radeon&nbsp;RX&nbsp;580&nbsp;4GB | 6.50 | 4 | 256 | 946 (57%) | 1848 (56%) | 1577 (47%) |
| 🔴&nbsp;Radeon&nbsp;R9&nbsp;390X | 5.91 | 8 | 384 | 1733 (69%) | 2217 (44%) | 1722 (35%) |
| 🔴&nbsp;Radeon&nbsp;HD&nbsp;7850 | 1.84 | 2 | 154 | 112 (11%) | 120 ( 6%) | 635 (32%) |
Expand Down

0 comments on commit 649fd40

Please sign in to comment.