- Fast implementation of Naive SurfaceNets in Unity, using Burst and SIMD instructions.
- Around ~(0.1 - 0.4) ms per 32^3 voxels (depends on SDF complexity) (on first gen ryzen 1700, singlethreaded)
- Marching cubes version : https://github.com/bigos91/fastMarchingCubes
- Normals can be generated from SDF values of 2x2x2 cube. Or regenerated from triangles at the end of the job.
- It may be hard to read (pointers, native collections, assembly intrinsics)
- Cornermask calculations are done using SIMD stuff, 32 cubes at time (32x2x2 voxels), reusing values calculated from previous loop steps.
- All SIMD things are well commented to explain what it is and why it is, with links to Intel intrinsics pages.
- Optimal mesh. All vertices are shared. There are no duplicates.
- Different SDF generation mechanisms (sphere, noise, something like noise but with spheres - sphereblobs, simple terrain). Default 3d noise values does not match real SDF values, so I just filled volume with spheres of different sizes at different locations.
- Use of advanced mesh api for faster uploading meshes. (SetVertexBufferData... etc.)
- Because its done entirely on cpu, output mesh can be easily used for collisions.
- Meshed area must have 32 voxels in at least one dimension. (this implementation support only chunks 32^3, but it is possible to make it working with 32xNxM)
- Unity (2020.3 works fine, dont know about previous versions)
- CPU with SSE4.1 support (around year 2007)
- Clone, run, open scene [FastNaiveSurfaceNets/Scenes/SampleScene],
- Disable everything what makes burst safe to make it faster :)
- https://github.com/TomaszFoster/NaiveSurfaceNets - I learnt most from this, and used algorithm for connecting vertices properly.
- https://github.com/Chaser324/unity-wireframe - for wireframe.
Naive Surface Nets works similiar to Marching Cubes - we iterate over volume collecting 8 voxel samples for 'cube' at a time. For such cube, we need to calculate something called 'corner mask' - it is 8bit mask describing which corner is below or above some isosurface value. I decided to store voxel density data as signed bytes, and use sign bit to build cornermask - so i extracting isosurface at 0. But, we can load 32 voxels into 2 128bit SSE registers, and use movemask_epi8 intrinsic to extract sign bits from 16 8bit values at a time. Need 2 operation for 32 voxels.
Such operations are performed 4 times, for 4 groups of 32 voxel each, creating 4 32bit masks. Those masks are reversed, to make first bits last. Then, we can extract highest bits from those 4 masks in the same way movemask_epi_ps, shift them all 4 bits to the left slli_epi32 and extract second 4 bits to create 8bit cornermask. Half of cornermask can be reused while iterating (Z dimension), and 2 of those 32bit masks can also be reused while iterating (Y dimension).
Additionally, it is possible to check if whole column of 32x2x2 voxels is above or inside surface - test_mix_ones_zeros can be used to check if all 4 32bit masks have all bits set to same value or not.
- 16^3 size version
- maybe 64^3 size version but on AVX
- Use of cmplt_epi8 for extracting isosurface at isovalues other than 0.