"2014, Hoetzlein, Rama Karl. Fast Fixed-Radius Nearest Neighbors: Interactive Million-Particle Fluids. GPU Technology Conference, 2014. San Jose, CA. 2010-2014. Online at http://fluids3.com"
Morphogenesis started as a minimal variant of Rama Hoetzlein's Fluids SPH, cut down from gFluidSurface in gvdb-voxels, https://github.com/ramakarl/gvdb-voxels, which was in turn developed from Fluids-v3.
This is the cut down version of gFluidSurface. Dependence on gvdb-voxels library has been removed, and CMakeLists.txt has been rewritten. New output has been written to provide ascii .ply files for viewing in MeshLab.
This code compiles and runs with cmake 3.10, vtk-9.0, and Cuda 11.2 on Ubuntu 20.04 with GTX 980m. and on Suse Linux cluster with Cuda 9.1 and Tesla P100.
A morphogenesis simulator (in progress) , with soft-matter elasticity, diffusion of heat/chemicals/morphogens, epi-genetics and particle automata behaviour.
The notes below are rough working notes, and will change with development.
VTK is used for .vtp file output for visualization in Parraview. NB vtk-dev is broken in Ubuntu 18.04 LTS. You may need to build VTK from source. This code has been developed with VTK-9.0.1 .
***NB if you need to build VTK, please remember to ***
sudo make install
sudo ldconfig
so that the new library will be found at runtime.
This project uses Cmake. Create a build subdirectory. In the build subdirectory
cmake ../
make
make install
usage:
cd data
make_demo num_particles spacing x_dim y_dim z_dim demoType simSpace
where demoType(0:free falling, 1:remodelling & actuation, 2:diffusion & epigenetics.)
and simSpace{0:regression test, 1:Tower(256,128,256), 2:Wave pool(400,200,400), 3:Small dam break(80,60,80), 4:Dual-Wave pool(200,100,30), 5: Microgravity(160,100.160) }
demoType sets individual particle properties in "demo/particles_pos_vel_color100001.csv", especially epigenetic states, from 3D positions.
simSpace sets parameters in "SimParams." , especially gravity, and wavepool actuation.
e.g.
../build/install/bin/make_demo 125 1 6 6 6 0 5 // free falling
../build/install/bin/make_demo 120 1 2 2 30 1 5
../build/install/bin/make_demo 600 1 4 4 30 1 5 // remodelling & actuation, with fixed, bone, tendon, muscle, elastic, mesenchyme, external actuation
../build/install/bin/make_demo 2000 1 6 6 30 1 5 // "" , with reserve particles for growth.
../build/install/bin/make_demo 400 1 10 10 3 1 5 // diffusion & epigenetics, with reserve particles for growth.
../build/install/bin/make_demo 10000 1 100 10 10 1 1
../build/install/bin/make_demo 100000 1 100 10 100 1 1
../build/install/bin/make_demo 100000 1 100 100 10 1 1
../build/install/bin/make_demo 1000000 1 100 100 100 1 1
Factors affecting memory requrement for Morphogenesis:
0/ Available GPU memory depends on the device + other sofftware using the GPU, especially yuor desktop, GUIs, graphics and video.
To see your current GPU memory useage and total GPU memory, call "nvidia-smi" from the commandline.
NB When running a GUI onthe GPU it is advised to use models with fewer than 100k bins and particles.
1/ The number of bins in the simulation volume. i.e. volume/particle_radius^3
Dense arrays of bins are maintained for sorting particles. genes, remodelling.
These arrays are use to profuce the dense lists of particles for each kernel, for efficient computation.
2/ The number of particles in the simulation
3/ The number of genes in the simulation will increase the size of the bin arrays(1), and the data per particle(2).
4/ The number of active genes among the particles will increase the size of the dense lists of particles for each active gene (1).
5/ The number of particles subject to remodelling per time step, will incease the size of the dense lists for each remodelling kernel.
NB the simulation volume and particle radius are set in SimParams.txt by :
m_Param[PSIMSCALE], m_Param[PGRID_DENSITY], m_Param[PSMOOTHRADIUS],
m_Vec[PVOLMIN], m_Vec[PVOLMAX].
Where:
m_GridRes = (PVOLMAX-PVOLMIN)*PGRID_DENSITY*PSIMSCALE / (2*PSMOOTHRADIUS)
Total number of bins = m_GridRes.x * m_GridRes.y * m_GridRes.z
For default PGRID_DENSITY=2, PSIMSCALE=0.005, PSMOOTHRADIUS=0.015,
This gives m_GridRes = (PVOLMAX-PVOLMIN)/3
So
10x10x10 vol -> 333 bins
30x30x30 vol -> 9,000 bins
60x60x60 vol -> 72,000 bins
100x100x100 vol -> 333,3333 bins
150x150x150 vol -> 1,125,000 bins
1000x1000x1000 vol -> 333,333,333 bins. Too big for most GPUs, would need multi-GPU on cluster.
Factors affecting prcoessing speed include:
0/ Available GPU blocks and cores per block, after space taken by other GPU contxts for other programs.
1/ The number of bins. These have to be processesd every time step to sort the particle data and generate the dense lists.
2/ The number of particles.
NB there is the possiblity of "solid" and "living" particles into separate lists (using genes), to avoid running elastic and gene kernels on fluid and non-living tissue.
3/ The length of the "active gene" and "remodelling" particle lists per time step.
NB the system of dense lists, provides faster running for models with few active genes per particle, and few particles remodelling per time step.
CPU-only test program to generate example "SimParams.txt" and "particles_pos_vel_color100001.csv" files for specifying models, and a "particles_pos100001.ply" for viewing a model in e.g. Meshlab.
usage:
cd data
check_demo simulation_data_folder output_folder
e.g.
../build/install/bin/check_demo demo check
CPU-only test program to verify the ability to read and re-output models.
New launch program to load data from files, and run simulation on GPU.
usage:
cd data
../build/install/bin/load_sim demo out num_files steps_per_file freeze_steps save_ply(y/n) save_csv(y/n) save_vtp(y/n) debug(y/n) gene_activity(y/n) remodelling(y/n)
usage: load_sim simulation_data_folder output_folder num_files steps_per_file freeze_steps save_ply(y/n) save_csv(y/n) save_vtp(y/n) debug(0-5) gene_activity(y/n) remodelling(y/n)
debug: 0=full speed, 1=current special output, 2=host cout, 3=device printf, 4=SaveUintArray(), 5=save .csv after each kernel.
e.g.
cd data/test
./load_sim ../demo/ ../out/ 10 3 1 y y y
./load_sim ../demo/ ../out/ 10 1 6 n y y
./load_sim ../demo/ ../out/ 1000 10 0 n y y y y y // NB Now '100' frames per timestep x02-x20 snapshot after each kernel. x91 end of timestep. x00 begining of simulation.
./load_sim ../demo/ ../out/ 1000 1 10 n y y y n y // NB now 'freeze_steps' delay the start of particle movement, while heal() forms initial bonds.
./load_sim ../demo/ ../out/ 1000 1 10 n y y y n n
./load_sim ../demo/ ../out/ 1000 1 10 n y y n y y
./load_sim ../demo/ ../out/ 200 3 1 n n y 1 y y // Good launch option for profiling
./load_sim ../demo_10000_1_100_10_10/ ../out/ 100 30 1 n n y
./load_sim ../demo_100000_1_100_100_10/ ../out/ 100 30 1 n n y
./load_sim ../demo_1000000_1_100_100_100/ ../out/ 100 30 1 n n y
If you have Caua-toolkit installed, enter "nsight-sys" at the commandline to open "Nvidia Nsight Systems".
See https://developer.nvidia.com/nsight-systems , https://docs.nvidia.com/nsight-systems/index.html
Follow the nsight-systems InstallationGuide.
NB especially wrt setting /proc/sys/kernel/perf_event_paranoid
https://docs.nvidia.com/nsight-systems/InstallationGuide/index.html#linux-requirements
For a an introduction to how to use Nsight Systems, see "Blue Waters Webinar: Introduction to NVIDIA Nsight Systems" https://www.youtube.com/watch?v=WA8C48FJi3c
Note, setting the "debug" value (0-5) when launching "load_sim" will have a big effect on the memory transfers, flies saved and console output, and therefore the performance.
The .ply files output can be viewed in MeshLab. It is recommended to select the following MeshLab options: NB need old MeshLab (Ubuntu18.04). Version on Ubuntu 20.04 does not work.
Render - Show vertex dots
Render - Render Mode - Wireframe
The .vtp files output can be viewed in Paraview. This allows visualization of all the parameters, and is especially relevant for diffusion of morphogens, epigenetic state, and material properties.
ParaView is a widely used scientific data visualization tool.
ParaView can be downloaded from https://www.paraview.org/
NB GPU conflict:
It is advised to exit Paraview before launching Morphogenesis.
Sometimes other runtime errors arrise if Paraview is still running when Morphogenesis is launched.
Both Paraview and Morphogenesis will both try to use your GPU.
This may result in an _"invalid device context"_ or _"There is no device supporting CUDA"_ error.
This seems to happen specifically where the computer suspends while ParaView is open.
If ParaView fails to release the GPU after being shut down, then it may be necessary to reboot.
This does not arise where the two programs are run on separate machines, as when Morphogenesis is run on a cluster.
Loading the data into ParaView:
load the .vtp file
select the file in the pipeline browser
From the top menu bar, select "Filters->Common->Treshold"
In Properties(Thresold2), in scalars, select FPARTICLE_ID.
Set Maximum to the number of active particles in the simulation.
click "Apply" (green button in Properties)
In the third row of the tool bar, click "zoom to data" icon (four arrows pointing inwards).
NB this is necessary, when unused particles are stored in one corner of the simulation, with FPARTICLE_ID = UINT_MAX.
In "Coloring" select the model parameter of interest
Adjust the coloring scale
Set the background colour:
Select Edit->Settings
In the pop-up window,
select Color Pallette->Background
Choose a colour, click Apply, Okay.
For Volume rendering:
In the top menu bar, select "Filters->Point Interpolation->Point Volume Interpolator"
Select the new "PointVolumeInterpolator" in the pipeline browser
In the Properties pane, select a kernel type, e.g. Gaussian Kernel or Shepard Kernel
In Coloring, select the model parameter of interest
Adjust the coloring scale
In "Volume Refinement", "Representation", select "Volume"
(Alternatively select these in the top menu, second row tool bar.)
In Volume Rendering (near the bottom of the Properties pane), in "Volume Rendering Mode", select "Smart" or "GPU"
Click Apply (at the toip of the Properties pane.
Also in "Volume Rendering"
In "Blend Mode" select between "Compostite/IsoSurface/Slice"
Colour key for F_TISSUE_TYPE: Preset colours
"Cool-warm" "Jet" "Black-body"
Mesenchyme/other tissueType =0 dark blue dark blue black
Tendon tissueType =6 pink yellow red-orange
Muscle tissueType =7 orange orange orange
Cartilage tissueType =8 red-orange yellow
Bone tissueType =9 red pale yellow
Elast lig tissueType =10 dark red dark red white
Note Tissue order in "Demo" from Pos.z=0 along z-axis.
Fixed particles,
mesenchyme,
bone,
tendon,
muscle,
elastic
tissue,
mesenchyme,
actuated particles.
See notes in fluid.h
#define FELASTIDX 14 //# uint[BONDS_PER_PARTICLE +1] 0=self UID, mass, radius. >0= modulus & particle UID
#define FNERVEIDX 15 //# uint
#define FCONC 16 //# float[NUM_TF] NUM_TF = num transcription factors & morphogens
#define FEPIGEN 17 //# uint[NUM_GENES]
FELASTIDX - gives the particle ID of those particles that have elastic bonds to this particle.
FNERVEIDX - gives the ID of the nerve that reads/writes to this particle.
FCONC - gives the concentration in this particle of each diffusable morphogen.
FEPIGEN - gives the epigenetic state of the genes of this particle.
// Buffers:
// NB one buffer created per particle parameter (marked '#'),
// & explicitly allocated in FluidSystem::AllocateParticles ( int cnt ).
//
// Need to list any new parameters here. e.g. for elasticity, diffusion, epigenetics, nerves & muscles.
// NB Epigenetics & diffusion only in morphogenesis mode.
// Elasticity:
// Elasticity requires a list of [3+] attached particles, and a uid for each particle, plus material type &/or modulus
// 32bit unsigned int for Uid - upto 4Bn (4,294,967,296) particles
// 32bit float for modulus (could be more complex e.g. fatigue, nonlinear) (lookup type)
// Require:
// 128bits = 4*32 extra bits/particle (upto 3 'bonds)
// Could use 32bit uint - 8 bits for 256 bond types (look up modulus on exp scale), and 2^24=16,777,216 particles.
// Use 8 bit of self-UID for 2^4=16 mass & fluid radius.
//
// # if elastic force is written to both interacting particles, then the effective number of bonds doubles.
// # i.e. each particle stores three bonds, but the average bonds per atom would be six.
#define BONDS_PER_PARTICLE 3 // +1 for self ID, mass & radius
// Sensorimotor:
// Sensory nerve endings & muscles need type and nerve uid to write/read.
// 32bit uint for i/d, but use 3 bits for type (0)muscle, (1)stretch, (2)pressure/vibration, (3)temp, (4-7)other..
// => 536,870,912 nerve UIDs
// ? 3 bits ? in particle UID => then read nerve ID from 32bit uint array.
// Diffusion:
// Diffusion of morphogens and accumulation of transcription factors, reqires concentration,
// (8bit integer 0-255, 4 per 32 bit unsigned int)
// Need _at_least_ 8 morphogens, and 8 transcription factors => 16/4=4 uints or 8*16= 128bits.
// + rule index (common to most cells, copied to 'local' SMP memory on GPU) => diffusion rate & breakdown rate.
// 128bits
#define NUM_TF 16 // minimum 8 transcription factors + 8 diffusable morphogens
// Epigenetic state requires for each gene:
// (i)current activation, (ii)available/silenced (bool), (iii)spread/stop (bool).
// Could use uint[num_genes=32], and use the 1st 2 bits for booleans & 14 bits for the value (0-16383).
// Or 2 genes per uint, with 4 boolean bits and 6-bit values (0-63).
// If there are 32 genes, then need 32*16bits = 512bits.
// Or 16 genes * 8bits = 128 bits
// (NB there are about 200 cell types in the human body => minimum 8 genes)
// 128bits
#define NUM_GENES 16 // >= NUM_TF NB each gene wll have a list of sensitivities to TFs & morphogens
#define BITS_PER_EPIGENETIC_STATE 8 //i.e. 2 boolean + 6bit activation 2^6=64.
// NB Genome defined at bottom of this file.
// Data costs:
// Lean simulation, 16M particles, 16 TF or morphogens, 16 genes:
// 128 + 128 + 128 = 384 additional bits for minimal morphogenesis at minimal resolution => 960 bits/particle.
// 3bonds*(8bits log modulus + 24bit uid) + self(4bit mass + 4bit radius + 24bit uid) +
// 16(TF + morphogens) * 8bit concentration + 16genes * (2 bool + 4bit activation)
// Elastic only simulation
// 128 additional bits/particle for minimal elastic simulation => 704 bits/particle.
// Extra bonds
// Could use 6 bonds per particle, and drop FNBRCNT,FNBRNDX : 224 + 128 + 128 + 576 - 32 = 1024 ie:1k/particle.
// Rich simulation 4Bn particles (7.5x spatial resolution), 32 genes etc.
// 8*64 + 256 + 256 = 1024 additional bits/particle for a 'rich' simulation => 1600 bits/particle.
// 8bonds * 32bit uid + 8bonds*32bit modulus + 32TF*8bits + 32genes*8bits
// NB target architectures:
// Tesla P100, 3584 cores, 1.Ghz, 16Gb, 732GB/sec, 10.6 TeraFLOPS single, 21.2 TeraFLOPS half precision, copmute capability 6.0
// RTX2080-Ti, 4352 cores, 1.545Ghz 11Gb, 616GB/sec, 11.75TFLOPS single, 23.5TFLOPS half
//
// base morphogenesis -> 16GB / 960 bits * 2 fullcopy = 8M particles on one card's memory.
// 'Rich' " -> 16GB / 1600 bits " = <5M
// speed 732GB/sec / 16GB = 45.75fps (upper bound data speed)
// NB spatial resolution:
// 4M^(1/3) = 158.7 , 8M^(1/3) = 200 particles per side of cube.
// 4M/card = 1,600,000,000, max Bracewell, => 1,169 particles per side of cube.
// bits 4*96 + 2*32 + 8*16 = 576bits/particle : fluid only
#define FPOS 0 //# 3DF 96 particle buffers //list of meaninful defines for the array of pointers in FBufs below.
#define FVEL 1 //# 3DF velocity
#define FVEVAL 2 //# 3DF half step in velocity - used in numerical integration
#define FFORCE 3 //# 3DF force
#define FPRESS 4 //# F 32 pressure
#define FDENSITY 5 //# F density
#define FAGE 6 //# Ushort 16 particle age.
#define FCLR 7 //# uint 16 colour
#define FGCELL 8 //# uint grid cell
#define FGNDX 9 //# uint grid index
#define FGNEXT 10 //# uint
#define FNBRNDX 11 //# uint particle neighbors index (optional)
#define FNBRCNT 12 //# uint particle neighbors count
#define FCLUSTER 13 //# uint
// additional buffers for morphogenesis
#define FELASTIDX 14 //# uint[BONDS_PER_PARTICLE +1] 0=self UID, mass, radius. >0= modulus & particle UID
//#define FELASTMOD //# uint[BONDS_PER_PARTICLE +1] modulus of bond (use a standard length) //not required
#define FNERVEIDX 15 //# uint
#define FCONC 16 //# float[NUM_TF] NUM_TF = num transcription factors & morphogens
#define FEPIGEN 17 //# uint[NUM_GENES]
// original buffers continued
#define FGRID 18 //! uniform acceleration grid
#define FGRIDCNT 19 //! grid count
#define FGRIDOFF 20 //! grid offset
#define FGRIDACT 21 //!
#define FSTATE 22 //# uint
#define FBRICK 23 //!
#define FPARAMS 24 //! fluid parameters
#define FAUXARRAY1 25 //! auxiliary arrays (prefix sums)
#define FAUXSCAN1 26 //!
#define FAUXARRAY2 27 //!
#define FAUXSCAN2 28 //!
#define MAX_BUF 29 //!
// Genome
// (common to most cells, copied to 'local' SMP memory on GPU) for each gene:
// (i) mutability, - Not used during morphogenesis
// (ii) delay/insulator, - Needs to set an epigenetic counter
// (iii) sensitivity to inputs - Transcription factors/morphogens, stress/strain cycles (stored in a TF)
// (a) array of sensitivities to each TF/Morphogen => increment activity of the gene (recorded in the epigenetics)
//
// (iv)cell actions - the function of the gene is called, with activation level as its parameter.
// - only functions of active genes are called.
// (a)secrete morphogen, - the product of the gene
// (b)move, - exert force on neighbouring cells, according to concetration gradient of a morphogen
// (c)adhere, - form elastic bonds - to a particular cell type.
// (d)divide - split/combine particles - NB elastic links.
// (e)secrete/resorb material - change mass, radius, modulus, viscosity of particle.
// NB we don't have a particle-wise viscosity param ...yet
struct FGenome{
uint mutability[NUM_GENES];
uint delay[NUM_GENES];
uint sensitivity[NUM_GENES][NUM_GENES]; // for each gene, its sensitivity to each TF or morphogen
uint difusability[NUM_GENES][2];// for each gene, the diffusion and breakdown rates of its TF.
//uint *function[NUM_GENES]; // Hard code a case-switch that calls each gene's function iff the gene is active.
}; // NB gene functions need to be in fluid_system_cuda.cu
// Multi-scale particles
// - use radius parameter in FELASTIDX buffer, 0=self UID, mass, radius.
// - requires a split/combine kernel
// 1st order - all spheres, not 2nd order - elipsoids with orientatoin and angular momentum.
// When finding particles in range for a small particle - consider (i) is the bin in range, (ii) iff particle x&y&z are in range
// When combining particles - (i) similar type ?, (ii) gradients, (iii) significance to simulation.
New kernels are being written force
-
elastic forces & breaking elastic bonds
-
diffusion
-
particle automata behaviour - controlled by epigenetics
-
connecting a nervous system
-
visualization