Stepping algorithm and physics management

Guiding principles

Kernels should operate on minimal data to decrease register pressure.
Group kernels and code so that similar code from disparate processes/models can happen inside a single parallel invocation rather than one per model/process.
Kernel data dependencies are assumed to be on the previous kernel unless otherwise noted (i.e. launches are all synchronous).
I'm focusing on only the "inside" of the stepping loop, not on initialization of tracks from primaries/secondaries, nor on output of hits etc., so the device/host transfers are limited.

There will be many ways we can restructure kernels (combining or extracting), and several "optimizations" I've proposed here can be omitted for a first pass.

Restrictions

We start based on the standard EM physics models and processes and don't worry about anything else for now. If we need a new set of capabilities, we review our assumptions with a chunk of new abilities rather than one at a time, for efficiency.

The models have either zero or one multiple scattering (MSC) processes. If MSC is enabled, then it is invoked as part of every step.
The only other along-step processes are simple energy loss processes that locally deposit energy without emitting secondary particles.
There's no restriction on which processes have discrete interactions:
- Energy loss processes emit delta rays occasionally

In this formulation, processes are loose groupings of models. Processes should be allowed to apply to multiple particle types, but each particle type should correspond to a set of models with contiguous ranges of energy applicability. So for a given particle, a single energy range should describe the range of validity for the process; and that range should be exactly divided into a set of models corresponding to subsets of that range. The exact value of zero for a truly "at rest" process should be a valid energy range, so I suggest ranges be open on the lower end and closed on the upper end. This choice also might make threshold reactions less problematic (avoiding potential zeros in the exiting particle's energy).

The EM processes have only "instantaneous" at rest process, rather than competing decay channels, so some of the stuff about decays here can be removed for the simple first pass.

Step lengths

Physical step lengths are determined by energy- and material-dependent properties. (Step lengths can be also limited geometrically, by the global geometry definition, tally meshes, etc.) The steps can be limited by:

Deterministic range limitation by an energy loss process -- using a range lookup table modified by a simple range limiter to ensure the validity of the approximation that the cross sections are approximately constant over a step
Stochastic sampling of distance for a discrete interaction
Stochastic sampling of time to a decay (if particle is stopped)

Stepping algorithm

Each subsection below I expect to be a single kernel (or, for the interaction, a series of kernels launched in parallel).

Interaction length

At the beginning of the step, tracks that have been stopped but not killed (i.e. "at rest", waiting to decay or annihilate) may still await an interaction.

State input:

Particle type and energy
RNG

Setup:

Material ID must be calculated from the current volume ID

Sample the exponential distribution of interaction lengths to the next discrete interaction, and store it. (Don't resample if the stored value is valid.)
Calculate the total mean free path for applicable discrete-interaction processes: loop over processes that apply to the particle, storing and accumulating the cross sections at the current energy in the current material. Calculate (from cross section and remaining MFP) and locally store the distance to the interaction. Save the cumulative path length (CPL) of the interaction to the state data.
Calculate step limiters for applicable energy loss processes. Locally (within the kernel) find the minimum step length over these and the minimum distance to the next interaction.
The minimum step (interaction or step-limited) is used to save the CPL for the end-of-physics step for the particle in the "sim" state data.

The mean free path and step limiter calculations both (at least for EM physics) will use the physics lookup tables exported from Geant4 and will use the same data structures for finding/interpolating (material, energy) -> value.

Assume cross sections always need to be recalculated. Transporting through multiple consecutive volumes with the same material should happen internal to the propagator.

At-rest processes

"Stopped" tracks (those that have slowed down to an energy of zero) should calculate decay constants \lambda [1/s] instead of cross sections \sigma [1/cm]. Sampling from this discrete distribution (weight proportional to fraction of lambda) is a fair method of determining the decay process among multiple competing decay channels. Instantaneous processes (such as annihilation) would have an infinite decay constant.

Tracks at rest will not move (step length of zero) so should have a minimum step of zero -- one easy way to do this would be to set the remaining "interaction length" to zero for stopped particles. With zero-length steps, propagation and other kernels should "naturally" do the right thing.

Physics vector calculations

For now, start with simple static polymorphism for the tables/vectors. An enum value will correspond to a class (e.g. the uniform-in-log-E XS calculation method used in the Klein-Nishina interaction).

Multiple scattering

Multiple scattering downsizes a physical step length to a "geometrical" step length: a single smooth line (the particle's natural path, whether in a magnetic field or not) has become a shorter wiggly line (due to condensed collisions) with the same path length. The distance between the start and end points of this line is (naturally) shorter than the original length, and the end point of the step gives the particle track a displaced position and a perturbed direction.

Geant4 models MSC with tight coupling between the geometry and the multiple scattering model. I'd like to avoid this tight coupling if possible to minimize register occupancy and maximize flexibility of the MSC model. One way to do this might be to have the MSC kernel calculate multiple possible step lengths (fractions of the proposed physics minimum step, e.g. 1/8, 1/4, 1/2) and interpolate the displacements/direction changes/energy losses based on the material boundaries in the geometry.

Propagation and material update

The propagation kernel should handle the geometry aspects of moving along a step (so it's like the "forced" along step in Geant4).

The trick with propagation is that with magnetic fields in play, you can't a priori find the next geometry intersection point without interrogating the geometry along that path, and the easiest way to do that is to actually change the geometry state.

State input:

Physical step limit
Geometry state
Particle charge and speed (for magnetic field)

State change:

Geometry position, direction, volume ID

Output:

Physical distance traveled (for time updates, slowing down, CPL update)
Post-step material ID (or flag indicating it must be recalculated from the updated volume)

Instantiate a propagator based on whether the track/particle is subject to a field.
Propagate up to the remaining physical step limit. Propagation returns the new volume and the distance traveled.
If the distance traveled is less than the remaining physical step, the particle has entered a new volume. Use the volume-to-material mapping to update the material ID, but temporarily store the previous ID for comparison. An "exterior" (invalid) volume ID should translate to an "exterior" material ID.
If the material ID has changed from its previous state, propagation is complete (new material properties are needed to calculate range, cross sections, etc.).
If the remaining physics interaction distance is zero, propagation is also complete.

Depending on how we implement sensitive detector volumes (tallies), we may need to stop at new-volume boundaries even if they have the same materials. (And it may be that only energy deposition matters, so if they are streaming without interacting that's OK.)

There also may be various advantages and disadvantages to where and when to use differential step lengths vs accumulated path length.

Slowing down

The "slowing down" step should encapsulate the physical changes to the track along the step.

State input:

Step length

State change:

Particle energy
Particle CPL
Particle time
Accumulated step energy deposition (?)

Sum dE/dx over all continuous processes that apply to the particle. Multiply by the distance traveled in the step and subtract from the particle's energy, clamping to a minimum of zero.

(QUESTION: for improved accuracy do we instead scale the step length backward if the particle slows to zero? Or would that require another iteration with geometry?)
If the particle slowed to zero (and wasn't zero before the start of the step), cancel the potential applicability of the discrete process by setting its interaction length/CPL to infinity.
Update the particle's simulation time based on the length traveled and the average particle speed.
Update the interaction time for at-rest particles. We should be able to reuse the "remaining interaction lengths" sampled in the first kernel, dividing by the total calculated decay constant.
Reduce physical path length and remaining mean free path.

Discrete model selection

This can probably be integrated into the slowing-down kernel since it won't rely on any grid synchronization and will have the same extents (run over all tracks) and a subset of its data requirements.

State input:

Particle type, energy
Pre-step material
Previously calculated per-process cross sections (or decay constants)
RNG

State change:

Applicable model ID

Output:

Shared array of flags indicating applicable physics models

Setup:

Array of flags for applicable physics models must be initialized to zero before this kernel.

If the particle CPL is equal to the discrete physics interaction CPL (should not be 'greater than'), the particle is interacting. Otherwise, save an unassigned ModelId{} to the selected model in the track state and exit.
Sample from the previously calculated per-process cross section/decay to determine the applicable process ID.
From the process ID and (post-slowing-down) particle energy, we obtain the applicable model ID. Save it to the track state, and set the shared flag for the model to "true" (no atomics needed btw).

After kernel:

Device-to-host transfer of the shared array of selected models

The map of process + particle + energy -> model ID can be implemented as a linear search for now, but it could also be done with a hash table (with a grid lookup like we do in shift for energy). However in practice I think there will be few enough models that linear searches will be OK.

Interaction

The process manager launches the virtual interact() methods for all of the models that need to be applied, based on the array of selected models transferred back to the host. Since the "selected model" step is an optimization (with the trade-off that it requires a small but blocking device-to-host memory transfer) we can skip it for now and implement later during the optimization phase.

The virtual host interact() kernel launch method will give the kernel itself a lot of flexibility. After implementing two or three kernels we should consider if we can create one or more templated launcher/kernel pairs to reduce code repetition, since I imagine most kernels will look like this:

Sample the interacting element by calculating cross sections provided by the model.
Instantiate and call the Interactor class.
Save the resulting interaction.

There are a couple of different ways we could allocate/store interactions (since only a subset of tracks will actually be selected by an interaction kernel) but the easiest way to do this first is just having an array of interactions the size of the number of tracks.

A track-sized array will likely have plenty of padding for secondary production, and few/none of the interactions we'll initially implement will emit more than 2 secondaries. Therefore our initial implementation of the model kernels will not have to worry about saving the selected element and restoring if secondary allocation fails, if we reserve 2 * num_tracks space for secondaries.

Cutoffs and secondary processing

During the physics setup step, we should construct a list of particles that have at-rest processes (i.e. those which apply at zero energy). Stopped tracks that do not have any at-rest processes should be killed, I presume. (Or we just need error checking that no zero-energy particles without at-rest processes are produced -- some of the models explicitly kill the incident particles via absorption.)

Models have minimum energies for emitted particles (gammas at least)
Each material can independent per-particle cutoff energies (applied in post-step do it)

State change:

Accumulated step energy deposition (?)

Kernel:

A track with an energy below any material/particle cutoff should be marked for absorption.
Interactions with post-interaction energy below the particle's cutoff should be absorbed (energy transferred to accumulated step).
Update particle state from the interaction (change energy, set direction).
If the particle is at rest and there are no at-rest processes that apply, kill the particle.

We can probably eliminate an explicit "no at rest" processes by making sure there's a nonzero global energy cutoff for such particle types (electrons, gammas, ...).

Hit scoring and such

If the pre-step particle volume is sensitive (could just have a bit field marking "sensitive" volumes for easy O(1) access), then allocate a Hit object with the particle's end-of-step position, direction, energy, local energy deposition, event ID, track ID, etc.

Data structures and design

Models

All models that we need to worry about are "interacting" models -- ones that produce secondaries or have discrete interactions. Each Model CPU class needs to provide:

A set of (particle type, emin, emax) tuples for which the model applies
A set of (particle type, emin) cutoffs for secondary production

Process manager

Persistent (params) physics data owned by the manager class will include several maps, which in practice will probably be simple arrays due to the decision to use consecutive integer IDs to represent particles, materials, processes, and models:

{particle type: [applicable continuous, interacting process IDs], ...}
{interaction process ID: [(model ID, particle, emin, emax), ...], ...}
{(process ID, particle, material): (xs, de/dx, range, ...), ...}

State data:

Remaining number of interaction lengths to the next discrete process
Calculated cross sections for each discrete process
Cumulative path length (CPL) of the next discrete process
CPL or distance for the allowed physics step (minimum of discrete length plus range/step limiters from continuous processes)
Selected model ID (if interacting)
Selected element ID (during interaction)

Add to "sim" state data:

Current CPL for the track (starts with zero at birthplace)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stepping algorithm and physics management

Guiding principles

Restrictions

Step lengths

Stepping algorithm

Interaction length

At-rest processes

Physics vector calculations

Multiple scattering

Propagation and material update

Slowing down

Discrete model selection

Interaction

Cutoffs and secondary processing

Hit scoring and such

Data structures and design

Models

Process manager

Clone this wiki locally