Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
Makefile
NOTES
README
hex.c
hex.h
hex.i
hex24.c
hex5.c
hydram.c
hydram.h
regul.c
regul.h
store.c
test.i
tools.c
tools.h
yhex.c
ylmdec.i

README

Overview
--------

This Yorick package tracks straight rays through a 3D mesh.  The
inputs are a mesh and a set of rays.  For each ray, the outputs are an
ordered list of the points along the ray where the ray cuts through
faces, and a corresponding list of the indices of the cells crossed.

To specify a ray, I need two 3-vectors p and q -- a point on the ray
and the ray direction, respectively.  (Internally, I normalize q to
unit length, but for convenience I do not require normalized q on
input.)  In the following formulas, I assume q is normalized.  The
distance s from p parameterizes the ray: r = p+q*s is a point on
the ray.

The result of the trace is a list of s values where the ray cuts cell
faces, and between each pair of s values, the index c of the cell the
ray is traversing there.  Hence, there is one more value of s than
value of c; I store the count of intersections ni as the first item in
the list of cell indices.  The result of tracking a ray through the
mesh consists of two lists: a list of ni c indices, and a list of ni s
values (where the first c value is ni).  When I track multiple rays, I
can combine the results into a single pair of long lists:

  s= [sA0, sA1, sA2, ..., sB0, sB1, sB2, ..., ..., sZ0, sZ1, sZ2, ...]
  c= [niA, cA1, cA2, ..., niB, cB1, cB2, ..., ..., niZ, cZ1, cZ2, ...]

where niA is the number of intersections for ray A, niB the number for
ray B, and so on, and c(L,m) is the cell index for ray L between
s(L,m-1) and s(L,m).

I use two conventions for exceptional cases: First, if ray M misses
the mesh entirely, I put niM==1 rather than 0, and use the value of
sM0 as a diagnostic flag to explain why no entry point was found.
Second, I use negative ni values to indicate the previous ray has
re-entered the mesh; the correct count is abs(niR) for the re-entrant
section of the ray.  For example, if ray B reenters twice, the c list
looks like this:

  [...,  niBa, cBa1, cBa2, ..., -niBb, cBb1, cBb2, ...,
        -niBc, cBc1, cBc2, ...,  ...]

Often the exact entry and exit points s are unimportant; all that
matters is the distance ds the ray spends inside each cell.  The full
two list solution can be reduced to three lists in that case: A list
of nc = ni-1 (or sum(abs(ni)-1) for a reentrant ray), the count of
cells traversed by the ray, and the corresponding cell index dx lists:

  nc = [ncA, ncB, ..., ncZ]
  cc = [  cA1,     cA2,   ...,   cB1,     cB2,   ..., ..., cZ1, cZ2, ...]
  ds = [sA1-sA0, sA2-sA1, ..., sB1-sB0, sB2-sB1, ..., ...,
         sZ1-sZ0, sZ2-sZ1, ...]

These would be appropriate for solving a transport equation; with the
cc array, I can look up the values of the emission and absorption;
with the ds array, I can form the transmission exp(-absorption*ds);
with the emission, transmission, and nc arrays, I can find the total
emission and transmission for each ray.

Note that this approach assumes emission and absorption are
cell-centered quantities.  The result of the tracking operation would
need to be much larger to be able to interpolate nodal emission and
absorption to the precise intersection points during the trace -- I'd
need to return four node indices and three node weight fractions for
each intersection point.  Also, I can't think of an easy way to
generalize such a scheme to a mesh with arbitrary cell shapes; the
cell-centered scheme I have implemented works just as well for
arbitrary polyhedral cells as for hexahedral cells.


Tetrahedra
----------

Tracking a straight ray through a tetrahedron is so easy, that the
fastest way to track a ray through an arbitrary polyhedron is probably
to decompose it into tets.

In a tetrahedral mesh, each cell face is a triangle shared by two
cells (or one cell if it is a mesh boundary).  The ray enters a tet
through one of its four faces, and exits through one of the remaining
three faces -- therein is the great simplifying feature of tets: the
ray will never hit the other two faces, nor can it exit through the
same face it entered.  The three nodes of the exit face are shared
with the tet the ray now enters; just replace the fourth node of the
exited tet by the fourth node of the entered tet to continue tracking.

To restate, the ray has just entered the face determined by nodes n0,
n1, and n2; the fourth node of the tet it is about to cross is n3.
Decide whether the ray exits through face 013, 123, or 203 -- which
you can specify compactly by saying whether node 2, 0, or 1,
respectively, is to be discarded.  Swap n3 with the discarded node, so
that n0, n1, n2 are now the exit face.  Finally, load n3 with the
fourth node of the tet you just entered, and you've tracked the ray
across one tet.

Although you are tracking through 3D, this tracking operation actually
takes place in 2D: Project the mesh into a plane perpendicular to the
ray.  In that plane the ray is a point, and all the tets are squashed
flat; the tracking problem becomes simply this: Given that the
ray-point is inside triangle 012, determine which of 013, 123, or 203
also contains the ray-point.  The fastest algorithm to make that
decision relies on knowing that triangle 012 has positive area (neither
zero nor negative), and on the ray-point being (0,0) -- both of which
are easy to arrange.  Then do this:

   is 0X3 negative? (X means 2D cross product, x0*y3-x3*y0 < 0?)
   if so,
      is 1X3 negative?
      if so,  discard 0, move n3 to n0
      if not, discard 2, move n3 to n2
   if not,
      is 2X3 negative?
      if so,  discard 1, move n3 to n1
      if not, discard 0, move n3 to n0

Thus, it costs only two 2D cross products (a total of 4 floating point
multiplies and 2 floating point adds) to get across the tetrahedron.
Note that moving n3 into the discarded point automatically preserves
the condition that 012 have positive area.

There is are two exceptional cases: one or both of the two cross
products will be zero if the ray exits exactly through an edge or the
apex.  In those cases, I discard the point that makes the ray exit the
face with the larger or largest (projected) area.  This prevents
exiting into a degenerate zero area triangle, but it is extremely
expensive relative to the above usual cases -- hopefully it will be
rare in practice.

Now, to actually advance into the next tet, you need to retrieve the
its fourth node, project it into the plane perpendicular to the ray,
and displace it to put the ray at (0,0).  If the point is x, this
operation is Q*(x-p), where p is the point on the ray, and Q is a
2-by-3 projection matrix.  If you do the obvious thing, this would
mean 6 multiplies and 7 adds -- substantially more expensive than
getting across the tet!

I do only a partial projection, in order to greatly reduce the cost
of retrieving the next node.  My idea is to use the two unrotated
mesh coordinates most nearly normal to the ray direction "as is",
only adjusting them by a shear operation so that the ray stays at
(0,0).  That is, instead of a full Q*(x-p), I do this:

   zz = z-pz
   yy = y-py - zz*qy/qz
   xx = x-px - zz*qx/qz

where p = (px,py,pz) is the point on the ray, q = (qx,qy,qz) is the
ray direction, and xyz have been cyclically permuted so that qz has
larger absolute value than either qx or qy.  Since I can precompute
the ratios qx/qz and qy/qz, this partial projection costs only 2
multiplies and 5 adds.

The 2D coordinates xx and yy are not an orthogonal coordinate system,
but they cannot be dangerously close to singular either, since qz was
chosen to have the largest absolute value.  The above test to
determine exit face works just as well in any coordinate system,
orthogonal or otherwise.

This is all you need to track the ray topologically through the mesh
-- that is, to generate the list of cells c the ray crosses.  However,
you also need the points s where the ray crosses the faces.  Note that
unless the original mesh was tets, you only need to compute the point
s (or record the cell c) when you cross a triangle which lies on a
face of the original mesh -- the tetrahedral decomposition will
usually lead to many triangles interior to a real mesh cell.  The
calculation of s turns out to be very expensive relative to the rest
of the tracking.  The partially projected coordinate system (xx,yy,zz)
is handy for that task as well.  In those coordinates, the parametric
equation of the ray is simply (0,0,s*qz); the question is where this
line intersects triangle 012 (xx0,yy0,zz0), (xx1,yy1,zz1), etc.
Parametrically, the triangle is:

   xx = xx0 + u*(xx1-xx0) + v*(xx2-xx0)
   yy = yy0 + u*(yy1-yy0) + v*(yy2-yy0)
   zz = zz0 + u*(zz1-zz0) + v*(zz2-zz0)

where 0<u<1, 0<v<1, and 0<u+v<1.  Thus, (0,0,anything) is at

   area = (xx1-xx0)*(yy2-yy0) - (xx2-xx0)*(yy1-yy0)
   u = (xx2*yy0 - xx0*yy2)/area
   v = (xx0*yy1 - xx1*yy0)/area

The ray intersection point is at:

   s = (zz0 + u*(zz1-zz0) + v*(zz2-zz0))/qz

which comes at the cost of 1 reciprocal, 11 multiplies, and 11 adds
(assuming you precompute 1/qz).  Note that u and v are really
interpolation coefficients, which allow you to compute any linear
function defined on the nodes, not just z.  They cost 1 reciprocal, 8
multiplies, and 7 adds by themselves.


Hexahedral mesh
---------------

A hexahedral mesh may include cells which are hexahedra, or degenerate
hexes, including tetrahedra, pyramids, or triangular prisms.  As long
as the fraction of degenerate cells is fairly small, special handling
of degenerate cases would not significantly reduce tracking time.

I coded three tetrahedral decompositions of a hexahedral mesh:

(1) The 5 tet decomposition.  When all hex faces are planar, this
choice is best.  When there are non-planar faces, the 5 tet
decomposition is not unique; there is a switch to try both choices.
The decomposition is as follows: Draw either diagonal of one face; in
the four adjacent faces, draw the diagonals which share a corner with
the original diagonal; in the opposite face, draw the diagonal
connecting the two corners shared with the adjacent diagonals.  Those
six diagonals are the six edges of the "body tet"; each of the four
corners not on any of those diagonals is the apex of a "corner tet".

The ray always enters a corner tet.  It can exit the hex immediately
(2 chances in 3) or enter the body tet (1 chance in 3), from which it
always enters a second corner tet, from which it always exits the hex.

(2) The face-centered 24 tet decomposition.  This usually slightly
faster than the body-centered 24 tet decomposition.  Either of the 24
tet decompositions gives a unique result, even when the faces are
non-planar.  Begin by finding the centroid of each of the six faces,
and connecting that centroid to each of the four vertices, dividing
each face into four triangles.  (Note that this decomposition involves
6 new nodes, which are not nodes of the original mesh.)  Each of the
24 surface triangles shares one edge with a triangle on the adjacent
face; these are two sides of 12 "edge tets".  Each of the 8 corners of
the original hex, together with the centroids of the three adjacent
faces form 8 "corner tets".  That leaves an octahedron at the center
of the hex (whose 6 corners are the face centroids), which can be
divided into 4 "body tets" in three different ways, by connecting any
of the three pairs of opposite face centroids.

The ray always enters through an edge tet.  It can exit the hex
immediately (1 chance in 3), or enter either of the corner tets
associated with the endpoints of the edge.  From those corner tets,
the ray may move to a new edge tet (2 chances in 3), or into a body
tet (1 chance in 3).  The ray crosses the central octahedron cutting
one, two, or three body tets, and emerges into a new corner tet, from
which it always enters a new edge tet.  From this edge, it can either
exit the hex, or move to yet another corner.  I'm not sure how long
this can go on; however, any tet can be entered at most once, and the
central octahedron can be entered at most once.

(3) The body-centered 24 tet decomposition.  This decomposition
generalizes to arbitrary polyhedral cells; it is unique, but requires
in addition to the face centroids, a seventh point at the body center
-- the mean of all 8 corners of the hex.  As for the face-centered 24
tet decomposition, begin with the 24 triangles, each face quartered by
connecting its vertices to its centroid.  This time, take each of
these 24 triangles as the base of a tet whose apex is at the body
center.

The ray enters the hex, crossing to at least one other before exiting.
Initially, there is 1 chance in 3 of stepping to the adjacent face, 2
chances in 3 of remaining on the same face (but moving to one of the
other three tets on that face).  Thereafter, there is 1 chance in 3 of
exiting the hex, with either 2 chances in 3 of remaining on the same
face (if you just came from the adjacent face), or 1 chance in 3 of
each of remaining on the same face or of stepping to the adjacent
face.  Again, this can go on for quite a while before exiting.

When the ray steps from one hex cell to its neighbor, I fetch the
opposite face of the new hex and perform the partial projection
operation for all four new corners.  Even though up to three of the
new corner coordinates might be unnecessary to track the ray, I judged
that the bookkeeping cost of fetching corners only "on demand" would
usually exceed the benefit.  For the 24 tet decompositions, I compute
the five new face centroids and the new body center after the partial
projection operation (of course); here it certainly pays to compute
all five or six points at once, since the optimizer should find the
many common subexpressions.

In addition to all this, I coded a far faster algorithm applicable to
meshes formed by three orthogonal sets of parallel planes aligned with
the xyz axes.  See regul.h and the reg_track function in hex.i for
details.


Entering the mesh
-----------------

In a 2D mesh with NxN cells, the mesh boundary has of order 4N edges,
while a typical ray might cut 2N edges.  In the past, we have done an
exhaustive boundary search, checking whether each ray intersects each
boundary edge.  A typical 2D ray trace therefore spends about equal
time finding the entry points as it does tracking the rays.

If we continued the exhaustive boundary search in 3D ray tracking, it
would become grossly more expensive than 2D tracking: Now the mesh has
NxNxN cells, with 6N^2 boundary faces, while a typical ray might cut
3N interior faces.  An exhaustive boundary search now costs N times as
much as in 2D -- so it gets relatively more and more expensive the
finer your resolution -- while the tracking operation itself has a
comparable cost as 2D.  In my judgment, this entry expense is
intolerable.

The only solution is to assume the mesh boundary is convex.  (If not,
the best solution appears to be to add a few layers of cells to the
mesh boundary to make it convex -- although I don't see a way to
automatically extend a mesh in this way.  Or see the following
section.)  For a convex boundary, there is an algorithm to find an
entry point requiring only order N operations, so that in the worst
case, the entry search cost will be comparable to an exhaustive search
in 2D.  By remembering the entry point found for the previous ray, I
dramatically reduce the cost of finding the next entry point, if the
two rays are near each other.  (If you need to trace a random set of
rays, you can speed up the entry search by a huge factor if you can
somehow sort them first -- but this never gains you more than about a
factor of 2 in overall performance, since in the worst case, the entry
search time is comparable to the tracking time.)

Here's how it works:

Begin with the boundary triangle you entered last time, or with the
first finite area boundary triangle in the mesh if this is the first
ray.  Since the triangle has finite area, you can always find a point
inside the triangle (in its own plane) which does not lie on the ray.
That point and the ray together determine a plane E.  The E plane cuts
the convex mesh boundary in a convex curve (it must cut the boundary
somewhere, since by construction it contains one point of a boundary
face).  The idea is to circumnavigate this curve, until you either
come to the place where the ray enters it, or prove that the ray never
enters.  If the ray enters the E plane curve, it must also exit the
curve, so you need to distinguish between those two cases.

The partially projected coordinates are once again ideal for
performing the entry search.  Again, the zz coordinate is irrelevant.
The E plane is a line through the (xx,yy) origin (i.e.- containing the
ray).  The E plane is therefore

   ex*xx + ey*yy = 0

where (ex,ey) are the partially projected coordinates of the normal
to the E plane.  Notice that this 2D dot product in the non-orthogonal
partially projected system has the same value as the ordinary 3D dot
product in the orthogonal mesh coordinate system:

   ex*xx + ey*yy = ex*(x-z*qx/qz) + ey*(y-z*qy/qz)
                 = x*ex + y*ey + z*(-ex*qx-ey*qy)/qz
                 = x*ex + y*ey + z*ez

where the the normal to E in mesh coordinates is

   (ex,ey,ez) = (ex,ey,(-ex*qx-ey*qy)/qz)

Note that (ex,ey,ez) dot (qx,qy,qz) is zero -- the E plane contains
the ray.  If (fxx,fyy) are the partially projected coordinates of the
point in the initial triangle which does not lie on the ray, then

   (ex,ey) = (-fyy,fxx)

is in fact a normal to E, even though (xx,yy) are not orthogonal
coordinates: If the point is (fx,fy,fz) in mesh coordinates (with the
point p at the origin for simplicity), then the cross product with
the ray direction q is normal to E, so (qy*fz-qz*fy, qz*fx-qx*fz,
qx*fy-qy*fx) = qz*(-fyy, fxx, (fyy*qx-fxx*qy)/qz) is in fact the
normal to E sought.

Now, the mesh boundary faces can be triangulated, just as the mesh
itself can be decomposed into tets.  In partially projected (xx,yy)
coordinates, each vertex lies either to the right or left of the E
plane according to whether ex*xx+ey*yy is positive or negative.  The E
plane cuts exactly two of the three edges of a boundary triangle.  If
the nodes of the current edge are n0 and n1, and the third point of
the triangle is n2, and if d0 = ex*xx0+ey*yy0 >= 0 and d1 =
ex*xx1+ey*yy1 <= 0 (and not both zero).  You then compute d2 =
ex*xx2+ey*yy2.  If d2>0, move to the 21 edge, swapping n2 for n0.  If
d2<0, move to the 02 edge, swapping n2 for n1.  In the degenerate case
that d2=0, I keep the point with larger absolute value, discarding the
smaller.

This topological part of the entry search algorithm is again extremely
cheap -- 2 multiplies and 1 add to cross each triangle.  After crossing
the triangle, it costs considerably more to fetch the next n2 node and
partially project its coordinates.  (For hex meshes, I fetch both points
of the opposite edge of the new quad face at the same time.  I always
use a simple diagonal triangulation of the quad for the entry search,
even for the 24 tet decompositions.  In that case, once I've entered
a face through a diagonal triangle, I can just cross one or two tets
to figure out which of the four face triangles I really entered.)

At each step, you need to check to see whether you stepped over the
entry point.  Since ex*xx+ey*yy=0 on the E plane, xx and yy are
proportional, and it suffices to keep track of only one -- the one
with the larger absolute value, which corresponds to the smaller
absolute value of ex or ey.  Suppose the larger absolute value is xx;
it's value at the current edge intersection is xx = (xx1*d0-xx0*d1) /
(d0-d1) -- another divide, 2 multiplies, and 2 adds.

Basically, you just wait until the sign of the intersection xx
changes.  However, there are two catches: First, the boundary curve in
the E plane has two branches, one corresponding to ray entry, the
other to exit.  Second, the ray may miss the mesh entirely, and you
need to detect that in order to avoid orbiting in the E plane forever.
The idea for coping with these complications is to notice that you can
figure out once and for all whether increasing or decreasing xx
corresponds to the entry point you seek.  Once you've done that, the
algorithm to check whether to halt the entry search at the current
edge is:

   dxx = xx_this_edge - xx_last_edge
   does sign(dxx) match ray_entry_sign?
   if so,
      does sign(xx_this_edge) differ from sign(xx_last_edge)?
      if so,
         QUIT - ray enters current boundary triangle
      if not, (**)
         does sign(dxx) match sign(xx_this_edge)?
         if so,  QUIT - ray misses the mesh
         if not, set entry_side_flag
   if not, (**)
      is entry_side_flag set?
      if so,
         QUIT - ray misses the mesh

Actually, I had serious problems with the branches marked (**): those
decisions are too sharp, and can be fooled by roundoff errors even in
very regular meshes.  To solve that problem, I needed to compute a
"typical very small" scale length for the mesh; I skip the (**)
branches unless abs(dxx) exceeds the precomputed mesh scale.

The remaining rather nasty problem is to compute the initial values of
ray_entry_sign and entry_side_flag.  If da is the area element of the
initial triangle, pointing toward the interior of the mesh, then
entry_side_flag is set if da dot q is positive, but not set if
negative or zero.  Actually, I don't set entry_side_flag unless the
dxx across the initial triangle exceeds the mesh scale length just
mentioned.

The initial value for ray_entry_sign is computable from the sign of
fxx (or ey), the sign of qz, and a knowledge of the initial
orientation of the first boundary triangle.  Working out this sign was
excruciatingly difficult for me, and I can't really describe (or
certify) my algorithm, although it's only four short lines of sign
comparisons.  The underlying idea is to look at the sign of the
dot product of the area element da with the ray direction, but the
case in which that dot product is zero (or even the wrong sign owing
to roundoff) also must and can work cleanly and unambiguously.  The
only case that becomes problematic is when the ray lies in the plane
of the initial triangle -- and that really should be an unescapable
problem; it basically means the ray will miss the mesh for sure.


Alternative ray start strategy
------------------------------

If you only need to track rays as far as their first exit from the
mesh, there is a fast alternative strategy for initializing the ray
tracking, which does not rely on the convexity of the mesh boundary.
The idea is to forget about finding where the ray enters the mesh, and
instead assume that the given point on the ray lies interior to the
mesh.  If you specify some cell deep inside the mesh, you can track a
pseudo ray from any point interior to that given cell (say its
centroid) to the given point on the ray.  As long as the segment
connecting the two points lies entirely within the mesh, this will
succeed.  Now you know what cell contains the given point on the ray,
and you can track it from there as usual.

To initialize both the pseudo-ray tracking and the actual ray tracking
by this method, you need to find the point at which a ray containing a
known interior point of a cell enters that cell.  The easiest way to
do this is to exhaustively search all of the face triangles for the
cell, selecting the one that the ray cuts earliest (at smallest s).
For the 5-tet decomposition this means checking 12 triangles; for the
24-tet decompositions, there are 24 triangles to check.

This strategy only results in tracking the ray beyond its given point;
to track it the other direction, you can track the ray with the same
given point but opposite direction.


Reflecting boundaries
---------------------

Many or most 3D meshes include reflecting boundaries.  When a ray hits
a cell face at such a boundary, I reflect the ray from the plane of
the triangle it hit.  This will do weird things if the face of the hex
is not planar.  When the ray reflects, both p and q change, which
means the partial projection of all of the nodes must be redone.
Recall that the partial projection also involves a permutation of the
mesh coordinates to give qz the largest absolute value; that
permutation may also change.

All this complexity means that reflection is far costlier than just
cutting through an ordinary cell face.  I don't see any way to avoid
this, except by slowing down the tracker on ordinary faces to the
speed at reflecting faces (for example, by doing a full projection
rather than partial projection).  I did make the reflection function
recognize as special cases reflections from a plane normal to one of
the coordinate axes -- so if you can line up all or most of your
reflecting boundaries with the coordinate axes, I'll track through
your mesh much faster.

Reflecting boundaries create a more serious problem: the entry search
becomes even more conceptually difficult.  If you zone up, say, an
octant of a sphere with three reflecting planes, you want to be able
to shoot rays at your mesh that miss the octant you actually zoned,
but "enter" one of the seven reflected images.  The number of images
can be very large -- 120 in one popular spherical zoning scheme.  I
don't want to attempt a "gestalt function" that tries to look at such
a mesh and figure out automatically what all the images look like;
there mightn't even be a finite number (e.g.- one radian in phi).

Instead, I treat the entry search the same way as the main trace: As I
walk along the boundary curve in the E plane, when I encounter a
reflecting boundary, I reflect the ray.  I also need to reflect the E
plane.  As in the main trace, all the partially projected mesh
coordinates must be recomputed, but now the state information in the
halting algorithm, xx_last_edge and ray_entry_sign must also be
recomputed.  Pretty tedious, but it works in the end, even for the
meshes with 120 reflected images, without any gestalt programming.

But worse is yet to come: Remember that one of the key parts of the
entry search algorithm was to use the entry point found for the
previous ray as the starting point for the entry search for the
current ray.  If the previous ray reflected thirty times before it
entered the mesh, I have to somehow remember those thirty reflections
and apply them to the new ray before I can even begin its entry
search.

For each reflection, I already need to update both p and q for the
ray, and during the entry search, I need to update the E plane normal
e as well.  If I remember the initial values of p, q, and e, and if I
also keep track of whether the total number of reflections has been
even or odd, I have enough information to reconstruct the net result
of all the reflection transforms, without doing any additional
calculations for the reflections themselves (other than to flip the
even-odd counter).  Here's how: With q, e, and the even-odd counter, I
can reconstruct a set of three orthonormal basis vectors -- the third
is (-1)^odd*(q cross e) -- and see how that basis set has been rotated
by all the reflections.  Meanwhile, from the change in p, I can
discover the net translation due to all the reflections.

Of course, nothing is ever quite that simple.  Before I found e for
the previous ray, I had already transformed it according to the
reflections of the ray before it.  Fortunately, that previous
transformation matrix is still available, so I can rotate e back to
the original coordinate system for p and q by its transpose, before I
update the transformation matrix for the next ray.  One residual
problem with all these transformations is that roundoff errors in the
transformation matrix seem to accumulate with alarming speed.  So far,
this hasn't been a major issue -- I think it will be manageable as
long as we don't track more than a hundred thousand or so rays in a
single group.


Organizational notes
--------------------

Read the *.h files for the details of the functions I wrote.  The
functions defined in tools.h constitute (I hope) a generic toolkit
useful for tracking rays through any mesh which can be decomposed into
tets.  The functions in hex.h are for dealing specifically with
hexahedral meshes.

Right now, I assume the mesh consists of a single ix by jx by kx block
of cells.  Some minor additions will be necessary to handle meshes
which consist of several such blocks stuck together; the main problem
is designing an interface to describe how the blocks fit together.

Currently, each of the six logical boundaries of the mesh can be
treated as an actual mesh boundary surface, a reflecting surface (each
face of the surface is a planar reflector), or a periodic boundary
(assuming the other two logical coordinates remain fixed).


Profiling and timing
--------------------

On an HP735 (laura), I can track through about 1.e5 cells/sec for the
5 tet decomposition.  If the machine were getting 35 Mflops (close to
its Livermore loops geometric mean), that would indicate 350 floating
point operations to get across each cell.

On my Linux/Pentium, the number is 4.e4 cells/sec on a nominally 7.5
Mflop machine, indicating 175 floating point ops per cell crossing.

On a DEC alpha (fajita), the rate is 2.6e5 cells/sec, for a machine
nominally rated at perhaps 85 Mflops, for 330 floating ops per cell.

For this test, crossing a hex cell required on average crossing 2
tets, or 8 multiplies and 4 adds.  The partial projection for four new
nodes of each cell costs 8 multiplies and 40 adds, and computing the s
value at each cell exit is another 1 reciprocal, 11 multiplies and 11
adds.  Thus, the minimum possible work the algorithm I'm using could
cross a cell in is 1 reciprocal, 27 multiplies, and 55 adds -- perhaps
a little under 100 floating point operations, maybe four times fewer
than the estimate of 350.  So I'm bogged down with a fair amount of
bookkeeping; a factor of two faster ray tracing code doesn't seem
beyond the realm of possibility.  A factor of ten improvement,
however, would certainly require a smarter algorithm.

The speed of the 5, 24f (face centered), and 24b (body centered)
tracking routines are in the following ratio:

   reg : 5 : 24f : 24b  =  0.15 : 1.00 : 1.3 : 1.3

The reg is the special case regular mesh code mentioned above.  The
ratios vary quite a bit from problem to problem, but those numbers are
typical.

Also of interest is the "typical" number of tets crossed in crossing
each hex cell.  I have numbers for the big_mesh and degen_test timing
routines in test.i.  Here "image" means the rays where parallel in a
2D array, while "random" means random rays:

     rect image  sphere random  sphere image
 5:     2.0           2.3            2.3
24f:    3.4           4.0            3.9
24b:    3.0           4.1            4.1

The practical difference between the 24f and 24b seems to be very
slight by any measure.  Probably that means I should prefer the 24b
algorithm, since that generalizes to an arbitrary mesh.  The 24f
decomposition is still my favorite, though.

I've profiled the code on a Linux/Pentium only.  In all cases, the
hex_face routine dominates (with hex24_face often second), using about
1/5 of the total time.  The tet_traverse and tri_intersect routines
are usually next.  Even for random rays, the time for the entry search
seems to be only perhaps 1/10 of the total cost; for image rays, the
entry search cost is negligible.  Almost certainly it makes no sense
to try to beat down the entry search cost further (for example, by
reversing every other row of a 2D image array).