index.idl

[meta
  title:"The Barnes-Hut Approximation"
  description:"Efficient computation of N-body forces" /]

[header
  title:"The Barnes-Hut Approximation"
  subtitle:"Efficient computation of N-body forces"
  author:"Jeffrey Heer"
  authorLink:"http://jheer.org" /]

[var name:"charge" value:-30 /]
[var name:"step" value:0 /]
[var name:"accum" value:0 /]
[var name:"theta" value:0 /]
[var name:"layout" value:true /]
[var name:"estimate" value:false /]
[var name:"focus" value:false /]

[fixed]
  [BarnesHut
    size:step
    theta:theta
    charge:charge
    layout:`layout && !(estimate || focus)`
    estimate:`estimate || focus;`
    accumulate:accum
  /]
[/fixed]

Computers can serve as exciting tools for discovery, with which we can
model and explore complex phenomena. For example, to test theories about
the formation of the universe, we can perform _simulations_ to predict
how galaxies evolve. To do this, we could gather the estimated mass and
location of stars and then model their gravitational interactions over time.

Another avenue for discovery is to _visualize_ complex information to
reveal structure and patterns. Consider this network diagram, showing
connections between people in a social network. We can use such diagrams
to examine community groups and identify people who bridge between them.

Though they may seem quite different at first, these two examples share
a common need: they require computing forces that arise from pairwise
interactions among a set of points, often referred to as an _N-body problem_.
In the case of astronomical simulation, we seek to model the
gravitational forces among stars. In the case of network visualization,
we compute the layout using a similar physical simulation:
nodes in the network act as charged particles that repel each other, while
links act as springs that pull related nodes together.

_To get a sense of how this force-directed layout works, drag the nodes or
use the slider to adjust the force strength._ Negative values indicate
repulsive forces, while positive values indicate attractive forces.

[p]
  [em]Force Strength [/em]
  [Range min:-30 max:10 step:1 value:charge /]
  [Display value:charge /]
[/p]

A straightforward approach to computing N-body forces is to consider all
pairs of individual points and add up the contributions of each
interaction. This naïve scheme has _quadratic complexity_: as the number
of points [i]n[/i] increases, the running time grows proportionally
to [i]n[/i][sup]2[/sup], quickly leading to intractably long calculations.
How might we do better?

# The Barnes-Hut Approximation

To accelerate computation and make large-scale simulations possible, the
astronomers Josh Barnes and Piet Hut devised a clever scheme.
The key idea is to approximate long-range
forces by replacing a group of distant points with their center
of mass. In exchange for a small amount of error,
this scheme significantly speeds up calculation, with
complexity [i]n log n[/i] rather than [i]n[/i][sup]2[/sup].

Central to this approximation is a _spatial index_: a "map" of space
that helps us model groups of points as a single center of mass. In
two dimensions, we can use a [quadtree](https://en.wikipedia.org/wiki/Quadtree)
data structure, which recursively
subdivides square regions of space into four equal-sized quadrants. (In three dimensions, one can use an analogous [octree](https://en.wikipedia.org/wiki/Octree)  that divides a cubic volume into eight sub-cubes.)

The Barnes-Hut approximation involves three steps:

1. Construct the spatial index (e.g., quadtree)
2. Calculate centers of mass
3. Estimate forces

Let's explore each step in turn. We will assume we are computing _repulsive_
forces for the purposes of network layout. This setup is akin to modeling
anti-gravity or electric forces with similarly-charged particles. While we
will use the term "center of mass", this could readily
be replaced with "center of charge".

[em]As you read through, click the [/em]
[action onClick:`alert('👍 🎉');`]action links[/action]
[em] to update the diagram![/em]


// Quadtree Construction

## Step 1: Construct the Quadtree

We begin with [action onClick:`layout=false; step=0;`]a set of
two-dimensional points as input[/action].
When we [action onClick:`layout=false; step=1;`]insert the first point into
the quadtree[/action], it is added to the top-level root cell of the tree.

[action onClick:`layout=false; step=2;`]Next we insert another point[/action],
which requires expanding the tree by subdiving the space.
Upon [action onClick:`layout=false; step=Math.min(step+1, 77);`]each subsequent
insertion[/action], more fine-grained cells may be added until all points
reside in their own cell.

_Advance the slider to add each point and produce the full quadtree_.

[p]
  [em]Inserted Points [/em]
  [Range min:0 max:77 step:1 value:step /]
  [Display value:`step;` format:"d" /]
[/p]


// Accumulate

## Step 2: Calculate Centers of Mass

After quadtree construction, [action
  onClick:`layout=false; accum=(accum+1)%2;`
] we calculate centers of mass for each cell of the tree[/action].
The center of mass of a quadtree cell is simply the weighted
average of the centers of its four child cells.

We visit the leaf node cells first and then visit subsequent parent
cells, merging data as we pass upwards through the tree.
Once the traversal completes, each cell has been updated with the
position and strength of its center of mass.


// Force Estimation

## Step 3: Estimate N-Body Forces

Now we are ready to estimate forces!

[action onClick:`theta=0; estimate=true;`]To measure forces at a given
point, let's add a "probe" to our diagram[/action] ([probe/]). The purple
line extending from the probe indicates the direction and magnitude of
the total force at that location. (To promote visibility,
the purple line is three times longer than the actual
pixel distance the probe would be moved in a single timestep of
the force simulation.) The dotted lines extending to the probe
represent the force components exerted by individual points.

_Move the probe (click or
drag in the visualization) to explore the force field_.

Ignoring the quadtree, we can naïvely calculate forces by summing the
contributions of _all_ individual points. Of course, we would instead like
to use the quadtree to accelerate calculation and approximate long-range
forces. Rather than compute interactions among individual
points, [action onClick:`theta=1; estimate=true;`]we can compute interactions
with centers of mass, using smaller quadtree cells for nearer points and
larger cells for more distant points.[/action]

At this point we've skipped a critical detail: what constitutes
"long-range" versus "short-range" forces? We consider both
the _distance_ to the center of a quadtree cell and that cell's _width_.
If the ratio _width / distance_ falls below a chosen threshold
(a parameter named _theta_), we treat the quadtree cell as a
source of long-range forces and use its center of mass.
Otherwise, we will recursively visit the child cells in the quadtree.

When theta = 1, a quadtree cell's center of mass will be used — and its
internal points ignored — if the distance from the sample point to the
cell's center is greater than or equal to the cell's width.

_Adjust the theta parameter to view its effect on force estimation_. How does
the number of considered points change based on the probe location and theta?
How does the direction and magnitude of the total force vary with theta?

[p]
  [em]Theta [/em]
  [Range min:0 max:2 step:0.1 value:theta /]
  [Display value:theta /]
[/p]

We can now perform force estimation for each individual point, using the
Barnes-Hut approximation to limit the total number of comparisons!

# Performance Analysis

To assess the performance of the Barnes-Hut approximation, let's look at
both the running time and accuracy of force estimation. We will compare
naïve ([i]n[/i][sup]2[/sup]) calculation to different settings of the theta
parameter.

We will take measurements using different point sets,
ranging from 500 to 10,000 points. For each point count,
we average the results from 50 separate runs of force estimation,
each time using a different set of points placed at uniformly
random coordinates within a 900 x 500 pixel rectangle.

[TimePlot focus:focus theta:theta /]

The running time results confirm that the Barnes-Hut
approximation can significantly speed-up computation.
As expected, the
[Theta value:0 theta:theta focus:focus /]
exhibits a quadratic relationship, whereas increasing the theta parameter
leads to faster calculations. A low setting of
[Theta value:0.5 theta:theta focus:focus /]
does not fare better than the naïve
approach until processing about 6,000 points. Until that point,
the overhead of quadtree construction and center of mass
calculation outstrips any gains in force
estimation. For
[Theta value:1 theta:theta focus:focus /] and
[Theta value:1.5 theta:theta focus:focus /],
however, we see a
significant improvement in running time, with similar
performance for each.

To evaluate approximation error, we measure the average vector
distance between the results of the naïve scheme and Barnes-Hut.
In the context of a force-directed graph layout, this error
represents the difference (in pixels) between node positions after
applying the naïve and approximate methods.

[ErrorPlot focus:focus theta:theta /]

Looking at the error results, we first see that the average
error is relatively small: only ~5% of a single pixel in
difference! However, we should take care in interpreting these
results, as we use the _average_ error per point and
the _maximum_ error may be substantially higher. While
[Theta value:1 theta:theta focus:focus /] and
[Theta value:1.5 theta:theta focus:focus /] exhibit similar _running times_,
here we see notably higher _error rates_ for
[Theta value:1.5 theta:theta focus:focus /]
versus
[Theta value:1 theta:theta focus:focus /] and
[Theta value:0.5 theta:theta focus:focus /].

These results suggest that a good default value for theta
— with low running time _and_ low approximation error
— is around 1.0. Indeed, in practice it is common to see default
settings slightly below 1. In
visualization applications, where errors on the order of a few
pixels are not a problem, even higher theta values may be used
without issue.

# Conclusion

The Barnes-Hut approximation has had a major impact on both physical
simulation and network visualization, enabling n-body calculations
to scale to much larger data sets than naïve force calculation permits.

[action onClick:`step=77; estimate=false, layout=true;`]Returning to our initial
network diagram[/action], we can use Barnes-Hut to efficiently compute
repulsive forces at each timestep. For each animation frame, we perform the
approximation anew, creating a new quadtree, accumulating centers of mass,
and (approximately) estimating forces.

Want to learn more or apply Barnes-Hut in your own work?

* The popular [D3.js](https://d3js.org) library by Mike Bostock includes Barnes-Hut for performing force-directed layout as part of the [d3-force module](https://github.com/d3/d3-force). The examples in this article are based on the D3 implementation.
* For more details, including pseudo-code and performance analysis, Prof. James Demmel of UC Berkeley has [online lecture notes on the Barnes-Hut approximation](https://people.eecs.berkeley.edu/~demmel/cs267/lecture26/lecture26.html). These notes provided my first introduction to the technique!
* While the error of the Barnes-Hut approximation is acceptable in a variety of applications, sometimes greater precision is needed. In such cases, more complicated schemes such as the [Fast Multipole Method](https://en.wikipedia.org/wiki/Fast_multipole_method) can be used in lieu of Barnes-Hut.

This article was created using [Idyll](http://idyll-lang.org/), with
visualizations powered by [D3](https://d3js.org/)
and [Vega](https://vega.github.io/vega/).
The [source code](https://github.com/jheer/barnes-hut) is available on GitHub.