Building on D3 and Vega
Our initial proof-of-concept app is a user-friendly editor based on Vega, a declarative grammar for driving D3 visualizations. This gives us a readily editable format for describing an illustration's data, visual style, and a wide variety of editorial decisions and ornamental effects. The result of these edits can be saved as JSON and shared easily with a CMS or even a simple key/value store.
Vega works by gathering all these choices into a spec or specification, which then drives D3 to create a layout and render it to SVG (or Canvas). Naturally, this includes the dataset(s) to be illustrated. The first data source we want to support is NEXson from the Open Tree docstore, but we should anticipate wanting to use NEXML and other common formats. We'll need a way to translate each of these formats into a uniform data model, a JS object (perhaps called phylotree
?) with the basic structure required by D3's hierarchial layouts:
-
The main JS object represents the topology and elements of a tree in a very general sense. It has properties that contain the expected fodder for tree visualization with D3:
-
phyloNodes
is an array of tree nodes, connected via these properties:- a
parent
property pointing back to the parent node; - a
children
array for its own child nodes (or this can be empty); and - a
depth
property, reflecting its distance from the root (0-based, root node = 0).
- a
-
phyloEdges
is an array of edges for these nodes, each having these properties:- a
source
property pointing to the parent node intreeNodes
- a
target
property pointing to the child node
- a
-
-
The layout properties (
x
,y
, etc) of these nodes and edges are expressed on a continuum from 0.0 to 1.0, so they can be projected to any size and coordinate space (eg, a circular layout) in later steps. -
These nodes and edges may contain any number of additional properties, which will become available to the user as inputs to style or layout. General properties or metadata should be added to the root object (TBD).
To take maximum advantage of Vega's dynamic loading and other features, these importers will ideally be implemented as transforms in the vg.data
namespace, for example vg.data.nexson
. The importer will use (or emulate) D3's cluster layout to generate thes basic properties. (We formerly used the tree layout, but cluster moves leaf nodes to the farthest margin, and this seems appropriate in all cases.)
This also lets us chain additional data transforms after the initial import, to incorporate supporting data from other files, groom the data for more complex layouts, etc.
Vega has certain quirks and biases that complicate what we're trying to do. It's easy to extend, but we might need to fork the vega code to finish the job. So far, these workarounds are fairly straightforward, but if they continue to accumulate, we might abandon Vega in favor of another solution.
Many of Vega's transforms and other features assume that a named dataset is an array. Attempting to use them with a JS object (as in our common phylotree
model above) will fail.
We overcome this with a new pluck
data transform that lets us easily pull a named property from an object, for example the phyloEdges array from our generic phylotree. If the plucked value is an array, we can then proceed with Vega in the usual way.
In the tree examples for Vega, it's common to import nodes and edges as two datasets from the same source file. In our setup, this would lead to lots of duplicated preprocessing of tree structure. Another Vega idiom is to copy one dataset to another via the source
definition, but this won't work for us either, since this requires a deep copy of the source dataset. Our phylotree
is a complex network of objects that includes circular references, so this inevitably fails.
Fortunately, we avoid all of this by using the pluck
transform above.
We want the ability to freely scale and rotate elements of an illustration. Sadly, Vega exposes only the lowest common denominator of its two rendering backends, SVG and Canvas. As a result, transformations like these need to be done very early (too early) in the tree definition, which doesn't give us the flexibility to import in one step, then do complex manipulations later.
SVG provides a perfect tool for this with its transform
attribute, which we can assign to group
marks to easily position, rotate, or distort the marks within. We can slightly modify the Vega core to pass these transforms through to the final rendering. NOTE that these effects won't appear in a Canvas rendering, but Canvas is useless for our purposes anyhow, since it's a raster display with no print capability. Painful, since this change would probably be unwelcome in the upstream Vega project, but worth it for the expanded flexibility it provides.
This took quite a bit of work, which is recorded in two pull requests (#11 and #12) from the 'vegap-2-updates' branch. See the commit history and messages in these for more details.
I've also added two pages to the vega
wiki that discuss transforms, other v2 internals, and the new JS toolchain (node, npm, browserify, etc) in more detail: