Building on D3 and Vega

Our initial proof-of-concept app is a user-friendly editor based on Vega, a declarative grammar for driving D3 visualizations. This gives us a readily editable format for describing an illustration's data, visual style, and a wide variety of editorial decisions and ornamental effects. The result of these edits can be saved as JSON and shared easily with a CMS or even a simple key/value store.

Data importers

Vega works by gathering all these choices into a spec or specification, which then drives D3 to create a layout and render it to SVG (or Canvas). Naturally, this includes the dataset(s) to be illustrated. The first data source we want to support is NEXson from the Open Tree docstore, but we should anticipate wanting to use NEXML and other common formats. We'll need a way to translate each of these formats into a uniform data model, a JS object (perhaps called phylotree?) with the basic structure required by D3's hierarchial layouts:

The main JS object represents the topology and elements of a tree in a very general sense. It has properties that contain the expected fodder for tree visualization with D3:
- phyloNodes is an array of tree nodes, connected via these properties:
  - a parent property pointing back to the parent node;
  - a children array for its own child nodes (or this can be empty); and
  - a depth property, reflecting its distance from the root (0-based, root node = 0).
- phyloEdges is an array of edges for these nodes, each having these properties:
  - a source property pointing to the parent node in treeNodes
  - a target property pointing to the child node
The layout properties (x, y, etc) of these nodes and edges are expressed on a continuum from 0.0 to 1.0, so they can be projected to any size and coordinate space (eg, a circular layout) in later steps.
These nodes and edges may contain any number of additional properties, which will become available to the user as inputs to style or layout. General properties or metadata should be added to the root object (TBD).

To take maximum advantage of Vega's dynamic loading and other features, these importers will ideally be implemented as transforms in the vg.data namespace, for example vg.data.nexson. The importer will use (or emulate) D3's cluster layout to generate thes basic properties. (We formerly used the tree layout, but cluster moves leaf nodes to the farthest margin, and this seems appropriate in all cases.)

This also lets us chain additional data transforms after the initial import, to incorporate supporting data from other files, groom the data for more complex layouts, etc.

Vega limitations and workarounds

Vega has certain quirks and biases that complicate what we're trying to do. It's easy to extend, but we might need to fork the vega code to finish the job. So far, these workarounds are fairly straightforward, but if they continue to accumulate, we might abandon Vega in favor of another solution.

Bias toward array data, vs. objects and hierarchies

Many of Vega's transforms and other features assume that a named dataset is an array. Attempting to use them with a JS object (as in our common phylotree model above) will fail.

We overcome this with a new pluck data transform that lets us easily pull a named property from an object, for example the phyloEdges array from our generic phylotree. If the plucked value is an array, we can then proceed with Vega in the usual way.

Bias toward multiple, simple datasets (versus rich, re-usable data)

In the tree examples for Vega, it's common to import nodes and edges as two datasets from the same source file. In our setup, this would lead to lots of duplicated preprocessing of tree structure. Another Vega idiom is to copy one dataset to another via the source definition, but this won't work for us either, since this requires a deep copy of the source dataset. Our phylotree is a complex network of objects that includes circular references, so this inevitably fails.

Fortunately, we avoid all of this by using the pluck transform above.

Lack of support for SVG transforms

We want the ability to freely scale and rotate elements of an illustration. Sadly, Vega exposes only the lowest common denominator of its two rendering backends, SVG and Canvas. As a result, transformations like these need to be done very early (too early) in the tree definition, which doesn't give us the flexibility to import in one step, then do complex manipulations later.

SVG provides a perfect tool for this with its transform attribute, which we can assign to group marks to easily position, rotate, or distort the marks within. We can slightly modify the Vega core to pass these transforms through to the final rendering. NOTE that these effects won't appear in a Canvas rendering, but Canvas is useless for our purposes anyhow, since it's a raster display with no print capability. Painful, since this change would probably be unwelcome in the upstream Vega project, but worth it for the expanded flexibility it provides.

Migration to Vega v2

This took quite a bit of work, which is recorded in two pull requests (#11 and #12) from the 'vegap-2-updates' branch. See the commit history and messages in these for more details.

I've also added two pages to the vega wiki that discuss transforms, other v2 internals, and the new JS toolchain (node, npm, browserify, etc) in more detail: