Skip to content

Stored illustrations

Jim Allman edited this page Jan 19, 2017 · 7 revisions

While the core of an illustration is a single JSON file, a fully reproducible and standalone illustration will need to store its source data (for trees, annotations, style, etc) and perhaps binary assets as well. An illustration is therefore a "folderish" document type that contains nested files and subfolders (or equivalent containers). This notion is used in content management systems (CMS) like Plone and Nuxeo.

Illustrations in storage

This folderish structure can manifest differently in storage (as a git subfolder, or by using path-like keys in a key/value store) or in transit (as an obscured ZIP archive or multi-part file). Anything goes, as long as the internal structure is preserved and each sub-resource (file) can be referenced using a "relative path".

The current (Jan 2017) proposed structure looks something like this. A minimal illustration (with all resources linked via URL) might be simply:

my-illustration/
  main.json

...but even the introduction of a pasted Newick string would necessitate the creation of a second file, perhaps in an inputs subfolder:

my-illustration/
  main.json
  inputs/
    pasted-tree-1.tre

Once rendered in the browser, we might store the latest SVG output as well:

my-illustration/
  main.json
  main.svg
  inputs/
    pasted-tree-1.tre

As an illustration becomes more complex, incorporating multiple trees and annotations, the number of sub-resources increases and conventional paths start to seem like a good idea:

my-illustration/
  README.md
  main.json
  main.svg
  main.pdf
  inputs/
    full-bird-tree.tre
    comparison-trees.nex
    bootstrap-vals-3.csv
    support.csv
  transformed/
    full-bird-tree.to_nexson.json
  images/
    inst-logo.png
  fonts/
    helvetica-bold.ttf
  style/
    main.css
    nature-style.css

In this example, we've stored all input data locally, with source/provenance noted in main.json. We might also choose to save intermediate results of Vega transforms, shown here in transformed. Fonts and styles (normally drawn from shared resources) can also be included if want a truly standalone document for reproducibility.

The result is a document that can be easily unpacked and analyzed with command-line (text-oriented) tools, or simply re-opened and explored using the Tree Illustrator app. For the most reliable reproduction, the app itself can be loaded in the original version used for this illustration. Or we can use the latest version to take advantage of new features, rendering improvements, etc.

Illustrations in use

In the client-side editor, we need to manage this structure using a client-side cache, for a couple of reasons:

  • We refresh our SVG view using the complete Vega rendering pipeline. For best performance, this should be done using cached data, esp. taking advantage of intermediate products (post-transform) where possible.

  • The user should be able to construct a new illustration entirely in the browser, without the chicken-and-egg problem common in content management systems: When adding annotations, etc. in a new illustration (i.e. the user hasn't yet saved the document), there's nowhere on the server to hold this data.

There are a few storage APIs that are pretty widely supported in modern browsers (e.g. File, IndexedDB, ) Which is the most appropriate for our client-side cache? Ideally it would fulfill these requirements:

  • Generous storage capacity to hold large datasets. (Can we put a number to this?) Most local-storage APIs will allow up to 5MB per application.

  • Provision for storing binary as well as text data.

  • It does not need to preserve data across sessions, if we can save/load ZIP files instead. This seems preferable, since users can switch browsers, email the illustration to chosen collaborators, etc.

  • Organization of stored resources by "relative paths". Note that this might through nested containers or simply using unique strings ("images/mammal.png") in a key-value store.

In any case, the most important requirement is that the user can add/update data by "uploading" local files into the browser. The resulting dataset is recorded in the illustration's main JSON file, including user-submitted details about origin and provenance. If/when the user saves the illustration, any added or updated resources will be updated in storage.