Skip to content

DevGuide ExportFormat

Oliver Kennedy edited this page Jan 19, 2022 · 3 revisions

Vizier uses two distinct data representations: Its standard catalog model, with the entities defined in info.vizierdb.catalog._ and stored via JDBC, and a more portable export format, with entities defined in info.vizierdb.export._. This document details the latter. If you aren't already familiar with the catalog format, read this first.

Export Version 1

Identifiers (Project-local vs Export-local)

The following text makes a distinction between project-local identifiers (identifiers of objects encoded in the catalog format) and export-local identifiers (identifiers of objects in the exported file).

Export identifiers are entirely local to the export file. The identifier can be any unique ASCII-encoded strings. When JSON encoded, identifiers MUST be strings. Vizier's existing exporter re-uses the project-local identifiers for entities. All instances are assigned to fresh identifiers on import.

Terminology: Cell, Module, and Command

Export format version 1 is intended to be backwards compatible with the Python implementation of Vizier. The python version uses a slightly different catalog model, and so some of the terminology is a bit different. Most notably:

  • What Vizier-Python and the export format call a Module is what Vizier-Scala calls a Cell
  • What Vizier-Python and the export format call a Command is what Vizier-Scala calls a Module

The .vizier file format

A .vizier file is a tar-format archive compressed with gzip compression (commonly referred to as a tarball). tar -zxvf my_export.vizier should open the file. The contents of this archive are as follows:

  • version.txt: A text file containing exactly the text 1.
  • fs: A directory containing FILE-type artifacts. Each file's name is the export-local artifact identifier.
  • project.json: A text file containing a single json object with the schema described below.

The project.json schema

The project.json file contains a single object with fields as follows:

  • properties: Array of Property Objects with schema:
    • key: The identifier of a property
    • value: The value of the identified property
  • defaultBranch: The export-local identifier of the Branch object that is the active.
  • files: Array of FileSummary Objects with schema:
    • id: The export-local identifier of a file. This corresponds to a file in the fs directory of the tarball.
    • name: The human-readable name of the file
    • mimetype: The MIME type of the file
  • modules: Dictionary of Module Objects. Note (as per the note above) that this is what Vizier-Scala calls a Cell. The key is the export-local identifier of the Module ("Cell" in vizier-scala). Module objects have schema:
    • id: The export-local identifier of the Module ("Cell" in vizier-scala)
    • state: One of the following integers (Typically 4).
      • 0: PENDING
      • 1: RUNNING
      • 2: CANCELLED
      • 3: ERROR
      • 4: DONE
      • 5: FROZEN
    • command: A Command object with the following schema:
      • id: The export-local identifier of the Command ("Module" in vizier-scala). Multiple Modules ("Cells" in vizier-scala) may have the same Command ("Module" in vizier-scala). If multiple Command ("Module" in vizier-scala) objects appear with the same id, the remaining fields MUST be identical as well.
      • packageId: The name of the package.
      • commandId: The name of the command.
      • arguments: An array of Arguments Objects that encode the module arguments. *Arguments objects have schema:
        • id: The identifier of a command argument
        • value: The value of a command argument
      • revisionOfId: The export-local identifier of the Command ("Module" in vizier-scala) that this command was derived from in a prior revision of a workflow.
      • properties: Ignored by the importer.
    • text: A human-readable description of the command (For backwards compatibility. The current implementation of Vizier ignores this field)
    • timestamps: An Timestamps object with the following schema:
      • createdAt: An ISO-8601 formatted datetime string indicating the time the Module ("Cell" in vizier-scala) was originally created.
      • startedAt: An ISO-8601 formatted datetime string indicating the time the Module ("Cell" in vizier-scala) started execution or null if it has not been started yet.
      • finishedAt: An ISO-8601 formatted datetime string indicating the time the Module ("Cell" in vizier-scala) finished execution or null if it has not been started yet or if it is still running.
      • lastModifiedAt: An ISO-8601 formatted datetime string indicating the last time the cell was modified.
  • branches: Array of Branch Objects with the following schema:
    • id: The export-local identifier for the Branch object
    • createdAt: An ISO-8601 formatted datetime string indicating the time the branch was originally created.
    • lastModifiedAt: An ISO-8601 formatted datetime string indicating the last time the branch was modified.
    • sourceBranch: An optional export-local identifier for the Branch object from which this branch was derived. If defined, sourceWorkflow must also be defined.
    • sourceWorkflow: An optional export-local identifier for the Workflow object from which this branch's head was derived. If defined, sourceBranch must also be defined, and sourceWorkflow must be a workflow in the branch identified by sourceBranch
    • isDefault: True if this branch is the project's defaultBranch
    • properties: Array of Property Objects with schema:
      • key: The identifier of a property
      • value: The value of the identified property
    • workflows: Array of Workflow objects with schema:
      • id: The export-local identifier for this workflow.
      • createdAt: An ISO-8601 formatted datetime string indicating the time the workflow was originally created.
      • action: A string identifying the action that created the workflow. Must be one of the strings:
        • create: This is the first workflow of a branch.
        • append: The workflow was derived by appending a module to a prior workflow.
        • delete: The workflow was derived by deleting a module from a prior workflow
        • insert: The workflow was derived by inserting a module at a specified position in a prior workflow.
        • update: The workflow was derived by modifying a module from a prior workflow.
        • freeze: The workflow was derived by freezing one or more modules from a prior workflow.
      • packageId: If action is append, insert, or update, the package name of the command inserted.
      • commandId: If action is append, insert, or update, the command name of the command inserted.
      • actionModule: If action is append, insert, or update, the export-local module identifier of the affected module in the new workflow. If action is delete, the export-local identifier of the affected module in the prior workflow.
      • modules: Array of export-local Module ("Cell" in vizier-scala) identifiers. Each Module must be defined at $.modules.{identifier} in this file.
  • createdAt: An ISO-8601 formatted datetime string indicating the time the project was originally created.
  • lastModifiedAt: An ISO-8601 formatted datetime string indicating the last time the project was modified.