Skip to content
Specification and validator for BlobToolKit BlobDir format
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

BlobDir specification

1. Overview

BlobDir is the BlobToolKit file format, currently written by BlobTools2 and parsed by BlobTools2 and the BlobToolKit Viewer.

2. Directory structure

A BlobDir is a standard filesystem directory that may optionally be tared and gziped. It should be named to match the primary identifier for the dataset.

2.1 Minimal example

A minimal BlobDir must contain at least two JSON format files: meta.json containing dataset metadata and identifiers.json containing the primary identifiers for the constituent records.

+- meta.json
+- identifiers.json

2.2 Typical example

Each additional field in the dataset must have a corresponding JSON format file in the BlobDir. In a typical BlobDir these will contain information derived from the constituent sequences of a genome assembly (GC content, contig/scaffold/chromosome lengths and number of Ns), details of read-mapping, taxonomic inference based on sequence-similarity searches and BUSCO results.

+- meta.json
+- identifiers.json
+- gc.json
+- length.json
+- ncount.json
+- {LIBRARYNAME}_cov.json
+- {LIBRARYNAME}_read_cov.json
+- {TAXRULE}_{RANK}.json
+- {TAXRULE}_{RANK}_cindex.json
+- {TAXRULE}_{RANK}_positions.json
+- {TAXRULE}_{RANK}_score.json
+- {LINEAGE}_busco.json

2.3 Summary and descriptive files

A BlobDir may contain additional files to provide a human-readable summary and/or description of the dataset in TSV (summary.tsv) and MARKDOWN ( formats.

+- meta.json
+- identifiers.json
+- summary.tsv

2.4 Required files

As noted in section 2.1, a valid BlobDir can be created with only two files, meta.json and identifiers.json. This allows a BlobDir to be generated iteratively, with new fields added as the data become available. For specific uses of a BlobDir, additional files may be required, for example the BlobToolKit Viewer requires at least a gc.json, a length.json and a {TAXRULE}_{RANK}.json file to produce meaningful plots.

3. Metadata

A meta.json file is required and must contain metadata describing the dataset and its constituent fields in JSON format.

3.1 File structure

The top level of the meta.json file is a JSON object with

X. Validator

X.1 Json schema validation

  • meta.json
    • convert meta to run schema validation on meta fields
  • field data validation
    • generate schema for array/multiarray
    • set range as max and min for values
    • set minItems and maxItems to get array length

X.2 Additional validation

  • check data files match fields
  • check length of value arrays (minItems, maxItems)
  • check taxids/taxa are valid?
You can’t perform that action at this time.