Skip to content
Specification and validator for BlobToolKit BlobDir format
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
example/FXWY01
schema
README.md
validate.py

README.md

BlobDir specification

1. Overview

BlobDir is the BlobToolKit file format, currently written by BlobTools2 and parsed by BlobTools2 and the BlobToolKit Viewer.

2. Directory structure

A BlobDir is a standard filesystem directory that may optionally be tared and gziped. It should be named to match the primary identifier for the dataset.

2.1 Minimal example

A minimal BlobDir must contain at least two JSON format files: meta.json containing dataset metadata and identifiers.json containing the primary identifiers for the constituent records.

DatasetID
+- meta.json
+- identifiers.json

2.2 Typical example

Each additional field in the dataset must have a corresponding JSON format file in the BlobDir. In a typical BlobDir these will contain information derived from the constituent sequences of a genome assembly (GC content, contig/scaffold/chromosome lengths and number of Ns), details of read-mapping, taxonomic inference based on sequence-similarity searches and BUSCO results.

DatasetID
+- meta.json
+- identifiers.json
+- gc.json
+- length.json
+- ncount.json
+- {LIBRARYNAME}_cov.json
+- {LIBRARYNAME}_read_cov.json
+- {TAXRULE}_{RANK}.json
+- {TAXRULE}_{RANK}_cindex.json
+- {TAXRULE}_{RANK}_positions.json
+- {TAXRULE}_{RANK}_score.json
+- {LINEAGE}_busco.json

2.3 Summary and descriptive files

A BlobDir may contain additional files to provide a human-readable summary and/or description of the dataset in TSV (summary.tsv) and MARKDOWN (description.md) formats.

DatasetID
+- meta.json
+- identifiers.json
...
+- summary.tsv
+- description.md

2.4 Required files

As noted in section 2.1, a valid BlobDir can be created with only two files, meta.json and identifiers.json. This allows a BlobDir to be generated iteratively, with new fields added as the data become available. For specific uses of a BlobDir, additional files may be required, for example the BlobToolKit Viewer requires at least a gc.json, a length.json and a {TAXRULE}_{RANK}.json file to produce meaningful plots.

3. Metadata

A meta.json file is required and must contain metadata describing the dataset and its constituent fields in JSON format.

3.1 File structure

The top level of the meta.json file is a JSON object with

X. Validator

X.1 Json schema validation

  • meta.json
    • convert meta to run schema validation on meta fields
  • field data validation
    • generate schema for array/multiarray
    • set range as max and min for values
    • set minItems and maxItems to get array length

X.2 Additional validation

  • check data files match fields
  • check length of value arrays (minItems, maxItems)
  • check taxids/taxa are valid?
You can’t perform that action at this time.