BIOM-LD

Summary

This is a first hacky attempt towards a full mapping of the BIOM format to a Linked Data, and beyond.

Rationale

The Biological Observation Matrix (BIOM) format is a HDF5 and JSON based specification to represent biological observation tables. For example, it is used in the Earth Microbiome Project for storing environmental metagenomics data in a common an efficient format. A BIOM table can be converted to a Linked Data dataset (from HDF5+JSON to RDF+OWL with HTTP resolvable URIs) and obtain a Linked Data representation that can be directly plugged into the web of data.

Moving to a Linked Data representation has the following advantages, specially from the publication/interoperability point of view (but not from the efficient storage point of view):

Publishing our data in Linked Data means that other datasets can be linked to ours (i.e. our dataset becomes more "discoverable" over the web) or we can link our dataset to other datasets and integrate information easily.
Since we are using RDF, it is easy to merge our datasets with other datasets. This is specially interesting if common vocabularies like EnvO or NCBI taxonomy are used to represent row and column metadata, and/or SPARQL federated queries or SPARQL R are used to query the data.
Since the BIOM specification is represented as an OWL ontology, the specification, rather than being pure text, becomes computationally explicit: programs that consume BIOM data can be more easily written, reasoning can be used to check validity, specific validators (e.g. for metadata, value ranges, ... ) can be more easily written, and in general any programmatic endeavour becomes easier and more maintainable in the long term.

The mapping

A BIOM file is basically a sparse table with some metadata attached to it: in this mapping, it becomes and instance that is a member of any of the subclasses of biom:Table, for example biom:OTUTable. The information about the table (generated by, table type, etc.) is translated to triples whose subject is the table instance. The cell values are also translated to triples that are linked to the table instance. An example is included in the ontology.

Linked Data

Normal triple store for table metadata and rows and columns, SADI (BerkeleyDB) for cells, SHARE as client.
Linked Data Fragments as server (possibly decompose into normal triple store + BerkeleyDB as above) and client.
Normal setting: triple store, some special LD server + AJAX client a la LODestar.
Binary RDF (HDT).

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
data		data
doc		doc
lib		lib
ontology		ontology
result		result
sparql		sparql
src/eu/genomic/resources/biom2ld		src/eu/genomic/resources/biom2ld
test/eu/genomic/resources/biom2ld/HDF5		test/eu/genomic/resources/biom2ld/HDF5
.gitignore		.gitignore
BIOMhdf52rdf.jar		BIOMhdf52rdf.jar
BIOMhdf52rdf.sh		BIOMhdf52rdf.sh
README.md		README.md
VIRTUOSO_HOWTO.md		VIRTUOSO_HOWTO.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

doc

doc

lib

lib

ontology

ontology

result

result

sparql

sparql

src/eu/genomic/resources/biom2ld

src/eu/genomic/resources/biom2ld

test/eu/genomic/resources/biom2ld/HDF5

test/eu/genomic/resources/biom2ld/HDF5

.gitignore

.gitignore

BIOMhdf52rdf.jar

BIOMhdf52rdf.jar

BIOMhdf52rdf.sh

BIOMhdf52rdf.sh

README.md

README.md

VIRTUOSO_HOWTO.md

VIRTUOSO_HOWTO.md

Repository files navigation

BIOM-LD

Summary

Rationale

The mapping

Linked Data

About

Releases

Packages

Languages

mikel-egana-aranguren/biom-ld

Folders and files

Latest commit

History

Repository files navigation

BIOM-LD

Summary

Rationale

The mapping

Linked Data

About

Resources

Stars

Watchers

Forks

Languages