Epic 7: metadata annotation tools #63

jonc125 · 2019-08-07T13:30:10Z

Annotating datasets
- Design a useful data format WebLab#29
- Add data annotation tool WebLab#94
- Add data description objects WebLab#95
- Specifying column units
Annotating fitting parameters for models, see also Epic 4 Epic 4: Develop fitting spec & implement it using FC+PINTS #60

jonc125 · 2019-08-08T14:15:11Z

Email train on annotating datasets:

From: Matthias König konigmatt@googlemail.com
Sent: 01 August 2019 14:52
To: Michael Clerx michael.clerx@cs.ox.ac.uk
Cc: Keating, Sarah s.keating@ucl.ac.uk
Subject: Re: Metadata for CSV

Hi Michael,

yes, my plan was to try the frictionlessdata via the respective python libraries and see what is working and what is not.
I think the most important step is to create some example use cases to see where problems are.

I definitely will use csv/tsv + json meta information (this solves all my possible use cases and is probably also the easeast thing to prototype).
As soon as I have some examples/experiences I will share them with you.

Best Matthias

On Thu, Aug 1, 2019 at 2:11 PM Michael Clerx michael.clerx@cs.ox.ac.uk wrote:
Hi again,
This list is very convincing: https://frictionlessdata.io/software/
Do you plan to experiment with putting your data into this format? Wondering if the best way forward would be to just try it out over the next few months and see if we find anything lacking?
Best wishes,
Michael

On 24/07/2019 10:41, Matthias König wrote:
Hi Sarah and Michael,
this sounds great.

Basically what I want is a way to annotate my columns in a CSV/TSV file
The CSV will have a single header row which defines the ids of the columns, e.g.
study_id | sex | age | height | time | caffeine | ...

I want to have a simple way to add meta-information to these columns which consists of

data types, human-readable name, description
additional restrictions like for instance possible choices on a categorial or >= 0.0 constraints
units (standard way of representing a unit which will make unit conversion much easier
classical annotations in the sense of a list of RDF triples consisting of a predictate (from MIRIAM qualifiers. mostly BQB_IS) and an object,
which is mostly an identifiers.org term (e.g., https://identifiers.org/CHEBI:000123), but could also be a general URI

My prefered solution is a combination of

CSV
JSON schema kind of document

I need something which I can track in git and is somehow human-readable/editable and is supported by wide range of tools/libraries.
JSON seems to be a good solution here.

I found the following very interesting
https://frictionlessdata.io/data-packages/
https://frictionlessdata.io/specs/table-schema/
which looks very closely to what I want with libraries available for python, R and javascript (which would cover most of my use-cases).

Similar approaches are things like JSON-LD, breaking things down to CSV + JSON (for rich description).
It would be great if we could find a common solution here.

Best Matthias

On Wed, Jul 24, 2019 at 10:30 AM Keating, Sarah s.keating@ucl.ac.uk wrote:
Hi Matthias

At COMBINE you seemed keen to add metadata to CSV files. Since WebLab/Michael Clerx (cc) are also wanting to do this, it would be great to coordinate any efforts so that we are at least consistent.

It doesn't look like COMBINE is going to land on a standard form for data any time soon so it would good to establish some consistency with annotation at least for CSV - which is commonly used 😊. That way we can then propose it to the COMBINE annotation list as a 'standard' way of annotating CSV.

Michael can add more technical stuff that he has looked into; I'm really the messenger at this point but will happily get involved.

Sarah

--
Matthias König, PhD.
Junior Group Leader LiSyM - Systems Medicine of the Liver
Humboldt Universität zu Berlin, Institute of Biology, Institute for Theoretical Biology
https://livermetabolism.com
konigmatt@googlemail.com
https://twitter.com/konigmatt
https://github.com/matthiaskoenig
Tel: +49 30 2093 98435

MichaelClerx · 2019-08-15T15:22:02Z

Eventually, all annotations should be based on community-agreed-upon ontologies, but the best strategy to achieve this might be:

Make it up as we go along
See if the system gets any uptake, leading to people with some stake in having a good ontology
Discuss
Map our ontology terms to existing ones

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic 7: metadata annotation tools #63

Epic 7: metadata annotation tools #63

jonc125 commented Aug 7, 2019 •

edited by MichaelClerx

Loading

jonc125 commented Aug 8, 2019

MichaelClerx commented Aug 15, 2019

Epic 7: metadata annotation tools #63

Epic 7: metadata annotation tools #63

Comments

jonc125 commented Aug 7, 2019 • edited by MichaelClerx Loading

jonc125 commented Aug 8, 2019

MichaelClerx commented Aug 15, 2019

jonc125 commented Aug 7, 2019 •

edited by MichaelClerx

Loading