Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic 7: metadata annotation tools #63

Open
jonc125 opened this issue Aug 7, 2019 · 2 comments
Open

Epic 7: metadata annotation tools #63

jonc125 opened this issue Aug 7, 2019 · 2 comments
Assignees
Labels

Comments

@jonc125
Copy link
Contributor

jonc125 commented Aug 7, 2019

@jonc125
Copy link
Contributor Author

jonc125 commented Aug 8, 2019

Email train on annotating datasets:

From: Matthias König konigmatt@googlemail.com
Sent: 01 August 2019 14:52
To: Michael Clerx michael.clerx@cs.ox.ac.uk
Cc: Keating, Sarah s.keating@ucl.ac.uk
Subject: Re: Metadata for CSV

Hi Michael,

yes, my plan was to try the frictionlessdata via the respective python libraries and see what is working and what is not.
I think the most important step is to create some example use cases to see where problems are.

I definitely will use csv/tsv + json meta information (this solves all my possible use cases and is probably also the easeast thing to prototype).
As soon as I have some examples/experiences I will share them with you.

Best Matthias

On Thu, Aug 1, 2019 at 2:11 PM Michael Clerx michael.clerx@cs.ox.ac.uk wrote:
Hi again,
This list is very convincing: https://frictionlessdata.io/software/
Do you plan to experiment with putting your data into this format? Wondering if the best way forward would be to just try it out over the next few months and see if we find anything lacking?
Best wishes,
Michael

On 24/07/2019 10:41, Matthias König wrote:
Hi Sarah and Michael,
this sounds great.

Basically what I want is a way to annotate my columns in a CSV/TSV file
The CSV will have a single header row which defines the ids of the columns, e.g.
study_id | sex | age | height | time | caffeine | ...

I want to have a simple way to add meta-information to these columns which consists of

  • data types, human-readable name, description
  • additional restrictions like for instance possible choices on a categorial or >= 0.0 constraints
  • units (standard way of representing a unit which will make unit conversion much easier
  • classical annotations in the sense of a list of RDF triples consisting of a predictate (from MIRIAM qualifiers. mostly BQB_IS) and an object,
    which is mostly an identifiers.org term (e.g., https://identifiers.org/CHEBI:000123), but could also be a general URI

My prefered solution is a combination of

  • CSV
  • JSON schema kind of document

I need something which I can track in git and is somehow human-readable/editable and is supported by wide range of tools/libraries.
JSON seems to be a good solution here.

I found the following very interesting
https://frictionlessdata.io/data-packages/
https://frictionlessdata.io/specs/table-schema/
which looks very closely to what I want with libraries available for python, R and javascript (which would cover most of my use-cases).

Similar approaches are things like JSON-LD, breaking things down to CSV + JSON (for rich description).
It would be great if we could find a common solution here.

Best Matthias

On Wed, Jul 24, 2019 at 10:30 AM Keating, Sarah s.keating@ucl.ac.uk wrote:
Hi Matthias

At COMBINE you seemed keen to add metadata to CSV files. Since WebLab/Michael Clerx (cc) are also wanting to do this, it would be great to coordinate any efforts so that we are at least consistent.

It doesn't look like COMBINE is going to land on a standard form for data any time soon so it would good to establish some consistency with annotation at least for CSV - which is commonly used 😊. That way we can then propose it to the COMBINE annotation list as a 'standard' way of annotating CSV.

Michael can add more technical stuff that he has looked into; I'm really the messenger at this point but will happily get involved.

Sarah

--
Matthias König, PhD.
Junior Group Leader LiSyM - Systems Medicine of the Liver
Humboldt Universität zu Berlin, Institute of Biology, Institute for Theoretical Biology
https://livermetabolism.com
konigmatt@googlemail.com
https://twitter.com/konigmatt
https://github.com/matthiaskoenig
Tel: +49 30 2093 98435

--
Matthias König, PhD.
Junior Group Leader LiSyM - Systems Medicine of the Liver
Humboldt Universität zu Berlin, Institute of Biology, Institute for Theoretical Biology
https://livermetabolism.com
konigmatt@googlemail.com
https://twitter.com/konigmatt
https://github.com/matthiaskoenig
Tel: +49 30 2093 98435

@MichaelClerx
Copy link

Eventually, all annotations should be based on community-agreed-upon ontologies, but the best strategy to achieve this might be:

  1. Make it up as we go along
  2. See if the system gets any uptake, leading to people with some stake in having a good ontology
  3. Discuss
  4. Map our ontology terms to existing ones

See also #21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants