Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USE CASE: Small controlled taxonomies provided as tablar data #12

Open
nichtich opened this issue Apr 29, 2019 · 2 comments
Open

USE CASE: Small controlled taxonomies provided as tablar data #12

nichtich opened this issue Apr 29, 2019 · 2 comments
Labels

Comments

@nichtich
Copy link

nichtich commented Apr 29, 2019

Creator: Jakob Voß

Problem statement

We collect simple controlled vocabularies (term lists, classification schemes etc.) from users such as libraries and museums. Vocabularies can be (mono-)hierarchic so let's call them taxonomies. Users often manage their taxonomies without a dedicated vocabulary management tool. The most popular data format is a spreadsheet. For simplicity, each term/concept/class consists of an identifier and a label, together with hierarchy information. We came up with two CSV formats:

hierarchy information via level:

level,id,label
0,A,beeings
1,A.1,animals
2,A.1.1,mammals
2,A.1.2,insects
1,A.2,humans

hierarchy information via parents:

id,label,parent
A,beeings,
A.1,animals,A
A.1.1,mammals,A.1
A.1.2,insects,A.1
A.2,humans,A

The question is how to communicate constraints of these formats.

Stakeholders

GLAM institutions that want to share simple taxonomies

Requirements

  • every concept must have a unique id
  • ids must match a regular expression (optional constraint)
  • every concept must have a label (with optional uniqueness constraint)
  • natural language of labels must be the same for all concepts
  • ids of child concepts must start with the id of their parent concept (optional constraint)
  • the level/parent information must form (mono)hierarchy
    • every parent must be the id of another concept
    • circles are not allowed
    • ...
  • ids should be sorted
  • ...
@kcoyle
Copy link
Collaborator

kcoyle commented May 22, 2019

@nichtich I'm trying to make these into AP requirements - and some seem to be requirements on the vocabularies themselves. So there's a need to translate these into what an AP can do to support the taxonomies. Much of what is here looks very much like SKOS, of course, although it is expressed in csv rather than RDF. It looks to me that:

  1. "concept" here is an entity using our current terminology
  2. a concept would have statements: level', labelandid(I'm assuming thatparent/child` would be derived from levels)
  3. it needs to be possible to validate language tags across the entire profile
  4. the ids may need to conform to a pattern that is determined by levels

I'm not sure what to say about "ids should be sorted" since that seems to be beyond the capacity of a profile itself as long as the sort element is identified. Also, it isn't clear to me what would be used for sorting: labels? levels?

In any case, does this capture your requirements?

@nichtich
Copy link
Author

I moved some requirements to #21. In this use case a CSV table row corresponds to an entity with statements id, label, and either level (or parent). You wrote:

it needs to be possible to validate language tags across the entire profile

The data does not include language tags but natural language strings that should all be in the same language. The profile could specify a language by its language tag but validation is not possible automatically.

the ids may need to conform to a pattern that is determined by levels

yes.

I'm not sure what to say about "ids should be sorted"

If the data contains column level, the CSV rows must be sorted the same way as the hierarchy tree would be shown (left-to-right depth-first tree traversal). A slightly simplified constraint would be: every parent entity must be given before its child entities.

@tombaker tombaker added requirements As derived from use cases and removed requirements As derived from use cases labels Dec 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants