Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eaf controlled vocabulary #362

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

marianne-m
Copy link
Contributor

@marianne-m marianne-m commented Feb 24, 2022

  • check that all the annotations use labels defined in the controlled vocabulary
  • import the description of each label of the controlled vocabulary and store it somewhere
  • documentation
  • tests

@marianne-m marianne-m linked an issue Feb 24, 2022 that may be closed by this pull request
@marianne-m marianne-m changed the title check if value is in controlled vocabulary for newtiers Eaf controlled vocabulary Feb 24, 2022
@marianne-m marianne-m marked this pull request as ready for review March 2, 2022 15:02
@lucasgautheron
Copy link
Collaborator

lucasgautheron commented Mar 10, 2022

Few comments on this PR!

  • It's cool!
  • I was not convinced that the output belongs into metadata rather than annotations/<set>, but I think it does not matter so much!
  • I think the metadata file should contain information about the annotation set (annotation['set']) and the input annotation (annotation['raw_filename']). Maybe also the date of importation? (just like for the annotations' index).

What do you think?

Copy link
Collaborator

@lucasgautheron lucasgautheron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion: add the annotation set (since different sets may have different schemes for the eaf right) and the raw filename of the eaf (for debugging purposes mostly) to the controlled voc metadata.

(All this is up to you of course! Just my advice)

@William-N-Havard
Copy link

I am in favour of putting the controlled vocabulary metadata inside the metadata folder. I find it simpler to have everything stored in one place rather than in several places.

I agree it's important to save the name of the annotation set in the metadata file. But I'm not convinced we should also save the raw_filename of all the files using this annotation scheme as all the files of a given set should be using it anyway...

@marianne-m
Copy link
Contributor Author

For now the controlled vocabulary looks like this :

controlled_voc

It's a good idea to add the annotation set !
I can add a column annotation_sets, with a list of the sets in which the tier is used.

I'm not sure about the raw_filename (and importation_date), the table would be difficult to read...

@lucasgautheron
Copy link
Collaborator

I agree that ideally all the files should share the same vocabulary within a set, but for debugging purposes it is still useful (I think) to keep track of where the information was retrieved. That does not mean keeping a duplicate entry for each file (so that should not pose any readability problem). Also this should allow to spot mismatches between annotations within a set if they occur. If you can't see a situation where this may be useful then forget about it! It's just an idea:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add EAF controlled vocabulary to metadata
3 participants