Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EAF controlled vocabulary to metadata #344

Open
William-N-Havard opened this issue Jan 25, 2022 · 2 comments · May be fixed by #362
Open

Add EAF controlled vocabulary to metadata #344

William-N-Havard opened this issue Jan 25, 2022 · 2 comments · May be fixed by #362
Assignees
Labels

Comments

@William-N-Havard
Copy link

William-N-Havard commented Jan 25, 2022

Is your feature request related to a problem? Please describe.
EAF tiers can be assigned a specific controlled vocabulary, which is defined by the creator of the EAF file, that the annotators will use during the annotation campaign. This ensures that the annotators do not add custom labels (either intentionally or by mistake).

First, when importing annotations belonging to a new type of tier (see issue #343) it would be good to ensure that all the annotations use labels defined in the controlled vocabulary (it's better to be safe than sorry!)

Second, it would be nice to also import the description of each label of the controlled vocabulary and store it somewhere. This description is stored directly in the EAF file. Storing this description would allow users of the data set to understand the meaning of the codes used during the annotation campaign.

<CONTROLLED_VOCABULARY CV_ID="vcm">
        <DESCRIPTION LANG_REF="und">Simplified subset of infant vocal maturity classes (distinguishing between variegated and non-variegated syllables)</DESCRIPTION>
        <CV_ENTRY_ML CVE_ID="cveid_e7300257-f12a-479f-90f0-c2fefbf99a26">
            <CVE_VALUE DESCRIPTION="Crying" LANG_REF="und">Y</CVE_VALUE>
        </CV_ENTRY_ML>
        <CV_ENTRY_ML CVE_ID="cveid_ae00bfde-d4bb-499e-8c63-81c4459f5b8a">
            <CVE_VALUE DESCRIPTION="Laughing" LANG_REF="und">L</CVE_VALUE>
        </CV_ENTRY_ML>
        <CV_ENTRY_ML CVE_ID="cveid_df01bf24-04f4-4cff-9bc4-ca92a0ca945f">
            <CVE_VALUE
                DESCRIPTION="Non-canonical non-variegated syllable(s)" LANG_REF="und">A</CVE_VALUE>
        </CV_ENTRY_ML>
        <CV_ENTRY_ML CVE_ID="cveid_8675a2cf-bb35-476c-a602-8b911eb2a845">
            <CVE_VALUE
                DESCRIPTION="Non-canonical variegated syllable(s)" LANG_REF="und">P</CVE_VALUE>
        </CV_ENTRY_ML>
        <CV_ENTRY_ML CVE_ID="cveid_f1ad7cdd-4916-4914-a59a-a33d0d7052cc">
            <CVE_VALUE DESCRIPTION="Canonical variegated syllable(s)" LANG_REF="und">V</CVE_VALUE>
        </CV_ENTRY_ML>
        <CV_ENTRY_ML CVE_ID="cveid_09a9bb98-31a9-4afd-9ed7-d4fc7af658a6">
            <CVE_VALUE
                DESCRIPTION="Canonical non-variegated syllable(s)" LANG_REF="und">W</CVE_VALUE>
        </CV_ENTRY_ML>
        <CV_ENTRY_ML CVE_ID="cveid_ee07af47-c822-4fb3-80d3-d842d80272b7">
            <CVE_VALUE DESCRIPTION="Uncertain" LANG_REF="und">U</CVE_VALUE>
        </CV_ENTRY_ML>
    </CONTROLLED_VOCABULARY>

Describe the solution you'd like
Check controlled vocabulary when importing EAF file and add the description of the controlled vocabulary labels to the metadata.

@marianne-m
Copy link
Contributor

For the second part, where do you think we should store the description ?

@William-N-Havard
Copy link
Author

Good question! I'm not sure where it would be best to store them. I see two options:

  • either in a CSV file in the annotation folder `project/annotations/EAF/vocabulary.csv'
  • or in a CSV file in the metadata folder '/project/metadata/vocabulary_EAF.csv

where EAF is the name of the directory containing the EAF files for which we want to store the controlled vocabularies (there can be more than one in a single EAF file). I usually prefer to have all the metadata stored in the same place, so I'd personally go for the second option.

@marianne-m marianne-m linked a pull request Feb 24, 2022 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants