Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various metadata-related fixes and improvements. #42

Closed
wants to merge 6 commits into from

Conversation

tzakharko
Copy link
Contributor

  • If a value list variable has no values (all missing), the json value list metadata
    is now serialized as an empty list [] for consistency
  • NPStructurePresence is no longer classified as a PerLanguageSummaries dataset
  • LID field was sometimes serialized as string, fixed
  • Missing glottocodes were sometimes serialized as explicit "NA" string, fixed
  • Multiple metadata fixes:
    • Added value list descriptions for PhonologicalFusion::FusionBinned6 and all variables that
      rely on it (such as GrammaticalMarkers::MarkerFusionBinned6)
    • Added value list descriptions for PositionalBehavior::MarkerBehaviorBinned4 and all variables
      that rely on it (such as GrammaticalMarkers::MarkerBehaviorBinned4)
    • Value list description for LocusOfMarking::LocusOfMarkingBinned5 was missing the value
      'FloatingorClitic', fixed (this also fixes all the variables that rely on it, such as
      GrammaticalMarkers::LocusOfMarkingBinned5)
    • Fixed value list description for GrammaticalMarkers::MarkerPositionBinned4
    • Fixed value list description for GrammaticalMarkers::MarkerPositionBinned5
    • Fixed data type of GrammaticalMarkers::MarkerExpressesMultipleCategories to be logical
    • Added value list descriptions for ClauseLinkage::IntuitiveClassification, value "?" is now
      recoded as NA (missing)
    • Added value list descriptions for multiple fields in ClauseLinkage where they were missing.
      The fields are: AnticipatoryArgumentMarking, CataphoraConstraints, CategoricalSymmetry,
      ClauseLayer, ClausePosition, Embedding, ExtractionConstraints, FinitenessSimplified,
      FocusMarkingInDependent, FocusMarking, IllocutionaryMarking, IllocutionaryScope,
      InterpropositionalSemanticRelation, ReferenceTrackingSystem, TenseMarking and
      TenseScope
    • Fixed the value list description for ClauseWordOrder::WordOrderAPLex
    • Fixed the value list description for SemanticClass::SemanticClassBinned
    • Removed invalid values from GrammaticalRelationsRaw::SelectedArguments::SemanticCondition
    • Fixed the value list descriptionb for Register::OriginContinent
    • Computed variables in GrammaticalMarkersPerLanguage now have correct value list metadata
    • Computed variables in LocusOfMarkingPerLanguage now have correct value list metadata
    • Computed variables MorphologyPerLanguage::HasAny* are now correctly annotated as logical
    • Computed variables NPStructurePerLanguage::NPHas* are now correctly annotated as logical
    • NPStructurePerLanguage::NPStructureID is now correctly annotated as integer
    • Computed variables in VerbInflection* summary datasets now have correct value list metadata

- If a value list variable has no values (all missing), the json value list metadata
  is now serialized as an empty list `[]` for consistency
- `NPStructurePresence` is no longer classified as a `PerLanguageSummaries` dataset
- `LID` field was sometimes serialized as string, fixed
- Missing glottocodes were sometimes serialized as explicit "NA" string, fixed
- Multiple metadata fixes:
  - Added value list descriptions for `PhonologicalFusion::FusionBinned6` and all variables that
    rely on it (such as `GrammaticalMarkers::MarkerFusionBinned6`)
  - Added value list descriptions for `PositionalBehavior::MarkerBehaviorBinned4` and all variables
    that rely on it (such as `GrammaticalMarkers::MarkerBehaviorBinned4`)
  - Value list description for `LocusOfMarking::LocusOfMarkingBinned5` was missing the value
    'FloatingorClitic', fixed (this also fixes all the variables that rely on it, such as
    `GrammaticalMarkers::LocusOfMarkingBinned5`)
  - Fixed value list description for `GrammaticalMarkers::MarkerPositionBinned4`
  - Fixed value list description for `GrammaticalMarkers::MarkerPositionBinned5`
  - Fixed data type of `GrammaticalMarkers::MarkerExpressesMultipleCategories` to be `logical`
  - Added value list descriptions for `ClauseLinkage::IntuitiveClassification`, value "?" is now
    recoded as NA (missing)
  - Added value list descriptions for multiple fields in `ClauseLinkage` where they were missing.
    The fields are: `AnticipatoryArgumentMarking`, `CataphoraConstraints`, `CategoricalSymmetry`,
    `ClauseLayer`, `ClausePosition`, `Embedding`, `ExtractionConstraints`, `FinitenessSimplified`,
    `FocusMarkingInDependent`, `FocusMarking`, `IllocutionaryMarking`, `IllocutionaryScope`,
    `InterpropositionalSemanticRelation`, `ReferenceTrackingSystem`, `TenseMarking` and
    `TenseScope`
  - Fixed the value list description for `ClauseWordOrder::WordOrderAPLex`
  - Fixed the value list description for `SemanticClass::SemanticClassBinned`
  - Removed invalid values from `GrammaticalRelationsRaw::SelectedArguments::SemanticCondition`
  - Fixed the value list descriptionb for `Register::OriginContinent`
  - Computed variables in `GrammaticalMarkersPerLanguage` now have correct value list metadata
  - Computed variables in `LocusOfMarkingPerLanguage` now have correct value list metadata
  - Computed variables `MorphologyPerLanguage::HasAny*` are now correctly annotated as logical
  - Computed variables `NPStructurePerLanguage::NPHas*` are now correctly annotated as logical
  - `NPStructurePerLanguage::NPStructureID` is now correctly annotated as integer
  - Computed variables in `VerbInflection*` summary datasets now have correct value list metadata
# -- end of MarkerPositionForNPRelated

MarkerPositionForClassification:
description: >-
GrammaticalMarkers::MarkerPosition value for Classification
kind: computed data (aggregation-scripts/VerbInflectionPerLanguage.R)
data: logical
data: value-list
values: []

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supposed to be a dictionary, not a list.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a couple more like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Fixed in c52d68a

Copy link

@xrotwang xrotwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tzakharko
Copy link
Contributor Author

Updated the glottocode for Ahom and Eastern Armenian in the database, will be reflected on the next data export. The only language with a missing glottocode is now Serbian Torlak, which I could not locate in Glottolog (there is a note mentioning this variety under standard Serbian though)

@xrotwang
Copy link

@tzakharko
Copy link
Contributor Author

@tzakharko see glottolog/glottolog#815 (comment)

Thanks, I forwarded this to the team, will update the glottocode once a decision is made.

@tzakharko
Copy link
Contributor Author

@xrotwang All the issues you have found so far should be fixed now. If you don't see any other low-hanging issues, I would like to publish a bugfix release based on the contents of this branch.

@xrotwang
Copy link

It would be nice, if the issue(s) with the bibliography could be addressed, too. That would save me a couple of lines cleaning it up.
Otherwise, yes, all issues addressed.

@tzakharko
Copy link
Contributor Author

It would be nice, if the issue(s) with the bibliography could be addressed, too. That would save me a couple of lines cleaning it up.

Yes, of course! Adopted your version of the bibliography in latest commit.

@xrotwang
Copy link

So the bibliography isn't pulled out of AUTOTYP but only exists as this BibTeX file anyway?

@tzakharko
Copy link
Contributor Author

It is pulled from the database, but that particular part of the database is currently on life support and is not actively updated. This is another part that will require an overhaul, as some sources are "hidden" in the comments or special fields of various data file and are not currently exported. It is a lot of legacy to deal with..

For now your curated file works great, and as we continue modernising the database structure we will put a new bibliogaphy-generating mechanism in place. Making use of the excellent Glottolog reference database sounds like an obvious choice here.

@tzakharko tzakharko closed this Feb 24, 2022
@tzakharko tzakharko deleted the fixes-1.0.1 branch February 24, 2022 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants