-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various metadata-related fixes and improvements. #42
Conversation
- If a value list variable has no values (all missing), the json value list metadata is now serialized as an empty list `[]` for consistency - `NPStructurePresence` is no longer classified as a `PerLanguageSummaries` dataset - `LID` field was sometimes serialized as string, fixed - Missing glottocodes were sometimes serialized as explicit "NA" string, fixed - Multiple metadata fixes: - Added value list descriptions for `PhonologicalFusion::FusionBinned6` and all variables that rely on it (such as `GrammaticalMarkers::MarkerFusionBinned6`) - Added value list descriptions for `PositionalBehavior::MarkerBehaviorBinned4` and all variables that rely on it (such as `GrammaticalMarkers::MarkerBehaviorBinned4`) - Value list description for `LocusOfMarking::LocusOfMarkingBinned5` was missing the value 'FloatingorClitic', fixed (this also fixes all the variables that rely on it, such as `GrammaticalMarkers::LocusOfMarkingBinned5`) - Fixed value list description for `GrammaticalMarkers::MarkerPositionBinned4` - Fixed value list description for `GrammaticalMarkers::MarkerPositionBinned5` - Fixed data type of `GrammaticalMarkers::MarkerExpressesMultipleCategories` to be `logical` - Added value list descriptions for `ClauseLinkage::IntuitiveClassification`, value "?" is now recoded as NA (missing) - Added value list descriptions for multiple fields in `ClauseLinkage` where they were missing. The fields are: `AnticipatoryArgumentMarking`, `CataphoraConstraints`, `CategoricalSymmetry`, `ClauseLayer`, `ClausePosition`, `Embedding`, `ExtractionConstraints`, `FinitenessSimplified`, `FocusMarkingInDependent`, `FocusMarking`, `IllocutionaryMarking`, `IllocutionaryScope`, `InterpropositionalSemanticRelation`, `ReferenceTrackingSystem`, `TenseMarking` and `TenseScope` - Fixed the value list description for `ClauseWordOrder::WordOrderAPLex` - Fixed the value list description for `SemanticClass::SemanticClassBinned` - Removed invalid values from `GrammaticalRelationsRaw::SelectedArguments::SemanticCondition` - Fixed the value list descriptionb for `Register::OriginContinent` - Computed variables in `GrammaticalMarkersPerLanguage` now have correct value list metadata - Computed variables in `LocusOfMarkingPerLanguage` now have correct value list metadata - Computed variables `MorphologyPerLanguage::HasAny*` are now correctly annotated as logical - Computed variables `NPStructurePerLanguage::NPHas*` are now correctly annotated as logical - `NPStructurePerLanguage::NPStructureID` is now correctly annotated as integer - Computed variables in `VerbInflection*` summary datasets now have correct value list metadata
# -- end of MarkerPositionForNPRelated | ||
|
||
MarkerPositionForClassification: | ||
description: >- | ||
GrammaticalMarkers::MarkerPosition value for Classification | ||
kind: computed data (aggregation-scripts/VerbInflectionPerLanguage.R) | ||
data: logical | ||
data: value-list | ||
values: [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is supposed to be a dictionary, not a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a couple more like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Fixed in c52d68a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahom has a glottocode: https://glottolog.org/resource/languoid/id/ahom1240
Updated the glottocode for Ahom and Eastern Armenian in the database, will be reflected on the next data export. The only language with a missing glottocode is now Serbian Torlak, which I could not locate in Glottolog (there is a note mentioning this variety under standard Serbian though) |
Thanks, I forwarded this to the team, will update the glottocode once a decision is made. |
@xrotwang All the issues you have found so far should be fixed now. If you don't see any other low-hanging issues, I would like to publish a bugfix release based on the contents of this branch. |
It would be nice, if the issue(s) with the bibliography could be addressed, too. That would save me a couple of lines cleaning it up. |
Yes, of course! Adopted your version of the bibliography in latest commit. |
So the bibliography isn't pulled out of AUTOTYP but only exists as this BibTeX file anyway? |
It is pulled from the database, but that particular part of the database is currently on life support and is not actively updated. This is another part that will require an overhaul, as some sources are "hidden" in the comments or special fields of various data file and are not currently exported. It is a lot of legacy to deal with.. For now your curated file works great, and as we continue modernising the database structure we will put a new bibliogaphy-generating mechanism in place. Making use of the excellent Glottolog reference database sounds like an obvious choice here. |
e2c3174
to
4481151
Compare
is now serialized as an empty list
[]
for consistencyNPStructurePresence
is no longer classified as aPerLanguageSummaries
datasetLID
field was sometimes serialized as string, fixedPhonologicalFusion::FusionBinned6
and all variables thatrely on it (such as
GrammaticalMarkers::MarkerFusionBinned6
)PositionalBehavior::MarkerBehaviorBinned4
and all variablesthat rely on it (such as
GrammaticalMarkers::MarkerBehaviorBinned4
)LocusOfMarking::LocusOfMarkingBinned5
was missing the value'FloatingorClitic', fixed (this also fixes all the variables that rely on it, such as
GrammaticalMarkers::LocusOfMarkingBinned5
)GrammaticalMarkers::MarkerPositionBinned4
GrammaticalMarkers::MarkerPositionBinned5
GrammaticalMarkers::MarkerExpressesMultipleCategories
to belogical
ClauseLinkage::IntuitiveClassification
, value "?" is nowrecoded as NA (missing)
ClauseLinkage
where they were missing.The fields are:
AnticipatoryArgumentMarking
,CataphoraConstraints
,CategoricalSymmetry
,ClauseLayer
,ClausePosition
,Embedding
,ExtractionConstraints
,FinitenessSimplified
,FocusMarkingInDependent
,FocusMarking
,IllocutionaryMarking
,IllocutionaryScope
,InterpropositionalSemanticRelation
,ReferenceTrackingSystem
,TenseMarking
andTenseScope
ClauseWordOrder::WordOrderAPLex
SemanticClass::SemanticClassBinned
GrammaticalRelationsRaw::SelectedArguments::SemanticCondition
Register::OriginContinent
GrammaticalMarkersPerLanguage
now have correct value list metadataLocusOfMarkingPerLanguage
now have correct value list metadataMorphologyPerLanguage::HasAny*
are now correctly annotated as logicalNPStructurePerLanguage::NPHas*
are now correctly annotated as logicalNPStructurePerLanguage::NPStructureID
is now correctly annotated as integerVerbInflection*
summary datasets now have correct value list metadata