Skip to content

SAuderset/MixteCoDB

Repository files navigation

Mixtecan Cognate Database (MixteCoDB)

This database contains lexical entries from Mixtecan languages, which are cognate coded and standardized to IPA. It is a work in progress and continuously updated. For use in your own research and citation, please refer to the most recent release archived in Zenodo. The database is available under the Creative Commons Attribution Share Alike 4.0 International license. Questions, comments, corrections, and the like are most welcome! Please open an issue for that.

MixteCoDB 1.0

The initial creation of the database, which corresponds to its first release (1.0) is explained in:

  • Auderset, Sandra, Simon J., Greenhill, Christian T., DiCanio, Eric W., Campbell. 2023. Subgrouping in a `dialect continuum': A Bayesian phylogenetic analysis of the Mixtecan language family. Journal of Language Evolution. 8(1). https://doi.org/10.1093/jole/lzad004
  • Auderset, Sandra, Simon J., Greenhill, Christian T., DiCanio, Eric W., Campbell. 2023. Supplementary Materials to "Subgrouping in a `dialect continuum': A Bayesian phylogenetic analysis of the Mixtecan language family." Available on Zenodo at DOI

MixeCoDB 1.1 - October 2023

Files and content

coding details

file explaining the conversion from orthography to IPA and other details regarding the standardization

metadata

Metadata of the language sample:

  • DOCULECT = unique identifier for each Mixtec variety (containing only ASCII characters)
  • VillageName = name of the village where the variety is spoken
  • Abbreviation / MapAbbr = abbreviations of the varieties used for more legible plotting
  • AudersetGroup / AudersetGroupSub / AudersetGroupLow = (sub-)classification according to Auderset et al. 2023
  • JosserandArea / JosserandAreaSub = dialect area classification according to Josserand 1983
  • Latitude / Longitude = coordinates of the village
  • Glottocode (if applicable)
  • ISO639P3code (if applicable)
  • JosserandCode (if applicable) = code used in Josserand 1983
  • SOURCE = cite key of the source(s) of the data

mixtecan_cognate_database

contains all the primary language data coded for cognacy:

  • ID = a unique, arbitrary identifier for each entry (digits)
  • CONCEPT = concept (standardized meaning) of the entry
  • GLOSS = gloss(es)/meaning(s) in Spanish
  • COGIDS = unique identifier for cognate sets; each morpheme is assigned its own identifier, separated by a space (digits)
  • DOCULECT = identifier for variety of the entry (see also: metadata)
  • TOKENS = entry in standardized IPA representation, tokenized
  • SOURCE_ORIGINAL = entry exactly as provided in the source
  • SOURCE_ORTHOGRAPHIC = standardized entry as provided in source (stripped of special characters like brackets, etc.)
  • TOKENS_SOURCE = entry in IPA representation as converted from SOURCE_ORTHOGRAPHIC without standardization
  • COMMENT = comments and notes
  • SOURCE = cite key of source document for the entry

protoforms

contains the reconstructed proto-Mixtec forms (where reconstruction was possible), as well as information about earlier reconstructions:

  • COGIDS_BROAD = unique identifier for cognate sets; each morpheme is assigned its own identifier, separated by a space (digits)
  • CONCEPT = concept (standardized meaning) of the entry
  • GLOSS = gloss(es)/meaning(s) in Spanish (for easier comparison with source material)
  • PMX = reconstructed proto-Mixtec form
  • COMMENT = further explanations
  • PMX_Josserand1983 = reconstruction as given by Josserand 1983 (if available) - cite key:josserand1983mixtec
  • JosserandID = number assigned to reconstructed form in Josserand 1983
  • PMX_Durr1987 = reconstruction as given by Dürr 1987 (if available)
  • DurrID = number assigned to reconstructed form in Dürr 1987 - cite key:durr1987preliminary
  • PMX_SwantonMendoza2021 = reconstruction as given by Swanton & Mendoza Ruíz 2021 (if available) - cite key:swanton2021observaciones
  • SwantonMendoza_ID = number assigned to reconstructed form in Swanton & Mendoza Ruíz 2021
  • PMX_Swanton2021 = reconstruction as given by Swanton 2021 (if available) - cite key:swanton2021un-acercamiento
  • SwantonID = number assigned to reconstructed form in Swanton 2021

sources.bib

bibtex file with all the sources of the data

wordlist_mesoamerica

a list of 209 concepts of basic vocabulary items tailored to the Mesoamerican cultural area, that was used to collect the lexical entries

CC BY-SA 4.0