Skip to content

cldf-clts/clts

Repository files navigation

Cross-Linguistic Transcription Systems

Build Status

This repository provides the data underlying the "cross-linguistic transcription systems" project (CLTS [siː ɛl tʰiː ɛs]), which offers transcription systems and transcription data for various sources. Please see CONTRIBUTING.md for more information on how to contribute.

Master data

This repository contains files that are generated by running commands from the pyclts package, intended to help with curation. Thus, it is important to know where master (or authoritative) copies of certain data types live (i.e. where to edit data).

  • References: data/references.bib
  • Feature system: pkg/transcriptionsystems/features.json
  • BIPA transcription system: pkg/transcriptionsystems/bipa/
  • Index of source datasets: sources/index.tsv
  • Source datasets: sources/*/graphemes.tsv

CLDF Dataset

CLDF Metadata: cldf-metadata.json

Sources: data/references.bib

The Cross-Linguistic Transcription Systems (CLTS) project provides a catalog of speech sounds aggregated from (and linked to) phonetic notation systems from various sources.

property value
dc:conformsTo CLDF Generic
dc:identifier https://doi.org/10.5281/zenodo.3515744

CLTS is compiled from information about transcriptions and how these relate to sounds from many sources, such as phoneme inventory databases like PHOIBLE or relevant typological surveys.

property value
dc:extent 33

Columns

Name/Property Datatype Description
NAME string Primary key
DESCRIPTION string
REFS list of string (separated by , ) References data/references.bib::BibTeX-key
TYPE string
Valid choices:
td ts sc
CLTS groups transcription information into three categories: Transcription systems (ts), transcription data (td) and soundclass systems (sc).
URITEMPLATE string Several CLTS sources provide an online catalog of the graphemes they describe. If this is the case, the URI template specified in this column was used to derive the URL column in graphemes.csv.

The feature system employed by CLTS describes sounds by assigning values for certain features (constrained by sound type). The permissible values per (feature, sound type) are listed in this table.

property value
dc:extent 163

Columns

Name/Property Datatype Description
ID string Primary key
TYPE string
Valid choices:
consonant vowel tone
CLTS distinguishes the basic sound types consonant, vowel, tone, and marker. Features are defined for consonants, vowels, and tones.
FEATURE string Note that CLTS features are not necessarily binary.
VALUE string
property value
dc:extent 81895

Columns

Name/Property Datatype Description
PK integer Primary key
GRAPHEME string Grapheme used in a particular transcription to denote a sound
NAME string The ordered concatenation of feature values of the denoted sound
References data/sounds.tsv::NAME
BIPA string The grapheme for the denoted sound in the Broad IPA transcription system
DATASET string Links to the source of this grapheme
References sources/index.tsv::NAME
FREQUENCY integer
URL anyURI URL of the grapheme in its source online database
IMAGE string Image of the typeset grapheme.
SOUND string Audio recording of the sound being pronounced.
EXPLICIT string Indicates whether the mapping of grapheme to sound was done manually (explicitly, +) or whether it was inferred from the Grapheme.
FEATURES string Features of the sound as described in the local feature system of the source dataset
NOTE string
property value
dc:extent 8765

Columns

Name/Property Datatype Description
ID string
NAME string Ordered list of features + sound type
Primary key
FEATURES list of string (separated by ) Ordered list of feature values for the sound.
References data/features.tsv::ID
GRAPHEME string CLTS choses the BIPA grapheme as canonical representative of the graphemes mapped to a sound.
UNICODE list of string (separated by /) Unicode character names of the codepoints in GRAPHEME
GENERATED boolean Indicates whether the sound was inferred by our algorithmic procedure (which is active for all diphthongs, all cluster sounds, but also all sounds which we do not label explicitly) or whether no inference was needed, since the sound is explicitly defined.
TYPE string
Valid choices:
consonant vowel diphthong tone cluster
CLTS defines five sound types: consonant, vowel, tone, diphthong, and cluster. The latter two are always GENERATED.
NOTE string