Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Cross-Linguistic Transcription Systems

Build Status

This repository provides the data underlying the "cross-linguistic transcription systems" project (CLTS [siː ɛl tʰiː ɛs]), which offers transcription systems and transcription data for various sources. Please see for more information on how to contribute.

Master data

This repository contains files that are generated by running commands from the pyclts package, intended to help with curation. Thus, it is important to know where master (or authoritative) copies of certain data types live (i.e. where to edit data).

  • References: data/references.bib
  • Feature system: pkg/transcriptionsystems/features.json
  • BIPA transcription system: pkg/transcriptionsystems/bipa/
  • Index of source datasets: sources/index.tsv
  • Source datasets: sources/*/graphemes.tsv

CLDF Dataset

CLDF Metadata: cldf-metadata.json

Sources: data/references.bib

The Cross-Linguistic Transcription Systems (CLTS) project provides a catalog of speech sounds aggregated from (and linked to) phonetic notation systems from various sources.

property value
dc:conformsTo CLDF Generic
dc:bibliographicCitation Johann-Mattis List, Cormac Anderson, Tiago Tresoldi, & Robert Forkel. (2023). CLTS. Cross-Linguistic Transcription Systems. Zenodo.

Table sources/index.tsv

CLTS is compiled from information about transcriptions and how these relate to sounds from many sources, such as phoneme inventory databases like PHOIBLE or relevant typological surveys.

property value
dc:extent 33


Name/Property Datatype Description
NAME string Primary key
REFS list of string (separated by , ) References data/references.bib::BibTeX-key
TYPE string CLTS groups transcription information into three categories: Transcription systems (ts), transcription data (td) and soundclass systems (sc).
URITEMPLATE string Several CLTS sources provide an online catalog of the graphemes they describe. If this is the case, the URI template specified in this column was used to derive the URL column in graphemes.csv.

Table data/features.tsv

The feature system employed by CLTS describes sounds by assigning values for certain features (constrained by sound type). The permissible values per (feature, sound type) are listed in this table.

property value
dc:extent 161


Name/Property Datatype Description
ID string Primary key
TYPE string CLTS distinguishes the basic sound types consonant, vowel, tone, and marker. Features are defined for consonants, vowels, and tones.
FEATURE string Note that CLTS features are not necessarily binary.
VALUE string

Table data/graphemes.tsv

property value
dc:extent 80639


Name/Property Datatype Description
PK integer Primary key
GRAPHEME string Grapheme used in a particular transcription to denote a sound
NAME string The ordered concatenation of feature values of the denoted sound
References data/sounds.tsv::NAME
BIPA string The grapheme for the denoted sound in the Broad IPA transcription system
DATASET string Links to the source of this grapheme
References sources/index.tsv::NAME
URL anyURI URL of the grapheme in its source online database
IMAGE string Image of the typeset grapheme.
SOUND string Audio recording of the sound being pronounced.
EXPLICIT string Indicates whether the mapping of grapheme to sound was done manually (explicitly, +) or whether it was inferred from the Grapheme.
FEATURES string Features of the sound as described in the local feature system of the source dataset
NOTE string

Table data/sounds.tsv

property value
dc:extent 8684


Name/Property Datatype Description
ID string
NAME string Ordered list of features + sound type
Primary key
FEATURES list of string (separated by ) Ordered list of feature values for the sound.
References data/features.tsv::ID
GRAPHEME string CLTS choses the BIPA grapheme as canonical representative of the graphemes mapped to a sound.
UNICODE list of string (separated by /) Unicode character names of the codepoints in GRAPHEME
GENERATED boolean Indicates whether the sound was inferred by our algorithmic procedure (which is active for all diphthongs, all cluster sounds, but also all sounds which we do not label explicitly) or whether no inference was needed, since the sound is explicitly defined.
TYPE string CLTS defines five sound types: consonant, vowel, tone, diphthong, and cluster. The latter two are always GENERATED.
NOTE string