Skip to content

Latest commit

 

History

History

forms

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Forms

Forms, i.e. written denotations of the linguistic sign (cf. GOLD's FormUnit), are stored in a FormTable in CLDF datasets (typically Wordlists).

Each form is stored as a separate row in this table. Some analyses, e.g. alignments, require segmented lexical forms. If these can be supplied, they should be added in a Segments column, by default as space-separated strings.

The CLDF Ontology provides some more properties which may be supplied in corresponding columns of the FormTable:

A value column may be used to supply the raw value as it can be found in the source - if this is different from Form. This is particularly useful for "retro-digitized" datasets, where the CLDF dataset is already the result of data clean-up.

As with any CLDF component,

  • comments and references to sources can be added via comment and source columns respectively,
  • additional data can be supplied in additional columns.

Example

Many examples for FormTable can be found in the datasets in the lexibank community.

The one for the Intercontinental Dictionary Series is described here: https://github.com/intercontinental-dictionary-series/ids/blob/v4.3/cldf/cldf-metadata.json#L59-L171 Datasets created using the lexibank workflow (implemented in the pylexibank package) derive the segmentation of a form using orthography profiles (see Moran and Cysouw 2018) and the name of the profile used for a particular form is kept in the custom (non-CLDF) profile column.

FormTable: forms.csv

Name/Property Datatype Cardinality Description
ID string singlevalued

A unique identifier for a row in a table.

To allow usage of identifiers as path components of URLs IDs must only contain alphanumeric characters, underscore and hyphen.

Language_ID string singlevalued A reference to a language (or variety) the form belongs to
References LanguageTable
Parameter_ID string unspecified A reference to the meaning denoted by the form
References ParameterTable
Form string singlevalued The written expression of the form. If possible the transcription system used for the written form should be described in CLDF metadata (e.g. via adding a common property dc:conformsTo to the column description using concept URLs of the GOLD Ontology (such as phonemicRep or phoneticRep) as values).
Segments list of string (separated by ) multivalued

A list of segments (aka a sound sequence) is understood as the strict segmental representation of a form unit of a language, which is usually given in phonetic transcription. Suprasegmental elements, like tone or accent, of sound sequences are usually represented in a sequential form, although they are usually co-articulated along with the segmental elements of a sound sequence. Alternatively, suprasegmental aspects could also be represented as part of the prosodic structure of a word form.

Comment string unspecified

A human-readable comment on a resource, providing additional context.

Source list of string (separated by ;) multivalued

List of source specifications, of the form <source_ID>[], e.g. http://glottolog.org/resource/reference/id/318814[34], or meier2015[3-12] where meier2015 is a citation key in the accompanying BibTeX file.