Data Structure

BrenBarn edited this page Feb 21, 2012 · 8 revisions
Clone this wiki locally

A tentative stab at what the structure of a text might look like:

  • Text
    • Metadata (Title, Language, TransliterationScheme?, Speakers?, MediaLink?...)
    • Utterances
      • Utterance (Collection of Words)
        • Word
          • Form
          • Analysis? (collection of Morphemes)
            • Morpheme
            • Glosses, Gramcat?
      • FreeTranslation

New idea based on simple recursive structure:

  • LangUnit (or whatever we want to call this)
    • unitType --- could be "utterance", "word", "morpheme", etc.
    • targetLang --- data in the target language
    • metaLang --- gloss/translation in the metalanguage
    • optional additional metadata or tiers of info ("speaker" for utterances, "word class" for words, etc.)
    • parsed (a list of LangUnits representing subunits of this unit)

For example:

	{
		unitType: "utterance",
		targetLang: "yo q- a yo quiero tacos",
		metaLang: "I q- um I want tacos",
		parsed: [
			{
				unitType: "word",
				targetLang: "yo",
				metaLang: "1SG.NOM"
			},
			{
				unitType: "word",
				targetLang: "quiero",
				metaLang: "want.1SG",
				parsed: [
					{
						unitType: "morpheme",
						targetLang: "quier",
						metaLang: "want"
					},
					{
						unitType: "morpheme",
						targetLang: "o",
						metaLang: "1SG.PRES"
					}
				]
			},
			// etc.
		]
	}