Skip to content

A JavaScript library that converts scription text files to the Data Format for Digital Linguistics

License

Notifications You must be signed in to change notification settings

digitallinguistics/scription2dlx

Repository files navigation

scription2dlx

GitHub version downloads GitHub issues tests status license DOI GitHub stars

A JavaScript library that converts linguistic texts in scription format to the Data Format for Digital Linguistics (DaFoDiL). This library is useful for language researchers who want to work with their data in text formats that are simple to type and read (scription), but want to convert their data for use in other Digital Linguistics tools.

Quick Links

Contents

Basic Usage

  1. Install the library using npm or yarn:
npm i @digitallinguistics/scription2dlx
yarn add @digitallinguistics/scription2dlx

Or download the latest release from the releases page.

  1. Import the library into your project:

Node:

import convert from '@digitallinguistics/scription2dlx';

HTML:

<script src=scription2dlx.js type=module></script>
  1. The library exports a single function which accepts a string and returns a DaFoDiL Text Object.

data.txt

---
title: How the world began
---
waxdungu qasi
one day a man

script.js

const data = await fetch(`data.txt`);
const text = scription2dlx(data);

console.log(text.utterances.transcription); // "waxdungu qasi"

You may also pass an options hash as the second option. See the Options section below.

const text = scription2dlx(data, { /* options */ });

Notes

  • If your project does not support ES modules and/or the latest JavaScript syntax, you may need to transpile this library using tools like Babel, and possibly bundle the library using a JavaScript bundler.

  • The scription2dlx library does not perform validation on the text data. You should use another validator like AJV to validate your data against the DLx DaFoDiL format.

  • In order to keep this library small and dependency-free, scription2dlx does not automatically parse the YAML header of a scription document. Instead, the header string is returned as a header property on the text object. If you would like scription2dlx to parse the header, pass a YAML parser to the parser option when calling the scription2dlx function:

    import yaml from 'yaml'; // use your preferred YAML parsing library
    
    const text = scription2dlx(data, { parser: yaml.parse });

Options

Option Default Description
codes {} This option allows you to use custom backslash codes in your interlinear glosses. It should be a hash containing the scription code as a key (without a leading backslash), and the custom code as the value; ex: "txn": "t" will allow you to write \t instead of \txn for transcription lines.
emphasis true This option specifies whether emphasis should be passed through as-is (true, default), or stripped from the data (false).
errors "warn" This option allows you to specify how to handle errors. If set to "warn"" (the default), an utterance which throws an error is skipped and a warning is logged to the console. If set to "object", an error object with information is returned in the results array. If set to false, utterances with errors will be skipped silently. If set to true, utterances with errors will throw and stop further processing.
orthography "default" An abbreviation for the default orthography to use for transcriptions when one is not specified.
parser undefined A YAML parser to use in parsing the header of a scription document. If none is present, the header will be provided as a string in the header property of the returned object.
utteranceMetadata true Whether to parse the utterance metadata line (the first line when it begins with #). If set to true, a metadata property will be added to each utterance that has it.