Releases: digitallinguistics/tags2dlx
v0.4.0
v0.3.0
This is a breaking release which changes the method for determining utterances in a text. Utterances are now determined based on newlines rather than punctuation. This was motivated by the fact that some portions of major corpora (such as the Open American National Corpus) do not include punctuation.
The utteranceSeparators
option has been removed, and the punctuation
option has been updated so that the default list of punctuation now includes punctuation typically placed at the end of a sentence/utterance.
v0.2.1
v0.2.0
v0.1.1
v0.1.0
This is the initial release of the tags2dlx
library.
NEW: convert text to a valid DLx Text object
NEW: tokenize text into utterances
NEW: tokenize utterances into word tokens
NEW: parse words into token and tag
NEW: option: metadata
NEW: option: punctuation
NEW: option: tagName
NEW: option: tagSeparator
NEW: option: utteranceSeparators
DOCS: add project readme