Tools for Statistical Content Analysis
created at TU Dortmund University.
tosca is a framework for statistical methods in content analysis. We offer a pipeline for preprocessing, model text corpora using a link to the implemantation of Latent Dirichlet Allocation from the
lda package. Useful plot routines for both - pre- and post-modeled corpora - are given for the descriptive analysis of text corpora and topic models. Moreover, an implementation of Chang's intruder words and intruder topics is provided; as well as reasoned sampling of text ids to get effective sets of texts for human labeling/coding regarding accuracy of estimating Precision and Recall.
See examples how to use
tosca at the Vignette.
For a BibTeX entry please use
citation(package = "tosca").