Skip to content
Customizable lists of stopwords in multiple languages
R
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
DATA
R
data
man
vignettes
.gitignore
DESCRIPTION
NAMESPACE
README.md

README.md

tidystopwords: R package for multilingual stopwords

Authors: Silvie Cinková*, Maciej Eder
License: GPL-3

An R package containing customizable lists of stopwords in multiple languages; it attempts to follow tidy data principles.

The idea behind this package is to give the user control over the stopword selection. The core generate_stoplist() function relies on multilingual_stopwords(), a large data frame derived from the current release of the Universal Dependencies Treebanks. We have included all languages whose corpora totalled above 10,000 tokens – large enough to cover all common closed-class words, such as prepositions, conjunctions, and auxiliary verbs. The data comes encoded in UTF-8.

Installation

Install the package directly from the GitHub repository:

library(devtools)
install_github("computationalstylistics/stopwoRds", build_vignettes = TRUE)
You can’t perform that action at this time.