Skip to content
Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
R C++ Rebol
Branch: master
Clone or download
Latest commit 2fb5613 Dec 19, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R need proxy pkg in suggests Dec 17, 2019
data fixes suggested by CRAN team Jan 10, 2016
docs fixes #282 Dec 17, 2019
man RWMD docs, glove docs Dec 18, 2019
src - speedup compilation: no rcpp modules Dec 17, 2019
tests fix another test Dec 18, 2019
vignettes
.Rbuildignore fix bunch of warnings May 15, 2017
.gitignore
.travis.yml try to fix irlba installation issue Mar 17, 2018
DESCRIPTION
LICENSE parallel lda issues related to #195 Jul 4, 2017
NAMESPACE
NEWS.md
README.md
cran-comments.md get rid of many dependencies, use 'rsparse' pkg for glove and truncat… Dec 21, 2018
text2vec.Rproj initial work on parallel and distributed LDA May 9, 2017

README.md

Travis-CI Build Status codecov License Lifecycle: maturing

You've just discovered text2vec!

text2vec is an R package which provides an efficient framework with a concise API for text analysis and natural language processing (NLP).

Goals which we aimed to achieve as a result of development of text2vec:

  • Concise - expose as few functions as possible
  • Consistent - expose unified interfaces, no need to explore new interface for each task
  • Flexible - allow to easily solve complex tasks
  • Fast - maximize efficiency per single thread, transparently scale to multiple threads on multicore machines
  • Memory efficient - use streams and iterators, not keep data in RAM if possible

Performance

htop

This package is efficient because it is carefully written in C++, which also means that text2vec is memory friendly.

Emrassingly parallel tasks (such as vectorization) can benefit from fork-based parallelism on UNIX-like systems. They can achieve near-linear scalability with the number of available cores.

Finally, a streaming API means that users do not have to load all the data into RAM.

Contributing

The package has issue tracker on GitHub where I'm filing feature requests and notes for future work. Any ideas are appreciated.

Contributors are welcome. You can help by:

License

GPL (>= 2)

You can’t perform that action at this time.