We need something like gensim's wikicorpus.py and make_wikicorpus.py for cleaning wikipedia markup in R.
This is really useful feature for future experiments.
Contributions are very welcome - believe it will be not too hard to implement something similar in R using efficient stringi library.
We need something like gensim's wikicorpus.py and make_wikicorpus.py for cleaning wikipedia markup in
R.This is really useful feature for future experiments.
Contributions are very welcome - believe it will be not too hard to implement something similar in
Rusing efficientstringilibrary.