GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and
privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We need something like gensim's wikicorpus.py and make_wikicorpus.py for cleaning wikipedia markup in R.
This is really useful feature for future experiments.
Contributions are very welcome - believe it will be not too hard to implement something similar in R using efficient stringi library.