Skip to content

Functions to processing/cleaning wikipedia dump #32

@dselivanov

Description

@dselivanov

We need something like gensim's wikicorpus.py and make_wikicorpus.py for cleaning wikipedia markup in R.

This is really useful feature for future experiments.
Contributions are very welcome - believe it will be not too hard to implement something similar in R using efficient stringi library.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions