Extracts unformatted body text from MediaWiki wikitext
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
extras
mwtextextractor
.travis.yml
README.rst
pytest.ini
setup.cfg
setup.py
tests.py

README.rst

mwtextextractor

https://travis-ci.org/danmichaelo/mwtextextractor.png?branch=master https://coveralls.io/repos/danmichaelo/mwtextextractor/badge.png

mwtextextractor extracts simple body text from MediaWiki wikitext by stripping off templates, html tags, tables, headers, etc. The extracted text can be used for word counting.

Example:

from mwtextextractor import get_body_text
print get_body_text('Lorem {{ipsum}} dolor')