A simple library for efficiently scanning the Wikipedia XML database dump.
The database can be downloaded here https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
For info on the database dumps see https://en.wikipedia.org/wiki/Wikipedia:Database_download