MWDumpTemplateParser is a C++ wiki template parameter parser for a MediaWiki XML page dump.
It generates 3 files:
- Template parameter value file
- Template totals file
- Template start offset in the Template parameter value file
- expat - SAX XML parser. Must be compiled for 8-bit mode.
- pcre - Perl compatible regular expression engine version 1. Must be compiled for 8-bit mode. Will use jit if available.
/src - Source code
-std=c++11
To run all tests:
cd /src
MWDumpTemplateParser -t - - -
- bunzip2 -c enwiki-pages-articles.xml.bz2 | ./MWDumpTemplateParser -v - enwikiTemplateParams enwikiTemplateTotals&
- LC_ALL=C sort -n -k 1,1 -k 2,2 enwikiTemplateParams >enwikiTemplateParams.sorted
- ./MWDumpTemplateParser -offsets enwikiTemplateParams.sorted enwikiTemplateOffsets