Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
Wikipedia Tools --------------- WikiXMLSplit Wiki XML Split splits a compressed (bz2) or uncompressed MediaWiki XML dump file into a set of XML files per page organized into subdirectories by page id number. /preprocessing/wikixmlsplit/ WikiXML2HTML Wiki XML to HTML converts the MediaWiki formatted text contained inside the split XML files produced by WikiXMLSplit, along with the title information, into a set of HTML files (XHTML 1.0) organized into directories by page id number. /preprocessing/wikixml2html/ WikiHTML2Text Wiki HTML to Text converts the HTML formatted text (in XHTML 1.0) contained inside the files produced by WikiXML2HTML, along with the title tag information, into a set of plain-text files (UTF-8) organized into directories by page id number. /preprocessing/wikihtml2text/