Skip to content

ABTech/twiki-to-mediawiki-xml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

twiki-to-mediawiki-xml

This is a new attempt to convert a TWiki to a MediaWiki. This tool parses the TWiki data files and produces a MediaWiki XML dump. This import method is different from past open solutions, but it is the most native way to get data into MediaWiki, and it supports importing all possible features (like revisions).

This tool runs on Python 3.8+.

A lot of the TWiki parsing logic is reworked from Ryan Castillo's Twiki-to-Mediawiki (written in Perl).

Planning

A current (possible) plan for this tool is to develop several sub-tools:

  • TWiki medtadata parser: Given a TWiki data web directory, convert the data as-is to easily-machine-readable JSON (including revisions)
  • TWiki to Wikitext convertor: Convert the contents of pages from TWiki format to Wikitext format
  • TWiki to Wikitext conventions convertor: Take a parsed TWiki metadata data and convert in-place to MediaWiki conventions (like WikiWords to Capital_Snake_Case).
  • TWiki JSON to MediaWiki XML: Given converted JSON files, transform them into a MediaWiki XML dump.

Features

Current

  • Revisions
  • Preserve all timestamps
  • Page renaming (titles, but not yet in body text), including User: pages
  • Preserve page renaming/movement history (NOTE: All pages will appear to redirect directly to the final page name, not chains of redirects)
  • Convert parents pages to Subpages, preserving any depth of hierarchy
  • Capitalize usernames

Planned

Possible but Unlikely

  • Support multiple TWiki webs
  • Unit testing

Development

Install Python 3.8 or later and PIP. Clone the repo and change to it. It is recommended to use a venv (if you know how). Then:

pip install -e .

The twiki-to-mediawiki-xml command should now be available.

License

Copyright (C) 2022-present AB Tech, Carnegie Mellon University

Copyright (C) 2022 Perry Naseck (git@perrynaseck.com)

Copyright (C) 2011-2016 Ryan Castillo (some parts of files)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.