WikiTeam

We archive wikis, from Wikipedia to tiniest wikis

WikiTeam software is a set of tools for archiving wikis. They work on MediaWiki wikis, but we want to expand to other wiki engines. As of 2025, WikiTeam has preserved more than 600,000 wikis, several wikifarms, regular Wikipedia dumps and 34 TB of Wikimedia Commons images.

There are thousands of wikis in the Internet. Every day some of them are no longer publicly available and, due to lack of backups, lost forever. Millions of people download tons of media files (movies, music, books, etc) from the Internet, serving as a kind of distributed backup. Wikis, most of them under free licenses, disappear from time to time because nobody grabbed a copy of them. That is a shame that we would like to solve.

WikiTeam is the Archive Team (GitHub) subcommittee on wikis. It was founded and originally developed by Emilio J. Rodríguez-Posada, a Wikipedia veteran editor and amateur archivist. Many people have helped by sending suggestions, reporting bugs, writing documentation, providing help in the mailing list and making wiki backups. Thanks to all, especially to: Federico Leva, Alex Buie, Scott Boyd, Hydriz, Platonides, Ian McEwen, Mike Dupont, balr0g and PiRSquared17.

Quick guide

This is a very quick guide for the most used features of WikiTeam tools. For further information, read the tutorial and the rest of the documentation. You can also ask in the mailing list.

Requirements

Requires Python 2.7.

Confirm you satisfy the requirements:

pip install --upgrade -r requirements.txt

or, if you don't have enough permissions for the above,

pip install --user --upgrade -r requirements.txt

Download any wiki

To download any wiki, use one of the following options:

python dumpgenerator.py http://wiki.domain.org --xml --images (complete XML histories and images)

If the script can't find itself the API and/or index.php paths, then you can provide them:

python dumpgenerator.py --api=http://wiki.domain.org/w/api.php --xml --images

python dumpgenerator.py --api=http://wiki.domain.org/w/api.php --index=http://wiki.domain.org/w/index.php --xml --images

If you only want the XML histories, just use --xml. For only the images, just --images. For only the current version of every page, --xml --curonly.

You can resume an aborted download:

python dumpgenerator.py --api=http://wiki.domain.org/w/api.php --xml --images --resume --path=/path/to/incomplete-dump

See more options:

python dumpgenerator.py --help

Download Wikimedia dumps

To download Wikimedia XML dumps (Wikipedia, Wikibooks, Wikinews, etc) you can run:

python wikipediadownloader.py (download all projects)

See more options:

python wikipediadownloader.py --help

Download Wikimedia Commons images

There is a script for this, but we have uploaded the tarballs to Internet Archive, so it's more useful to reseed their torrents than to re-generate old ones with the script.

Developers

You can run tests easily by using the tox command. It is probably already present in your operating system, you would need version 1.6. If it is not, you can download it from pypi with: pip install tox.

Example usage:

$ tox
py27 runtests: commands[0] | nosetests --nocapture --nologcapture
Checking http://wiki.annotation.jp/api.php
Trying to parse かずさアノテーション - ソーシャル・ゲノム・アノテーション.jpg from API
Retrieving image filenames
.    Found 266 images
.
-------------------------------------------
Ran 1 test in 2.253s

OK
_________________ summary _________________
  py27: commands succeeded
  congratulations :)
$

This use of GitHub is not an endorsement

This project is currently hosted by GitHub for legacy reasons. GitHub is not recommended as it's a service running on proprietary software and does not respect copyleft.

Free software needs free tools: support the campaign Give up GitHub from the Software Freedom Conservancy.

(This section is released under CC-0.)

Name	Name	Last commit message	Last commit date
Latest commit emijrp shoutwiki list incuding deleted wikis Mar 23, 2025 9f7c64b · Mar 23, 2025 History 1,142 Commits
batchdownload	batchdownload	List of wikis to archive, from not-archived.py	May 19, 2018
docs	docs	added Classic theme to docs	Jul 30, 2016
listsofwikis	listsofwikis	shoutwiki list incuding deleted wikis	Mar 23, 2025
research/paper-wikiteam-2014	research/paper-wikiteam-2014	Spanish version of paper	Jul 27, 2016
testing	testing	Cleanup of link rot	Aug 28, 2020
wikiapiary	wikiapiary	Update wikiapiary-update-ia-params.py	Jun 11, 2019
wikimediacommons	wikimediacommons	comments and https	Aug 3, 2016
wikiteam	wikiteam	docs: Fix a few typos	Dec 25, 2021
.gitattributes	.gitattributes	Ignore .org as well.	Apr 7, 2016
.gitignore	.gitignore	Add .DS_Store to .gitignore	Apr 12, 2022
.travis.yml	.travis.yml	disabling travis notifications	Jun 30, 2017
LICENSE	LICENSE	renaming gpl.txt to LICENSE	Jun 25, 2014
README.md	README.md	2025	Mar 9, 2025
dumpgenerator.py	dumpgenerator.py	Fix wrong urllib module call	Jun 17, 2023
gui.py	gui.py	Issue 85: more cross-platform shebang on all scripts	Feb 26, 2014
launcher.py	launcher.py	launcher.py: Avoid shell=True to consume half as many processes	Feb 24, 2020
not-archived.py	not-archived.py	print	Jan 30, 2016
requirements.txt	requirements.txt	Implement continuation for --xmlrevisions with prop=revisions in MW 1.19	Feb 11, 2020
tox.ini	tox.ini	Add tox env for flake8 linter	Nov 16, 2014
uploader.py	uploader.py	Update uploader.py	Feb 18, 2023
wikiadownloader.py	wikiadownloader.py	updated wikiadownloader.py to work with new dumps	Feb 10, 2015
wikipediadownloader.py	wikipediadownloader.py	wikiadownloader: Autopep8fied	Oct 2, 2014
wikispaces.py	wikispaces.py	quotes issues in titles	May 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WikiTeam

We archive wikis, from Wikipedia to tiniest wikis

Quick guide

Requirements

Download any wiki

Download Wikimedia dumps

Download Wikimedia Commons images

Developers

This use of GitHub is not an endorsement

About

Releases

Packages

Contributors 31

Languages

License

WikiTeam/wikiteam

Folders and files

Latest commit

History

Repository files navigation

WikiTeam

We archive wikis, from Wikipedia to tiniest wikis

Quick guide

Requirements

Download any wiki

Download Wikimedia dumps

Download Wikimedia Commons images

Developers

This use of GitHub is not an endorsement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 31

Languages

Packages