Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
wikipedia dump downloader
Python
branch: master

Merge pull request #7 from dpinol/2015_dumps_new_index_format

Adapt to new format of dumps index page
latest commit 68cc6a43b5
@babilen authored
Failed to load latest commit information.
doc Added table for exit status
examples changed base_url
lib/wp_download Adapt to new format of dumps index page
pip wp-download v0.1.1
scripts Refactor code
test Refactor code
CHANGES wp-download v0.1.1
COPYING Initial commit
INSTALL Initial commit
MANIFEST.in Refactor code
README.markdown wp-download v0.1.1
setup.py Refactor code

README.markdown

wp-download

It is a cumbersome task to administer local Wikipedia databases, especially if you need access to multiple language versions of Wikipedia.

With wp-download you can automatically download the newest database dumps for all language edition you want:

$ wp-download --resume -v /path/to/wikipedia/dumps
Read configuration from: '/home/foobar/.wpdownloadrc'
Set timeout to 30s
Processing language: sw
Creating directory: /path/to/wikipedia/dumps/sw/20090821
Latest dump for (sw) is from Friday 21 August 2009
Skip: swwiki-20090821-redirect.sql.gz
Skip: swwiki-20090821-category.sql.gz
Resume: swwiki-20090821-pages-articles.xml.bz2
swwiki-20090821-pages-articles.xml.bz2 [****] 100% Time: 00:00:00   3.19 M/s
...
...

Installation

wp-download does not use setuptools but plain distutils, which gives you the following installation options.

setup.py

Using setup.py means that you will download the source distribution file and run:

$ tar xjf wp-download-0.1.tar.bz2
# python setup.py install --prefix=/usr/local

which will install wp-download within /usr/local. You might have to include /usr/local/bin in your $PATH.

$ export PATH=/usr/local/bin:$PATH

To meet wp-downloads requirements you also have to install progressbar (=2.2) with an installation tool of your choice.

pip

pip is an excellent tool for installing python software in a sane way. If you already use it, or plan to do so you might be happy to hear that wp-download provides a pip requirements file that can be used to install wp-download and it's requirements.

You should never install software into /usr which is used by the package management system of your distribution.

But fear not! virtualenv and pip make it easy to install software like this into an completely isolated environment.

# pip install -E wp-download -r requirements-0.1.1.txt

Documentation

Documentation for wp-download can be found here

Something went wrong with that request. Please try again.