Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

pyallris - an ALLRIS-Scraper in python

This scraper is work in progress and builds on the information found in this wiki document: ScrapingAllrisHowto

It's based on the XML output which needs to be enabled for it to work. Unfortunately not everything is accessible via the XML output.

Installation / Requirements

The best way to make it work is to install virtualenv and create a virtual environment for the scraper to run in:

mkdir scraper
cd scraper
virtualenv .
source bin/activate

Then clone the repo:

git clone

and develop it:

cd pyallris
python develop

This will install all requirements as well.

After that you can look into to check how it's supposed to work. There are also some experiments in the experiments/ folder.

Ubuntu 12.04 installation

Install python 2.7 and some libs:

sudo apt-get install python2.7-dev
sudo apt-get install libxml2-dev libxslt-dev

Install the mongo dbms:

sudo apt-get install mongodb

Install pip for the virtualenv:

sudo apt-get install python-pip
sudo pip-2.7 install virtualenv

Use VirtualEnvironment: Navigate to a folder where the project root folder will be and run:

virtualenv-2.7 .
source bin/activate

Clone the git project as a subdirectory of the virtualenv folder:

git clone

Initialize the bootstrap git submodule:

git submodule init
git submodule update


python develop

Now you can run the scrapers with:



Please note that the URLs in use are right now hard coded for the ALLRIS in Aachen. This might change soon though.