Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
lachambre.be_to_json
Python HTML
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
dierentheater
history
lachambre
pdfs
scheduler
scraper
templates
.env
.gitignore
COPYING
README.md
TODO
manage.py
requirements.txt

README.md

Dieren Theater

lachambre.be to json sausage machine.

Installation

You need pip and virtualenv (or buildout).

Under debian based distributions:

sudo apt-get install python-pip python-virtualenv

You also need mongodb:

sudo apt-get install mongodb

And pdftotext:

sudo apt-get install xpdf-utils

And libxml2 and libxslt for lxml:

sudo apt-get install libxml2-dev libxslt-dev

Once this is done create a virtualenv, for example:

virtualenv --distribute --no-site-packages ve

Then activate it:

source ve/bin/activate

Now, install the dependencies (this can take a little bit of time):

pip install -r requirements.txt

And install the indices:

python manage.py syncdb

Usage

Scraping

To launch the scraping:

python manage.py scrape

By default this launch all the parser, if you want to only launch some parser you can specify them on the CLI:

python manage.py scrape --deputies

Available options are:

--commissions         Parse commissions
--documents           Parse documents
--written_questions   Parse written_questions
--reports             Parse reports
--deputies            Parse deputies

If you have exception, remember that you need to passe the option --traceback to a django command to have the traceback (yes ...).

Debugging

If you want to end up in ipdb (a python interactive debugger) on exception, you can use the --ipdb option.

python manage.py scrape --ipdb

Dump

But aware that by default the parser store every page he has encountered in the folder dump but re-download it (for a maximum number of 2 more times) if it has problems with it. If you want to reset the cache simply run:

rm dump/*

Server

To launch the dev server:

python manage.py runserver

Public instance

http://www.dierentheater.be

Licence

agplv3+ for code and ODbL for data

Something went wrong with that request. Please try again.