Scraper of Slovak National Council for Visegrad+ project.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.



Scraper of Slovak National Council for Visegrad+ project. Scrapes MPs, their memberships, votes and debates and stores the data into Visegrad+ parliament API.




  • lxml library to parse HTML documents,
  • LibreOffice core and unoconv to convert documents from RTF format,
  • some Python packages.

On Debian-based distributions install the libraries:

$ sudo apt-get install libxml2-dev libxslt1-dev zlib1g-dev libreoffice-core unoconv


Get the scraper:

$ sudo mkdir --p /home/projects/scrapers
$ cd /home/projects/scrapers
$ sudo git clone sk_nrsr

Get VPAPI client and SSH certificate of the server:

$ cd sk_nrsr
$ sudo wget
$ sudo wget

Create a virtual environment for the scraper and install the required packages into it:

$ sudo virtualenv /home/projects/.virtualenvs/scrapers/sk_nrsr --no-site-packages
$ source /home/projects/.virtualenvs/scrapers/sk_nrsr/bin/activate
(sk_nrsr)$ sudo pip install -r requirements.txt
(sk_nrsr)$ deactivate


Check that SERVER_NAME and SERVER_CERT variables in have correct values.

Copy file conf/private-example.json to conf/private.json and fill in your username and password for write access through API. Those sensitive data must not be present in the repository.


Run in the virtual environment. See help message of the scraper for parameters the scraper accepts

$ source /home/projects/.virtualenvs/scrapers/sk_nrsr/bin/activate
$ python --help

unoconv listener must be running to scrape transcripts of former debates (election terms 1-4)

$ unoconv --listener &

Scrape people and their memberships first, then debates and finally votes (initial scrape of debates deletes all existing sessions and sittings)

$ sudo -u visegrad python --people initial --debates none --votes none
$ sudo -H -u visegrad python --people none --debates initial --votes none
$ sudo -u visegrad python --people none --debates none --votes initial

(unoconv creates tmp files in HOME). Or all at once

$ sudo -H -u visegrad python --people initial --debates initial --votes initial

You can stop unoconv listener unless needed for other scrapers or conversions

$ sudo killall soffice.bin

Then schedule periodic scrape

$ sudo -u visegrad python --people recent --debates recent --votes recent

or, knowing that recent is the default value, simply

$ sudo -u visegrad python