Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


Here, you find the code and documentation of the, a platform that scrapes, aggregate, analyses and visualises open data on public procurement by authorities, state entities and state-owned companies in Austria.

The platform focuses in particular on data on the award of contracts. Only data on contacts above EUR 50,000 is avaialble. This data has to be published via the Austrian open data portal ( in line with the Federal Procurement Act 2018 (Bundesvergabegesetz 2018,

This project is made possible by support from Netidee (

More information on the project is available at (in German).

This is a project of Forum Informationsfreiheit – the Freedom of Information Forum Austria. In case you have any questions, feel free to write to

OffeneVergaben Scraper

A command line utility toolkit, written in php, for scraping raw xml data for Ausschreibungen laut BVergG2018.

For a fully functional scraper a connected MySql (or MariaDB equivalent) database is required.

However there is a download command available for downloading xml files into a local directory, no database setup is required for this.

Installation instructions



  1. Clone the github repository
  2. Navigate into the project root directory
  3. Install dependencies with composer install
  4. Run the provided install.sql script in <projectroot>/sql to setup the database and necessary tables
  5. To configure the database connection copy the provided example configuration file cp .env.example .env and fill in your database credentials

Run the command line tool

Inside the project directory run bin/console to run the command line tool in default mode which will print out all available commands.

Run bin/console <command> to run a specific command.

Primary commands





Additional Commands




Command Scrape All


A "meta" command that executes scrape:publishers and scrape:kerndaten consecutively. If you intend to run the scraper periodically this is the command to plug into your crontab.

Command Scrape Publishers


Scrapes a list of data publishers from . Makes use of its CKAN API. The main goal of this command is to retrieve all the urls of all available Kerndaten-sources.

Automatically runs a check to test if the provided url is a valid Kerndaten-source. In case the test fails the publisher will be marked as disabled in the database and will be ignored during the actual Kerndaten scraping process.

Please note if you don't use scrape:all this command is required to run before scrape:kerndaten.

Command Scrape Kerndaten


This command scrapes the actual Kerndaten (Tender procedures) by a two step process.

  1. Scrape one Kerndaten Source XML to receive the complete list of Kerndaten item urls. Compare the list of items with already known Kerndaten (stored inside the database) and ignore items that have not been updated since the last run.

  2. Iterate over the resulting list of urls, scrape one singular Kerndaten XML at a time and store the XML string in the database.

(Repeats for each active publisher)

By default the scraper waits 1.5 seconds between each request. This translates to a runtime of approximately 4 hours for the initial run (roughly 10.000 xmls, Feb. 2020).

The scraping process can be stopped at any time with CTRL^C. On the next call the scraper will resume seamlessly without loss of data.

Command Download


The "quick & dirty" way to get the data. Executes a complete download (not version aware!) of all available Kerndaten xmls at the given time.

By default the data will be downloaded into a timestamped directory inside the downloads directory.

Database setup is not required for this command to work.

Command List Publishers


Print a list of all known publishers that are stored in the connected database to the console and show information about the date it was added, the current status (enabled / disabled) and the key (the reference id from

Use the --check option to perform a live check to find out if the data source is actually up and responding with xml content, this takes a moment or two, depending on the number of requests that have to be performed and the response time of each request.

Command Enable Publisher

enable:publisher <reference_id of publisher>

Use this command to (re-)enable a previously disabled publisher. Only publishers that are enabled will be considered for scraping.

Command Disable Publisher

disable:publisher <reference_id of publisher>

Use this command to disable a publisher. Disabled publishers are simply ignored during scraping.


you may find the license for the source code in the LICENSE file.









No releases published


No packages published