Various DCAT tools for updating data.gov.be
Java
Latest commit c15682e Feb 20, 2017 @barthanssens barthanssens Small improvements in mappings
Signed-off-by:Bart Hanssens <bart.hanssens@fedict.be>

README.md

DCAT tools

Various DCAT tools for harvesting metadata from Belgian open data portals, converting metadata to DCAT-AP files and updating the Belgian data.gov.be portal.

The portal itself is a Drupal 7 website, based on Fedict's Openfed distribution plus two extra modules RestWS and RestWS i18n.

Data

Only interested in the result ? The N-Triples and XML files (DCAT-AP-ish) used to update data.gov.be can be found in the dcat repository

Overview of the tools

Components

Requirements

These tools can be used with Oracle Java runtime 1.8 (1.7 will probably work, but not tested), on a headless machine, i.e. there is no fancy GUI.

Internet connection is obviously required, although a proxy can be used.

Binaries can be found in dist/bin, compiling from source requires the Oracle JDK and Maven.

Main parts

  • Helper classes: for storing scraped pages locally, conversion tools etc.
  • Various scrapers: getting metadata from various repositories and websites, and turning the metadata into DCAT files
  • DCAT enhancers: for improving the DCAT files, e.g. map site-specific themes add missing properties and prepare the files for updating data.gov.be
  • Data.gov.be updater: update the data.gov.be (Drupal 7) website using the enhanced DCAT files
  • Some tools: link checker

There is also separate, stand-alone RDF validator project which can be used to validate DCAT metadata, regardless if the metadata is to be published on data.gov.be or not.

Steps

  • The various portals (except all) should be harvested using the scrapers.
  • The resulting RDF files must be improved using the enhancers
  • The enhanced files can be uploaded to the data.gov.be portal
  • Then use the all enhancer to merge the files from the portals into one file datagovbe.nt
  • Convert the merged file using the EDP tool to an XML file datagovbe_edp.xml
  • Upload both the datagovbe.nt and datagovbe_edp.xml to github

Configuration

All configuration is done using Java (plain text) properties files. Some examples can be found in dist/cfg

See also the Notes