Fork of datagouvfr-rdf applied to data.gov.uk metadata
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
api
ontology
sparql
turtle
.gitignore
README.md
UNLICENSE
build.xml
upload_template.properties

README.md

Semantic data.gov.uk (0.8.5)

This project is a fork of datagouvfr-rdf, adapted to the British Open Data portal metadata (data.gov.uk).

You can fire SPARQL queries on the endpoint here.

This script is fully functional (not beta or alpha or what not).

Update script

build.xml is an Apache Ant script that runs the following tasks:

  1. Downloading the latest metadata dumps from data.gov.uk (CSV)
  2. Cleaning the data dumps (empty lines, spaces in CSV headers, etc.)
  3. Converting the CSV into RDF (using TARQL)
  4. Uploading the RDF to a repository
  5. Converting text identifiers into URIs for better linking across the data
  6. Integrating the output of beheader into the graph (soon)
  7. Adding some metadata about the resulting data set (DCAT, VoID, PROV)

This script is run every night to update the RDF metadata.

The data model can be seen here.

Requirements

  • Apache Ant, with [ANT INSTALL]/bin directory added to your PATH environment variable
  • cURL, with [CURL INSTALL] directory added to your PATH environment variable
  • TARQL by Richard Cyganiak (@cygri), with [TARQL INSTALL] directory added to your PATH environment variable
  • An RDF repository. Apache Fuseki is a good choice, but there are plenty.

Configuration

  • Copy upload_template.properties and rename it upload.properties
  • Open it and fill it. As-is, your repository requires a user:password combination

Run it

  • If Requirements are fulfilled, just run ant in datagovuk-rdf root folder.
  • If you have already run the process and just want to reload the data in the triple store, run ant quick.

Next steps

  • Tell me!

Contact

I would love to read your feedback/comments/suggestions!

If you have a Github account, you can create an issue.

Otherwise, you can reach me:

Change log

0.8.5
  • Fixed malformed URLs by trimming trailing space before upload
0.8.4
  • Detection of machine-readable resources (dgfr:machineReadable)
0.8.3
  • Added backup-repository and load-backup targets to enable the management of the repository as a service
  • Added data integration from beheader
0.8.2
  • Fixed dcat:downloadUrl
0.8.1
  • Fixed missing directories (csv and rdf)

0.8.0

  • Adapted scripts and queries to data.gov.uk setup (#1)

Pre-fork change log

0.7.0

  • Added properties dgfr:responseStatusCode, dgfr:responseTime and dgfr:availabilityCheckedOn to the ontology and API configuration
  • Added direct link between organizations and published distributions (see the result in the data model
  • Added a view for anavailable resources in the API (https://www.data.maudry.com/fr/resources/unavailable)
  • Icons for boolean values (true/false) are clearer now

0.6.0

0.5.0

  • Availability and unavailability count at dataset and organization levels
0.4.3
  • Made SPARQL endpoint configuration more flexible
0.4.2
  • Fixed errors in ontology
0.4.1
  • Disabled archiving of RDF due to disk space. Will enable again when I have a clearer archiving strategy.

0.4.0

  • Calculation of popularity points for all objects, and aggregate sums on organisations and datasets
  • Integration of the data collected by beheader (availability of the distributions, content type, content length)
0.3.3
  • Enabled ETL with previously downloaded data to have CasanovaLD up quicker
0.3.2
  • Not much...
0.3.1

0.3.0

  • The RDF data is now loaded in a single atomic transaction in the repository
  • Switch from Dydra (http://dydra.com) to a local Apache Fuseki instance
  • Added organizations and reuses data, with all identifiers turned into URIs for full linking
0.2.1
  • That was a lame name. Say hi to CasanovaLD!
  • Improved documentation

0.2.0

  • The data.gouv.fr explorer app, with somewhat documented APIs, is live!
  • URIs have changed to match the domain of the app
  • Added dgfr:visits and dcterms:keywords (as comma-separated list, meh) in the data
0.1.5
0.1.4
  • Fixed missing properties (mismatch at conversion stage). Still no tags
0.1.3
  • Fixed RDF dataset modification date
0.1.2
  • Fixed resources that have spaces in their URLs (url-encode)
  • Added dgfr:slug for datasets
0.1.1
  • Configured upload and update of VoID and PROV metadata (in default graph)
  • Enabled scheduled task to update data every day

0.1.0

  • Script to download/clean/convert/publish data.gouv.fr dataset metadata
  • Basic documentation