Semantic data.gov.uk (0.8.5)
This project is a fork of datagouvfr-rdf, adapted to the British Open Data portal metadata (data.gov.uk).
You can fire SPARQL queries on the endpoint here.
This script is fully functional (not beta or alpha or what not).
Update script
build.xml is an Apache Ant script that runs the following tasks:
- Downloading the latest metadata dumps from data.gov.uk (CSV)
- Cleaning the data dumps (empty lines, spaces in CSV headers, etc.)
- Converting the CSV into RDF (using TARQL)
- Uploading the RDF to a repository
- Converting text identifiers into URIs for better linking across the data
- Integrating the output of beheader into the graph (soon)
- Adding some metadata about the resulting data set (DCAT, VoID, PROV)
This script is run every night to update the RDF metadata.
The data model can be seen here.
Requirements
- Apache Ant, with [ANT INSTALL]/bin directory added to your PATH environment variable
- cURL, with [CURL INSTALL] directory added to your PATH environment variable
- TARQL by Richard Cyganiak (@cygri), with [TARQL INSTALL] directory added to your PATH environment variable
- An RDF repository. Apache Fuseki is a good choice, but there are plenty.
Configuration
- Copy
upload_template.properties
and rename itupload.properties
- Open it and fill it. As-is, your repository requires a user:password combination
Run it
- If Requirements are fulfilled, just run
ant
indatagovuk-rdf
root folder. - If you have already run the process and just want to reload the data in the triple store, run
ant quick
.
Next steps
- Tell me!
Contact
I would love to read your feedback/comments/suggestions!
If you have a Github account, you can create an issue.
Otherwise, you can reach me:
- by email: colin@maudry.com
- on Twitter: @CMaudry
Change log
0.8.5
- Fixed malformed URLs by trimming trailing space before upload
0.8.4
- Detection of machine-readable resources (
dgfr:machineReadable
)
0.8.3
- Added backup-repository and load-backup targets to enable the management of the repository as a service
- Added data integration from beheader
0.8.2
- Fixed dcat:downloadUrl
0.8.1
- Fixed missing directories (csv and rdf)
0.8.0
- Adapted scripts and queries to data.gov.uk setup (#1)
Pre-fork change log
0.7.0
- Added properties dgfr:responseStatusCode, dgfr:responseTime and dgfr:availabilityCheckedOn to the ontology and API configuration
- Added direct link between organizations and published distributions (see the result in the data model
- Added a view for anavailable resources in the API (https://www.data.maudry.com/fr/resources/unavailable)
- Icons for boolean values (true/false) are clearer now
0.6.0
- Added ontology documentation (Ontoology, thanks @dgarijo). You can view it following the ontology URI at http://colin.maudry.com/ontologies/dgfr/index.html
0.5.0
- Availability and unavailability count at dataset and organization levels
0.4.3
- Made SPARQL endpoint configuration more flexible
0.4.2
- Fixed errors in ontology
0.4.1
- Disabled archiving of RDF due to disk space. Will enable again when I have a clearer archiving strategy.
0.4.0
- Calculation of popularity points for all objects, and aggregate sums on organisations and datasets
- Integration of the data collected by beheader (availability of the distributions, content type, content length)
0.3.3
- Enabled ETL with previously downloaded data to have CasanovaLD up quicker
0.3.2
- Not much...
0.3.1
- Updated the API documentation
- Updated VoiD and PROV metadata to match the new repository location
0.3.0
- The RDF data is now loaded in a single atomic transaction in the repository
- Switch from Dydra (http://dydra.com) to a local Apache Fuseki instance
- Added organizations and reuses data, with all identifiers turned into URIs for full linking
0.2.1
- That was a lame name. Say hi to CasanovaLD!
- Improved documentation
0.2.0
- The data.gouv.fr explorer app, with somewhat documented APIs, is live!
- URIs have changed to match the domain of the app
- Added dgfr:visits and dcterms:keywords (as comma-separated list, meh) in the data
0.1.5
- Redirections to the www. address was flaky on data.gouv.fr, so I had to specify the fully resolved address (e.g. http://www.data.gouv.fr/fr/datasets.csv)
0.1.4
- Fixed missing properties (mismatch at conversion stage). Still no tags
0.1.3
- Fixed RDF dataset modification date
0.1.2
- Fixed resources that have spaces in their URLs (url-encode)
- Added dgfr:slug for datasets
0.1.1
- Configured upload and update of VoID and PROV metadata (in default graph)
- Enabled scheduled task to update data every day
0.1.0
- Script to download/clean/convert/publish data.gouv.fr dataset metadata
- Basic documentation