TripleGeo is a utility developed by the Information Systems Management Institute at Athena Research Center under the EU/FP7 project GeoKnow: Making the Web an Exploratory for Geospatial Knowledge and the EU/H2020 Innovation Action SLIPO: Scalable Linking and Integration of big POI data. This generic purpose, open-source tool can be used for extracting features from geospatial files and databases and thransforming them into RDF triples.
Initial releases of TripleGeo were based on open-source utility geometry2rdf. Starting from version 1.2, the source code has been completely re-engineered, rewritten, and further enhanced towards scalable performance against big data volumes, as well as advanced support for more input formats and attribute schemata. TripleGeo is written in Java and is still under development; more enhancements will be included in future releases. However, all supported functionality has been tested and works smoothly in both MS Windows and Linux platforms.
- TripleGeo is a command-line utility and has several dependencies on open-source and third-party, freely redistributable libraries. The
pom.xmlfile contains the project's configuration in Maven.
- Special note on JDBC drivers for database connections: In case you wish to extract data from a geospatially-enabled DBMS (e.g., PostGIS), either you have to include the respective
postgresql-9.4-1206-jdbc4.jar) in the classpath at runtime or to specify the respective dependency in the
.pomand then rebuild the application.
- Special note on manual installation of a JDBC driver for Oracle DBMS: Due to Oracle license restrictions, there are no public repositories that provide
ojdbc7.jar(or any other Oracle JDBC driver) for enabling JDBC connections to an Oracle database. You need to download it and install in your local repository. Get this jar from Oracle and install it in your local maven repository using:
mvn install:install-file -Dfile=/<*YOUR_LOCAL_DIR*>/ojdbc7.jar -DgroupId=com.oracle -DartifactId=ojdbc7 -Dversion=188.8.131.52 -Dpackaging=jar
- Starting from version 1.3, TripleGeo includes support for custom transformation of thematic attributes according to RDF Mapping language (RML). In order to enable RML conversion mode, you need to install RML-Mapper.jar specially prepared for TripleGeo execution in your local maven repository using:
mvn install:install-file -Dfile=/<*YOUR_LOCAL_DIR*>/RML-Mapper.jar -DgroupId=be.ugent.mmlab.rml -DartifactId=rml-mapper -Dversion=0.3 -Dpackaging=jar
Building the application with maven:
mvn clean package
results into a
targetaccording to what has been specified in the
TripleGeo supports two-way transformation of geospatial features:
- Transformation of geospatial datasets from various conventional formats into RDF data. TripleGeo supports mappings from the attribute schema of input dataset into an ontology for RDF features that guides the transformation (i.e., creating RDF properties, constructing URIs, defining links between entities, etc.). Optionally, classification of input features into categories can be also performed, provided that the user specifies a (possibly hierarchical, multi-tier) classification scheme (e.g., possible amenities for Points of Interest, a list of road types for a Road Network).
- Reverse Transformation of RDF data into de facto geospatial formats (currently, .CSV and ESRI shapefiles). TripleGeo retrieves data from a graph constructed on-the-fly from the RDF data and creates records with a geometry attribute and thematic attributes reflecting the underlying ontology of the input RDF data.
Explanation and usage tips for both transformation modules are given next. The current distribution (ver. 1.5) comes with dummy configuration templates
file_options.conf for geographical files (ESRI shapefiles, CSV, GPX, KML, etc.) and
dbms_options.conf for database contents (from PostGIS, Oracle Spatial, etc.). These files contain indicative values for the most important properties when accessing data from geographical files or a spatial DBMS. This release also includes a template
reverse_options.conf for reconverting RDF data back into geospatial file formats. Self-contained brief instructions can guide you into the extraction and reverse transformation processes.
Indicative configuration files and mappings for several cases are available here in order to assist you when preparing your own.
NOTE: All execution commands and configurations refer to the current version (TripleGeo ver. 1.5).
How to use TripleGeo in order to transform geospatial data into RDF triples:
- In case that triples will be extracted from a geographical file (e.g., ESRI shapefiles) as specified in the user-defined configuration file in
./test/conf/shp_options.conf, and assuming that binaries are bundled together in
/target/triplegeo-1.5-SNAPSHOT.jar, give a command like this:
java -cp ./target/triplegeo-1.5-SNAPSHOT.jar eu.slipo.athenarc.triplegeo.Extractor ./test/conf/shp_options.conf
- If triples will be extracted from a geospatially-enabled DBMS (e.g., PostGIS), the command is essentially the same, but it specifies a suitable configuration file
./test/conf/PostGIS_options.confwith all information required to connect and extract data from the DBMS, as well as runtime linking to the JDBC driver for enabling connections to PostgreSQL (assuming that this JDBC driver is located at
java -cp ./lib/postgresql-9.4-1206-jdbc4.jar;./target/triplegeo-1.5-SNAPSHOT.jar eu.slipo.athenarc.triplegeo.Extractor ./test/conf/PostGIS_options.conf
- TripleGeo supports data in GML (Geography Markup Language) and KML (Keyhole Markup Language). It can also handle INSPIRE-aligned GML data for seven Data Themes (Annex I), as well as INSPIRE-aligned geospatial metadata. Any such transformation is performed via XSLT, as specified in the respective configuration settings (e.g.,
./test/conf/KML_options.conf) as follows:
java -cp ./target/triplegeo-1.5-SNAPSHOT.jar eu.slipo.athenarc.triplegeo.Extractor ./test/conf/KML_options.conf
Wait until the process gets finished, and verify that the resulting output files are according to your specifications.
How to use TripleGeo in order to transform RDF triples into a geospatial data file:
- In the configuration file, specify one or multiple files that contain the RDF triples that will be given as input to the reverse transformation process.
- You must specify a valid SPARQL SELECT query that will be applied against the RDF graph and will fetch the resulting records. The path to the file containing this SPARQL command must be specified in the configuration. It is assumed that the user is aware of the underlying ontology of the RDF graph. If the SPARQL query is not valid, then no or partial results may be retrieved. By default, the names of the variables in the SELECT clause will be used as attribute names in the output file.
- The current release of TripleGeo (ver. 1.5) supports .CSV delimited files and ESRI shapefiles as output formats for reverse transformation.
- In case of ESRI shapefile as output format, make sure that all input RDF geometries are of the same type (i.e., either points or lines or polygons), because shapefiles can only support a single geometry type in a given file.
- Once parameters have been specified in a suitable configuration file (e.g., like
./test/conf/shp_reverse.conf), execute the following command to launch the reverse transformation process:
java -cp ./target/triplegeo-1.5-SNAPSHOT.jar eu.slipo.athenarc.triplegeo.ReverseExtractor ./test/conf/shp_reverse.conf
The current version of TripleGeo utility can access geometries from:
- ESRI shapefiles, a widely used file-based format for storing geospatial features.
- Other widely used geographical file formats, including: GPX (GPS Exchange Format), GeoJSON, as well as OpenStreetMap (OSM) XML and PBF files.
- De facto data interchange formats with geometries specified as coordinate pairs: CSV (comma separated values), JSON.
- Geographical data stored in GML (Geography Markup Language) and KML (Keyhole Markup Language).
- INSPIRE-aligned datasets for seven Data Themes (Annex I) in GML format: Addresses, Administrative Units, Cadastral Parcels, GeographicalNames, Hydrography, Protected Sites, and Transport Networks (Roads).
- Several geospatially-enabled DBMSs, including: Oracle Spatial and Graph, PostGIS extension for PostgreSQL, MySQL, Microsoft SQL Server, IBM DB2 with Spatial Extender, SpatiaLite, and ESRI Personal Geodatabases in Microsoft Access format.
Sample geographic datasets for testing are available in various file formats.
In terms of RDF serializations, triples can be obtained in one of the following formats: RDF/XML (default), RDF/XML-ABBREV, N-TRIPLES, N3, TURTLE (TTL).
Concerning geospatial representations, RDF triples can be exported according to these ontologies:
- the GeoSPARQL standard for several geometric types (including points, linestrings, and polygons);
- the WGS84 RDF Geoposition vocabulary for point features;
- the legacy Virtuoso RDF vocabulary for point features.
Resulting triples are written into local files, so that they can be readily imported into a triple store that supports the respective ontology.
TripleGeo has been used to transform a large variety of geospatial datasets into RDF. Amongst them:
- Exposing INSPIRE-alinged geospatial data and metadata for Greece as Linked Data through a SPARQL endpoint. This has been the first attempt to build an abstraction layer on top of the INSPIRE infrastructure based on GeoSPARQL concepts, thus making INSPIRE contents accessible and discoverable as linked data.
- Exposing Points of Interest (POI) as Linked Geospatial Data through this SPARQL endpoint. In this case, POI data extracted from OpenStreetMap across Europe has been transformed into RDF according a comprehensive and vendor-agnostic OWL ontology for POI data, which enables modeling and representation of multifaceted and enriched POI profiles.
All Java classes and data structures developed for TripleGeo are fully documented in this Javadoc.
The contents of this project are licensed under the GPL v3 License.