Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.

Welcome to LinkedGeoData: Providing OpenStreetMap data as RDF

LinkedGeoData (LGD) is an effort to add a spatial dimension to the Web of Data / Semantic Web. LinkedGeoData uses the information collected by the OpenStreetMap project and makes it available as an RDF knowledge base according to the Linked Data principles. It interlinks this data with other knowledge bases in the Linking Open Data initiative.

The project web site can be found here. If you are running Ubuntu then this repository contains everything you need to transform OpenStreetMap data to RDF yourself. For other systems please consider contributing adaptions of the existing scripts.

Debian package now available!

Technically, LinkedGeoData is set of SQL files, database-to-rdf (RDB2RDF) mappings, and bash scripts. The actual RDF conversion is carried out by the SPARQL-to-SQL rewriter Sparqlify. You can view the Sparqlify Mappings for LinkedGeoData here. Therefore, if you want to install the LinkedGeoData debian package, you also Sparqlify one.

For the latest version of LinkedGeoData package, perform the following steps to set up the package source:

Create the file


and add the content

deb precise main contrib non-free

Import the public key with

wget -O - | apt-key add -

Now you can install LinkedGeoData using

sudo apt-get update
sudo apt-get install linkedgeodata

Alternatively You can download both packages manually:

After installing these packages, the following essential commands will be available:

  • lgd-createdb (provided by linkedgeodata)
  • lgd-createdb-snapshot (provided by linkedgeodata)
  • sparqlify-tool (provided by sparqlify, supersedes the former lgd-query command)
  • Have a look at the section for additional tools

Read the section on data conversion for their documentation.

Alternative set up

In /bin you find the following setup helper scripts which are aimed at easing the LinkedGeoData setup directly from source; without a debian package:

  • Installs all required system packages using apt-get (postgres, postgis, osmosis, git and maven)

The following scripts are just helpers to build and/or install the Sparqlify debian package. Mainly intended for development.

  • Builds a Sparqlify debian package from source and installs it.
  • Simply downloads and installs the latest Sparqlify debian package.

Do it yourself data conversion

This section describes how to create and query a LinkedGeoData database. After you installed the LinkedGeoData scripts, you need to obtain an OpenStreetMap dataset which you want to load. Note: Make sure to read the section on database tuning when dealing with larger datasets!

As for obtaining datasets, a very good source for OSM datasets in bite-size chunks is GeoFabrik. For full dumps, refer to the planet downloads.

In /bin you find several scripts. Essentially they are designed to work both from a cloned LinkedGeoData Git repo and wrapped up as a debian package. All of them are configured via lgd.conf.dist. You can override the default settings without changing this file by creating a lgd.conf file. If you installed the debian package, instead of the lgd.conf.dist file, the file /etc/sparqlify/sparqlify.confis used. If you are using the following scripts from the git repo, invoke them with./ don't forget the./`).

  • lgd-createdb-snapshot: A slightly experimental, but possibly much faster, version of the lgd-createdb script. Probably the lgd-createdb command will eventually refer to this version.

  • lgd-createdb: Creates and loads an LGD database

    • -h postgres host name
    • -d postgres database name
    • -U postgres user name
    • -W postgres password (will be added to ~/.pgpass if not exists)
    • -f .pbf file to load (other formats currently not supported)


lgd-createdb -h localhost -d lgd -U postgres -W mypwd -f bremen-latest.osm.pbf

The reason we chose Bremen for the example is simply that it is a small file (around 8MB).

  • sparqlify-tool: This is a small wrapper for sparqlify command that adds a simple profile system for convenience.
    • -P profile name. Settings will be loaded from such a file (see below) and can be overridden by further options.
    • -h database host name
    • -d database name
    • -U database user name
    • -W database password (will be added to ~/.pgpass if not exists)
    • -Q SPARQL query string or named query

Here is an example of a profile file, which is assumed to be located at /etc/sparqlify/profiles.d/lgd-example.conf. This file will be deployed when installing the linkedgeodata debian package.


A named query is just a SPARQL query that is referenced by a name. The mapping of a name to a SPARQL is configured via /etc/sparqlify/sparqlify.conf.

Currently, the following named queries exist:

  • ontology: Creates an N-Triple output with all classes and properties
  • dump: Create a full dump of the database


    sparqlify-tool -P lgd-example ontology
    sparqlify-tool -P lgd-example dump
    sparqlify-tool -h localhost -d lgd -U postgres -W mypwd -Q 'Construct { ?s ?p ?o } { ?s a <> . ?s ?p ?o }'
    sparqlify-tool -P lgd-example -Q 'Select * { ?s ?p ?o . Filter(?s = <>) }'

Again, note that Sparqlify is still in development and the supported features are a bit limited right now - still, basic graph patterns and equal-constraints should be working fine.

Additional tools

  • lgd-osm-replicate-sequences: Convert a timestamp to a sequence ID. This is similar to mazdermind's replicate sequences tool, however, our version does not require a local index. Instead, our tools combines binary search with linear interpolation: First, the the two most recent state.txt files from the given repository url are fetched, then the time differnce is computed, and based on linear interpolation a sequence id close to the given timetstamp is computed. This process is repeated recursively.
lgd-osm-replicate-sequences -u "" -t "2017-05-28T15:00:00Z"

# The above command from the debian package is a wrapper for:

java -cp linkedgeodata-debian/target/linkedgeodata-debian-*-jar-with-dependencies.jar \
    "org.aksw.linkedgeodata.cli.command.osm.CommandOsmReplicateSequences" \
    -u "" -t "2017-05-28T15:00:00Z"

The output is a (presently subset) of the appropriate state.txt file whose timestamp is strictly less than that given as the argument.


Note, that the timestamp format is compatible with osmconvert, which can check for the most recent data item in a osm data file. Hence, these tools can be combined in order to find the state.txt file from which to proceed with replication.

timestamp=`osmconvert --out-timestamp "data.osm.pbf"`
lgd-osm-replicate-sequences -u "url-to-repo" -t "$timestamp"
# Use the -d option to option the (d)uration between the most recently published files
lgd-osm-replicate-sequences -u "" -d
# This yields simply the output (possibly off by a few seconds)
# 86400

Postgresql Database Tuning

It is recommended to tune the database according to these recommendations. Here is a brief summary: Edit /etc/postgresql/9.1/main/postgresql.conf and set the following properties:

    shared_buffers       = 2GB #recommended values between 25% - 40% of available RAM, setting assumes 8GB RAM
    effective_cache_size = 4GB #recommended values between 50% - 75% of available RAM, setting assumes 8GB RAM
    checkpoint_segments  = 256
    checkpoint_completion_target = 0.9
    autovacuum = off # This can be re-enabled once loading has completed

    work_mem             = 256MB (This memory is used for sorting, so each user may use this amount of memory for his sorts; You may want to use a significantly lower value if there are many connections doing sorts)
    maintainance_work_mem = 256MB

Furthermore, allow more shared memory, otherwise postgres won't start: Append the following line to /etc/sysctl.conf:

    #Use more shared memory max

    # Note: The amount (specified in bytes) for kernel.shmmax must be greater than the shared_buffers settings obove
    #4GB = 4294967296
    #8GB = 8589934592

Make the changes take effect:

    sudo sysctl -p
    sudo service postgresql restart


The content of this project are licensed under the GPL v3 License.

You can’t perform that action at this time.