Linked Data Fragments Server

On today's Web, Linked Data is published in different ways, which include data dumps, subject pages, and results of SPARQL queries. We call each such part a Linked Data Fragment.

The issue with the current Linked Data Fragments is that they are either so powerful that their servers suffer from low availability rates (as is the case with SPARQL), or either don't allow efficient querying.

Instead, this server offers Triple Pattern Fragments. Each Triple Pattern Fragment offers:

data that corresponds to a triple pattern (example).
metadata that consists of the (approximate) total triple count (example).
controls that lead to all other fragments of the same dataset (example).

An example server is available at data.linkeddatafragments.org.

Install the server

This server requires Node.js 0.10 or higher and is tested on OSX and Linux. To install, execute:

$ [sudo] npm install -g ldf-server

Use the server

Configure the data sources

First, create a configuration file config.json similar to config-example.json, in which you detail your data sources. For example, this configuration uses an HDT file and a SPARQL endpoint as sources:

{
  "title": "My Linked Data Fragments server",
  "datasources": {
    "dbpedia": {
      "title": "DBpedia 2014",
      "type": "HdtDatasource",
      "description": "DBpedia 2014 with an HDT back-end",
      "settings": { "file": "data/dbpedia2014.hdt" }
    },
    "dbpedia-sparql": {
      "title": "DBpedia 3.9 (Virtuoso)",
      "type": "SparqlDatasource",
      "description": "DBpedia 3.9 with a Virtuoso back-end",
      "settings": { "endpoint": "http://dbpedia.restdesc.org/", "defaultGraph": "http://dbpedia.org" }
    }
  }
}

The following sources are supported out of the box:

HDT files (HdtDatasource with file setting)
N-Triples documents (TurtleDatasource with url setting)
Turtle documents (TurtleDatasource with url setting)
JSON-LD documents (JsonLdDatasource with url setting)
SPARQL endpoints (SparqlDatasource with endpoint and optionally defaultGraph settings)

Support for new sources is possible by implementing the Datasource interface.

Start the server

After creating a configuration file, execute

$ ldf-server config.json 5000 4

Here, 5000 is the HTTP port on which the server will listen, and 4 the number of worker processes.

Now visit http://localhost:5000/ in your browser.

Reload running server

You can reload the server without any downtime in order to load a new configuration or version.
In order to do this, you need the process ID of the server master process.
One possibility to obtain this are the server logs:

$ bin/ldf-server config.json
Master 28106 running.
Worker 28107 running on http://localhost:3000/.

If you send the server a SIGHUP signal:

$ kill -s SIGHUP 28106

it will reload by replacing its workers.

Note that crashed or killed workers are always replaced automatically.

Using the `LiveHdtDatasource`

The LiveHdtDatasource was developed by Pablo Estrada as part of a project for the Google Summer of Code of 2015. This datasource allows the Linked Data Fragments Server to keep an updated version of the DBPedia, while still being scalable.

The LiveHdtDatasource used an HDT file as data source. It periodically polls the DBPedia Live Feed for updates to the DBPedia dataset. When there are new updates, it downloads them and inserts them in temporary databases; which are used along with the base HDT file. Then, it periodically generates a new HDT file, that incorporates the changesetss it had downloaded previously.

The LiveHdtDatasource requires a 'workspace' directory for its temporary databases, and the newly generated HDT files. It is recommended that after first initialization, the LiveHdtDatasource is allowed to manage its workspace independently.

A LiveHdtDatasource requires the following settings to be configured to run properly:

pollingInterval - This is the period, in minutes, between each polling for new changesets in the DBPecia Live Feed.
regeneratingInterval - This is the period, in minutes, between each time the HDT is regenerated.
file - This is the HDT file containing the data. The LiveHdtDatasource will ignore, and delete the file once a new HDT file has been generated.
workspace - This is the workspace directory where the LiveHdtDatasource manages its data. The directory must exist.
latestChangeset - This is the latest changeset that has been added to the HDT file, written as 'Year/Month/Day/Number'. Once the LiveHdtDatasource has started polling and applying changesets on it's own, it will ignore this parameter. Nonetheless, it's very important to add this information on first startup.
addedTriplesDb/removedTriplesDb - These are the added/removed triples databases. It defaults to 'added.db', and 'removed.db' within the workspace. - It is recommended to leave this property unset.
regeneratorScript - This is the script that regenerates the HDT file. It defaults to ./consolidate.sh in the bin/ directory.
changesetThreshold - This is the maximum number of changesets to be applied on a single apply cycle. If more than this number of changesets is retrieved, they will be applied in several cycles. Default is 500.
hourStep - This is the maximum number of hours to try to download at once. This generally does not matter, since the pollingInterval is smaller, but when we need to update the HDT file, this limit will play a role. Default is 25.

The default regenerator script uses the hdt-iris, so it is recommended to install it, and add its appropriate location in the bin/consolidate.sh file.

(Optional) Set up a reverse proxy

A typical Linked Data Fragments server will be exposed on a public domain or subdomain along with other applications. Therefore, you need to configure the server to run behind an HTTP reverse proxy.
To set this up, configure the server's public URL in your server's config.json:

{
  "title": "My Linked Data Fragments server",
  "baseURL": "http://data.example.org/",
  "datasources": { … }
}

Then configure your reverse proxy to pass requests to your server. Here's an example for nginx:

server {
  server_name data.example.org;

  location / {
    proxy_pass http://127.0.0.1:3000$request_uri;
    proxy_set_header Host $http_host;
    proxy_pass_header Server;
  }
}

Change the value 3000 into the port on which your Linked Data Fragments server runs.

If you would like to proxy the data in a subfolder such as http://example.org/my/data, modify the baseURL in your config.json to "http://example.org/my/data" and change location from / to /my/data (excluding a trailing slash).

License

The Linked Data Fragments server is written by Ruben Verborgh.

This code is copyrighted by iMinds – Ghent University and released under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 371 Commits
assets		assets
bin		bin
lib		lib
test		test
views		views
.gitignore		.gitignore
.jshintrc		.jshintrc
LICENSE.txt		LICENSE.txt
README.md		README.md
config-example-advanced.json		config-example-advanced.json
config-example-livehdt.json		config-example-livehdt.json
config-example.json		config-example.json
package.json		package.json

License

pabloem/Server.js

Folders and files

Latest commit

History

Repository files navigation

Linked Data Fragments Server

Install the server

Use the server

Configure the data sources

Start the server

Reload running server

Using the LiveHdtDatasource

(Optional) Set up a reverse proxy

License

About

Resources

License

Stars

Watchers

Forks

Languages

Using the `LiveHdtDatasource`