Imagine a jQuery-style autocompletion widget without hardcoded data options, built using Linked Data. This project contains a proof of concept of such how the source data can be fragmented and hosted for such an application.
- Create a docker volume:
docker volume create fragments_volume
- Optional: If a different volume name was chosen, update the volume mappings of both services in
docker-compose.yml
.
- Optional: If a different volume name was chosen, update the volume mappings of both services in
- Gather all input data sources into one directory
- In
docker-compose.yml
, replace/data/dumps/path
on line 7 with the chosen directory. This directory will be mounted as/input
in thefiles
container
- In
- Create
files/config.json
, usingfiles/example_config.json
as a template.maxFileHandles
is the maximum number of open file handles the fragmenter may have open; 1024 is a common limit set by operating systems.outDir
can remain unchanged, this is a mounted volume determined bydocker-compose.yml
domain
is used as the root URI to base every fragment's identifier on, so is technically not just the domain but also the protocol, the base path, ...tasks
is a list of all datasets, and how they should be processedinput
is the path to the file, which should be in the/input
directory as determined bydocker-compose.yml
name
will become part of each fragment's path, to keep the fragmented datasets separateproperties
is a list of all predicate (URIs) to fragment this dataset on
Running docker-compose build; docker-compose up
will then fragment all the given datasets, and serve them on localhost:80
.
Running docker-compose up server
will skip fragmenting the data (again), and will only serve the existing data fragments.
files/
Dockerfile
: creates a runnable docker container by compiling the java sources and copying theconfig.json
fileexample_config.json
: template to create aconfig.json
file fromsrc/
: Java sources of the fragmenter
server/
Dockerfile
: copies the localnginx.conf
into the default nginx containernginx.conf
: enables gzip compression, CORS, and caching headers
docker-compose.yml
: ensures that thefiles
container writes to content root of theserver
container
Input data is processed in 3 steps:
-
The data is parsed as an RDFStream from Apache Jena, each triple or quad is processed separately
-
Discovered literals are processed to obtain a set of fragment files to pipe the triples to
-
The literal is normalized to NFKD; filtered to just the Letters (L), Numbers (N) and Separator (Z) Unicode classes; and then lowercased
-
The normalized literal is tokenized by whitespace
-
Prefixes are extracted from each token
-
A writable StreamRDF is returned for each prefix, and the triple/quad is written to them
-
-
Once all triples are processed, hypermedia links are added to the fragments