Data pipeline to convert tsv files from the Statistisches Amt Basel-Stadt to N-Triples files. The SPARQL interface to access the data can be found at https://ld.data-bs.ch/sparql/.
All scripts are configured as
npm scripts and can be run like:
npm run $script
*-config.json files contain details about the pipeline steps, file patterns, URL patterns etc.
indikatoren-fetch- script clones the Indikatoren repository into the local folder
The actual repository URL is read from the config from the property
indikatoren-generate-csv-metadata script generates CSVW Metadata files based on Indikatoren metadata.
Based on the publishLod flag the CSVW Metadata file are generated.
The generated CSVW Metadata files are stored at
indikatoren-convert script runs all steps defined in the config with the property
Tasks with the
abstract property and boolean true value are ignored.
Based on the publishLod flag tasks are added during runtime.
For details see the code in
These scripts combine all files from all task defined in the property
output to a single file (
Dynamic tasks based on the publishLod flag are also included.
The output file is uploaded using the Graph Store protocol to the endpoint defined in the property
The named graph is read from the property
The environment variables
SPARQL_PASSWORD are used for authentication.
All tasks are defined in the config at
steps property is reserved for an array of operation descriptions, which defines the logic of the pipeline task.
Other properties can be defined to use them as variables in the operations arguments.
If the value of the
steps property is a string, the value is used to import the steps from the task with the key equals the
The operation for each step of each task is defined in the config at
Arguments for the operations can be given in
The argument values are evaluated as ES6 template strings.
The task (
tasks/*) is used as
this context for the evaluation.
Template strings can be used to defined reusable tasks.
For example properties like
output, defined for the task, can be used as argument to read or write a file like this:
Each operation returns a stream.
All streams are combined with
Not all operations are covered in this documentation as most of them have already self describing operation names.
Generates a triple with a WKT literal for each feature in the given GeoJSON.
Two arguments are required:
- The IRI of the subject
- The IRI of the predicate
Both arguments are evaluated as ES6 template strings.
Generates metadata triples for the dataset based on the Indikatoren metadata.
Two arguments are required:
- The filename of the JSON file
- The IRI of the dataset
The CSVW metadata files are generated based on the Indikatoren Metadata JSON files and some static values.
From the JSON file, the value of
delimiter property is copied to the CSVW metadata
If the property is not defined,
\t is used.
tableSchema property from the JSON file is merged together with static values into the
tableSchema property of the CSVW metadata.
The static values are:
- A virtual column for the type
- A virtual column for the
dataSetlink from the observation to the dataset
It should be kept in mind that the URL properties in the CSVW metadata files are processed as URI Templates. The syntax is different to ES6 template strings.
The following properties are required in the
aboutUrl is used as subject for the observation triples.
Therefore it should have a pattern like this:
$CSV_KEY_COLUMN_* is the name of the column in the CSV file.
All columns of the primary key must be included.
Static values for the key are possible.
See existing datasets for more details.
The value of the
columns property is an array of triples descriptions.
Each object is mapped into a triple with the
aboutUrl as subject.
propertyUrl property is used for the predicate.
The value of the
titles property is used to identify the column, which is used as value for the triple.
If the value for the column defined in
titles is empty for a row, the triple description will be ignored.
virtual property can be set to
true to force the triple description.
This can be used for static values.
Static values require a
valueUrl, which is used to generate a Named Node object.
For Literal objects, the property
datatype can be defined to use a specific datatype.
The configuration is read from the file
The Indikatoren config is dynamically extended based on the publishLod flag.
The Indikatoren repository contains all the input data and metadata as TSV or JSON files.
The Indikatoren repository contains metadata for each dataset at (
publishLod in the Indikatoren Metadata files controls whether the dataset will be processed in the pipeline or ignored.