Skip to content
This repository has been archived by the owner on Mar 12, 2024. It is now read-only.

GSS-Cogs/poc-pipelines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

poc-pipelines

To run the project we set up a virtual environment and install the dependencies.

python3 -m venv env
source env/bin/activate
pip3 install -r requirements.txt

The main.py script is the entry point for the project.

It expects a .env file to be present in the root of the project. This file should contain the following environment variables:

PMD_AUTH0_ENDPOINT=""
PMD_API_ENDPONT=""
PMD_CLIENT_ID=""
PMD_CLIENT_SECRET=""

The docker-compose.yml is taken from the airflow documentation.

To run airflow for the first time we must initialise an admin account

docker compose up airflow-init

After this, we can start airflow by starting all the containers.

docker compose up

Docs

Steps

An example CSV file and CSVW metadata file can be found in the example-files directory.

Given a CSV file and CSVW metadata file which describe a statistical dataset:

  • Get the dcat:Dataset's IRI.
  • Construct a dcat:CatalogRecord IRI as {DATASET_IRI}/record.
  • Add all triples from the CSVW metadata file to a named graph using the dcat:CatalogRecord IRI.
  • Explicitly insert triples adding classes for the csvw:Table, csvw:TableSchema and csvw:Columns.
  • Construct PMDCAT metadata and add it to a named graph using the dcat:CatalogRecord IRI.
  • Run csv2rdf to convert the CSV file to RDF, producing RDF observations.
  • Get the qb:DataSet's IRI.
  • Add all observation triples to a named graph using the qb:DataSet IRI.

To do

  • Craft the functions in main.py into an airflow taskflow pipeline, following best practices.
  • Similar approach for concept schemes to datasets:
    • We have to construct skos:narrower relationships between concepts.
    • We have to construct skos:hasTopConcept relationships between the concept scheme and its top concepts.
  • Running PMD validation tests and RDF data cube integrity constraints.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages