To run the project we set up a virtual environment and install the dependencies.
python3 -m venv env
source env/bin/activate
pip3 install -r requirements.txtThe main.py script is the entry point for the project.
It expects a .env file to be present in the root of the project. This file should contain the following environment variables:
PMD_AUTH0_ENDPOINT=""
PMD_API_ENDPONT=""
PMD_CLIENT_ID=""
PMD_CLIENT_SECRET=""The docker-compose.yml is taken from the airflow documentation.
To run airflow for the first time we must initialise an admin account
docker compose up airflow-initAfter this, we can start airflow by starting all the containers.
docker compose upAn example CSV file and CSVW metadata file can be found in the example-files directory.
Given a CSV file and CSVW metadata file which describe a statistical dataset:
- Get the
dcat:Dataset's IRI. - Construct a
dcat:CatalogRecordIRI as{DATASET_IRI}/record. - Add all triples from the CSVW metadata file to a named graph using the
dcat:CatalogRecordIRI. - Explicitly insert triples adding classes for the
csvw:Table,csvw:TableSchemaandcsvw:Columns. - Construct
PMDCATmetadata and add it to a named graph using thedcat:CatalogRecordIRI. - Run
csv2rdfto convert the CSV file to RDF, producing RDF observations. - Get the
qb:DataSet's IRI. - Add all observation triples to a named graph using the
qb:DataSetIRI.
- Craft the functions in
main.pyinto an airflow taskflow pipeline, following best practices. - Similar approach for concept schemes to datasets:
- We have to construct
skos:narrowerrelationships between concepts. - We have to construct
skos:hasTopConceptrelationships between the concept scheme and its top concepts.
- We have to construct
- Running PMD validation tests and RDF data cube integrity constraints.