Skip to content
This repository was archived by the owner on Jul 12, 2022. It is now read-only.

bartkl/SchemaTransformer

Schema Transformer

⛔️ DEPRECATED

This repository is deprecated and will soon be deleted entirely.

The project is continued here.

Contributing

Please read CODE_OF_CONDUCT.md and CONTRIBUTING.md for details on the process for submitting pull requests to us.

Architecture of the generators

Architectural overview of the generators

Vocabulary and Constraints used to define a profile

To define the profile metadata the following constructs are used for the transformation:

Profile (based on dx-prof)
prof:Profile
prof:ResourceDescriptor
prof:ResourceRole
prof:hasResource
prof:hasArtifact
dct:format
role:constrains
role:vocabulary

Profile Diagram

To define the vocabulary of the profile, the following constructs will be used for the transformation:

Vocabulary (owl, rdfs, rdf)
owl:Class
owl:DatatypeProperty
owl:ObjectProperty
owl:NamedIndividual
rdf:type
rdfs:comment
rdfs:range
rdfs:domain
rdfs:label
rdfs:subClassOf
rdfs:subPropertyOf

Vocabulary Diagram

To define the constraints, the following constructs will be used for the transformation:

Constraints (shacl)
sh:NodeShape
sh:targetClass
sh:datatype
sh:property
sh:minCount
sh:maxCount
sh:path
sh:in
sh:node
sh:node

Constraint Diagram

Mapping specifications from rdfs/owl+shacl to schema's

example profile

ExampleProfile

The purpose of the Example Profile is to provide a common starting point to transf to different schema technologies. You will se that different schema technologies have support different levels of expressiveness, and some allow for richer data structures than others. Two examples to illustrate this:

  • Some schema definitions provide structures to model sub-typing or inheritance (like UML, XSD, JSON Schema), others don't (like Apache Avro and Shacl). There are ways to transform a profile in such a way that the structural definitions do not get lost (even if the taxonomical 'knowledge' is no longer represented in the schema).
  • There are many situations in which we need to model a relationship between 2 objects of the same type. A Schema like Apache Avro does not support this.

The challenge in schema generation is to recognize the strengths and weaknesses of each of these schema's and generate an artifact that represents the intent of the profile without introducing new definitions or constructs. The example profile is meant to 'put the finger on the sore spot' for frequently used schema definitions so that we can properly explore the best possible transformation.

Apache Avro

Introduction

Apache Avro schema is a schema definition that is primarily used for modelling events on the Apache Kafka message queue.

Technical considerations

Apache Avro supports

  • objects
  • primitive attributes
  • relationships between objects
  • sub type (rdfs:subClassOf and/or sh:and) relations
    • the schema itself doesn't understand this, but the mapping can be configured such that it displays the correct label for the relationship
  • limited cardinality (0,1,*)
  • tree structure
  • enumerations
  • type unions
  • documentation within the schema

Apache Avro does not support

  • inheritance/subtypes
  • cardinality beyond 0,1, *
  • graph structure
  • Relation 'loops' (from A to B to C back to A or any variant)
    • Relations between objects of the same type are a special case of this limitation

Additional requirements for automatic generation:

  • requires definition of a Root Object

Notes

Apache Avro requires the definition of a Root Object. In the example profile, "B" was taken as the root object. Because Avro provides a tree structure and does not support relation 'loops', it means "E" is ignored in this schema.

Mapping table

Avro schema Profile notes
record sh:NodeShape
name of the record sh:targetClass
field sh:property
type (field) sh:node if type is complex
type (field) sh:datatype
type= union ( null, * sh:minCount if <1 otherwise not supported
type = array sh:maxCount if maxCount > 1 only supports cardinality of *
name of field sh:path
enum sh:in
add fields sh:and Apply all sh:properties
in the target (range) of the target shape

Primitive mapping

avro schema primitive XSD Primitive notes
string xsd:string
boolean xsd:boolean
bytes xsd:decimal "logicalType": "decimal"
float xsd:float
double xsd:double
fixed xsd:duration "logicalType": "duration"
string (conforming to iso 8601) xsd:dateTime it is possible to map these
to avro int/long with logical type
"timestamp-millis" or "timestamp-micros". we have found
this leads to much confusion among developers,
so we recommend mapping to iso format (string)
xsd:time
xsd:date
xsd:anyURI

The Avro schema that is generated according to these rules from the example profile can be found here.

SQL

Introduction

SQL is actually a query language, but defining the structures of a relational data base is embedded within the query language. Relational databases are probably still the most common approach to storing and retrieving data.

Technical considerations

SQL supports

  • objects
  • inheritance
  • primitive attributes
  • relationships between objects
  • sub-relations
  • limited cardinality (0,1,*)
  • graph structure
  • enumerations

SQL does not support

  • cardinality beyond 0,1,*

Additional requirements for automatic generation:

  • requires explicit identification of primary keys (which is good practice anyway ;-) )

Mapping table

SQL Profile notes
table sh:NodeShape
name of the table sh:targetClass
column sh:property
sh:node create foreign key constraint
datatype sh:datatype
sh:minCount
sh:maxCount if maxCount > 1 do not create column instead
create link table with columns 'domain' and 'range
name of the column sh:path
sh:in create new table with 1 column
and use elements as primary keys
rdfs:subClassOf for super-class, create table as union of subclasses
rdfs:subPropertyOf

PrimitiveMapping

SQL XSD Primitive notes
VARCHAR() xsd:string
BOOLEAN xsd:boolean
DECIMAL() xsd:decimal
xsd:float
DOUBLE() xsd:double
xsd:duration
DATETIME() xsd:dateTime
xsd:time
xsd:date
xsd:anyURI

OpenAPI/json schema

Introduction

Apache Avro schema is a schema definition that is primarily used for modelling events on the Apache Kafka message queue.

Technical considerations

OpenAPI/JSON Schema supports

  • objects
  • inheritance/subtypes
  • primitive attributes
  • relationships between objects
  • cardinality
  • tree structure
  • enumerations
  • type unions

OpenAPI/JSON Schema does not support

  • sub-relations
    • because inheritance is approach structurally, though the allOf array, the super-relation has to be used
  • graph structure
  • out of the box type-validation for data instances
    • this can be circumvented/hacked by using the @type keyword borrowed from JSON-LD.
  • documentation within the schema

Additional requirements for automatic generation:

  • Supports definition of a Root Object (not required)

Mapping table

OpenAPI json schema Profile notes
Object sh:NodeShape
sh:targetClass
properties sh:property
if type is complex sh:node
if type is primitive sh:datatype
minItems sh:minCount
maxItems sh:maxCount
name of the property sh:path
enum sh:in
allOf rdfs:subClassOf
not supported rdfs:subPropertyOf the way allOf validates does not allow
us to replace the super property by the sub-property
anyOf if >1 sh:node specified use
anyOf to list all target types

Primitive mapping

Some schema primitive XSD Primitive notes
string xsd:string
boolean xsd:boolean
number xsd:decimal
number xsd:float
number xsd:double
string xsd:duration
string xsd:dateTime
string xsd:time
string xsd:date
string xsd:anyURI

The JSON-schema that is generated according to these rules from the example profile can be found here.

Generic Mapping Table

use these table templates to define a mapping for other schema's. feel free to augment where this makes sense

Mapping table

Some schema Profile notes
sh:NodeShape
sh:targetClass
sh:property
sh:node
sh:datatype
sh:minCount
sh:maxCount
sh:path
sh:in
rdfs:subClassOf
rdfs:subPropertyOf

PrimitiveMapping

Some schema primitive XSD Primitive notes
xsd:string
xsd:boolean
xsd:decimal
xsd:float
xsd:double
xsd:duration
xsd:dateTime
xsd:time
xsd:date
xsd:anyURI

About

(⛔️ DEPRECATED) Tool to transform dx-prof/CIM501 profiles to a variety of schema

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Contributors

Languages