This repository is deprecated and will soon be deleted entirely.
The project is continued here.
Please read CODE_OF_CONDUCT.md and CONTRIBUTING.md for details on the process for submitting pull requests to us.
To define the profile metadata the following constructs are used for the transformation:
| Profile (based on dx-prof) |
|---|
| prof:Profile |
| prof:ResourceDescriptor |
| prof:ResourceRole |
| prof:hasResource |
| prof:hasArtifact |
| dct:format |
| role:constrains |
| role:vocabulary |
To define the vocabulary of the profile, the following constructs will be used for the transformation:
| Vocabulary (owl, rdfs, rdf) |
|---|
| owl:Class |
| owl:DatatypeProperty |
| owl:ObjectProperty |
| owl:NamedIndividual |
| rdf:type |
| rdfs:comment |
| rdfs:range |
| rdfs:domain |
| rdfs:label |
| rdfs:subClassOf |
| rdfs:subPropertyOf |
To define the constraints, the following constructs will be used for the transformation:
| Constraints (shacl) |
|---|
| sh:NodeShape |
| sh:targetClass |
| sh:datatype |
| sh:property |
| sh:minCount |
| sh:maxCount |
| sh:path |
| sh:in |
| sh:node |
| sh:node |
The purpose of the Example Profile is to provide a common starting point to transf to different schema technologies. You will se that different schema technologies have support different levels of expressiveness, and some allow for richer data structures than others. Two examples to illustrate this:
- Some schema definitions provide structures to model sub-typing or inheritance (like UML, XSD, JSON Schema), others don't (like Apache Avro and Shacl). There are ways to transform a profile in such a way that the structural definitions do not get lost (even if the taxonomical 'knowledge' is no longer represented in the schema).
- There are many situations in which we need to model a relationship between 2 objects of the same type. A Schema like Apache Avro does not support this.
The challenge in schema generation is to recognize the strengths and weaknesses of each of these schema's and generate an artifact that represents the intent of the profile without introducing new definitions or constructs. The example profile is meant to 'put the finger on the sore spot' for frequently used schema definitions so that we can properly explore the best possible transformation.
Apache Avro schema is a schema definition that is primarily used for modelling events on the Apache Kafka message queue.
Apache Avro supports
- objects
- primitive attributes
- relationships between objects
- sub type (rdfs:subClassOf and/or sh:and) relations
- the schema itself doesn't understand this, but the mapping can be configured such that it displays the correct label for the relationship
- limited cardinality (0,1,*)
- tree structure
- enumerations
- type unions
- documentation within the schema
Apache Avro does not support
- inheritance/subtypes
- cardinality beyond 0,1, *
- graph structure
- Relation 'loops' (from A to B to C back to A or any variant)
- Relations between objects of the same type are a special case of this limitation
Additional requirements for automatic generation:
- requires definition of a Root Object
Apache Avro requires the definition of a Root Object. In the example profile, "B" was taken as the root object. Because Avro provides a tree structure and does not support relation 'loops', it means "E" is ignored in this schema.
| Avro schema | Profile | notes |
|---|---|---|
| record | sh:NodeShape | |
| name of the record | sh:targetClass | |
| field | sh:property | |
| type (field) | sh:node | if type is complex |
| type (field) | sh:datatype | |
| type= union ( null, * | sh:minCount | if <1 otherwise not supported |
| type = array | sh:maxCount | if maxCount > 1 only supports cardinality of * |
| name of field | sh:path | |
| enum | sh:in | |
| add fields | sh:and | Apply all sh:properties in the target (range) of the target shape |
| avro schema primitive | XSD Primitive | notes |
|---|---|---|
| string | xsd:string | |
| boolean | xsd:boolean | |
| bytes | xsd:decimal | "logicalType": "decimal" |
| float | xsd:float | |
| double | xsd:double | |
| fixed | xsd:duration | "logicalType": "duration" |
| string (conforming to iso 8601) | xsd:dateTime | it is possible to map these to avro int/long with logical type "timestamp-millis" or "timestamp-micros". we have found this leads to much confusion among developers, so we recommend mapping to iso format (string) |
| xsd:time | ||
| xsd:date | ||
| xsd:anyURI |
The Avro schema that is generated according to these rules from the example profile can be found here.
SQL is actually a query language, but defining the structures of a relational data base is embedded within the query language. Relational databases are probably still the most common approach to storing and retrieving data.
SQL supports
- objects
- inheritance
- primitive attributes
- relationships between objects
- sub-relations
- limited cardinality (0,1,*)
- graph structure
- enumerations
SQL does not support
- cardinality beyond 0,1,*
Additional requirements for automatic generation:
- requires explicit identification of primary keys (which is good practice anyway ;-) )
| SQL | Profile | notes |
|---|---|---|
| table | sh:NodeShape | |
| name of the table | sh:targetClass | |
| column | sh:property | |
| sh:node | create foreign key constraint | |
| datatype | sh:datatype | |
| sh:minCount | ||
| sh:maxCount | if maxCount > 1 do not create column instead create link table with columns 'domain' and 'range |
|
| name of the column | sh:path | |
| sh:in | create new table with 1 column and use elements as primary keys |
|
| rdfs:subClassOf | for super-class, create table as union of subclasses | |
| rdfs:subPropertyOf |
| SQL | XSD Primitive | notes |
|---|---|---|
| VARCHAR() | xsd:string | |
| BOOLEAN | xsd:boolean | |
| DECIMAL() | xsd:decimal | |
| xsd:float | ||
| DOUBLE() | xsd:double | |
| xsd:duration | ||
| DATETIME() | xsd:dateTime | |
| xsd:time | ||
| xsd:date | ||
| xsd:anyURI |
Apache Avro schema is a schema definition that is primarily used for modelling events on the Apache Kafka message queue.
OpenAPI/JSON Schema supports
- objects
- inheritance/subtypes
- primitive attributes
- relationships between objects
- cardinality
- tree structure
- enumerations
- type unions
OpenAPI/JSON Schema does not support
- sub-relations
- because inheritance is approach structurally, though the allOf array, the super-relation has to be used
- graph structure
- out of the box type-validation for data instances
- this can be circumvented/hacked by using the @type keyword borrowed from JSON-LD.
- documentation within the schema
Additional requirements for automatic generation:
- Supports definition of a Root Object (not required)
| OpenAPI json schema | Profile | notes |
|---|---|---|
| Object | sh:NodeShape | |
| sh:targetClass | ||
| properties | sh:property | |
| if type is complex | sh:node | |
| if type is primitive | sh:datatype | |
| minItems | sh:minCount | |
| maxItems | sh:maxCount | |
| name of the property | sh:path | |
| enum | sh:in | |
| allOf | rdfs:subClassOf | |
| not supported | rdfs:subPropertyOf | the way allOf validates does not allow us to replace the super property by the sub-property |
| anyOf | if >1 sh:node specified use anyOf to list all target types |
| Some schema primitive | XSD Primitive | notes |
|---|---|---|
| string | xsd:string | |
| boolean | xsd:boolean | |
| number | xsd:decimal | |
| number | xsd:float | |
| number | xsd:double | |
| string | xsd:duration | |
| string | xsd:dateTime | |
| string | xsd:time | |
| string | xsd:date | |
| string | xsd:anyURI |
The JSON-schema that is generated according to these rules from the example profile can be found here.
use these table templates to define a mapping for other schema's. feel free to augment where this makes sense
| Some schema | Profile | notes |
|---|---|---|
| sh:NodeShape | ||
| sh:targetClass | ||
| sh:property | ||
| sh:node | ||
| sh:datatype | ||
| sh:minCount | ||
| sh:maxCount | ||
| sh:path | ||
| sh:in | ||
| rdfs:subClassOf | ||
| rdfs:subPropertyOf |
| Some schema primitive | XSD Primitive | notes |
|---|---|---|
| xsd:string | ||
| xsd:boolean | ||
| xsd:decimal | ||
| xsd:float | ||
| xsd:double | ||
| xsd:duration | ||
| xsd:dateTime | ||
| xsd:time | ||
| xsd:date | ||
| xsd:anyURI |

