Skip to content

Data specification

Yasunori edited this page Apr 15, 2022 · 1 revision

Data specification generated by umakaparser

umakaparser generates JSON data, which consists of the following fields.

  • meta_data
  • prefixes
  • classes
  • properties
  • inheritance_structure

Below are details of them.

meta_data

meta_data has the following fields.

  • endpoint: URL of the target SPARQL endpoint
  • crawl_date: timestamp of finishing the crawl
  • triples: total triple count
  • classes: total class count
  • properties: total property count

umakaparser collects these data from a given file in the SPARQL Builder Metadata (SBM) format. Below is an example from which they are extracted.

@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix sbm: <http://sparqlbuilder.org/2015/09/rdf-metadata-schema#> .
@prefix void: <http://rdfs.org/ns/void#> .

<service_node> a sd:Service ;
	sd:endpoint <http://data.allie.dbcls.jp/sparql> ; #<<< endpoint
	sd:defaultDataset <dataset_node> .

<dataset_node> a sd:Dataset ;
	void:properties "33"^^xsd:long ;     #<<< properties
	void:classes "19"^^xsd:long ;        #<<< classes
	void:triples "150184110"^^xsd:long ; #<<< triples
  
<dataset_node> sbm:crawlLog <crawllog_node> .

<crawllog_node> a sbm:CrawlLog ;
	sbm:crawlEndTime "2016-11-15T22:01:49.071+09:00"^^xsd:dateTime ; #<<< crawl_date

prefixes

Prefixes are extracted from a given SBM file and every ontology file given with the build-index operation of umakaparser. They are used for URIs to be shown as QNames, and defined as follows in a given JSON file.

{
  "prefixes": {
    "xml": "http://www.w3.org/XML/1998/namespace",
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#"
  }
}

classes

classes are described as follows in a given JSON file.

{
  "classes": {
    ":LongForm": {
      "entities": 2638336,
      "label": {
        "en": "LongForm"
      },
      "rhs": [
        [
          ":frequency",
          null
        ],
        [
          "rdf:type",
          "owl:Class"
        ],
        [
          "rdfs:label",
          null
        ]
      ],
      "lhs": [
        [
          ":EachPair",
          ":hasLongFormOf"
        ]
      ]
    }
  }
}

Below are value objects whose key is a class URI

  • entities: instance count in the class.
  • label: an object whose key and value are a language tag and the value extracted from a given ontology that describes the class.
  • subClassOf: an array of parent classes extracted from rdf:subClassOf properties in the ontology (there can be multiple parent classes).
  • rhs: an array of arrays consisting of a property and a class to which objects of triples belong when their subjects belong to the key class (details in the properties section).
  • lhs: an array of arrays consisting of a class and a property when there are triples whose subject belongs to this class and whose object belongs to the key class (details in the properties section).

An example of assumed SBM data to be collected.

@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix sbm: <http://sparqlbuilder.org/2015/09/rdf-metadata-schema#> .
@prefix void: <http://rdfs.org/ns/void#> .

<service_node> a sd:Service ;
	sd:endpoint <http://data.allie.dbcls.jp/sparql> ; #<<< endpoint
	sd:defaultDataset <dataset_node> .

<dataset_node> void:classPartition <classpartition_node> 
<classpartition_node> a void:Dataset ;
	void:class <http://purl.org/allie/ontology/201108#LongForm> ; # <<< class URI
	void:entities "2638336"^^xsd:long . # <<< entities

properties

properties are described as follows in a given JSON file.

{
  "properties": [
    {
      "uri": ":frequency",
      "triples": 8468287,
      "class_relations": [
        {
          "triples": 3109687,
          "object_class": "xsd:string",
          "object_datatype": null,
          "subject_class": ":EachPair"
        },
        {
          "triples": 2638336,
          "object_class": "xsd:string",
          "object_datatype": null,
          "subject_class": ":LongForm"
        },
        {
          "triples": 743574,
          "object_class": "xsd:string",
          "object_datatype": null,
          "subject_class": ":ShortForm"
        }
      ]
    }
  ]
}

properties has an array consisting of the following objects:

  • uri: an URI of a property
  • triples: the number of triples that have the property
  • class_relations: an array that describes these triples grouped by classes and datatypes of their subjects and objects as follows:
    • subject_class: an URI of a class to which their subjects belong
    • object_class: an URI of a class to which their objects belong
    • object_datatype: an URI of a datatype when their objects are literal
    • triples: the number of triples that meet the structure specified in this array

These data are extracted from SBM data that have the following structure.

@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix sbm: <http://sparqlbuilder.org/2015/09/rdf-metadata-schema#> .
@prefix void: <http://rdfs.org/ns/void#> .

<service_node> a sd:Service ;
	sd:endpoint <http://data.allie.dbcls.jp/sparql>
	sd:defaultDataset <dataset_node> .

<dataset_node> a sd:Dataset ;
	void:propertyPartition <propertypartition_node> .

<propertypartition_node> a void:Dataset ;
	void:property <http://purl.org/allie/ontology/201108#frequency> ; # <<< properties[].uri
	void:triples "8468287"^^xsd:long ;                                # <<< properties[].triples

<propertypartition_node> sbm:classRelation <classrelation_node1> .

<classrelation_node1> a sbm:ClassRelation ;
	sbm:subjectClass <http://purl.org/allie/ontology/201108#EachPair> ; # <<< properties[].class_relations[].uri
	sbm:objectDatatype xsd:string . # <<< properties[].class_relations[].object_datatype
	void:triples "3109687"^^xsd:long ; # <<< properties[].class_relations[].triples

<propertypartition_node> sbm:classRelation <classrelation_node2> .

<classrelation_node2> a sbm:ClassRelation ;
	sbm:subjectClass <http://purl.org/allie/ontology/201108#LongForm> ; # <<< properties[].class_relations[].uri
	sbm:objectClass <http://purl.org/allie/ontology/201108#ResearchArea> . # <<< properties[].class_relations[].object_class
	void:triples "2638336"^^xsd:long ; # <<< properties[].class_relations[].triples

inheritance_structure

inheritance_structure has a tree structure extracted from ontologies that have the rdfs:subClassOf property. These ontologies are given as files with the build-index operation of umakaparser. Below is an example.

{
  "inheritance_structure": {
      "uri": ":Pair",
      "children": [
        {
          "uri": ":PairCluster"
        },
        {
          "uri": ":EachPair"
        }
      ]
    }
}