-
Notifications
You must be signed in to change notification settings - Fork 1
Data specification
umakaparser
generates JSON data, which consists of the following fields.
- meta_data
- prefixes
- classes
- properties
- inheritance_structure
Below are details of them.
meta_data
has the following fields.
- endpoint: URL of the target SPARQL endpoint
- crawl_date: timestamp of finishing the crawl
- triples: total triple count
- classes: total class count
- properties: total property count
umakaparser
collects these data from a given file in the SPARQL Builder Metadata (SBM) format.
Below is an example from which they are extracted.
@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix sbm: <http://sparqlbuilder.org/2015/09/rdf-metadata-schema#> .
@prefix void: <http://rdfs.org/ns/void#> .
<service_node> a sd:Service ;
sd:endpoint <http://data.allie.dbcls.jp/sparql> ; #<<< endpoint
sd:defaultDataset <dataset_node> .
<dataset_node> a sd:Dataset ;
void:properties "33"^^xsd:long ; #<<< properties
void:classes "19"^^xsd:long ; #<<< classes
void:triples "150184110"^^xsd:long ; #<<< triples
<dataset_node> sbm:crawlLog <crawllog_node> .
<crawllog_node> a sbm:CrawlLog ;
sbm:crawlEndTime "2016-11-15T22:01:49.071+09:00"^^xsd:dateTime ; #<<< crawl_date
Prefixes are extracted from a given SBM file and every ontology file given with the build-index
operation of umakaparser
.
They are used for URIs to be shown as QNames, and defined as follows in a given JSON file.
{
"prefixes": {
"xml": "http://www.w3.org/XML/1998/namespace",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#"
}
}
classes
are described as follows in a given JSON file.
{
"classes": {
":LongForm": {
"entities": 2638336,
"label": {
"en": "LongForm"
},
"rhs": [
[
":frequency",
null
],
[
"rdf:type",
"owl:Class"
],
[
"rdfs:label",
null
]
],
"lhs": [
[
":EachPair",
":hasLongFormOf"
]
]
}
}
}
Below are value objects whose key is a class URI
- entities: instance count in the class.
- label: an object whose key and value are a language tag and the value extracted from a given ontology that describes the class.
- subClassOf: an array of parent classes extracted from
rdf:subClassOf
properties in the ontology (there can be multiple parent classes). - rhs: an array of arrays consisting of a property and a class to which objects of triples belong when their subjects belong to the key class (details in the
properties
section). - lhs: an array of arrays consisting of a class and a property when there are triples whose subject belongs to this class and whose object belongs to the key class (details in the
properties
section).
An example of assumed SBM data to be collected.
@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix sbm: <http://sparqlbuilder.org/2015/09/rdf-metadata-schema#> .
@prefix void: <http://rdfs.org/ns/void#> .
<service_node> a sd:Service ;
sd:endpoint <http://data.allie.dbcls.jp/sparql> ; #<<< endpoint
sd:defaultDataset <dataset_node> .
<dataset_node> void:classPartition <classpartition_node>
<classpartition_node> a void:Dataset ;
void:class <http://purl.org/allie/ontology/201108#LongForm> ; # <<< class URI
void:entities "2638336"^^xsd:long . # <<< entities
properties
are described as follows in a given JSON file.
{
"properties": [
{
"uri": ":frequency",
"triples": 8468287,
"class_relations": [
{
"triples": 3109687,
"object_class": "xsd:string",
"object_datatype": null,
"subject_class": ":EachPair"
},
{
"triples": 2638336,
"object_class": "xsd:string",
"object_datatype": null,
"subject_class": ":LongForm"
},
{
"triples": 743574,
"object_class": "xsd:string",
"object_datatype": null,
"subject_class": ":ShortForm"
}
]
}
]
}
properties
has an array consisting of the following objects:
-
uri
: an URI of a property -
triples
: the number of triples that have the property -
class_relations
: an array that describes these triples grouped by classes and datatypes of their subjects and objects as follows:-
subject_class
: an URI of a class to which their subjects belong -
object_class
: an URI of a class to which their objects belong -
object_datatype
: an URI of a datatype when their objects are literal -
triples
: the number of triples that meet the structure specified in this array
-
These data are extracted from SBM data that have the following structure.
@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix sbm: <http://sparqlbuilder.org/2015/09/rdf-metadata-schema#> .
@prefix void: <http://rdfs.org/ns/void#> .
<service_node> a sd:Service ;
sd:endpoint <http://data.allie.dbcls.jp/sparql>
sd:defaultDataset <dataset_node> .
<dataset_node> a sd:Dataset ;
void:propertyPartition <propertypartition_node> .
<propertypartition_node> a void:Dataset ;
void:property <http://purl.org/allie/ontology/201108#frequency> ; # <<< properties[].uri
void:triples "8468287"^^xsd:long ; # <<< properties[].triples
<propertypartition_node> sbm:classRelation <classrelation_node1> .
<classrelation_node1> a sbm:ClassRelation ;
sbm:subjectClass <http://purl.org/allie/ontology/201108#EachPair> ; # <<< properties[].class_relations[].uri
sbm:objectDatatype xsd:string . # <<< properties[].class_relations[].object_datatype
void:triples "3109687"^^xsd:long ; # <<< properties[].class_relations[].triples
<propertypartition_node> sbm:classRelation <classrelation_node2> .
<classrelation_node2> a sbm:ClassRelation ;
sbm:subjectClass <http://purl.org/allie/ontology/201108#LongForm> ; # <<< properties[].class_relations[].uri
sbm:objectClass <http://purl.org/allie/ontology/201108#ResearchArea> . # <<< properties[].class_relations[].object_class
void:triples "2638336"^^xsd:long ; # <<< properties[].class_relations[].triples
inheritance_structure
has a tree structure extracted from ontologies that have the rdfs:subClassOf
property.
These ontologies are given as files with the build-index
operation of umakaparser
.
Below is an example.
{
"inheritance_structure": {
"uri": ":Pair",
"children": [
{
"uri": ":PairCluster"
},
{
"uri": ":EachPair"
}
]
}
}