Skip to content

DataFAQs Core Services

Tim L edited this page Jul 7, 2014 · 125 revisions

What is first

What we will cover

This page provides details for each of the DataFAQs Core Services, which are used to determine [which evaluation services to apply](Selecting the evaluation services to apply) and [which datasets to analyze](Selecting the datasets to analyze) during a given evaluation epoch. Five different Dataset Selectors can provide "lists" of datasets that should be evaluated during a given epoch, four FAqT Service Selectors can provide "lists" of evaluation services that can be applied to a dataset during a given epoch, X Dataset Referencers can provide pointers to other URLs that describe the dataset, and Y Dataset Augmenters provide supplemental dataset descriptions that can be used during evaluation.

For each of the Core Services (FAqT Service Selector, Dataset Selector, Dataset Referencer, an Dataset Augmenter) we'll show the showing:

  • what it does,
  • how to use it (with an example),
  • where its source code lives,
  • where it is deployed (so you can call it right now, if you like)

So,

Let's get to it!

DataFAQs Core Services are [SADI](SADI Semantic Web Services framework) services, so to understand what they do we need to understand the RDF instance data that they accept and return. This page outlines each of the twelve DataFAQs Core Services, links to any deployed instances of the service, cites the input and output classes, and provides sample input and output instance data.

All twelve DataFAQs Core Services described on this page are listed in the SADI registry and available for SPARQL query from http://biordf.net/sparql's named graph http://sadiframework.org/registry/. services/sadi/core holds the implementations for the DataFAQs Core Services. http://aquarius.tw.rpi.edu/projects/datafaqs/configure-epoch is an initial interface to configure an epoch slice by selecting services to list FAqT evaluation services and datasets to analyze.

The following namespaces are used throughout this page:

@prefix datafaqs: <http://purl.org/twc/vocab/datafaqs#> .
@prefix dcat:     <http://www.w3.org/ns/dcat#> .
@prefix void:     <http://rdfs.org/ns/void#> .
@prefix dcterms:  <http://purl.org/dc/terms/> .
@prefix rdfs:     <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf:      <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

select-faqts (4)

A select-faqts service returns FAqT Services that will evaluate the dcat:Datasets during the given evaluation epoch.

select-faqts/identity (1/4)

services/sadi/core/select-faqts/identity.py is deployed at http://aquarius.tw.rpi.edu/projects/datafaqs/services/sadi/core/select-faqts/identity and allows you to list the FAqT services directly. Although this is the most straightforward way to list the evaluation services that should be used, it can become cumbersome to explicitly maintain a large and dynamic list. Subsequent FAqT Service Selectors can be used to list according to queries.

  • input class: datafaqs:FAqTService
  • output class: datafaqs:FAqTServiceCollection

Sample input 1 (max-1-topic-tag.ttl):

...
<http://aquarius.tw.rpi.edu/projects/datafaqs/services/sadi/faqt/lodcloud/max-1-topic-tag> a datafaqs:FAqTService . 

Sample output 1 (max-1-topic-tag.ttl.out):

<http://aquarius.tw.rpi.edu/projects/datafaqs/services/sadi/faqt/lodcloud/max-1-topic-tag> 
   a datafaqs:FAqTServiceCollection;
   dcterms:hasPart <http://aquarius.tw.rpi.edu/projects/datafaqs/services/sadi/faqt/lodcloud/max-1-topic-tag> .

select-faqts/via-sparql-query (2/4)

services/sadi/core/select-faqts/via-sparql-query is deployed at http://aquarius.tw.rpi.edu/projects/datafaqs/services/sadi/core/select-faqts/via-sparql-query and accepts an RDF description of a SPARQL query to apply, and where to apply it.

  • input class: datafaqs:QueryToApply
  • output class: datafaqs:FAqTServiceCollection

Sample input 1 (from-official-sadi-registry.ttl):

...
   a datafaqs:SPARQLQuery;
   rdfs:comment "One could resolve the URI for this query, or use the given rdf:value";
   rdf:value """
...
select distinct ?service
where {
   graph <http://sadiframework.org/registry/> {
      ?service
         moby:hasOperation [
            a moby:operation;
            moby:inputParameter [
               moby:objectType void:Dataset;
            ];
            moby:outputParameter [
               moby:objectType datafaqs:EvaluatedDataset;
...

select-faqts/towards/ckan-group (3/4)

Inspired by the datahub's lod group that contains datasets that intend to become part of the lodcloud group after official approval. Selecting datasets that intend to be part of a target group can be analyzed with evaluation services that reflect the requirements to be part of the group. Failing those evaluations gives a list of concrete problems to resolve. DataFAQs can be very beneficial here, since the requirements to become part of a curated group are often poorly documented and not mechanically checked and monitored.

select-faqts/visko-planner (4/4)

Inspired by Nick Del Rio's visualization planner, which returns a plan of services that can be composed to obtain a visualization suiting the requested characteristics. This will lead to DataFAQs composing services instead of performing the evaluations in parallel.

select-datasets (5)

A select-datasets service returns dcat:Datasets that will be evaluated by FAqT Services during a given evaluation epoch.

select-datasets/identity (1/5)

services/sadi/core/select-datasets/identity is deployed at http://aquarius.tw.rpi.edu/projects/datafaqs/services/sadi/core/select-datasets/identity and returns the list of datasets that it is given. This is similar to select-faqts/identity above, but is for datasets instead of evaluation services. The example below accepts the datasets and returns the same instances.

  • input class: dcat:Dataset
  • output class: dcat:Dataset

Sample input 1 (drug-molecules.ttl):

...
<http://atlas.bio2rdf.org/sparql>
    void:sparqlEndpoint <http://atlas.bio2rdf.org/sparql> ;
    a void:Dataset, dcat:Dataset .

<http://bind.bio2rdf.org/sparql>
    void:sparqlEndpoint <http://bind.bio2rdf.org/sparql> ;
    a void:Dataset, dcat:Dataset .
...

Sample output (drug-molecules.ttl.out):

@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix void: <http://rdfs.org/ns/void#> .

<http://atlas.bio2rdf.org/sparql> a void:Dataset,
        dcat:Dataset .

<http://bind.bio2rdf.org/sparql> a void:Dataset,
        dcat:Dataset .
...

select-datasets/via-sparql-query (2/5)

services/sadi/core/select-datasets/via-sparql-query.py is deployed at http://aquarius.tw.rpi.edu/projects/datafaqs/services/sadi/core/select-datasets/via-sparql-query and does TODO. The example below SHOWs.

  • input class: datafaqs:QueryToApply
  • output class: datafaqs:DatasetCollection

Sample input 1 (logd-converted-datasets-with-samples.ttl):

todo

Sample output 1:

todo

Note: make sure that the SD modeling conforms to the example approved by Greg.

select-datasets/by-ckan-group (3/5)

  • input class: datafaqs:CKANGroup
  • output class: datafaqs:DatasetCollection

Sample input 1 (thedatahub-datafaqs.ttl):

<http://ckan.net/group/datafaqs> a <http://purl.org/twc/vocab/datafaqs#CKANGroup> .

services/sadi/core/select-datasets/by-ckan-group is deployed at http://aquarius.tw.rpi.edu/projects/datafaqs/services/sadi/core/select-datasets/by-ckan-group and returns the list of datasets that are in a CKAN group at thedatahub.org. The first example below receives the datasets in the "datafaqs" group on ckan.org, while the second receives the datasets in the "lodcloud" group (the one that produces the Linked Data diagram).

edu.rpi.tw.data.quality.sadi.faqt.core.select.datasets.ByCKANGroup is deployed at TODO and does the same as above.

curl -H "Content-type: text/turtle" -d ' <http://ckan.net/group/datafaqs> a <http://purl.org/twc/vocab/datafaqs#CKANGroup> .' http://aquarius.tw.rpi.edu/projects/datafaqs/services/sadi/core/select-datasets/by-ckan-group gives:

Sample output 1:

<http://ckan.net/group/datafaqs> 
   a datafaqs:DatasetCollection;
   dcterms:hasPart 
        <http://thedatahub.org/dataset/congresspeople>,
        <http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states>,
        <http://thedatahub.org/dataset/white-house-visitor-access-records> .

<http://thedatahub.org/dataset/congresspeople> 
   a datafaqs:CKANDataset .
<http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states> 
   a datafaqs:CKANDataset .
<http://thedatahub.org/dataset/white-house-visitor-access-records> 
   a datafaqs:CKANDataset .

Sample input 2 (thedatahub-lodcloud.ttl):

<http://ckan.net/group/lodcloud> a <http://purl.org/twc/vocab/datafaqs#CKANGroup> .

Sample output 2:

<http://ckan.net/group/lodcloud> 
   a datafaqs:Composite;
   dcterms:hasPart 
      <http://thedatahub.org/dataset/2000-us-census-rdf>,
      <http://thedatahub.org/dataset/aemet>,
      <http://thedatahub.org/dataset/agrovoc-skos>,
      <http://thedatahub.org/dataset/amsterdam-museum-as-edm-lod>,
      ...
.
<http://thedatahub.org/dataset/2000-us-census-rdf> 
   a datafaqs:CKANDataset .
<http://thedatahub.org/dataset/aemet> 
   a datafaqs:CKANDataset .
...

select-datasets/by-ckan-tag (4/5)

(java version: edu.rpi.tw.data.quality.sadi.faqt.core.select.datasets.ByCKANTag)

Sample input 1 (ckan-lod.ttl):

@prefix moat: <http://moat-project.org/ns#> .

<http://ckan.net/tag/lod>
    moat:name "lod" ;
    a moat:Tag .
  • input class: moat:Tag
  • output class: moat:Tag

services/sadi/core/select-datasets/by-ckan-tag is deployed at http://aquarius.tw.rpi.edu/projects/datafaqs/services/sadi/core/select-datasets/by-ckan-tag and returns the list of datasets in thedatahub.org that are tagged with the given moat:Tag. The first example below receives the datasets with the "lod" tag on ckan.org, this tag is used for datasets that are on their way to being the "lodcloud" group (the one that produces the Linked Data diagram) after they meet certain criteria.

curl -H "Content-type: text/turtle" @ckan-lod.ttl http://aquarius.tw.rpi.edu/projects/datafaqs/services/sadi/core/select-datasets/by-ckan-tag

Sample output:

@prefix dcterms: <http://purl.org/dc/terms/> .

<http://ckan.net/tag/lod> a <http://moat-project.org/ns#Tag>;
    dcterms:hasPart 
        <http://thedatahub.org/dataset/2000-us-census-rdf>,
        <http://thedatahub.org/dataset/addgene>,
        <http://thedatahub.org/dataset/aemet>,
        <http://thedatahub.org/dataset/agrovoc-skos>,
...
<http://thedatahub.org/dataset/2000-us-census-rdf> 
   a <http://purl.org/twc/vocab/datafaqs#CKANDataset>,
     <http://www.w3.org/ns/dcat#Dataset> .

<http://thedatahub.org/dataset/addgene> 
   a <http://purl.org/twc/vocab/datafaqs#CKANDataset>,
     <http://www.w3.org/ns/dcat#Dataset> .

<http://thedatahub.org/dataset/aemet> 
   a <http://purl.org/twc/vocab/datafaqs#CKANDataset>,
     <http://www.w3.org/ns/dcat#Dataset> .

<http://thedatahub.org/dataset/agrovoc-skos>
   a <http://purl.org/twc/vocab/datafaqs#CKANDataset>,
     <http://www.w3.org/ns/dcat#Dataset> .

Sample input 2: get the datasets in thedatahub.org with the "lifesciences" tag. This example gives the POST content directly in the curl call (be sure to include a space before the @ symbol).

curl 
  -H "Content-type: text/turtle" 
  -d ' @prefix moat: <http://moat-project.org/ns#> . <http://ckan.net/tag/lifesciences> moat:name "lifesciences" ; a moat:Tag .' http://aquarius.tw.rpi.edu/projects/datafaqs/services/sadi/core/select-datasets

by-ckan-with-sparql-endpoint

upcoming: https://github.com/timrdf/DataFAQs/wiki/FAqT-Services#datasets-by-ckan-sparql-endpoint

select-faqts/towards/by-ckan-tag (5/5)

TODO

augment-datasets (1)

augment-datasets accept void:Datasets and return references to other locations that provide descriptions of the dataset. To get a larger description of the dataset, these references should be obtained.

augment-datasets/with-preferred-uri-and-ckan-meta-void (1/1)

services/sadi/core/augment-datasets/with-preferred-uri-and-ckan-meta-void is deployed at http://aquarius.tw.rpi.edu/projects/datafaqs/services/sadi/core/augment-datasets/with-preferred-uri-and-ckan-meta-void and includes two references if available. The first is the con:preferredURI of the dataset and the second is the VoID file. Both of these are drawn from the original CKAN description.

  • input class: dcat:Dataset
  • output class: datafaqs:WithReferences

Sample input 1 (datafaqs-3.ttl):

...
<http://ckan.net/group/datafaqs> a <http://purl.org/twc/vocab/datafaqs#Composite>;
    dcterms:hasPart 
        <http://thedatahub.org/dataset/congresspeople>,
        <http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states>,
        <http://thedatahub.org/dataset/white-house-visitor-access-records> .

<http://thedatahub.org/dataset/congresspeople>                                
   a <http://purl.org/twc/vocab/datafaqs#CKANDataset> .
<http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states> 
   a <http://purl.org/twc/vocab/datafaqs#CKANDataset> .
<http://thedatahub.org/dataset/white-house-visitor-access-records>            
   a <http://purl.org/twc/vocab/datafaqs#CKANDataset> .

Sample output 1:

<http://thedatahub.org/dataset/congresspeople> 
    a datafaqs:WithReferences;
    rdfs:seeAlso 
    <http://logd.tw.rpi.edu/source/contactingthecongress/dataset/directory-for-the-112th-congress> 
.

<http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states> 
    a datafaqs:WithReferences;
    rdfs:seeAlso 
    <http://logd.tw.rpi.edu/source/data-gov/dataset/4383/version/2011-Nov-29>,
    <http://logd.tw.rpi.edu/source/data-gov/file/4383/version/2011-Nov-29/conversion/data-gov-4383-2011-Nov-29.void.ttl> 
.

<http://thedatahub.org/dataset/white-house-visitor-access-records> 
    a datafaqs:WithReferences .

What's next

Clone this wiki locally