Skip to content

FAqT Brick

Tim L edited this page Jul 6, 2014 · 130 revisions

What's first

What we'll cover

This page describes how to invoke a new FAqT Brick epoch, i.e. run an analyses by asking a bunch of evaluation services what they think about some datasets. The result of running an analysis ends up in a triple store and is thus available for query from the FAqT Brick Explorer.

Let's get to it

A diagram illustrating the directory conventions, the data flow, and the data element overlaps is available as OmniGraffle or PDF.

A FAqT brick contains the materials created when using FAqT Services to analyze a set of datasets. A FAqT brick starts as a directory structure whose contents are then loaded into a SPARQL endpoint. Each time an analysis is performed, a new slice is added to the brick for the current time frame, or epoch. The three dimensions of a FAqT brick are dataset, evaluation service, and epoch, as illustrated below.

The three dimensions of a FAqT brick are *dataset*, *evaluation service*, and *epoch*

Directory conventions

You can choose any location for a FAqT brick directory, and you can have many FAqT bricks for different purposes. The name of a FAqT brick's root directory must be named faqt-brick. The core services follow directory conventions rooted on this name. For example, we can create a FAqT brick directory with the following commands:

mkdir ~/lebo/Desktop/faqt-brick
cd ~/lebo/Desktop/faqt-brick
df-epoch.sh --help

df-epoch.sh is available after Installing DataFAQs and prints usage similar to the following:

 usage: df-epoch.sh [-n] [--force-epoch | --reuse-epoch <existing-epoch>]
                                  [--faqts    <rdf-file> <service-uri>]
                                  [--datasets <rdf-file> <service-uri>]

            -n: perform dry run (not implemented yet).

       --faqts: override the service-uri and its input (to evaluate with a different set of FAqT evaluation 

    --datasets: override the service-uri and its input (to evaluate a different set of datasets).

 --force-epoch: force new epoch; ignore 'once per day' convention.

 --reuse-epoch: reapply FAqT evaluation services to datasets in existing epoch. Takes precedence over --force-epoch.

Creating a slice

Running df-epoch.sh will create a FAqT brick slice using a default configuration. Its output reports:

  • the name of the epoch it is going to create (e.g. 2012-01-13), then
  • the [DataFAQs Core Service](DataFAQs Core Services) (e.g. via-sparql-query) that it will use to obtain a list of FAqT services to apply, then
  • the DataFAQs Core Service (e.g. by-ckan-group) that it will use to obtain a list of datasets to evaluate, and finally
  • the DataFAQs Core Service (e.g. with-preferred-uri-and-ckan-meta-void) to use to obtain descriptions for each dataset.
mkdir ~/lebo/Desktop/faqt-brick
cd ~/lebo/Desktop/faqt-brick
df-epoch.sh

[INFO] Using __PIVOT_epoch/2012-01-13 
[INFO] Requesting FAqT services from 
       http://sparql.tw.rpi.edu/services/datafaqs/core/select-faqts/via-sparql-query
[INFO] Requesting datasets from 
       http://sparql.tw.rpi.edu/services/datafaqs/core/select-datasets/by-ckan-group
[INFO] Requesting dataset descriptions from 
       http://sparql.tw.rpi.edu/services/datafaqs/core/augment-datasets/with-preferred-uri-and-ckan-meta-void

After df-epoch.sh lists the FAqT Services and dataset URIs, it gathers RDF descriptions of the datasets. It shows the URIs that it requests to accumulate descriptions about each dataset, along with the first line of each response.

[INFO] 5 FAqT services will evaluate 3 datasets.

[INFO] FAqT Services:

[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples

[INFO] CKAN Datasets:

[INFO] http://thedatahub.org/dataset/congresspeople
[INFO] http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
[INFO] http://thedatahub.org/dataset/white-house-visitor-access-records

[INFO] Gathering information about FAqT evaluation services.

sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter (1/5)
sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop (2/5)
sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples (3/5)
sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties (4/5)
sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag (5/5)

[INFO] Gathering information about CKAN Datasets, for input to FAqT evaluation services.

thedatahub.org/dataset/congresspeople (1/3)
   <?xml version="1.0" encoding="utf-8"?>
   1: http://logd.tw.rpi.edu/source/contactingthecongress/dataset/directory-for-the-112th-congress
      <?xml version="1.0" encoding="utf-8" ?>

thedatahub.org/dataset/farmers-markets-geographic-data-united-states (2/3)
   <?xml version="1.0" encoding="utf-8"?>
   1: http://logd.tw.rpi.edu/source/data-gov/dataset/4383/version/2011-Nov-29
      <?xml version="1.0" encoding="utf-8" ?>
   2: http://logd.tw.rpi.edu/source/data-gov/file/4383/version/2011-Nov-29/conversion/data-gov-4383-2011-Nov-29.void.ttl
      @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

thedatahub.org/dataset/white-house-visitor-access-records (3/3)
   <?xml version="1.0" encoding="utf-8"?>

The accumulated dataset description responses are then submitted to each FAqT service, so that they have some basic information to start with when performing their evaluation. The RDF that each FAqT service returns is stored, and its size and format is reported by df-epoch.sh .

[INFO] Submitting CKAN dataset information to FAqT evaluation services.

[INFO] dataset 1/3, FAqT 1/5 (1/15 total)
[INFO] http://thedatahub.org/dataset/congresspeople
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter
[INFO] 32K of  results

[INFO] dataset 1/3, FAqT 2/5 (2/15 total)
[INFO] http://thedatahub.org/dataset/congresspeople
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties
[INFO] 32K of  results

[INFO] dataset 1/3, FAqT 3/5 (3/15 total)
[INFO] http://thedatahub.org/dataset/congresspeople
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag
[INFO] 4.0K of text/turtle results

[INFO] dataset 1/3, FAqT 4/5 (4/15 total)
[INFO] http://thedatahub.org/dataset/congresspeople
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop
[INFO] 4.0K of text/turtle results

[INFO] dataset 1/3, FAqT 5/5 (5/15 total)
[INFO] http://thedatahub.org/dataset/congresspeople
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples
[INFO] 4.0K of text/turtle results

[INFO] dataset 2/3, FAqT 1/5 (6/15 total)
[INFO] http://thedatahub.org/dataset/white-house-visitor-access-records
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop
[INFO] 4.0K of  results

[INFO] dataset 2/3, FAqT 2/5 (7/15 total)
[INFO] http://thedatahub.org/dataset/white-house-visitor-access-records
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag
[INFO] 4.0K of  results

[INFO] dataset 2/3, FAqT 3/5 (8/15 total)
[INFO] http://thedatahub.org/dataset/white-house-visitor-access-records
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter
[INFO] 4.0K of  results

[INFO] dataset 2/3, FAqT 4/5 (9/15 total)
[INFO] http://thedatahub.org/dataset/white-house-visitor-access-records
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples
[INFO] 4.0K of  results

[INFO] dataset 2/3, FAqT 5/5 (10/15 total)
[INFO] http://thedatahub.org/dataset/white-house-visitor-access-records
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties
[INFO] 4.0K of  results

[INFO] dataset 3/3, FAqT 1/5 (11/15 total)
[INFO] http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties
[INFO] 15M of  results

[INFO] dataset 3/3, FAqT 2/5 (12/15 total)
[INFO] http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag
[INFO] 15M of  results

[INFO] dataset 3/3, FAqT 3/5 (13/15 total)
[INFO] http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter
[INFO] 15M of  results

[INFO] dataset 3/3, FAqT 4/5 (14/15 total)
[INFO] http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop
[INFO] 15M of  results

[INFO] dataset 3/3, FAqT 5/5 (15/15 total)
[INFO] http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states
[INFO] http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples
[INFO] 15M of  results

The following illustrates the process of

  • (1) obtaining a dataset list from CKAN,
  • (2) obtaining a list of FAqT evaluation services from the SADI registry,
  • (3) obtaining descriptions of the dataset via URI dereference and VoID files,
  • (4) obtaining (via GET) a description of the FAqT evaluation service, and
  • (5) POSTing the dataset description to each FAqT evaluation service to obtain an evaluation described in RDF.

This process is done for each dataset and FAqT evaluation service to create a single slice of the FAqT brick.

dataset descriptions are collected before giving them to each FAqT service for evaluation

Storing the FAqT evaluation service descriptions

FAqT evaluation services describe themselves upon HTTP GET requests

When their URI is requested, FAqT evaluation services provide RDF descriptions of themselves. These are stored in a file faqt-service.ttl that is nested by both the faqt and the epoch. For example, the RDF that was returned by requesting the FAqT service http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples during epoch 2012-01-13 is stored at:

faqt-brick/
   sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_epoch/
      2012-01-19/faqt-service.ttl

Storing the CKAN dataset descriptions

dataset descriptions that will be POSTed to the FAqT evaluation service

The accumulated descriptions of the CKAN datasets are stored in a file post.ttl that is nested by both the epoch and the dataset. For example, the RDF that is POSTed to all FAqT services during epoch 2012-01-13 to evaluate dataset http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states is stored at:

faqt-brick/
   __PIVOT_epoch/2012-01-13/__PIVOT_dataset/
      thedatahub.org/dataset/farmers-markets-geographic-data-united-states/post.ttl

The contents of post.ttl is the union of the files:

faqt-brick/
   __PIVOT_epoch/2012-01-13/__PIVOT_dataset/
      thedatahub.org/dataset/farmers-markets-geographic-data-united-states/part-*.{ttl,rdf,nt}

part-0.{ttl,rdf,nt} is the result of dereferencing the URI, while remaining part- files come from other resources such as the VoID file or dereferencing the dataset's con:preferredURIs (as provided by an augment-dataset service; see DataFAQs Core Services).

Storing the FAqT evaluation results

POSTing a dataset description to a FAqT evaluation service will return an RDF description of its evaluation

When the RDF description of a dataset is POSTed to a FAqT evaluation service, the service returns an RDF evaluation of the dataset. The response from the FAqT evaluation service is stored in a file evaluation.ttl that is nested by the faqt, dataset, and epoch. For example, the RDF returned by the FAqT service http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples during epoch 2012-01-13 when evaluating dataset http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states is stored at:

faqt-brick/
   sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_dataset/
      thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/
         2012-01-13/evaluation.ttl

Forcing a new epoch slice

df-epoch.sh assumes that you wouldn't want more than one epoch per day. If that's not the case, go ahead and --force-epoch:

bash-3.2$ df-epoch.sh

An evaluation epoch has already been initiated today (2012-01-13).
Start one tomorrow, use --force-epoch to create another one today, or use --help.

bash-3.2$ df-epoch.sh --force-epoch
[INFO] Using __PIVOT_epoch/2012-01-13_17_49_46 
[INFO] Requesting FAqT services from http://sparql.tw.rpi.edu/services/datafaqs/core/select-faqts/via-sparql-query
...

Removing an epoch's slice

If you want to get rid of an epoch, first remove the epoch-specific materials from datafaqs.localhost/epochs and use df-purge-orphaned-epochs.sh to take care of the rest:

bash-3.2$ rm -rf __PIVOT_epoch/2012-01-13_17_49_46/

bash-3.2$ df-purge-orphaned-epochs.sh
usage: df-purge-orphaned-epochs.sh <-n | -w>

  -n: perform dry run; do not modify anything.
  -w: remove all epochs that are not listed in __PIVOT_epoch/

bash-3.2$ df-purge-orphaned-epochs.sh -w
[INFO] Removing 2012-01-13_17_49_46
[INFO] Removing 2012-01-13_17_49_46
[INFO] Removing 2012-01-13_17_49_46
[INFO] Removing 2012-01-13_17_49_46
[INFO] Removing 2012-01-13_17_49_46
...

df-purge-orphaned-epochs.sh walks the rest of the FAqT brick and removes all materials created during epochs that are not listed in __PIVOT_epoch/. The example usage above removes the forced epoch that was created in the --force-epoch example earlier.

Reapplying the FAqT service evaluations within an epoch

The CKAN dataset descriptions that were accumulated in an existing epoch can be reused to reapply the FAqT service evaluations within the same epoch. Because this replaces the results within the designated epoch, this should only be done for the latest epoch. The following usage shows that of the two epochs in the FAqT brick, the dataset listing and descriptions from the later one are reused.

ls __PIVOT_epoch/
2012-01-12		2012-01-13

df-epoch.sh --reuse-epoch datafaqs:latest
[INFO] Using __PIVOT_epoch/2012-01-13  (datafaqs:latest)
[INFO] Requesting FAqT services from http://sparql.tw.rpi.edu/services/datafaqs/core/select-faqts/via-sparql-query
[INFO] Reusing dataset listing and descriptions from __PIVOT_epoch/2012-01-13

Graph naming URI design and VoID hierarchy

For a given epoch, the following files contain graphs that are interesting for analysis. They need to be named and loaded into a triple store so that they can be available for SPARQL query.

(todo: describe rdf config with provo describing core services e.g.)

 []
   a sd:NamedGraph;
   sd:name  <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/>;

The following files are produced by the DataFAQs Core Services.

__PIVOT_epoch/2012-01-19/faqt-services.ttl      # the evaluation services that were used.
                         datasets.ttl           # the datasets that were evaluated.
                         dataset-references.ttl # rdfs:seeAlso to more descriptions
# __PIVOT_epoch/2012-01-19/faqt-services.ttl.sd_name contains string:
#    http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/faqt-services
# __PIVOT_epoch/2012-01-19/faqt-services.ttl.meta contains following graph.
#   (load all sd metadata into GRAPH sd:NamedGraph { })

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

 []
   a sd:NamedGraph;
   sd:name  <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/faqt-services>;
   sd:graph <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/faqt-services>;
.
<http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/faqt-services>
   a void:Dataset, sd:Graph;
   void:triples 6;
   void:dataDump <http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_epoch/2012-01-19/faqt-services.ttl>;
.

 []
   a sd:NamedGraph;
   sd:name  <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/datasets>;
   sd:graph <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/datasets>;
.
<http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/datasets>
   a void:Dataset, sd:Graph;
   void:triples 7;
   void:dataDump <http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_epoch/2012-01-19/datasets.ttl>;
.
 []
   a sd:NamedGraph;
   sd:name  <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/dataset-references>;
   sd:graph <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/dataset-references>;
.
<http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/config/dataset-references>
   a void:Dataset, sd:Graph;
   void:triples 6;
   void:dataDump <http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_epoch/2012-01-19/dataset-references.ttl>;
.

The following files contain the FAqT evaluation services' descriptions of themselves:

sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag/__PIVOT_epoch/2012-01-19/faqt-service.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter/__PIVOT_epoch/2012-01-19/faqt-service.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop/__PIVOT_epoch/2012-01-19/faqt-service.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties/__PIVOT_epoch/2012-01-19/faqt-service.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_epoch/2012-01-19/faqt-service.ttl
# sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_epoch/2012-01-19/faqt-service.ttl.sd_name
#   http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/faqt/1
# sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_epoch/2012-01-19/faqt-service.ttl.meta
#   (load all sd metadata into GRAPH sd:NamedGraph { })

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

 []
   a sd:NamedGraph;
   sd:name  <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/faqt/1>;
   sd:graph [ 
      a prov:Account, sd:Graph, void:Graph;
      void:triples 17;
      prov:wasAttributedTo <http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples>;
      foaf:primaryTopic    <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/faqt/1>;
      void:dataDump <http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_faqt/sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_epoch/2012-01-19/faqt-service.ttl>;
   ]
.
<http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/faqt/1>
   a datafaqs:FAqTService;
   prov:specializationOf <http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples>;
   dcterms:date "2012-01-19"^^xsd:date;
.

<http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_faqt/sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_epoch/2012-01-19/faqt-service.ttl>
   formats:media_type <http://www.w3.org/ns/formats/Turtle>;
.
<http://www.w3.org/ns/formats/Turtle> 
   rdfs:label "Turtle"; 
   dcterms:identifier "text/turtle";
.

The following files contain the dataset descriptions (including the additional references):

__PIVOT_epoch/2012-01-19/__PIVOT_dataset/thedatahub.org/dataset/congresspeople/post.ttl
__PIVOT_epoch/2012-01-19/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/post.ttl
__PIVOT_epoch/2012-01-19/__PIVOT_dataset/thedatahub.org/dataset/white-house-visitor-access-records/post.ttl
# load all sd metadata into GRAPH sd:NamedGraph { }

 []
   a sd:NamedGraph;
   sd:name  <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/dataset/1>;
   sd:graph [ 
      a prov:Account, sd:Graph, void:Graph;
      void:triples 14861;
      prov:wasDerivedFrom 
         <http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states>,
         <http://logd.tw.rpi.edu/source/data-gov/dataset/4383/version/2011-Nov-29>,
         <http://logd.tw.rpi.edu/source/data-gov/file/4383/version/2011-Nov-29/conversion/data-gov-4383-2011-Nov-29.void.ttl>;
      foaf:primaryTopic    <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/dataset/1>;
      void:dataDump <http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_faqt/__PIVOT_epoch/2012-01-19/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/post.ttl>;
   ]
.
<http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/dataset/1>
   a void:Dataset;
   prov:specializationOf <http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states>;
   dcterms:date "2012-01-19"^^xsd:date;
.

<http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_faqt/__PIVOT_epoch/2012-01-19/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/post.ttl>
   formats:media_type <http://www.w3.org/ns/formats/Turtle>;
.
<http://www.w3.org/ns/formats/Turtle> 
   rdfs:label "Turtle"; 
   dcterms:identifier "text/turtle";
.

The following files contain the evaluation of each dataset from each evaluation service:

sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag/__PIVOT_dataset/thedatahub.org/dataset/congresspeople/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/lodcloud/max-1-topic-tag/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter/__PIVOT_dataset/thedatahub.org/dataset/congresspeople/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/predicate-counter/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop/__PIVOT_dataset/thedatahub.org/dataset/congresspeople/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/redirect-loop/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties/__PIVOT_dataset/thedatahub.org/dataset/congresspeople/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/void-properties/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_dataset/thedatahub.org/dataset/congresspeople/__PIVOT_epoch/2012-01-19/evaluation.ttl
sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/2012-01-19/evaluation.ttl
# load all sd metadata into GRAPH sd:NamedGraph { }

@prefix rdfs:     <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd:      <http://www.w3.org/2001/XMLSchema#> .
@prefix dcterms:  <http://purl.org/dc/terms/> .
@prefix void:     <http://rdfs.org/ns/void#> .
@prefix sd:       <http://www.w3.org/ns/sparql-service-description#> .
@prefix formats:  <http://www.w3.org/ns/formats/media_type> .
@prefix prov:     <http://www.w3.org/ns/prov-o/> .
@prefix datafaqs: <http://purl.org/twc/vocab/datafaqs#> .

 []
   a sd:NamedGraph;
   sd:name  <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/faqt/1/dataset/1>;
   sd:graph [ 
      a prov:Account, sd:Graph, void:Graph, datafaqs:Evaluation;
      void:triples 14;
      prov:wasAttributedTo <http://sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples>;
      foaf:primaryTopic    <http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/faqt/1/dataset/1>;
      void:dataDump <http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_faqt/sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/2012-01-19/evaluation.ttl>;
   ]
.
<http://sparql.tw.rpi.edu/datafaqs/epoch/2012-01-19/faqt/1/dataset/1>
   a void:Dataset;
   prov:specializationOf <http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states>;
   dcterms:date "2012-01-19"^^xsd:date;
.

<http://sparql.tw.rpi.edu/datafaqs/dump/__PIVOT_faqt/sparql.tw.rpi.edu/services/datafaqs/faqt/void-triples/__PIVOT_dataset/thedatahub.org/dataset/farmers-markets-geographic-data-united-states/__PIVOT_epoch/2012-01-19/evaluation.ttl>
   formats:media_type <http://www.w3.org/ns/formats/Turtle>;
.
<http://www.w3.org/ns/formats/Turtle> 
   rdfs:label "Turtle"; 
   dcterms:identifier "text/turtle";
.

Finding Dataset Descriptions that were not valid RDF

(note: see df-find for a more complete and encapsulated way to find invalid results, etc.)

The following command will list the files returned that were not valid RDF.

find __PIVOT_epoch/2013-04-14/__PIVOT_dataset/ -name "augmentation-*" -o -name "reference-*" | xargs valid-rdf.sh -v | grep "^no"

e.g., when reference-1 contains HTML, we can see where it came from from the corresponding get-reference-1.sh:

bash-3.2$ head -5  __PIVOT_epoch/2013-04-14/__PIVOT_dataset//thedatahub.org/dataset/webnmasunotraveler/reference-1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
 <head>
  <title>Index of /metadata</title>
 </head>
bash-3.2$ cat __PIVOT_epoch/2013-04-14/__PIVOT_dataset//thedatahub.org/dataset/webnmasunotraveler/get-reference-1.sh 
curl -s -L -H "Accept: application/rdf+xml, text/rdf;q=0.6, */*;q=0.1" http://webenemasuno.linkeddata.es/metadata > reference-1

Showing how we find out about a dataset

All of the files used to find out about a dataset:

ls __PIVOT_epoch/2014-07-04/__PIVOT_dataset/thedatahub.org/dataset/dbpedia

augmentation-1.rdf
dataset.ttl
get-augmentation-1.sh
get-reference-0.sh
get-reference-1.sh
get-references-1.sh
post.meta.ttl
post.nt
post.nt.rdf
post.nt.sd_name
post.nt.ttl
reference-0.rdf
reference-1.rdf
references-1.ttl
references.csv
for get in __PIVOT_epoch/2014-07-04/__PIVOT_dataset/thedatahub.org/dataset/dbpedia/get*.sh; do 
   echo "============ `basename $get` ============"; 
   cat $get; echo '============================================'; 
   echo; echo;
done

returns something like:

============ get-augmentation-1.sh ============
curl -s -H 'Content-Type: application/rdf+xml' -d @post.nt.rdf http://aquarius.tw.rpi.edu/projects/datafaqstest/sadi-services/lift-ckan > augmentation-1
============================================


============ get-reference-0.sh ============
curl -s -L -H "Accept: application/rdf+xml, text/rdf;q=0.6, */*;q=0.1" http://thedatahub.org/dataset/dbpedia > reference-0
============================================


============ get-reference-1.sh ============
curl -s -L -H "Accept: application/rdf+xml, text/rdf;q=0.6, */*;q=0.1" http://dbpedia.org/void/Dataset > reference-1
============================================


============ get-references-1.sh ============
curl -s -H 'Content-Type: text/turtle' -d @dataset.ttl http://aquarius.tw.rpi.edu/projects/datafaqs/services/sadi/core/augment-datasets/with-preferred-uri-and-ckan-meta-void > references-1
============================================

What's next

Clone this wiki locally