From 429a80ecd39b083dec493e68c21ec30c81976148 Mon Sep 17 00:00:00 2001 From: brechtvdv Date: Thu, 14 Mar 2019 13:29:46 +0100 Subject: [PATCH] Rework for eswc --- content/conclusion.md | 16 ++++++++++++---- content/demonstrator.md | 2 +- content/implementation.md | 10 +++++----- content/index.md.erb | 4 ++-- content/introduction.md | 6 ++++-- content/sota.md | 4 ++-- content/styles/print.scss | 13 ++++++++++++- 7 files changed, 38 insertions(+), 17 deletions(-) diff --git a/content/conclusion.md b/content/conclusion.md index 9cc5282..dac3303 100644 --- a/content/conclusion.md +++ b/content/conclusion.md @@ -1,9 +1,17 @@ ## Conclusion {:#conclusion} -Data owners can publish their Linked Open Data very cost-efficient on their website with JSON-LD snippets. After an initial cost of adding this feature to their website, they can have an always up-to-date dataset with negligible maintenance costs. The cultural heritage website hetarchief.be showcases an official maintained paged collection of Linked Data Fragments about newspapers. By extending Comunica, in-depth data analysis and federated querying over this dataset is possible. To improve querying speed, Linked Data services ([SPARQL-endpoint](http://semanticweb.org/wiki/SPARQL_endpoint.html), [HDT](cite:cites Fernndez2013BinaryRR) file, TPF interface...) with a higher maintenance cost can be created on top of JSON-LD snippets. Such interfaces would suffer from scalability problems: Optical Character Recognition (OCR) texts have bad compression rates, and thus require gigabytes of disk space. With our solution, these OCR-text are published in a seperate document keeping the maintenance cost low while harvesting in an automated way is still possible. By using our demonstrator, non-technical users are able to extract a data dump from an enriched website. +Data owners can publish their Linked Open Data very cost-efficient on their website with JSON-LD snippets. After an initial cost of adding this feature to their website, they can have an always up-to-date dataset with negligible maintenance costs, however, machine clients that query and harvest over websites can introduce unforeseen spikes of activity. Data owners will need to extend their monitoring capabilities to not only focus on human interaction (e.g. Google Analytics) and apply a HTTP caching strategy for stale resources. -To gain traction with an international audience, e.g. the science stories platform ([http://sciencestories.io](http://sciencestories.io)), a reconciliation service could be created with knowledge bases (cfr. Wikidata). -Next to embedding the data, hypermedia controls or search engine optimization features, also the [International Image Interoperability Framework](https://iiif.io/api/image/2.1/) (IIIF) Image API for sharing images could be described within a JSON-LD snippet for raising the discoverability of this service. IIIF API information already uses JSON-LD to describe its features such as tiling and licensing which makes this an excellent snippet addition helping an organization become more visible on the Web. +Linked Data services ([HDT](cite:cites Fernndez2013BinaryRR) file, TPF interface...) with a higher maintenance cost can be created on top of JSON-LD snippets, but these would suffer from scalability problems: Optical Character Recognition (OCR) texts have bad compression rates, and thus require gigabytes of disk space. With our solution, these OCR-text are published in a seperate document keeping the maintenance cost low while harvesting in an automated way is still possible. -In future work, extending Comunica for harvesting Hydra collections would help organizations to improve their collection management. These collections could be defined on their main page of their website improving Open Data discoverability. Also work on supporting multiple views acting as indexes for collections would benefit querying performance on sorting or filtering operations on e.g. geospatial or temporal data. \ No newline at end of file +In future work, extending Comunica for harvesting Hydra collections would help organizations to improve their collection management. These collections could be defined on their main page of their website improving Open Data discoverability. + + + + + + + + \ No newline at end of file diff --git a/content/demonstrator.md b/content/demonstrator.md index adf751d..c16e4ae 100644 --- a/content/demonstrator.md +++ b/content/demonstrator.md @@ -6,7 +6,7 @@ The application is written with the front-end playground Codepen [https://codepe
- +
A spreadsheet is generated by entering a URL of a newspaper from hetarchief.be. diff --git a/content/implementation.md b/content/implementation.md index 8b69a9d..a4cc63c 100644 --- a/content/implementation.md +++ b/content/implementation.md @@ -5,8 +5,8 @@ Every newspaper webpage is annotated with JSON-LD snippets containing domain-specific metadata and hypermedia controls. The former metadata is described using acknowledged vocabularies such as [Dublin Core Terms](http://dublincore.org/documents/dcmi-terms/) (DCTerms), [Friend of a Friend](http://xmlns.com/foaf/spec/) (FOAF), [Schema.org](https://schema.org/) etc. The latter is described using the [Hydra](https://www.hydra-cg.com/spec/latest/core) vocabulary for hypermedia-driven Web APIs. Although hetarchief.be contains several human-readable hypermedia controls (free text search bar, search facets, pagination for every [newspaper](https://hetarchief.be/nl/media/brief-van-den-soldaat-aan-zijne-verdrukte-medeburgers/I2STYUAOpmFKmbFRXNmV0PTp) ) only Hydras partial collection view controls are implemented: hydra:next describes the next newspaper, vice versa hydra:previous. Also an estimate of the amount of triples on a page is added using hydra:totalItems and void:triples. This helps user agents to build more efficient query plans. -
-````/code/hydra-partial-collection-view.txt```` +
+ ````/code/hydra-partial-collection-view.txt````
Every newspaper describes its next and previous newspaper using Hydra partial collection view controls. This wires Linked Data Fragments together into a dataset.
@@ -32,13 +32,13 @@ That is why we added an actor (`ActorRdfParseHtmlScript`) for parsing such HTML This intermediate parser searches for data snippets and forwards these to their respective RDF parser. In case of a JSON-LD snippet, the body of a script tag `