Skip to content

Related work

Tim L edited this page Jul 18, 2019 · 128 revisions

Datasets

Discussions

Towards a Dynamic Linked Data Observatory

https://notebooks.dataone.org/citsci-data/understanding-data-quality-mechanism-week-2/

daQ, an Ontology for Dataset Quality Information at http://events.linkeddata.org/ldow2014/papers/ldow2014_paper_09.pdf, by Jeremy Debattista et al.

Analytics data of data.gov.uk: http://data.gov.uk/data/site-usage. They collect similar data with Google Analytics for the Swiss open data data portal, but currently it's not clear if this is going to published and how. Additionally they setup logging on Amazon S3 for all the primary data that is hosted there. They it's important to not just look at the site statistics, but also at the downloads of the primary data. Typically a developer of an app or visualization is using the portal one time to find the link to the interesting data. From that point on, she might only use the direct link, thus not showing up anymore in the logs of the portal. Stefan Oderbolz 22 Jan ckan-discuss

http://thomaslevine.com/open-data/

http://www.ldf.fi/

http://wiki.lib.sun.ac.za/index.php/OpenData

"Using http://data.police.uk/about/ as an example of how to do a really good statement about open data quality", JeniT

"the web is powered by feedback loops between people and information", Jon Kleinberg

The Amsterdam Manifesto on Data Citation Principles

There have been some discussions about the semantic web reaching the "slope of enlightment" on the lifesci list.

http://semwebquality.org has some very nice materials regarding data quality.

http://qualitywebdata.org/ by Michael Hausenblas, but hasn't been active for a while.

A very nice taxonomy of quality aspects by Hartig and Flemming.

The Pedantic Web Group works to get errors and bad practices fixed by engaging the data publishers.

Helena Deus announced her Survey of Linked Data Quality Metrics

Bernard Vatant's discussion about unresolvable vocabulary in LOD.

Gov LD cookbook

Survey of ontology libraries by Natasha Noy.

Aidan Hogan, Jürgen Umbrich, Andreas Harth, Richard Cyganiak, Axel Polleres, Stefan Decker "An empirical survey of Linked Data conformance" Journal of Web Semantics 2012 (to appear). http://sw.deri.org/~aidanh/docs/ldstudy12.pdf http://www.sciencedirect.com/science/article/pii/S1570826812000352?v=s5

http://slidewiki.org/application/questionnaire.php

http://en.wikipedia.org/wiki/OntoClean

LDBC project

http://blog.hubjects.com/2012/03/lov-stories-part-2-gardeners-and.html

"There should be official sem web tests, badges and achievements based on passing things like this. Described in linked data of course!" - Melvin Carvalho 4 Jan 2013 on public-lod

http://sindice.com/developers/publishing

http://nanopub.org/wordpress/?page_id=57

Clinical Quality Linked Data release from HDI 2011 mentioned at hhs challenge

survey of ontology use by Paul Warren Knowledge Media Institute (announcement)

Stian's responses to Sarven's provenance dataset: http://www.w3.org/mid/51364C90.4060501@csarven.ca

Why Linked Data is Not Enough for Scientists

The Amsterdam Manifesto on Data Citation Principles

http://sites.tufts.edu/liam/deliverables/prospectus-for-linked-archival-metadata-a-guidebook/

http://www.dbis.informatik.hu-berlin.de/fileadmin/research/papers/conferences/2009-ldow-hartig.pdf

Nine simple ways to make it easier to (re)use your data

http://www.data.gov/blog/under-hood-open-data-engine

Beyond Data: Building a Web of Needs

7 Points about RDF Validation

Paul Houle used wikipedia visit histories to measure importance in dbpedia, reported on the dbpedia list.

Sebastian

Dear all, I would like to wish you a Happy Easter. At the same time, I have an issue, which concerns LOD and data on the web in general.

As a community, we have all contributed to this image: http://lod-cloud.net/versions/2011-09-19/lod-cloud.html (which is now three years old) You can see a lot of eggs on it, but the meta data (the chicken) is:

  • inaccurate
  • out-dated
  • infeasible to maintain manually (this is my opinion. I find it hard to belief that we will start updating triple and link counts manually)

Here one pertinent example: http://datahub.io/dataset/dbpedia -> (this is still linking to DBpedia 3.5.1)

Following the foundation of the DBpedia Association we would like to start to solve this problem with the help of a new group called DataIDUnit (http://wiki.dbpedia.org/coop/DataIDUnit) using "rough consensus and working code" as their codex: http://wiki.dbpedia.org/coop/

The first goal will be to find some good existing vocabularies and then provide a working version for DBpedia. A student from Leipzig (Markus Freudenberg) will implement a "push DataId to Datahub via its API" feature. This will help us describe the chicken better, that laid all these eggs.

Happy Easter, all feedback is welcome, we hope not to duplicate efforts. Sebastian

-- Sebastian Hellmann AKSW/NLP2RDF research group DBpedia Association Insitute for Applied Informatics (InfAI)

Surveys

Jul 15 2013

Dear Timothy Lebo,

With its increased rate of adoption, Linked Data is becoming a valuable commodity in numerous domains across the web. But, how valuable is Linked Data after all? How much did it cost to create and publish a dataset as RDF? What is the value of a dataset? To gather information on details of the creation of datasets and then estimate the value of Linked Open Data in terms of time and money, we are conducting a survey. We came across your dataset at [1]. Thus, we would like to request you to fiil out the survey at: http://goo.gl/dLAl8.

This survey contains 23 questions and will take about 10-15 minutes to complete. The results of this survey will be summarized and used to estimate the value of Linked Data and will be made accessible to the survey participants as well as the general public. Please note: if you have more than one dataset that you have published, please fill the questionnaire separately for each of the datasets.

Thank you very much for your time.

[1] http://datahub.io/dataset/twc-ieeevis

Regards,
Ms. Amrapali Zaveri

University of Leipzig - Department of Computer Science
Paulinium 618, Augustusplatz 10, 04109 Leipzig, Germany
http://aksw.org/AmrapaliZaveri

Helena's survey

A lot of open data isn't openly licensed: http://thomaslevine.com/!/open-data-licensing/

Tools

Luzzu - A Quality Assessment Framework for Linked Data [1], now available on Github (https://github.com/EIS-Bonn/Luzzu). Luzzu is a Quality Assessment Framework for Linked Open Datasets. It is a generic framework based on the Dataset Quality Ontology (daQ) [2,3], allowing users to define their own quality metrics. Luzzu is an integrated platform that: assesses Linked Data quality using a library of generic and user-provided domain specific quality metrics in a scalable manner; provides queryable quality metadata on the assessed datasets; assembles detailed quality reports on assessed datasets. Furthermore, the infrastructure: scales for the assessment of big datasets; can be easily extended by the users by creating their custom and domain-specific pluggable metrics, either by employing a novel declarative quality metric specification language or conventional imperative plugins; employs a comprehensive ontology framework for representing and exchanging all quality related information in the assessment workflow; implements quality-driven dataset ranking algorithms facilitating use-case driven discovery and retrieval. [1] http://eis-bonn.github.io/Luzzu/ [2] http://purl.org/eis/vocab/daq [3] http://eis-bonn.github.io/Luzzu/papers/semantics2014.pdf [4] http://eis-bonn.github.io/Luzzu/howto.html

http://www.melissadata.com/index.htm

http://www.ddialliance.org/

  • DDI-RDF Discovery (Disco): Disco is designed to support the discovery of microdata sets and related metadata using RDF technologies in the Web of Linked Data.
  • Physical Data Description (PHDD): PHDD describes existing data in rectangular format and CSV format (character-separated values).
  • Extended Knowledge Organization System (XKOS): XKOS leverages the Simple Knowledge Organization System (SKOS) for managing statistical classifications and concept management systems, since SKOS is widely used.

https://github.com/AKSW/RDFUnit

QALD-4, the fourth in a series of evaluation campaigns on multilingual question answering over linked data http://sourceforge.net/mailarchive/message.php?msg_id=31925311

http://linter.structured-data.org/

http://vmwebsrv01.deri.ie/sites/default/files/publications/paperiswc.pdf

Databugger

SPARQL Endpoint Status (sparql-es)

http://graphite.ecs.soton.ac.uk/prov/

http://mappings.dbpedia.org/server/statistics/en/?show=100000

OOPS! OntOlogy Pitfall Scanner http://oeg-lia3.dia.fi.upm.es/oops/index-content.jsp

http://wiki.publicdata.eu/wiki/CSV2RDF_Application

https://github.com/cbaillie/QualityAssessmentFramework

  • Tomas Knap presented a poster on ODCleanStore at ISWC 2012. Some more documentation is here.

Mondeca hosts a dashbord showing SPARQL endpoint status for all SPARQL endpoints mentioned in http://thedatahub.org/group/lodcloud (Dr. Pierre-Yves Vandenbussche).

SEALS is a rather complete infrastructure, but focuses on tools not data.

  • García-Castro R.; Esteban-Gutiérrez M.; Gómez-Pérez A. "Towards an Infrastructure for the Evaluation of Semantic Technologies". eChallenges e-2010 Conference (e-2010). pp. 1-8. Warsaw, Poland. 27-29 October 2010.

  • http://openprovenance.org/

  • Jiao's validator

LODStats is a python based triple streaming processor. Does it give consumers a voice, or are they just centralized?

  • Jan Demter, Sören Auer, Michael Martin, Jens Lehmann: LODStats – An Extensible Framework for High-performance Dataset Analytics, submitted to ESWC2012

Linked Open Vocabularies http://labs.mondeca.com/dataset/lov/details/vocabulary_geosp.html accepts new vocabularies to evaluate at http://labs.mondeca.com/dataset/lov/suggest/ and has documentation for how to publish vocab.

URI debugger

Sindice Inspector Tool

http://ckan.org/2012/01/09/qa-on-thedatahub/

http://lod2.eu/Project/WIQA.html

http://swse.deri.org/RDFAlerts/ (Aidan Hogan) superseded by http://inspector.sindice.com/

https://github.com/cygri/make-void died in 2010; it is limited to files that fit in memory.

RDFStats http://rdfstats.sourceforge.net/ died in 2012

http://cs.univie.ac.at/research/research-groups/multimedia-information-systems/publikation/infpub/2910/

http://www.cs.ox.ac.uk/isg/tools/LogMap/ matches two given ontologies.

http://code.google.com/p/py-triple-simple/ by Janos Hajagos paper, slides cites Joslyn BTC 2010, which does "predicate bigrams".

https://github.com/kwijibo/void-import-to-thedatahub at http://keithalexander.co.uk/void-import-to-thedatahub/ imports void into ckan.

http://www.w3.org/2009/sparql/sdvalidator and http://validator.linkeddata.org/vapour give EARL with conneg.

http://hcls.sindicetech.com/explore/

http://linkeddata.informatik.hu-berlin.de/uridbg/

LOV:

State of LOD: http://www4.wiwiss.fu-berlin.de/lodcloud/state/#terms links into ckan.

http://ckan.org/2011/07/05/google-refine-extension-for-ckan/

CKAN already supports describing datasets in RDFa by using some commonly used vocabularies (e.g. DCat [1]). See an example by taking the URL of any dataset in CKAN and pasting it into the RDFa Distiller [2]. Given that the European Comission ADMS Working Group has recently published the related Reposiotry, Asset, Distribution (RADion) vocabulary [3][4], I wonder if CKAN should support this vocabulary in RDFa as well. What do you think about RADion? Were any of you involved in its development? [1] http://www.w3.org/TR/vocab-dcat/ [2] http://www.w3.org/2007/08/pyRdfa/ [3] https://joinup.ec.europa.eu/asset/radion/home [4] http://www.w3.org/ns/radion Augusto Herrmann Open Data Team - dados.gov.br

http://oeg-lia3.dia.fi.upm.es/oops/index-content.jsp

query by temporal factors later, for example: 1) how the graph changed between 20th October 2012 to 30th October 2012. I want to see all updates. 2) Snapshot of a particular node on 20th July 2012, 25th July 2012, etc.:

EARL [2] ReSpec HTML+RDFa rollup reports

For help in publishing great data, be sure you visit the Sindice Web Data Inspector. The Web Data Inspector will assist you by providing interactive data visualization and validation services. 1

http://www.bioontology.org/wiki/index.php/Ontology_Metrics?pop=true

http://aers.data2semantics.org/yasgui/ pulled the list of LOD SPARQL endpoints from http://semantic.ckan.net/sparql (which is no longer supported).

ckanext-qa

http://purl.org/openorg/corrections

http://lod.openlinksw.com and http://lod.openlinksw.com/sparql hosts 51 Billion+ Triples culled from across the LOD Cloud. Basically, all the datasets that OpenLink can get our hands on. -Kingsley Jul 2013

Webtest

There are quite a few testing frameworks available but I think only two major once make sense for ckan. The first is Twill and the second is WebTest.

A few words about Twill:

Docs: http://twill.idyll.org/

Pros:

  • more straight forward language
  • record sequences of actions

Cons:

  • uses only beautiful soup
  • poor docs
  • not actively maintained (however, there is a retwill fork on github)

And a few words about WebTest:

Docs: http://webtest.pythonpaste.org/en/latest/index.html

Pros:

  • actively maintained
  • integrated into major python web frameworks, recommended for pylons
  • you can choose between lxml html, lxml xml, beautiful soup, pyquery and json
  • good documentation

Cons:

  • No real webtesting (in an actual browser) since js is ignored
  • sometimes a little bit difficult to understand how to select links/ forms.

Overall, I think WebTest is the way to go which is why I added a few quick examples tat demonstrate how to use forms, click and xpaths/ pyquery. Pull request: https://github.com/okfn/ckan/pull/130

The ticket for the whole thing is here: http://trac.ckan.org/ticket/2934

So I was quite happy with the way I was writing UI tests for ckanext-cmap: using paste.fixture.TestApp to request pages, then using BeautifulSoup to parse the results. Note the most concise thing in terms of saving on typing but simple enough.

WebTest is based on paste.fixture.TestApp but apparently parts of it have been rewritten to use WebOb. As far as I can see the interface of the TestApp and Response objects are pretty much the same as those from paste. The documentation looks better or at least easier to find. The response object has builtin convenience methods for getting BeautifulSoup, ElementTree, LXML, or PyQuery parsed copy of the body, which as far as I know paste's Response object didn't have, but it was only one line of code to get it yourself anyway.

We would have to add webtest and I guess one of BeautifulSoup, ElementTree or LXML to pip-requirements-test.txt (anyone have a preference?)

So WebTest looks like just the thing to me.

I think it would probably be worthwhile for Dominik to write WebTest tests for a couple of parts of CKAN and get them reviewed and merged into master, and then copy a couple of examples from them and paste them into a new section in the CKAN Coding Standards, explaining what our best practice is for doing UI tests. I think that's important because using the right tool won't stop us from writing terrible tests with it.

P.S. For testing JavaScript, which WebTest doesn't do, there is actually a JavaScript test framework builtin into CKAN now, that came with the big demo merge.

P.P.S. Dominik I see that you're calling these "integration" tests, in CKAN currently these kind of UI tests (that use paste.fixture.TestApp) are called "functional" tests (see ckan/tests/functional). So maybe you just want to call yours functional tests, or maybe there's a distinction to be made between functional tests that test the contents of individual pages in detail and process or integration tests that test clicking through multiple pages but without checking the contents of each page in thorough detail.

Conferences and Workshops

http://www.semantics.cc/vocarnival

http://www.semantic-web-journal.net/content/quality-assessment-methodologies-linked-open-data

http://dataweb.medialab.ntua.gr/

http://www.w3.org/2013/04/odw/

http://www.iaria.org/conferences2013/ICIW13.html

http://data.semanticweb.org/usewod/2013/

http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2013_01_17

ISWC 2012:

http://www.iaria.org/conferences2013/CfPWEB13.html

18th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2012) National University of Ireland Galway Quadrangle October 8-12, 2012. http://ekaw2012.ekaw.org

1st International Workshop on Learning Analytics and Linked Data (#LALD2012) in conjunction with the 2nd Conference on Learning Analytics and Knowledge (LAK’12)

I-SEMANTICS 2012

  • Quality of Semantic Data on the Web
    • Provenance information for the Web of Data
    • Large scale ontology inspection and repair
    • Co-reference detection and dataset reconciliation
    • Maintenance of Linked Data models
    • Trust, privacy and security in Semantic Web applications

Linked Data on the Web (LDOW2012) at WWW http://events.linkeddata.org/ldow2012/

LDOW2012 workshop

  • evaluating quality and trustworthiness of Linked Data

International Conference on Dublin Core and Metadata Applications 2012

  • Metadata quality (methods, tools, and practices)

SePublica2012 an ESWC2012 Workshop

  • Provenance, quality, privacy and trust of scientific information

Journal of Web Semantics Special Issue on Evaluation of Semantic Technologies. Special Issue on Visualisation of and Interaction with Semantic Web Data. Special issue of the International Journal on Semantic Web and Information Systems http://www.ijswis.org/?q=node%2F41 Editors: Matthew Rowe , Aba-Sah Dadzie

http://ontologymatching.org/publications.html

Work before Linked Data

  • wang & strong (1996 – beyond accuracy) Recommended by Helena

chimaera papers - KSL-99-17 has more details on the tests.

KSL-00-08 http://ksl.stanford.edu/KSL_Abstracts/KSL-00-08.html

McGuinness, D. L.; Fikes, R.; Rice, J.; & Wilder, S. An Environment for Merging and Testing Large Ontologies. Proceedings of the Seventh International Conference on Principles of Knowledge Representation and Reasoning (KR2000), Breckenridge, Colorado, April, 2000. KSL-00-09 http://ksl.stanford.edu/KSL_Abstracts/KSL-00-09.html McGuinness, D. L.; Fikes, R.; Rice, J.; & Wilder, S. The Chimaera Ontology Environment. Proceedings of the The Seventeenth National Conference on Artificial Intelligence (AAAI 2000), July 30-August 3, 2000. KSL-99-17 http://ksl.stanford.edu/KSL_Abstracts/KSL-99-17.html Fikes, R. & Rice, J. The Stanford KSL Knowledge Base Merging Critical Component Experiment. Knowledge Systems Laboratory, October, 1999.

Sort

http://trustingwebdata.org/davide.html

http://openorg.ecs.soton.ac.uk/wiki/Namespace#Linked_Open_Data for referring to Timber's 5-star scheme came up in Edinburg last May.

We stubbed in something http://logd.tw.rpi.edu/lab/project/logd_internaltional_ogd_catalog/metadata_design

http://eprints.soton.ac.uk/340068/

Ontology Support for Influenza Research and Surveillance, Joanne Luciano, PhD, Lynette Hirschman, PhD, Marc Colosimo, PhD. Approved for Public Release; Distribution Unlimited. 28 April 2008 Case Number 08-0738 http://www.ebi.ac.uk/industry/Documents/workshop-materials/DiseaseOntologiesAndInformation190608/The%20Influenza%20Infectious%20Disease%20Ontology%20(I-IDO)%20-%20Joanne%20Luciano.pdf

THE EVALUATION OF ONTOLOGIES: Toward Improved Semantic Interoperability Leo Obrst, Werner Ceusters, Inderjeet Mani, Steve Ray, Barry Smith in C. Baker and K.-H. Cheung, ed., Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences, New York: Springer Verlag, 2006, 139-158. Chapter 6 http://ontology.buffalo.edu/smith/articles/EvaluationOfOntologies.pdf

A SURVEY OF ONTOLOGY EVALUATION TECHNIQUES Janez Brank, Marko Grobelnik, Dunja Mladenić, http://eprints.pascal-network.org/archive/00001198/01/BrankEvaluationSiKDD2005.pdf

Wilkinson Evaluating FAIR Maturity Through a Scalable, Automated, Community-Governed Framework 2019

Clone this wiki locally