Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Untested claim, factual claim, and the recoding of provenance data #1808

Closed
mwjames opened this issue Aug 24, 2016 · 1 comment
Closed

Untested claim, factual claim, and the recoding of provenance data #1808

mwjames opened this issue Aug 24, 2016 · 1 comment

Comments

@mwjames
Copy link
Contributor

mwjames commented Aug 24, 2016

Synopsis

For most applications or users that use Semantic MediaWiki, making untested claims is the desired (or convenient) method to record data or data snippets, yet some users require factual claims to be recorded.

This issue is about adding metadata to a value declaration in an intuitive (meaning using the common and existing Semantic MediaWiki edit patterns) and extensible way.

reference

Preface

For clarification, the term untested claim used in the issue is understood as "incomplete claim for which evidence is yet unavailable" [0]. For example, when a user records a statement as in "John Doe's car is green", a reader of said statement has to trust (or believe) that this statement (with the claim John's car is green) is true without evidence or a possibility to verify it. Whether such statement corresponds with that of the real world or not is of secondary nature or in some cases is even undesirable.

Under the open world assumption [1] it is expected that a statement is true and represents a fact (instead of just being an opinion; what if the observer who made the claim has a deuteranomaly [2] condition) and thus is to be true without evidence or a possibility to test the claim against a reciprocal entity.

Statements denoting authority data such as "Berlin has a population of 3,520,061" [3] with a claim of "population of 3,520,061" being untested may represent challenges for some users (#985) and it is the objective of this issue to exhibit a nominal approach (opposed to what Wikidata provides but yet coherent with the existing Semantic MediaWiki data model) that would allow verification of individual claims.

If we understand a factual claim as "a statement that is supported by convergent evidence" then we require from a statement like "Berlin has a population of 3,520,061" additional information [4] about the when, where, and by whom a particular claim (population being 3,520,061) was made. Recording provenance metadata and tracing the source [5, 6] of said claim will help transform an untested into a factual claim by allowing a verification of such claim.

Technical concept and realization

Technically there are different approaches (which I omit to describe and only selected the least invasive) that can be used for recording "provenance" data on an individual value level to support "empirical or analytical" statements in form of a reference to a claim.

Isolating a value (object) "3,520,061" from a subject (as we require it to be represented as single identifiable entity) is the first necessary step for a technical realization therefore we require a new DataType which is called Reference. Reference allows to define the required or expected provenance data fields that may be entered while annotating a property value.

It builds on the Record construct which requires the least amount of conceptional constraints and technical effort. By reusing Has fields to define members of a data construct, Semantic MediaWiki is freed from explicit knowledge of a specific provenance model and users are independent in defining their requirements to represent factual relevance. For instance, if a user doesn't require a date and only wants to record a reference to a page or file name then this can be done without Semantic MediaWiki limiting access or availability of those data.

Features and limitations

  • An annotation which requires (or expects) a reference is added as concatenated string separated by ; and in case a value itself contains ; as part of its declaration then \; is to be used to distinguish it from the separator
  • Annotations of type Reference work in the same way other types do and can be combined with #set or #subobject
  • Type and position of provenance data fields are fixed by the position used in the Has fields declaration
  • By convention the first field of the Has fields declaration is expected to describe the value property (equivalent to just declaring [[Has type:: ... ]] without provenance data)
  • In the provided example, the second and third argument specifies as to when and from where the value was retrieved
  • Annotated values are internally tracked using a container, visible elements are not expected to be distinguishable besides a ≡ marker (can be modified using CSS) to indicate a tooltip that will show additional information for any individual value that is marked as such
  • A reference to a value will in most cases (when displayed as in-text or as result of a #ask query) be displayed as tooltip
  • Queries (same as with those of the Record type) can specify the level of granularity with which an entity is expected to match ( Has population::?;>2000]] vs. Has population::>30000000]])
  • Provenance data will be removed once a value statement is removed
  • For the ongoing implementation, the Exporter is not expected to created reification statements (aka statements about statements) [4] for the provenance metadata

image

Examples

Example 1

Property: Has population

[[Has type::Reference]] [[Has fields::Population number;Retrieved on;Retrieved from]]

image

image

image

Example 2

Property:GDP

[[Has type::Reference]] [[Has fields::GDP (current US$);Retrieved on;Retrieved from;Published by]]

image

[0] http://www.auburn.edu/academic/education/reading_genie/Fact-opinion.html
[1] https://en.wikipedia.org/wiki/Open-world_assumption
[2] http://www.colourblindawareness.org/colour-blindness/types-of-colour-blindness/
[3] https://www.semantic-mediawiki.org/wiki/Berlin
[4] https://www.w3.org/TR/2004/REC-rdf-primer-20040210/#reification
[5] http://ceur-ws.org/Vol-526/InvitedPaper_1.pdf (A New Perspective on Semantics of Data Provenance)
[6] http://db.cis.upenn.edu/DL/fsttcs.pdf (Data Provenance: Some Basic Issues)

mwjames added a commit that referenced this issue Aug 24, 2016
Record type to support property name as index, refs #1808
mwjames added a commit that referenced this issue Sep 1, 2016
@mwjames mwjames closed this as completed Sep 8, 2016
@kghbln kghbln added the wikidocu missing Code changes (mostly features) what have not yet been documented label Sep 24, 2016
@kghbln kghbln removed the wikidocu missing Code changes (mostly features) what have not yet been documented label Mar 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants