Skip to content

Ontology design tension - parsimony/simplicity can overload, and precision/expressivity can overspecify #161

@mrgarth

Description

@mrgarth

We have observed with two different software implementations cases where the data standard presents data in a single class or property some data that a user wants to model as separate classes/properties.

A funder collects SELI-GLI survey responses. The SFF Companion Module models SELI-GLI as IndicatorReport, with an accompanying set of "standardized" SELI-GLI Theme, Outcome, Indicator instances. The funder wants to hold this data separately from the other IndicatorReport and impact model data from reporting organizations, apparently seeing them as qualitatively different type of data.

  • Does our choice to model SELI-GLI survey data as a logic model inappropriately "overload" the IndicatorReport class, if the result is aligned software writing custom parsing and/or filtering functions to treat this data differently?
  • What differentiates an IndicatorReport from other types of data? Is this relevant to the open question of how Observerations (time bound) and non-time-bound "facts" (Assessments) should be modelled?
  • If rdf:type is not enough for routing data to the right place, might we need "Application Profiles" to tell software how to handle different flavors of the same class? Can RML and/or SHACL shapes and/or SPARQL CONSTRUCT solve this, without resorting to creating multiple new classes/subclass or property/subproperty hierarchies?

Another funder has mapped i72:hasNumericalValue (which allows xsd:string text) to a Data Value field in Airtable that is restricted to Numeric type data. This immediately highlights that data type matching, and not just semantic matching, is critical to include at the mapping stage; this detail was missed in this mapping process. The funder prefers not to change the data type for the field in their database; we assume this is because it could have implications for downstream aggregation and analysis functions.

  • similar to the example above, if hasNumericalValue allows text and numbers, is the property "overloaded"? If users consistently have valid reason / requirement to handle text and numeric data values differently, should we not have separate properties for separate datatypes?
  • Should qualitative data potentially have a separate IndicatorReport subclass, or would additional sub-properties for separate datatypes be sufficient?

Key user needs: SPO users need as much simplicity as possible. Same kind of issue as this one also comes up for units of measure. Some funders may be more tolerant of implementing custom processing to do custom parsing, but this is not always the case, and may contribute to implementation overhead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions