Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shortcuts for modeling information #200

Open
tmprd opened this issue Nov 15, 2023 · 3 comments
Open

Shortcuts for modeling information #200

tmprd opened this issue Nov 15, 2023 · 3 comments
Labels
data modeling question Question concerning the use of the CCO to model data

Comments

@tmprd
Copy link

tmprd commented Nov 15, 2023

CCO provides a robust way of modeling information with details that may not always be needed when used with less structured data. It would be easier to write and read (both for humans and machines) if more shortcuts were provided, which could expand to the full details when necessary.

For example, suppose we have data about someone's age in years and want to reason about whether or not they can vote, or require a guardian's consent in some context, e.g. healthcare. As far as I can tell, an age is a measurement of a temporal region of someone's lifetime. This makes the example a little more convoluted than other types of information, like a person's name. (Also, in most use cases, age should be derived from a birthdate anyway.) But I'm using it to highlight the difference between a simple representation like "Bob is 18 years old" with the full representation in CCO. There are 11 RDF triples here:

# The phrase "Bob is 18 years old"
:BobsAgeInYears a cco:DocumentField ; # a cco:InformationBearingEntity
    cco:has_integer_value 18 ;
    cco:uses_measurement_unit cco:YearMeasurementUnit ;
    bfo:0000101 :BobsAgeInformation . # is carrier of (at some time)

# The proposition expressed by "Bob is 18 years old" (or "Bob is 216 months old")
:BobsAgeInformation a cco:MeasurementInformationContentEntity ;
    cco:is_a_measurement_of :BobsLifetime .

# Bob's lifetime
:BobsLifetime a cco:MultiYearTemporalInterval ;
    cco:is_temporal_region_of :BobsLife .

# Bob's life
:BobsLife a bfo:0000015 . # process

# Bob himself
:Bob a cco:Person ;
    bfo:0000056 :BobsLife . # participates in (at some time)

# Note: replace `bfo:0000101` with `ro:0010002` ("is carrier of"), or `bfo:0000056` with `ro:0000056` ("participates in") if needed.

The point here is that a lot of these details aren't necessary for common use cases of age data.

The annotation property cco:is_tokenized_by provides one kind of shortcut in cases where we need not model extra details about information bearers, such as their provenance. However, some of these details such as units need be added to the token (ex. "10 years"). Worse, the annotation property can't be used with logical axioms. For example, I can't express that someone must be over 18 years old to vote using the "is_tokenized_by" annotation. But it's still useful as a shortcut.

This graph shows what all of this looks like and how many "hops" are involved. I added more classes for context:

graph 

IBE[Information Bearing Entity]:::type -.-> DocumentField
DocumentField:::type -.-> BobsAgeInYears{Bob's Age\nIn Years}:::instance
Process:::type -.-> BobsLife{Bob's Life}:::instance
MYTI[MultiYearTemporalInterval]:::type -.-> BobsLifetime{Bob's Lifetime}:::instance
1DTR[1-Dimensional\nTemporal Region]:::type -.-> MYTI:::type
Person:::type -.-> Bob{Bob}:::instance
MICE[Measurement Information\nContent Entity]:::type -.-> BobsAgeMeasurement{Bob's Age\nMeasurement}:::instance
YearMeasurementUnit{Year Measurement Unit}:::instance
AgeValue{{18}}:::datatype

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
BobsAgeMeasurement --> |"is tokenized by (shortcut)"| AgeValue
Bob -->|participates in| BobsLife
BobsLifetime -->|temporal region of| BobsLife
%%% BobsLifetime --> |process started by| BobsBirth
BobsAgeMeasurement -->|measurement of| BobsLifetime
BobsAgeInYears -->|carrier of| BobsAgeMeasurement
BobsAgeInYears -->|uses measurement unit| YearMeasurementUnit

BobsAgeInYears -->|has integer value| AgeValue

classDef instance fill:#914585, color: white;
classDef type fill:#d6a500, color:white;
classDef datatype fill:#bb2f42, color: white;
linkStyle default stroke:#0079c0
linkStyle 7 stroke:#cb6b00
classDef dataProperty fill:#00a53c
Loading

Suggested shortcuts

With the age example, I find it more useful to model the information bearer than the information content. Information content about age is only useful if we want to represent, for example, that the phrase "18 years" means the same as "216 months" -- i.e. they bear/carry the same information content. Otherwise, we typically want to just reason about the integers and units associated information bearers. In which case, the content is metaphysical baggage.

A property chain axiom attached to something like "carries information about" could provide a shortcut between a carrier of information and what that information is about. Similarly, we could make more specific subproperties like "carries measurement of" which would fit the example better.

:carries_information_about rdf:type owl:ObjectProperty ;
    owl:inverseOf :is_subject_of_carrier ;
    rdfs:domain bfo:0000004 ; # independent continuant
    rdfs:range bfo:0000001 ; # entity
    owl:propertyChainAxiom ( bfo:0000101 # is carrier of (at some time)
                            cco:is_about
                            ) .

This would only be useful in reasoning if we already had a complex chain of relations as in the both example, which would entail the shortcut, but not the other way around. (To get the opposite of a shortcut, something like a SPARQL construct could be used for expanding these shortcut relations into the complex chain of relations, involving the information contents and bearers, etc.)

In any case, the shortcut is at least somewhat interoperable with the full CCO representation.

Using this in the previous example simplifies things a bit:

:BobsAgeInYears a cco:DocumentField ; # a cco:InformationBearingEntity
    cco:has_integer_value 18 ;
    cco:uses_measurement_unit cco:YearMeasurementUnit ;
    :carries_information_about :BobsLifetime .

:BobsLifetime a cco:MultiYearTemporalInterval ;
    cco:is_temporal_region_of :BobsLife .

:BobsLife a bfo:0000015 . # process

:Bob a cco:Person ;
    bfo:0000056 :BobsLife . # participates in (at some time)
graph 

IBE[Information Bearing Entity]:::type -.-> DocumentField
DocumentField:::type -.-> BobsAgeInYears{Bob's Age\nIn Years}:::instance
Process:::type -.-> BobsLife{Bob's Life}:::instance
MYTI[MultiYearTemporalInterval]:::type -.-> BobsLifetime{Bob's Lifetime}:::instance
1DTR[1-Dimensional\nTemporal Region]:::type -.-> MYTI:::type
Person:::type -.-> Bob{Bob}:::instance
YearMeasurementUnit{Year Measurement Unit}:::instance
AgeValue{{18}}:::datatype

Bob -->|participates in| BobsLife
BobsLifetime -->|is temporal region of| BobsLife
BobsAgeInYears -->|uses measurement unit| YearMeasurementUnit
BobsAgeInYears --> |"carries information about (shortcut)"| BobsLifetime
BobsAgeInYears -->|has integer value| AgeValue

classDef instance fill:#914585, color: white;
classDef type fill:#d6a500, color:white;
classDef datatype fill:#bb2f42, color: white;
classDef objectProperty fill:#0079c0
classDef dataProperty fill:#00a53c
linkStyle default stroke:#0079c0
Loading

Simplifying further, we get something that looks a little more familiar to someone working with more common data formats. This is probably want we want unless we need to reason about someone's life or the temporal region it occurs on. More likely, we just need to reason about the integer value associated with their age, and associate it to them.

:BobsAgeInYears a cco:DocumentField ; # a cco:InformationBearingEntity
    cco:has_integer_value 18 ;
    cco:uses_measurement_unit cco:YearMeasurementUnit ;
    :carries_information_about :Bob .

:Bob a cco:Person .

We could use cco:is_subject_of_field as a more specific subproperty of the inverse and use an anonymous node to get something that looks even more familiar. Now we have 5 triples instead of 11.

:Bob a cco:Person ;
    :is_subject_of_field [ rdf:type cco:DocumentField ;
                         cco:uses_measurement_unit cco:YearMeasurementUnit ;
                         cco:has_integer_value 18 .
                         ] .

Please let me know if something is missing here or I'm misunderstanding anything.

@swartik
Copy link

swartik commented Nov 15, 2023

@tmprd,

Let me be sure I understand what you're proposing. Is it to add object properties, defined through property chain axioms, that allow simplifications to graph traversal? There's a precedent for this. See Issue #155.

Can you explain something about your proposal? Consider the triple:

:BobsAgeInYears bfo:0000101 :BobsAgeInformation . # is carrier of at some time

When you use is carrier of at some time, you sometimes need to qualify the triple by describing the temporal interval during which it is true. I don't know whether or not you intended :BobsAgeInYears to be Bob's current age, as opposed to Bob's age at the time the triple is asserted. If you did, that qualification is necessary. (If not, maybe the predicate should be is carrier of at all times.) Putting this particular example aside, it may be tricky to incorporate temporal logic into shortcuts.

@tmprd
Copy link
Author

tmprd commented Nov 15, 2023

@swartik Thanks for the feedback and context. That's right, I want to add defined object properties to simplify graph traversal.

I used "is carrier of at some time" because I heard that an upcoming version of BFO is dropping the "at some time" part for simplicity (I assume because it's already implied to be occurring at some time, but could be wrong about this). I also noted that RO's "is carrier of" could be used instead, but I'm not sure to what extent RO terms are being reclaimed by BFO or CCO. Would using "is carrier of" be more appropriate here? I do see the need to index a statement about someone's age to a particular time. Anyway, the overall goal here is to make some of these representations more practically useful for data processing systems.

@cameronmore
Copy link
Contributor

I like the direction you're going Tim, I would just add that often we want to be more specific about the aboutness relation, so we might want is carrier of measurement at some time or is carrier of prescription at some time . The information model needs some attention, and it is the first module we should consider updating in a substantial way, so this will help the conversation and effort.

@cameronmore cameronmore added the data modeling question Question concerning the use of the CCO to model data label Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data modeling question Question concerning the use of the CCO to model data
Projects
None yet
Development

No branches or pull requests

3 participants