# Universal Ontologies

In the previous notebook, [Semantic Modeling](SemanticModeling.ipynb), we derived a motivation for ontologies. When we store our data in sematic graphs, we need a way to:
- search by specific type of object
- search for specific types of relationships
- define the rules of our system

We will see all three of those goals implemented in this notebook, and we will try and explore their consequences!

**Universal Ontologies** are attempts to define rules for *any* kind of system which we want to model. We won't implement a universal ontology here, as it is out of scope. Instead, we will make a version of an ontology to fit the MoMA collection dataset. 

Clay's opinion: defining a *true* universal ontology is the task of madmen and those with a budget to evaporate.

However, universal ontologies do exist, and they are helpful in the same way that **identifiers** are helpful, as we saw in the `Semantic Modeling` notebook. If our data spans multiple different semantic graphs and datasets, we can easily combine them if they all follow a universal ontology.


## FDH Textbook's Definition

Let's look at the textbook's definition for a **Universal Ontology**:

> Universal ontologies represent an ambitious effort in the field of semantic modeling, aiming  to create frameworks that are widely applicable across various domains. These ontologies are  designed to be comprehensive, interoperable, and capable of addressing fundamental challenges  in representing and linking diverse datasets.

In the textbook's definition, we see that key aspect we derived in `Semantic Modeling`: universal ontologies exist to "represent and *link* diverse datasets."

## What do Universal Ontologies Model?

FDH's Textbook breaks down Universal Ontologies into 4 modeling tasks. Quoting the list from the textbook:
1. Modeling Time: Creating representations that can account for historical, present,  and future contexts, including temporal intervals, sequences, and overlaps.
2. Modeling Space: Addressing spatial relationships, including coordinates, regions,  and topological connections.
3. Modeling Events: Defining events as interactions between entities, with attributes  like time, location, and participating agents.
4. Modeling Relations: Capturing the diverse and complex relationships that exist  between entities, such as hierarchical, causal, and associative links.


## Challenges with Universal Ontologies

The "universal" nature of "universal ontologies" creates some problems for them.  

By their definition, universal ontologies want to encompass every kind of semantic dataset. However, this often (practically) results in a lot of bloated, unused features for specific use cases. Furthermore, they require *instantaneous mass adoption* to be worth the overhead which they generate. Finally, they are very resistant to changes which cause `backwards-incompatability`: when datasets modeled with the newer versions of the ontology cannot combine with datasets modeled with older versions of the ontology.


## Constructing Universal Ontologies

Universal Ontologies are usually created in two ways: as a product of community participation, or by a specialized committee. We will look at both types!


## Collective Ontologies: WikiData
The most famous example of a collective ontology is the WikiData dataset. In `Semantic Modeling`, we quickly reviewed WikiData and saw it in the real world! Let's now give it the love it deserves.

We're going to use the Python endpoint of WikiData in this notebook, but be aware there are other ways of querying the dataset.

In [None]:
%pip install Wikidata

In [6]:
# Some imports to get us started.
import wikidata
from wikidata.client import Client

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Load in our swiss artists and artworks from MoMA. 
# See `Semantic Modeling` notebook if you want a more deatiled explanation.
swiss_artists = pd.read_csv('Swiss_Artists.csv')
swiss_artworks = pd.read_csv('Swiss_Artworks.csv')

In [4]:
swiss_artists.head(3)

Unnamed: 0,ConstituentID,DisplayName,ArtistBio,Nationality,Gender,BeginDate,EndDate,Wiki QID,ULAN
0,149,Cuno Amiet,"Swiss, 1868–1961",Swiss,male,1868,1961,Q566797,500005153.0
1,314,Theo H. Ballmer,"Swiss, 1902–1965",Swiss,male,1902,1965,Q2416828,500060722.0
2,348,Maurice Barraud,"Swiss, 1889–1955",Swiss,male,1889,1955,,


Before we dive into a quick overview, know that I am referring to the [Wikidata documentation](https://wikidata.readthedocs.io/en/stable/).

In [None]:
# We begin by creating a client to query the database:
client = Client()

# Let's query the data for the artist "Cunu Amiet", who has a Wiki QID.
cuno_amiet_qid = swiss_artists[swiss_artists['DisplayName'] == 'Cuno Amiet']['Wiki QID'].values[0]

cuno_amiet = client.get(cuno_amiet_qid)

When we query the client, we can query with an entity ID, the foundation of Wikidata. We get back a WikiData entity. 

In [9]:
cuno_amiet

<wikidata.entity.Entity Q566797>

In [11]:
cuno_amiet.description

m'Swiss painter, draughtsman, graphic artist and sculptor (1868-1961)'

In [21]:
# :et's
cuno_amiet.lists()

[(<wikidata.entity.Entity P19>, [<wikidata.entity.Entity Q68965>]),
 (<wikidata.entity.Entity P214>, ['7399068']),
 (<wikidata.entity.Entity P213>, ['0000000108665347']),
 (<wikidata.entity.Entity P244>, ['n83224756']),
 (<wikidata.entity.Entity P227>, ['118502557']),
 (<wikidata.entity.Entity P373>, ['Cuno Amiet']),
 (<wikidata.entity.Entity P269>, ['027277852']),
 (<wikidata.entity.Entity P268>, ['14969008f', '119354761']),
 (<wikidata.entity.Entity P20>, [<wikidata.entity.Entity Q33035234>]),
 (<wikidata.entity.Entity P27>, [<wikidata.entity.Entity Q39>]),
 (<wikidata.entity.Entity P569>, [datetime.date(1868, 3, 28)]),
 (<wikidata.entity.Entity P570>, [datetime.date(1961, 7, 6)]),
 (<wikidata.entity.Entity P31>, [<wikidata.entity.Entity Q5>]),
 (<wikidata.entity.Entity P646>, ['/m/03kw9m']),
 (<wikidata.entity.Entity P902>, ['021974']),
 (<wikidata.entity.Entity P135>, [<wikidata.entity.Entity Q382056>]),
 (<wikidata.entity.Entity P106>,
  [<wikidata.entity.Entity Q1028181>,
   <wik

Now is a good time to review the framework of WikiData. The [introduction](https://www.wikidata.org/wiki/Wikidata:Introduction) page is a great place to get a better overview.

But, the basics are this: WikiData is made up of [items](https://www.wikidata.org/wiki/Help:Items) which contain a unique identifier QID and properties. These properties have names, and connect the item to a value or another item.

For our Cuno Amiet example, take the following "Statement"
`(<wikidata.entity.Entity P21>, [<wikidata.entity.Entity Q6581097>])`

This tells us that Cuno Amiet is linked to an Entity with QID Q6581097 via property P21.

Huh?

Okay, so let's re-write this using our RDF triplet framework. WikiData is completely compatable with this framework! See `Semantic Modeling` if you want more info on these triplets.

The basic form of a triplet is:

| Subject | Predicate | Object |

For our triplet we have:
| Subject | Predicate | Object |
| ------- | --------- | ------ |
| Q566797 | P21 | Q6581097 | 

Where 'Q566797' is our QID for Cuno Amiet. What do property and other QID mean?


In [26]:
# What does P21 mean?
property_entity = client.get('P21', load=True)
property_entity


<wikidata.entity.Entity P21 'sex or gender'>

Ah! Okay! So Cuno's `sex or gender` is the entity with QID `Q6581097`. Let's see what this entity is! 


#### Clay's side note
As we mentioned earlier, Universal Ontologies are very sensible to backwards compatability. Thus, when we create a label according to a conception of sex and gender from 20 years ago, it isn't clear how useful it is now. And, since universal ontologies are *very* afraid of backwards incompatability, they can sacrifice modern ideas of concepts, such as the idea that sex is biological and gender is cultural. In this way, **universal ontologies are inherently conservative structures.**

As a footnote to my footnote, Even the idea that sex is somehow outside of culture has been disputed since the early 1990's. See *Bodies That Matter: On the Discursive Limits of Sex* by Judith Butler for more. 

In [27]:
value_entity = client.get('Q6581097', load=True)
value_entity

<wikidata.entity.Entity Q6581097 'male'>

With over 12,000 properites, WikiData is a serious attempt at a collective universal ontology. If we have 12,000 different kinds of properties, we can encode all kinds of attributes. 

Values can also take the form of text/numbers, rather than a specific entity. 
For example, Cuno Amiet's birtdate:

In [None]:
# The Property for "Birthdate"
birthdate_property_string = 'P569'

# Make sure to get the property entity from the client before indexing into Cuno!
birthdate_property = client.get(birthdate_property_string, load=True)

# Get the birthday using the birthdate property entity
cuno_birthday = cuno_amiet[birthdate_property]
cuno_birthday

datetime.date(1868, 3, 28)

## Time

This is a great way for us to jump into the question of how universal ontologies encode *time.*

## 