# Ontology and Taxonomy

## Reference

* [WordNet Webpage](https://wordnet.princeton.edu)
* [WordNet Online](http://wordnetweb.princeton.edu/perl/webwn)
* [NLTK: WordNet Interface](https://www.nltk.org/howto/wordnet.html)

## Contents

* [Concepts](#Concepts)
  * [Ontology](#Ontology)
  * [Taxonomy](#Taxonomy)
* [WordNet](#WordNet)
  * [Installation](#Installation)
  * [Senses](#Senses)
  * [Synonyms](#Synonyms)
  * [Antonyms](#Antonyms)
  * [Lexical Relations](#Lexical-Relations)
  * [Hyponyms](#Hyponyms)
  * [Hypernyms](#Hypernyms)
  * [Similarities](#Similarities)
  * [Entailments](#Entailments)
  * [Troponyms](#Troponyms)

In [1]:
# to left-algin the tables below
from IPython.core.display import display, HTML
display(HTML("<style>table {margin-left: 0 !important;}</style>"))

## Concepts

### Ontology

* Nature of being, becoming, existence, or reality, as well as the basic categories of being and their relations.
* Types, properties, and interrelationships of the entities that fundamentally exist for a particular domain of discourse.

### Taxonomy

* The science of classification according to a pre-determined system.
* Resulting catalog used to provide a conceptual framework for discussion, analysis, or information retrieval.

<img src="res/taxonomy.png" style="float:left; width: 350px;"/>

## WordNet

A lexical database that groups nouns, verbs, adjectives and adverbs into sets of cognitive synsets interlinked by conceptual-semantic and [lexical relations](#Lexical-Relations).

| Category | Words | Synsets | Senses |
|:---:|---:|---:|---:|
| Noun | 117,798 | 82,115 | 146,312 |
| Verb | 11,529 | 13,767 | 25,047 |
| Adjective | 21,479 | 18,156 | 30,002 |
| Adverb | 4,481 | 3,621 | 5,580 |
| Total | 155,287 | 117,659 | 206,941 |

### Installation

Verify the SSL (Secure Sockets Layer) certificate:

In [3]:
import ssl

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context

Install WordNet from [NLTK](https://www.nltk.org)

In [4]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to /Users/jdchoi/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

### Senses

A word can have multiple meanings, in other words, senses (e.g., [chair](http://wordnetweb.princeton.edu/perl/webwn?s=chair&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&h=))

* How find-grained do word senses need to be?
* Can we automatically distinguish word senses?

Retrieve all senses of a word as the list of [Synset](https://www.nltk.org/api/nltk.corpus.reader.html?highlight=synset#nltk.corpus.reader.wordnet.Synset):

In [5]:
from nltk.corpus import wordnet as wn
wn.synsets('chair')

[Synset('chair.n.01'),
 Synset('professorship.n.01'),
 Synset('president.n.04'),
 Synset('electric_chair.n.01'),
 Synset('chair.n.05'),
 Synset('chair.v.01'),
 Synset('moderate.v.01')]

You can specify the part-of-speech (tag) of the word:

* `n`: noun
* `v`: verb
* `a`: adjective
* `r`: adverb

In [6]:
chairs = wn.synsets('chair', pos='n')
chairs

[Synset('chair.n.01'),
 Synset('professorship.n.01'),
 Synset('president.n.04'),
 Synset('electric_chair.n.01'),
 Synset('chair.n.05')]

Retrieve the synset directly from the sense ID:

In [7]:
chair_0 = wn.synset('chair.n.01')

print(chair_0 == chairs[0])
print(chair_0 == chairs[1])

True
False


### Synonyms

Each sense can group its own set of synonyms.

Retrieve all synonyms of a word sense as the list of [Lemma](https://www.nltk.org/api/nltk.corpus.reader.html?highlight=lemma#nltk.corpus.reader.wordnet.Lemma):

In [8]:
dog_0 = wn.synset('dog.n.01')
dog_0_lemmas = dog_0.lemmas()
dog_0_lemmas

[Lemma('dog.n.01.dog'),
 Lemma('dog.n.01.domestic_dog'),
 Lemma('dog.n.01.Canis_familiaris')]

Retrieve only the surface forms (lemmas) of the synonyms:

In [7]:
dog_0_lemmas_forms = [l.name() for l in dog_0_lemmas]
dog_0_lemmas_forms

['dog', 'domestic_dog', 'Canis_familiaris']

Retrieve the `Lemma` object directly from the sense ID:

In [8]:
lemma_dog = wn.lemma('dog.n.01.dog')
lemma_canis = wn.lemma('dog.n.01.Canis_familiaris')
print(lemma_dog)
print(lemma_canis)

Lemma('dog.n.01.dog')
Lemma('dog.n.01.Canis_familiaris')


Retrieve the frequency of the lemma:

In [9]:
print(lemma_dog.count())
print(lemma_canis.count())

42
0


### Exercise

Write a function that takes a word, an optional POS tag as well as an optional count, and returns the lemma set of all synonyms of the word:

In [53]:
from typing import Optional, Set

def synonyms(word: str, pos: Optional[str] = None, count: Optional[int] = 0) -> Set[str]:
    """
    :param word: the word to retrieve synonyms for.
    :param pos: the part-of-speech tag of the word; if None, retrieve synonyms across all part-of-speeches.
    :param count: the minimum frequency of the synonym to be retrieved.
    :return: the lemma set of all synonyms of the specific word.
    """
    syns = set()
    # To be updated
    return syns

* [Default arguments in Python](https://docs.python.org/3.9/tutorial/controlflow.html#default-argument-values)
* [Typing in Python](https://docs.python.org/3/library/typing.html)

In [54]:
synonyms('dog', 'n')

{'Canis_familiaris',
 'andiron',
 'blackguard',
 'bounder',
 'cad',
 'click',
 'detent',
 'dog',
 'dog-iron',
 'domestic_dog',
 'firedog',
 'frank',
 'frankfurter',
 'frump',
 'heel',
 'hot_dog',
 'hotdog',
 'hound',
 'pawl',
 'weenie',
 'wiener',
 'wienerwurst'}

In [55]:
synonyms('dog', 'n', 1)

{'dog', 'hound'}

In [56]:
synonyms('dog', count=1)

{'chase', 'dog', 'go_after', 'hound', 'track', 'trail'}

### Antonyms

You can also rerieve antonyms of the lemma:

In [10]:
buy = wn.synset('buy.v.01')
for l in buy.lemmas():
    print(l.name(), l.antonyms())

buy [Lemma('sell.v.01.sell')]
purchase []


### Lexical Relations

A comprehensive set of lexical relations are available in WordNet:

* Synonym
* Antonym
* Hyponym
* Hypernym
* Meronym

Slides: https://www.slideshare.net/jchoi7s/cs329-lexical-relations

### Hyponyms

`(E1) is a kind of (E2)`
* A **horse** is a _kind of_ an **animal**.
* **Ambling** is a _kind of_ **walking**.

Multiple hyponyms
* A **mule** is a _kind of_ a **donkey** and a **horse**.
* **Ambling** is a _kind of_ **walking** and **being slow**.

Retrieve the direct hyponyms

In [9]:
chair_0.hyponyms()

[Synset('armchair.n.01'),
 Synset('barber_chair.n.01'),
 Synset('chair_of_state.n.01'),
 Synset('chaise_longue.n.01'),
 Synset('eames_chair.n.01'),
 Synset('fighting_chair.n.01'),
 Synset('folding_chair.n.01'),
 Synset('highchair.n.01'),
 Synset('ladder-back.n.01'),
 Synset('lawn_chair.n.01'),
 Synset('rocking_chair.n.01'),
 Synset('straight_chair.n.01'),
 Synset('swivel_chair.n.01'),
 Synset('tablet-armed_chair.n.01'),
 Synset('wheelchair.n.01')]

In [10]:
dog_0.hyponyms()

[Synset('basenji.n.01'),
 Synset('corgi.n.01'),
 Synset('cur.n.01'),
 Synset('dalmatian.n.02'),
 Synset('great_pyrenees.n.01'),
 Synset('griffon.n.02'),
 Synset('hunting_dog.n.01'),
 Synset('lapdog.n.01'),
 Synset('leonberg.n.01'),
 Synset('mexican_hairless.n.01'),
 Synset('newfoundland.n.01'),
 Synset('pooch.n.01'),
 Synset('poodle.n.01'),
 Synset('pug.n.01'),
 Synset('puppy.n.01'),
 Synset('spitz.n.01'),
 Synset('toy_dog.n.01'),
 Synset('working_dog.n.01')]

### Hypernyms

Retreive the direct hypernyms:

In [11]:
chair_0.hypernyms()

[Synset('seat.n.03')]

In [12]:
dog_0.hypernyms()

[Synset('canine.n.02'), Synset('domestic_animal.n.01')]

Retrieve all indrect hypernyms:

In [13]:
chair_0.hypernym_paths()

[[Synset('entity.n.01'),
  Synset('physical_entity.n.01'),
  Synset('object.n.01'),
  Synset('whole.n.02'),
  Synset('artifact.n.01'),
  Synset('instrumentality.n.03'),
  Synset('furnishing.n.02'),
  Synset('furniture.n.01'),
  Synset('seat.n.03'),
  Synset('chair.n.01')]]

In [14]:
dog_0.hypernym_paths()

[[Synset('entity.n.01'),
  Synset('physical_entity.n.01'),
  Synset('object.n.01'),
  Synset('whole.n.02'),
  Synset('living_thing.n.01'),
  Synset('organism.n.01'),
  Synset('animal.n.01'),
  Synset('chordate.n.01'),
  Synset('vertebrate.n.01'),
  Synset('mammal.n.01'),
  Synset('placental.n.01'),
  Synset('carnivore.n.01'),
  Synset('canine.n.02'),
  Synset('dog.n.01')],
 [Synset('entity.n.01'),
  Synset('physical_entity.n.01'),
  Synset('object.n.01'),
  Synset('whole.n.02'),
  Synset('living_thing.n.01'),
  Synset('organism.n.01'),
  Synset('animal.n.01'),
  Synset('domestic_animal.n.01'),
  Synset('dog.n.01')]]

## Lowest Common Hypernyms

NLTK already provides a method to find the lowest common hypernyms:

In [15]:
dog = wn.synset('dog.n.01')
cat = wn.synset('cat.n.01')
dog.lowest_common_hypernyms(cat)

[Synset('carnivore.n.01')]

### Exercise

Write a function that takes two sense IDs, finds the lowest common hypernyms, and returns the path from each lowest common hypernym to its root:

In [60]:
from typing import List
from nltk.corpus.reader import Synset

def lch_paths(sense_0: str, sense_1: str) -> List[List[Synset]]:
    """
    :param sense_0: the ID of the first sense.
    :param sense_1: the ID of the second sense.
    :return: the list of LCH paths where each LCH path shows the path from the LCD to its root.
    """
    paths = []
    # To be updated
    return paths

In [59]:
paths = lch_paths('dog.n.01', 'cat.n.01')
for path in paths:
    print(' -> '.join([syn.name() for syn in path]))

entity.n.01 -> physical_entity.n.01 -> object.n.01 -> whole.n.02 -> living_thing.n.01 -> organism.n.01 -> animal.n.01 -> chordate.n.01 -> vertebrate.n.01 -> mammal.n.01 -> placental.n.01 -> carnivore.n.01


## Similarities

How to measure the similarity between two senses?

Slides: https://www.slideshare.net/jchoi7s/cs329-wordnet-similarities

### References

* [Verb Semantics and Lexical Selection](https://www.aclweb.org/anthology/P94-1019/), Wu and Palmer, 1994.
* [Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://arxiv.org/abs/cmp-lg/9511007), Resnik, 1995.
* [Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy](https://www.aclweb.org/anthology/O97-1002/), Jiang and Conrath, 1997
* [Combining Local Context and Wordnet Similarity for Word Sense Identification](https://ieeexplore.ieee.org/document/6287675/authors#authors),  Fellbaum and Miller, 1998.
* [An Information-Theoretic Definition of Similarity](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.55.1832), Lin, 1998.

In [16]:
print('PTH: {}'.format(dog.path_similarity(cat)))
print('LCH: {}'.format(dog.lch_similarity(cat)))
print('WUP: {}'.format(dog.wup_similarity(cat)))

PTH: 0.2
LCH: 2.0281482472922856
WUP: 0.8571428571428571


## Entailments

If (`V1` is true), then (`V2` must be true):
* If (A is **snoring**), then (A must be **sleeping**)

Unless `V1` and `V2` are synonyms, the converse is not true
* If (A is **sleeping**), then (A must be **snoring**)

The contradiction is true
* If (A is **not sleeping**), then (A must **not** be **snoring**)

Temporal inclusion
* `T(V1)` &sube; `T(V2)`: If (A is **snoring**), then (A must be **sleeping**).
* `T(V1)` &supe; `T(V2)`: If (A **bought** B), then (A must have **paid** for B).
* `T(V1)` = `T(V2)`: If (A is **marching**), then (A must be **walking**).

In [17]:
# wn.entailments('snore.v.01')
snore_ents = wn.synset('snore.v.01').entailments()
snore_ents

[Synset('sleep.v.01')]

## Troponyms

(To `V1`) is (to `V2`) in some particular manner.

* (To **shout**) is (to **talk**) _loud_.
* (To **amble**) is (to **walk**) in _slow, relaxed manner_.