# 1. NLTK - Wordnet
[Reference1](https://xrds.acm.org/blog/2017/07/power-wordnet-use-python/)
and
[Reference2](http://www.nltk.org/howto/wordnet.html)

[WordNet](https://wordnet.princeton.edu/) is a large lexical database of English.

Take a look at the next four sentences.
1. “She went home and had <u>pasta</u>.”
2. “Then she cleaned the kitchen and sat on the <u>sofa</u>.”

In Natural Language Processing, we try to use computer programs to find the meaning of sentences. In the above four sentences, with the help of WordNet, a computer program will be able to identify the following:
1. “pasta” is a type of dish.
2. “kitchen” is a part of “home”.

Let’s get started with using WordNet in Python. It is included as a part of the [NLTK](http://www.nltk.org/) corpus. To use it, we need to import it first.
```python
>>> from nltk.corpus import wordnet as wn
```
### Word
Look up a word using **synsets()**; this function has an optional pos (think of it as type) argument which lets you constrain the part of speech of the word.
```python
>>> wn.synsets('dog') # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'),
Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]
>>> wn.synsets('dog', pos=wn.VERB)
[Synset('chase.v.01')]
```
The other parts of speech are NOUN, ADJ and ADV. A synset is identified with a 3-part name of the form: **word.pos.nn**:
```python
>>> wn.synset('dog.n.01')
Synset('dog.n.01')
>>> print(wn.synset('dog.n.01').definition())
a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds
>>> len(wn.synset('dog.n.01').examples())
1
>>> print(wn.synset('dog.n.01').examples()[0])
the dog barked all night
>>> wn.synset('dog.n.01').lemmas()
[Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'), Lemma('dog.n.01.Canis_familiaris')]
>>> [str(lemma.name()) for lemma in wn.synset('dog.n.01').lemmas()]
['dog', 'domestic_dog', 'Canis_familiaris']
>>> wn.lemma('dog.n.01.dog').synset()
Synset('dog.n.01')
```

### Synonyms
WordNet stores synonyms in the form of synsets where each word in the synset shares the same meaning. Each synset contains one or more lemmas, which represent a specific sense of a specific word. Basically, each synset is a group of synonyms. Each synset has a definition associated with it. Relations are stored between different synsets as we’ll see in the next section. Let’s explore synsets with an example.

Take the word ‘sofa’. The following line in Python gives all the synsets for ‘sofa’.
```python
>>> wn.synsets('sofa') 
[Synset('sofa.n.01')]
```
We have only one synset for ‘sofa’ which means that it has only one context or meaning. Another word like ‘ocean’ will give two synsets because it has two meanings – one as ‘a water body’ and the other as ‘being limitless in quantity’.
```python
>>> wn.synsets('ocean') 
[Synset('ocean.n.01'), Synset('ocean.n.02')]
```
We can find the definition of a synset with the code given below.
```python
>>> wn.synset('sofa.n.01').definition()
u'an upholstered seat for more than one person'
```
To find all the words that share the same meaning as sofa, we need to find all the synonyms in the sofa synset.
```
>>> wn.synset('sofa.n.01').lemma_names()
[u'sofa', u'couch', u'lounge']
```
As can be seen above, ‘couch’ and ‘sofa’ are synonyms and share the same meaning.

### Hyponyms and Hypernyms
Hyponyms and Hypernyms are specific and generalized concepts respectively.

For example, ‘beach house’ and ‘guest house’ are hyponyms of ‘house’. They are more specific concepts of ‘house’. And ‘house’ is a hypernym of ‘guest house’ because it is the general concept.

‘Pasta’ is a hyponym of ‘dish’ and ‘dish’ is a hypernym of ‘pasta’.

Let’s look at the code.

After we find the synset whose hyponyms / hypernyms we want, the following code finds the hyponyms and hypernyms.

```python
>>> wn.synset('pasta.n.01').hyponyms() 
[Synset('cannelloni.n.01'), Synset('lasagna.n.01'), 
Synset('macaroni_and_cheese.n.01'), Synset('spaghetti.n.01')]
```

```python
>>> wn.synset('pasta.n.01').hypernyms()
[Synset('dish.n.02')]
```

```python
>>> wn.synset('dish.n.02').definition()
u'a particular item of prepared food'
```

As seen, we found the hyponyms and hypernym of pasta and it can be seen that pasta is a type of dish.

### Meronyms and Holonyms
Meronyms and Holonyms represent the part-whole relationship. The meronym represents the part and the holonym represents the whole. For example, ‘kitchen’ is a meronym of ‘home'(the kitchen is a part of the home), ‘bread’ is a meronym of ‘sandwich’, and ‘sandwich’ is a holonym of ‘bread’.

In Python, we have –

```python
>>> wn.synsets('kitchen') 
[Synset('kitchen.n.01')]
>>> wn.synset('kitchen.n.01').part_holonyms() 
[Synset('dwelling.n.01')]
>>> wn.synset('kitchen.n.01').part_meronyms() 
[]
```

From the above code, we see that ‘dwelling’ is a holonym of ‘kitchen’, and so ‘kitchen’ is a meronym (or a part) of ‘dwelling’

The ‘dwelling’ synset contains ‘home’ as one of its synonyms as seen below.
```python
>>> wn.synset('dwelling.n.01').lemma_names()
[u'dwelling', u'home', u'domicile', 
u'abode', u'habitation', u'dwelling_house']
```
So we can figure out that the word ‘kitchen’ is a part of one’s ‘home’.

### Similarity
```python
>>> dog = wn.synset('dog.n.01')
>>> cat = wn.synset('cat.n.01')
>>> hit = wn.synset('hit.v.01')
>>> slap = wn.synset('slap.v.01')
```
synset1.path_similarity(synset2): Return a score denoting how similar two word senses are, based on the shortest path that connects the senses in the is-a (hypernym/hypnoym) taxonomy. The score is in the range 0 to 1. By default, there is now a fake root node added to verbs so for cases where previously a path could not be found---and None was returned---it should return a value. The old behavior can be achieved by setting simulate_root to be False. A score of 1 represents identity i.e. comparing a sense with itself will return 1.
```python
>>> dog.path_similarity(cat)  # doctest: +ELLIPSIS
0.2...
>>> hit.path_similarity(slap)  # doctest: +ELLIPSIS
0.142...
>>> wn.path_similarity(hit, slap)  # doctest: +ELLIPSIS
0.142...
>>> print(hit.path_similarity(slap, simulate_root=False))
None
>>> print(wn.path_similarity(hit, slap, simulate_root=False))
None
```
synset1.lch_similarity(synset2): Leacock-Chodorow Similarity: Return a score denoting how similar two word senses are, based on the shortest path that connects the senses (as above) and the maximum depth of the taxonomy in which the senses occur. The relationship is given as -log(p/2d) where p is the shortest path length and d the taxonomy depth.
```python
>>> dog.lch_similarity(cat)  # doctest: +ELLIPSIS
2.028...
>>> hit.lch_similarity(slap)  # doctest: +ELLIPSIS
1.312...
>>> wn.lch_similarity(hit, slap)  # doctest: +ELLIPSIS
1.312...
>>> print(hit.lch_similarity(slap, simulate_root=False))
None
>>> print(wn.lch_similarity(hit, slap, simulate_root=False))
None
```
synset1.wup_similarity(synset2): Wu-Palmer Similarity: Return a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node). Note that at this time the scores given do _not_ always agree with those given by Pedersen's Perl implementation of Wordnet Similarity.

The LCS does not necessarily feature in the shortest path connecting the two senses, as it is by definition the common ancestor deepest in the taxonomy, not closest to the two senses. Typically, however, it will so feature. Where multiple candidates for the LCS exist, that whose shortest path to the root node is the longest will be selected. Where the LCS has multiple paths to the root, the longer path is used for the purposes of the calculation.
```python
>>> dog.wup_similarity(cat)  # doctest: +ELLIPSIS
0.857...
>>> hit.wup_similarity(slap)
0.25
>>> wn.wup_similarity(hit, slap)
0.25
>>> print(hit.wup_similarity(slap, simulate_root=False))
None
>>> print(wn.wup_similarity(hit, slap, simulate_root=False))
None
```