# WordNet and NLTK

**(C) 2016-2019 by [Damir Cavar](http://damir.cavar.me/) <<dcavar@iu.edu>>**

**Version:** 1.1, November 2019

**License:** [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/) ([CA BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/))

This is a brief introduction to [WordNet](https://wordnet.princeton.edu/) in [NLTK](http://www.nltk.org/).

You will find more details on [WordNet](https://wordnet.princeton.edu/) as such on [the WordNet website](https://wordnet.princeton.edu/).

## Using WordNet

Some content and ideas in the following introduction are taken from the [NLTK-howto on WordNet](http://www.nltk.org/howto/wordnet.html).

Import the [WordNet](https://wordnet.princeton.edu/) corpus reader in [NLTK](http://www.nltk.org/) using this code:

In [1]:
from nltk.corpus import wordnet

[WordNet](https://wordnet.princeton.edu/) is a lexical resource that organizes nouns, verbs, adjectives, and adverbs into some form of taxonomy. Lexical items are for example organized in groups of synonyms. In [WordNet](https://wordnet.princeton.edu/) these synonym groups are calls synsets. Every  each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations.

In [2]:
wordnet.synsets('can')

LookupError: 
**********************************************************************
  Resource [93mwordnet[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m>>> import nltk
  >>> nltk.download('wordnet')
  [0m
  Attempted to load [93mcorpora/wordnet[0m

  Searched in:
    - 'C:\\Users\\damir/nltk_data'
    - 'C:\\ProgramData\\Anaconda3\\nltk_data'
    - 'C:\\ProgramData\\Anaconda3\\share\\nltk_data'
    - 'C:\\ProgramData\\Anaconda3\\lib\\nltk_data'
    - 'C:\\Users\\damir\\AppData\\Roaming\\nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
**********************************************************************


The output for the synset contains all synonyms of the word *can* in a list. Each individual synset is a dot-delimited triple that specifies the word, the [part-of-speech](https://en.wikipedia.org/wiki/Part_of_speech) ([PoS](https://en.wikipedia.org/wiki/Part_of_speech)) of the specific words, and a running number from 1 to n, for every specific synset. The [PoS](https://en.wikipedia.org/wiki/Part_of_speech)-tag *n* stands for noun and the [PoS](https://en.wikipedia.org/wiki/Part_of_speech)-tag *v* for verb.

You can request the synset providing the full code:

In [None]:
wordnet.synset('can.v.01')

You can output the definition of any such synset:

In [None]:
wordnet.synset('can.n.03').definition()

You can request all synsets with a specific [PoS](https://en.wikipedia.org/wiki/Part_of_speech) using the word and the [PoS](https://en.wikipedia.org/wiki/Part_of_speech)-tag in the *synset*-function:

In [None]:
wordnet.synsets('can', pos=wordnet.VERB)

The possible PoS-tags are: ADJ, ADJ_SAT, ADV, NOUN, VERB.

I will use the word *[lemmas](https://en.wikipedia.org/wiki/Headword)* refering to *lemmata*.

[WordNet](https://wordnet.princeton.edu/) contains a list of lemmas for each synset. You can print out the lemmas using the following function: 

In [None]:
wordnet.synset('can.v.01').lemmas()

You can map the lammas to a list of strings using the following list comprehension function:

In [None]:
[str(lemma.name()) for lemma in wordnet.synset('can.v.01').lemmas()]

The [NLTK](http://www.nltk.org/) [WordNet](https://wordnet.princeton.edu/) reader provides access to a multi-lingual [WordNet](https://wordnet.princeton.edu/), that is the Open Multilingual WordNet. The multi-lingual data is accessible using [ISO-639 language codes](http://www.iso.org/iso/home/standards/language_codes.htm) (see the [ISO-639 Wikipedia page](https://en.wikipedia.org/wiki/ISO_639-3)):

In [None]:
wordnet.langs()

To access the synsets of the Croatian (hrv) word *kuća*, you can use the language code specification in the *synset* function:

In [None]:
wordnet.synsets('kuća', lang='hrv')

We can even request the list of lemmas in a specific language for a given English word, for example the synset 01 for the noun *house* would have the following lemmas in Croatian (hrv):

In [None]:
wordnet.synset('house.n.01').lemma_names('hrv')

The same word would have the following lemmas in Japanese:

In [None]:
wordnet.synset('house.n.01').lemma_names('jpn')

We can save the synset request in a variable called *house*:

In [None]:
house = wordnet.synset('house.n.01')

We can now request the [hypernyms](https://en.wikipedia.org/wiki/Hyponymy_and_hypernymy) for the word *house* using the variable:

In [None]:
house.hypernyms()

Try this for some other words like *trout* and *poodle*:

In [None]:
wordnet.synset('trout.n.01').hypernyms()

In [None]:
wordnet.synset('poodle.n.01').hypernyms()

We can also request the list of [hyponyms](https://en.wikipedia.org/wiki/Hyponymy_and_hypernymy) for a given word. Here we request the list of [hyponyms](https://en.wikipedia.org/wiki/Hyponymy_and_hypernymy) for *house*:

In [None]:
wordnet.synset('house.n.01').hyponyms()

In the same way we can now request the [holonyms](https://en.wikipedia.org/wiki/Holonymy) for certain words. For example, imagine we are interested in the [holonyms](https://en.wikipedia.org/wiki/Holonymy) for *dog*:

In [None]:
dog = wordnet.synset('dog.n.01')
dog.member_holonyms()

We can also request the root [hypernym](https://en.wikipedia.org/wiki/Hyponymy_and_hypernymy) for some word:

In [None]:
wordnet.synset('leg.n.01').root_hypernyms()

We can request the lowest common [hypernym](https://en.wikipedia.org/wiki/Hyponymy_and_hypernymy) for two words, here for example for *leg* and *arm*:

In [None]:
wordnet.synset('leg.n.01').lowest_common_hypernyms(wordnet.synset('arm.n.01'))

In addition to [hypernym](https://en.wikipedia.org/wiki/Hyponymy_and_hypernymy), [hyponyms](https://en.wikipedia.org/wiki/Hyponymy_and_hypernymy), [holonyms](https://en.wikipedia.org/wiki/Holonymy), [WordNet](https://wordnet.princeton.edu/) also provides the means to request [antonyms](https://en.wikipedia.org/wiki/Opposite_(semantics)), derivationally related forms and [pertainyms](http://www.isi.edu/~ulf/amr/lib/popup/pertainym.html). Consider for example the word *good*. You can request the antonyms for a lemma, that is we fetch all lemmas of the synset *good* and request the antonyms for the first lemma:

In [None]:
wordnet.synset('good.a.01').lemmas()[0].antonyms()

We can now fetch the lemma names for good for Slovenian for example:

In [None]:
wordnet.synset('cold.n.01').lemma_names('slv')

Once again, the lemma names we can now use to request their Spanish lemma names:

In [None]:
slv_good = wordnet.synset('dog.n.01').lemma_names('spa')
print(slv_good)

We can now request the derivationally related forms for a lemma. In this example we request the derivationally related forms for the adjective ([PoS](https://en.wikipedia.org/wiki/Part_of_speech): *a*) *vocal*, which is the verb ([PoS](https://en.wikipedia.org/wiki/Part_of_speech): *v*) *vocalize*:

In [None]:
wordnet.lemma('vocal.a.01.vocal').derivationally_related_forms()

We can also request the [pertainyms](http://www.isi.edu/~ulf/amr/lib/popup/pertainym.html) for specific words:

In [None]:
wordnet.lemma('vocal.a.01.vocal').pertainyms()

For verbs we can for example request the verb frames from [WordNet](https://wordnet.princeton.edu/). In the following example we request the frames for all the different lemmas of the verb *sleep*:

In [None]:
wordnet.synset('say.v.01').frame_ids()
for lemma in wordnet.synset('say.v.01').lemmas():
    print(lemma, lemma.frame_ids())
    print(" | ".join(lemma.frame_strings()))

In the following example we request the verb-frames for the ditransitive verb *to give*:

In [None]:
wordnet.synset('give.v.01').frame_ids()
for lemma in wordnet.synset('give.v.01').lemmas():
    print(lemma, lemma.frame_ids())
    print(" | ".join(lemma.frame_strings()))

## Morphological Analysis and Lemmatization

For many tasks in NLP one needs a lemmatizer or morphological analyzer to map inflected word forms to lemmas. Morphy in the WordNet module of the NLTK can do that. To lemmatize a word, provide the word and the PoS to the morphy function in wordnet:

In [None]:
wordnet.morphy('calls', wordnet.NOUN)

Morphy can cope with surface forms that are the result of various rules of English word formations, as for example *e*-insertion or consonant reduplication:

In [None]:
wordnet.morphy('stopped', wordnet.VERB)

## Similarity of Words

...

# References

Fellbaum, Christiane (2005). WordNet and wordnets. In: Brown, Keith et al. (eds.), *Encyclopedia of Language and Linguistics*, Second Edition, Oxford: Elsevier, 665-670.

(C) 2016-2018 by [Damir Cavar](http://damir.cavar.me/) <<dcavar@iu.edu>>