# Using WordNet

WordNet, as the name implies, is a network of words, where words are connected by relations like `hyponym`, `hypernym`, `meronym`, and so on. You can use WordNet to generate wordlists (of, say, colors), or to categorize words. As usual, we'll start by importing the libraries we'll need. 

In [109]:
from nltk.corpus import wordnet as wn
from nltk import word_tokenize, pos_tag
import pandas as pd
import numpy as np
#from collections import Counter
from nltk import Counter
%matplotlib inline

Now let's look up any synset (set of synonyms, words which share a common meaning), using a string. 

In [3]:
dogSyns = wn.synsets('dog')
dogSyns

[Synset('dog.n.01'),
 Synset('frump.n.01'),
 Synset('dog.n.03'),
 Synset('cad.n.01'),
 Synset('frank.n.02'),
 Synset('pawl.n.01'),
 Synset('andiron.n.01'),
 Synset('chase.v.01')]

If we grab the first one of these, we can explore its properties: 

In [110]:
dogSyn = dogSyns[0]
dogSyn

Synset('dog.n.01')

In [111]:
dogSyn.definition()

'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'

Hyponyms are words that are more specific words in the lexical tree. All of the below are types of dogs, for instance:

In [112]:
dogHypo = dogSyn.hyponyms()
dogHypo

[Synset('basenji.n.01'),
 Synset('corgi.n.01'),
 Synset('cur.n.01'),
 Synset('dalmatian.n.02'),
 Synset('great_pyrenees.n.01'),
 Synset('griffon.n.02'),
 Synset('hunting_dog.n.01'),
 Synset('lapdog.n.01'),
 Synset('leonberg.n.01'),
 Synset('mexican_hairless.n.01'),
 Synset('newfoundland.n.01'),
 Synset('pooch.n.01'),
 Synset('poodle.n.01'),
 Synset('pug.n.01'),
 Synset('puppy.n.01'),
 Synset('spitz.n.01'),
 Synset('toy_dog.n.01'),
 Synset('working_dog.n.01')]

In [113]:
dogHypo[-4].definition()

'a young dog'

We can also get hypernyms, or higher-level abstractions/categories. A dog is a type of a canine, for instance, and also a type of a domestic animal. 

In [15]:
dogSyn.hypernyms()

[Synset('canine.n.02'), Synset('domestic_animal.n.01')]

In [16]:
canine = dogSyn.hypernyms()[0]

In [18]:
canine.definition()

'any of various fissiped mammals with nonretractile claws and typically long muzzles'

In [20]:
carnivore = canine.hypernyms()[0]

In [21]:
carnivore

Synset('carnivore.n.01')

In [26]:
carnivore.hypernyms()[0].hypernyms()[0].hypernyms()[0].hypernyms()

[Synset('chordate.n.01')]

Let's try a color:

In [27]:
wn.synsets('yellow')

[Synset('yellow.n.01'),
 Synset('yellow.v.01'),
 Synset('yellow.s.01'),
 Synset('chicken.s.01'),
 Synset('yellow.s.03'),
 Synset('scandalmongering.s.01'),
 Synset('yellow.s.05'),
 Synset('jaundiced.s.01')]

In [34]:
yellow = wn.synsets('yellow', pos='n')[0]

In [31]:
color = wn.synsets('yellow', pos='n')[0].hypernyms()[0]
color

Synset('chromatic_color.n.01')

In [32]:
color.hyponyms()

[Synset('blond.n.02'),
 Synset('blue.n.01'),
 Synset('brown.n.01'),
 Synset('complementary_color.n.01'),
 Synset('green.n.01'),
 Synset('olive.n.05'),
 Synset('orange.n.02'),
 Synset('pastel.n.01'),
 Synset('pink.n.01'),
 Synset('purple.n.01'),
 Synset('red.n.01'),
 Synset('salmon.n.04'),
 Synset('yellow.n.01')]

Each synset has a list of lemma names, or synonyms, associated with that meaning: 

In [33]:
color.lemma_names()

['chromatic_color', 'chromatic_colour', 'spectral_color', 'spectral_colour']

In [36]:
yellow.lemma_names()

['yellow', 'yellowness']

In [37]:
yellow.lemmas()

[Lemma('yellow.n.01.yellow'), Lemma('yellow.n.01.yellowness')]

We can also walk up or down the tree of associations. The `.tree()` method needs a function so it knows which way to walk. You can either write a function that just gets the hypernym of a word, like this: 

In [115]:
def getHypernyms(word):
    return word.hypernyms()
yellow.tree(getHypernyms)

[Synset('yellow.n.01'),
 [Synset('chromatic_color.n.01'),
  [Synset('color.n.01'),
   [Synset('visual_property.n.01'),
    [Synset('property.n.02'),
     [Synset('attribute.n.02'),
      [Synset('abstraction.n.06'), [Synset('entity.n.01')]]]]]]]]

Or you can write the same thing with a bit of shorthand called a lambda function, which is just a function with no name. 

In [38]:
yellow.tree(lambda x: x.hypernyms())

[Synset('yellow.n.01'),
 [Synset('chromatic_color.n.01'),
  [Synset('color.n.01'),
   [Synset('visual_property.n.01'),
    [Synset('property.n.02'),
     [Synset('attribute.n.02'),
      [Synset('abstraction.n.06'), [Synset('entity.n.01')]]]]]]]]

In [40]:
yellow.tree(lambda x: x.hyponyms())

[Synset('yellow.n.01'),
 [Synset('amber.n.01')],
 [Synset('brownish_yellow.n.01')],
 [Synset('canary_yellow.n.01')],
 [Synset('gamboge.n.02')],
 [Synset('greenish_yellow.n.01')],
 [Synset('old_gold.n.01')],
 [Synset('orange_yellow.n.01'), [Synset('ocher.n.01')]],
 [Synset('pale_yellow.n.01')]]

Let's try to do this with many words at a time, using the first paragraph of "The Garden Party." 

In [116]:
gardenPara = """And after all the weather was ideal. They could not have had a more perfect day for a garden-party if they had ordered it. Windless, warm, the sky without a cloud. Only the blue was veiled with a haze of light gold, as it is sometimes in early summer. The gardener had been up since dawn, mowing the lawns and sweeping them, until the grass and the dark flat rosettes where the daisy plants had been seemed to shine. As for the roses, you could not help feeling they understood that roses are the only flowers that impress people at garden-parties; the only flowers that everybody is certain of knowing. Hundreds, yes, literally hundreds, had come out in a single night; the green bushes bowed down as though they had been visited by archangels."""

In [117]:
print(gardenPara)

And after all the weather was ideal. They could not have had a more perfect day for a garden-party if they had ordered it. Windless, warm, the sky without a cloud. Only the blue was veiled with a haze of light gold, as it is sometimes in early summer. The gardener had been up since dawn, mowing the lawns and sweeping them, until the grass and the dark flat rosettes where the daisy plants had been seemed to shine. As for the roses, you could not help feeling they understood that roses are the only flowers that impress people at garden-parties; the only flowers that everybody is certain of knowing. Hundreds, yes, literally hundreds, had come out in a single night; the green bushes bowed down as though they had been visited by archangels.


In [118]:
gardenTokens = word_tokenize(gardenPara)

In [119]:
len(gardenTokens)

152

POS tag it, and extract all the nouns:

In [131]:
gardenPOS = pos_tag(gardenTokens)

In [132]:
gardenPOS[:10]

[('And', 'CC'),
 ('after', 'IN'),
 ('all', 'PDT'),
 ('the', 'DT'),
 ('weather', 'NN'),
 ('was', 'VBD'),
 ('ideal', 'JJ'),
 ('.', '.'),
 ('They', 'PRP'),
 ('could', 'MD')]

In [133]:
gardenNouns = [pair[0] for pair in gardenPOS 
               if pair[1] in ['NNS', 'NN', 'NNP']]

In [134]:
gardenNouns

['weather',
 'day',
 'Windless',
 'warm',
 'sky',
 'cloud',
 'blue',
 'haze',
 'gold',
 'summer',
 'gardener',
 'dawn',
 'lawns',
 'grass',
 'rosettes',
 'daisy',
 'plants',
 'roses',
 'roses',
 'flowers',
 'people',
 'garden-parties',
 'flowers',
 'everybody',
 'Hundreds',
 'hundreds',
 'night',
 'bushes',
 'archangels']

In [138]:
synsets = []
hypers = []
for word in gardenNouns: 
    ss = wn.synsets(word, pos='n')
    if len(ss) > 0: 
        synsets.append(ss[0])
        hypers.append(ss[0].hypernyms())
        if len(ss[0].hypernyms()) > 0: 
            hypers.append(ss[0].hypernyms()[0].hypernyms())

In [139]:
hypers

[[Synset('atmospheric_phenomenon.n.01')],
 [Synset('physical_phenomenon.n.01')],
 [Synset('time_unit.n.01')],
 [Synset('measure.n.02')],
 [Synset('atmosphere.n.05')],
 [Synset('gas.n.02')],
 [Synset('physical_phenomenon.n.01')],
 [Synset('natural_phenomenon.n.01')],
 [Synset('chromatic_color.n.01')],
 [Synset('color.n.01')],
 [Synset('aerosol.n.01')],
 [Synset('cloud.n.01')],
 [Synset('precious_metal.n.01')],
 [Synset('valuable.n.01')],
 [Synset('season.n.02')],
 [Synset('time_period.n.01')],
 [Synset('horticulturist.n.01')],
 [Synset('expert.n.01')],
 [Synset('hour.n.02')],
 [Synset('clock_time.n.01')],
 [Synset('field.n.01')],
 [Synset('tract.n.01')],
 [Synset('gramineous_plant.n.01')],
 [Synset('herb.n.01')],
 [Synset('adornment.n.01')],
 [Synset('decoration.n.01')],
 [Synset('flower.n.01')],
 [Synset('angiosperm.n.01')],
 [Synset('building_complex.n.01')],
 [Synset('structure.n.01')],
 [Synset('shrub.n.01')],
 [Synset('woody_plant.n.01')],
 [Synset('shrub.n.01')],
 [Synset('woody_p

Since this is a list of lists, but each sublist contains only one item, we can flatten it like this: 

In [140]:
flatHypers = [item[0] for item in hypers]

Which would then allow us to do a quantitative analysis of sorts, using `Counter`. 

In [141]:
Counter(flatHypers).most_common(5)

[(Synset('angiosperm.n.01'), 3),
 (Synset('woody_plant.n.01'), 3),
 (Synset('physical_phenomenon.n.01'), 2),
 (Synset('time_period.n.01'), 2),
 (Synset('shrub.n.01'), 2)]