# WordNet

WordNet, as the name implies, is a network of words, where words are connected by relations like `hyponym`, `hypernym`, `meronym`, and so on. You can use WordNet to generate wordlists (of, say, colors), or to categorize words. As usual, we'll start by importing the libraries we'll need. 

In [67]:
from nltk.corpus import wordnet as wn
from nltk.corpus import stopwords
from nltk.wsd import lesk
import nltk
import collections

In [4]:
dogSyns=wn.synsets('dog')

In [5]:
dogSyns

[Synset('dog.n.01'),
 Synset('frump.n.01'),
 Synset('dog.n.03'),
 Synset('cad.n.01'),
 Synset('frank.n.02'),
 Synset('pawl.n.01'),
 Synset('andiron.n.01'),
 Synset('chase.v.01')]

In [6]:
dog=dogSyns[0]

In [7]:
dog.definition()

'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'

In [9]:
dog.hyponyms() # 不会很全

[Synset('basenji.n.01'),
 Synset('corgi.n.01'),
 Synset('cur.n.01'),
 Synset('dalmatian.n.02'),
 Synset('great_pyrenees.n.01'),
 Synset('griffon.n.02'),
 Synset('hunting_dog.n.01'),
 Synset('lapdog.n.01'),
 Synset('leonberg.n.01'),
 Synset('mexican_hairless.n.01'),
 Synset('newfoundland.n.01'),
 Synset('pooch.n.01'),
 Synset('poodle.n.01'),
 Synset('pug.n.01'),
 Synset('puppy.n.01'),
 Synset('spitz.n.01'),
 Synset('toy_dog.n.01'),
 Synset('working_dog.n.01')]

In [10]:
dog.hypernyms()

[Synset('canine.n.02'), Synset('domestic_animal.n.01')]

In [11]:
dog.hypernyms()[0].hypernyms()

[Synset('carnivore.n.01')]

In [13]:
dog.hypernyms()[0].hypernyms()[0].hypernyms()

[Synset('placental.n.01')]

In [19]:
mammal=dog.hypernyms()[0].hypernyms()[0].hypernyms()[0].hypernyms()[0]
mammal

Synset('mammal.n.01')

In [22]:
dog.max_depth(),dog.min_depth() # 树深度

(13, 8)

In [23]:
mammal.hypernyms()

[Synset('vertebrate.n.01')]

In [28]:
def getHypernyms(word):
    print(word.max_depth())
    return word.hypernyms()
mammal.tree(getHypernyms)

9
8
7
6
5
4
3
2
1
0


[Synset('mammal.n.01'),
 [Synset('vertebrate.n.01'),
  [Synset('chordate.n.01'),
   [Synset('animal.n.01'),
    [Synset('organism.n.01'),
     [Synset('living_thing.n.01'),
      [Synset('whole.n.02'),
       [Synset('object.n.01'),
        [Synset('physical_entity.n.01'), [Synset('entity.n.01')]]]]]]]]]]

In [36]:
getDepthHypernym(mammal)

[Synset('whole.n.02')]

In [37]:
getDepthHypernym(dog)

[Synset('whole.n.02')]

In [38]:
house=wn.synsets('house')[0]

In [39]:
getDepthHypernym(house)

[Synset('whole.n.02')]

### do with The Garden Party

In [40]:
garden=open('garden.md').read()

In [70]:
gardenTokens=nltk.word_tokenize(garden)

In [43]:
gardenTags=nltk.pos_tag(garden)

In [71]:
# 在tagging后除掉stopwords
gardenNoStops=[token for token in gardenTags if token[0] not in stopwords('English')]

TypeError: 'WordListCorpusReader' object is not callable

In [46]:
wn.synsets('cold')[0].definition()

'a mild viral infection involving the nose and respiratory passages (but not the lungs)'

In [47]:
wn.synsets('cold')[3].definition()

'having a low or inadequate temperature or feeling a sensation of coldness or having been made cold by e.g. ice or refrigeration'

In [49]:
wn.synsets('cold',pos='n')

[Synset('cold.n.01'), Synset('coldness.n.03'), Synset('cold.n.03')]

In [50]:
wn.synsets('cold',pos='a')

[Synset('cold.a.01'),
 Synset('cold.a.02'),
 Synset('cold.s.03'),
 Synset('cold.s.04'),
 Synset('cold.s.05'),
 Synset('cold.s.06'),
 Synset('cold.s.07'),
 Synset('cold.s.08'),
 Synset('cold.s.09'),
 Synset('cold.s.10'),
 Synset('cold.s.11'),
 Synset('cold.s.12'),
 Synset('cold.s.13')]

In [55]:
# 所有的lever 4 nouns
categories=[]
for tokenPos in gardenTags:
    token = tokenPos[0] 
    pos = tokenPos[1] # the part of speech
    if pos in ['NN','NNS']:
        synset=wn.synsets(token,pos='n')
        if type(synset) is list and len(synset)>0:
            synset=synset[0]
            categories.append(getDepthHypernym(synset)[0])

In [57]:
collections.Counter(categories).most_common(10)

[(Synset('matter.n.03'), 5948),
 (Synset('definite_quantity.n.01'), 3543),
 (Synset('substance.n.07'), 3431),
 (Synset('time_unit.n.01'), 882),
 (Synset('whole.n.02'), 241),
 (Synset('signal.n.01'), 27),
 (Synset('event.n.01'), 8)]

In [58]:
def getDepth6Hypernym(word):
    if word.max_depth() >7:
        return getDepth6Hypernym(word.hypernyms()[0])
    return word.hypernyms()

In [61]:
# 所有的lever 6 nouns
categories=[]
for tokenPos in gardenTags:
    token = tokenPos[0] 
    pos = tokenPos[1] # the part of speech
    if pos in ['NN','NNS']:
        synset=wn.synsets(token,pos='n')
        if type(synset) is list and len(synset)>0:
            synset=synset[0]
            categories.append(getDepth6Hypernym(synset)[0])

In [66]:
collections.Counter(categories).most_common(10)
# he=>化学元素He 所以会出现chemical_element

[(Synset('chemical_element.n.01'), 3606),
 (Synset('inhibitor.n.01'), 2498),
 (Synset('chemical.n.01'), 1640),
 (Synset('radioactivity_unit.n.01'), 1243),
 (Synset('vitamin.n.01'), 933),
 (Synset('time_unit.n.01'), 882),
 (Synset('metric_capacity_unit.n.01'), 794),
 (Synset('metallic_element.n.01'), 702),
 (Synset('metric_weight_unit.n.01'), 375),
 (Synset('metric_linear_unit.n.01'), 373)]

In [65]:
help(lesk)

Help on function lesk in module nltk.wsd:

lesk(context_sentence, ambiguous_word, pos=None, synsets=None)
    Return a synset for an ambiguous word in a context.
    
    :param iter context_sentence: The context sentence where the ambiguous word
         occurs, passed as an iterable of words.
    :param str ambiguous_word: The ambiguous word that requires WSD.
    :param str pos: A specified Part-of-Speech (POS).
    :param iter synsets: Possible synsets of the ambiguous word.
    :return: ``lesk_sense`` The Synset() object with the highest signature overlaps.
    
    This function is an implementation of the original Lesk algorithm (1986) [1].
    
    Usage example::
    
        >>> lesk(['I', 'went', 'to', 'the', 'bank', 'to', 'deposit', 'money', '.'], 'bank', 'n')
        Synset('savings_bank.n.02')
    
    [1] Lesk, Michael. "Automatic sense disambiguation using machine
    readable dictionaries: how to tell a pine cone from an ice cream
    cone." Proceedings of the 5th Annu