# Wordnet sandbox

## Setup

Import Wordet, create a list of sample words, and examine their synsets

In [17]:
from nltk.corpus import wordnet as wn
# Assume 'scare' is a verb and the others are nouns
words = ['scare', 'ghost', 'fright', 'spook', 'koala']
synset_list =[wn.synsets(word) for word in words]
synset_list

[[Synset('panic.n.02'),
  Synset('scare.n.02'),
  Synset('frighten.v.01'),
  Synset('daunt.v.01')],
 [Synset('ghost.n.01'),
  Synset('ghostwriter.n.01'),
  Synset('ghost.n.03'),
  Synset('touch.n.03'),
  Synset('ghost.v.01'),
  Synset('haunt.v.02'),
  Synset('ghost.v.03')],
 [Synset('fear.n.01'), Synset('frighten.v.01')],
 [Synset('creep.n.01'), Synset('ghost.n.01'), Synset('spook.v.01')],
 [Synset('koala.n.01')]]

## To choose the correct synset, get the definitions of each

In [18]:
for i in range(len(synset_list)):
    [print(i, item, item.definition()) for item in synset_list[i]]

0 Synset('panic.n.02') sudden mass fear and anxiety over anticipated events
0 Synset('scare.n.02') a sudden attack of fear
0 Synset('frighten.v.01') cause fear in
0 Synset('daunt.v.01') cause to lose courage
1 Synset('ghost.n.01') a mental representation of some haunting experience
1 Synset('ghostwriter.n.01') a writer who gives the credit of authorship to someone else
1 Synset('ghost.n.03') the visible disembodied soul of a dead person
1 Synset('touch.n.03') a suggestion of some quality
1 Synset('ghost.v.01') move like a ghost
1 Synset('haunt.v.02') haunt like a ghost; pursue
1 Synset('ghost.v.03') write for someone else
2 Synset('fear.n.01') an emotion experienced in anticipation of some specific pain or danger (usually accompanied by a desire to flee or fight)
2 Synset('frighten.v.01') cause fear in
3 Synset('creep.n.01') someone unpleasantly strange or eccentric
3 Synset('ghost.n.01') a mental representation of some haunting experience
3 Synset('spook.v.01') frighten or scare, and 

## Choose the appropriate synset

Examine the output and select the relevant synset for each word:

Word   | Synset
------ | ------
scare  | Synset('frighten.v.01')
ghost  | Synset('ghost.n.03')
fright | Synset('fear.n.01')
spook  | Synset('ghost.n.01')
koala  | Synset('koala.n.01')

## Set a variable equal to each synset of interest

In [19]:
scare_synset = wn.synset('frighten.v.01')
ghost_synset = wn.synset('ghost.n.03')
fright_synset = wn.synset('fear.n.01')
spook_synset = wn.synset('ghost.n.01')
koala_synset = wn.synset('koala.n.01')

## Examine the lemmata for each synset

This is just for curiosity; we won't use it for processing

In [20]:
synset_list = [scare_synset, ghost_synset, fright_synset, spook_synset, koala_synset]
for i in range(len(synset_list)):
    print(i, [lemma.name() for lemma in synset_list[i].lemmas()])

0 ['frighten', 'fright', 'scare', 'affright']
1 ['ghost']
2 ['fear', 'fearfulness', 'fright']
3 ['ghost', 'shade', 'spook', 'wraith', 'specter', 'spectre']
4 ['koala', 'koala_bear', 'kangaroo_bear', 'native_bear', 'Phascolarctos_cinereus']


The similarity between a synset and itself is 1:

In [21]:
print(scare_synset.path_similarity(scare_synset))

1.0


Similarity is reciprocal:

In [22]:
print(ghost_synset.path_similarity(scare_synset))
print(scare_synset.path_similarity(ghost_synset))

None
0.08333333333333333


The similarity between any two synsets varies between 0 (maximally dissimilar) and 1. 'scare' and 'ghost' are dissimilar because they're different parts of speech:

In [23]:
print(scare_synset.path_similarity(ghost_synset))

0.08333333333333333


Among the nouns, 'ghost' and 'koala' are not very similar:

In [24]:
print(ghost_synset.path_similarity(koala_synset))

0.05555555555555555


While 'ghost' and 'spook' are more similar:

In [25]:
print(ghost_synset.path_similarity(spook_synset))

0.0625
