<a href="https://colab.research.google.com/github/carrielui/TextAnalytics/blob/master/NLPforClinicalText_WordNet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Using WordNet in Python

In [0]:
# use WordNet via nltk
import nltk
nltk.download('wordnet')
from nltk.corpus import wordnet as wn

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


### Synset
Motorcar has one meaning car.n.01 (=the first noun sense of car). 

The entity car.n.01 is called a synset, or "synonym set", a collection of synonymous words (or "lemmas").

Synsets are described with a gloss (= definition) and some example
sentences.


In [0]:
for synset in wn.synsets('motorcar'):
  print("\tLemma: {}".format(synset.name()))
  print("\tLemmas: {}".format(synset.lemmas()))
  print("\tDefinition: {}".format(synset.definition()))
  print("\tExample: {}".format(synset.examples()))

  

	Lemma: car.n.01
	Lemmas: [Lemma('car.n.01.car'), Lemma('car.n.01.auto'), Lemma('car.n.01.automobile'), Lemma('car.n.01.machine'), Lemma('car.n.01.motorcar')]
	Definition: a motor vehicle with four wheels; usually propelled by an internal combustion engine
	Example: ['he needs a car to get to work']


Unlike the words automobile and motorcar, which are unambiguous
and have one synset, the word car is ambiguous, having five synsets:


In [0]:
wn.synsets("car")

[Synset('car.n.01'),
 Synset('car.n.02'),
 Synset('car.n.03'),
 Synset('car.n.04'),
 Synset('cable_car.n.01')]

In [0]:
for synset in wn.synsets('car'):
  print("\tLemmas: {}".format(synset.lemma_names()))

  

	Lemmas: ['car', 'auto', 'automobile', 'machine', 'motorcar']
	Lemmas: ['car', 'railcar', 'railway_car', 'railroad_car']
	Lemmas: ['car', 'gondola']
	Lemmas: ['car', 'elevator_car']
	Lemmas: ['cable_car', 'car']


## **Hypernyms and Hyponyms ("is-a relation")**
One sense is a hyponym of another if the first sense is more specific, denoting a subclass of the other.


*   Motor vehical is a hypernym of motorcart
*   Ambulance is a hyponym of motorcar



In [0]:
motorcar = wn.synset('car.n.01')
print("\tHypernums: {}".format(motorcar.hypernyms()))
print("\tHyponyms: {}".format(motorcar.hyponyms()))

	Hypernums: [Synset('motor_vehicle.n.01')]
	Hyponyms: [Synset('ambulance.n.01'), Synset('beach_wagon.n.01'), Synset('bus.n.04'), Synset('cab.n.03'), Synset('compact.n.03'), Synset('convertible.n.01'), Synset('coupe.n.01'), Synset('cruiser.n.01'), Synset('electric.n.01'), Synset('gas_guzzler.n.01'), Synset('hardtop.n.01'), Synset('hatchback.n.01'), Synset('horseless_carriage.n.01'), Synset('hot_rod.n.01'), Synset('jeep.n.01'), Synset('limousine.n.01'), Synset('loaner.n.02'), Synset('minicar.n.01'), Synset('minivan.n.01'), Synset('model_t.n.01'), Synset('pace_car.n.01'), Synset('racer.n.02'), Synset('roadster.n.01'), Synset('sedan.n.01'), Synset('sport_utility.n.01'), Synset('sports_car.n.01'), Synset('stanley_steamer.n.01'), Synset('stock_car.n.01'), Synset('subcompact.n.01'), Synset('touring_car.n.01'), Synset('used-car.n.01')]


In [0]:
types_of_motorcar = motorcar.hyponyms()
print(types_of_motorcar[15])
print(sorted([lemma.name() for synset in types_of_motorcar for lemma in synset.lemmas()]))

Synset('limousine.n.01')
['Model_T', 'S.U.V.', 'SUV', 'Stanley_Steamer', 'ambulance', 'beach_waggon', 'beach_wagon', 'bus', 'cab', 'compact', 'compact_car', 'convertible', 'coupe', 'cruiser', 'electric', 'electric_automobile', 'electric_car', 'estate_car', 'gas_guzzler', 'hack', 'hardtop', 'hatchback', 'heap', 'horseless_carriage', 'hot-rod', 'hot_rod', 'jalopy', 'jeep', 'landrover', 'limo', 'limousine', 'loaner', 'minicar', 'minivan', 'pace_car', 'patrol_car', 'phaeton', 'police_car', 'police_cruiser', 'prowl_car', 'race_car', 'racer', 'racing_car', 'roadster', 'runabout', 'saloon', 'secondhand_car', 'sedan', 'sport_car', 'sport_utility', 'sport_utility_vehicle', 'sports_car', 'squad_car', 'station_waggon', 'station_wagon', 'stock_car', 'subcompact', 'subcompact_car', 'taxi', 'taxicab', 'tourer', 'touring_car', 'two-seater', 'used-car', 'waggon', 'wagon']


In [0]:
print(motorcar.hypernyms())
paths = motorcar.hypernym_paths()
print(len(paths))


[Synset('motor_vehicle.n.01')]
2


In [0]:
[synset.name() for synset in paths[0]]

['entity.n.01',
 'physical_entity.n.01',
 'object.n.01',
 'whole.n.02',
 'artifact.n.01',
 'instrumentality.n.03',
 'container.n.01',
 'wheeled_vehicle.n.01',
 'self-propelled_vehicle.n.01',
 'motor_vehicle.n.01',
 'car.n.01']

In [0]:
[synset.name() for synset in paths[1]]

['entity.n.01',
 'physical_entity.n.01',
 'object.n.01',
 'whole.n.02',
 'artifact.n.01',
 'instrumentality.n.03',
 'conveyance.n.03',
 'vehicle.n.01',
 'wheeled_vehicle.n.01',
 'self-propelled_vehicle.n.01',
 'motor_vehicle.n.01',
 'car.n.01']

##Hyponymas and Instance
WordNet has both classes and instances. An instance is an individual , a proper noun that is a unique entity. E.g. San Francisco is an instance of city.

In [0]:
wn.synset('san_francisco.n.01').hypernyms()


[]

In [0]:
wn.synset('san_francisco.n.01').instance_hypernyms()

[Synset('city.n.01'), Synset('port_of_entry.n.01')]

## Meronyms and Holonyms


*   branch is a meronym (part meronym) of tree
*   heartwood is a meronym (substance meronym) of tree
* forest is a holonym (member holonym) of *tree*



In [0]:
wn.synset('tree.n.01').part_meronyms()

[Synset('burl.n.02'),
 Synset('crown.n.07'),
 Synset('limb.n.02'),
 Synset('stump.n.01'),
 Synset('trunk.n.01')]

In [0]:
wn.synset('tree.n.01').substance_meronyms()

[Synset('heartwood.n.01'), Synset('sapwood.n.01')]

In [0]:
wn.synset('tree.n.01').member_holonyms()

[Synset('forest.n.01')]

## Entailments
The verb Y is entailed by X if by doing X you must be doing Y.
* to sleep is entailed by to snore

In [0]:
wn.synset("snore.v.01").entailments()

[Synset('sleep.v.01')]

In [0]:
wn.synset("walk.v.01").entailments()

[Synset('step.v.01')]

In [0]:
wn.synset("eat.v.01").entailments()

[Synset('chew.v.01'), Synset('swallow.v.01')]

In [0]:
wn.synset("tease.v.03").entailments()

[Synset('arouse.v.07'), Synset('disappoint.v.01')]

## Antonymy

In [0]:
wn.lemma("supply.n.02.supply").antonyms()

[Lemma('demand.n.02.demand')]

In [0]:
wn.lemma("rush.v.01.rush").antonyms()

[Lemma('linger.v.04.linger')]

In [0]:
wn.lemma("horizontal.a.01.horizontal").antonyms()

[Lemma('inclined.a.02.inclined'), Lemma('vertical.a.01.vertical')]

## More Lexical Relations
You can see the lexical relations and other methods defined on a synset, using dir()

In [0]:
print(dir(wn.synsets("motorcar")[0]))

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__unicode__', '__weakref__', '_all_hypernyms', '_definition', '_examples', '_frame_ids', '_hypernyms', '_instance_hypernyms', '_iter_hypernym_lists', '_lemma_names', '_lemma_pointers', '_lemmas', '_lexname', '_max_depth', '_min_depth', '_name', '_needs_root', '_offset', '_pointers', '_pos', '_related', '_shortest_hypernym_paths', '_wordnet_corpus_reader', 'also_sees', 'attributes', 'causes', 'closure', 'common_hypernyms', 'definition', 'entailments', 'examples', 'frame_ids', 'hypernym_distances', 'hypernym_paths', 'hypernyms', 'hyponyms', 'instance_hypernyms', 'instance_hyponyms', 'jcn_similarity', 'lch_similarity', 'lemma_names', 'lemmas', 'lexnam

## Semantic Similarity
Two synsets linked to the same root may have several hypernyms in common. If two synsets share a very specific hypernym (low doan in the hypernym hierarchy), they must be closely related.

In [0]:
right = wn.synset("right_whale.n.01")
orca = wn.synset("orca.n.01")
minke = wn.synset("minke_whale.n.01")
tortoise = wn.synset("tortoise.n.01")
novel = wn.synset("novel.n.01")

In [0]:
print(right.lowest_common_hypernyms(minke))
print(right.lowest_common_hypernyms(orca))
print(right.lowest_common_hypernyms(tortoise))
print(right.lowest_common_hypernyms(novel))


[Synset('baleen_whale.n.01')]
[Synset('whale.n.02')]
[Synset('vertebrate.n.01')]
[Synset('entity.n.01')]


We can quantigy this concept of generality by looking up the depth of each synset.

In [0]:
print(wn.synset("baleen_whale.n.01").min_depth())
print(wn.synset("whale.n.02").min_depth())
print(wn.synset("vertebrate.n.01").min_depth())
print(wn.synset("entity.n.01").min_depth())

14
13
8
0


### Path similarity
Path similarity measures have been defined over the collection of WordNet synsets that incorporate this insight 
* path_similarity()  assigns a score in the range 0-1 based on the shortest path that connects the concepts in the hypernym
hierarchy 
* -1 is returned in those cases where a path cannot be found 
* Comparing a synset with itself will return 1



Potential NLP application
Coreference Resolution: I saw an **orca**. The **whale** was huge.

In [0]:
print(right.path_similarity(minke))
print(right.path_similarity(orca))
print(right.path_similarity(tortoise))
print(right.path_similarity(novel))


0.25
0.16666666666666666
0.07692307692307693
0.043478260869565216


There are many different ways to quantify similarity. For more details, please see http://www.nltk.org/howto/wordnet.html

## Word Sense Disambiguation (WSD)
**Lesk Algorithm**: classical algorithm for Word Sense Disambiguation (WSD) introduced by Michael E. Lesk in 1986. Word’s dictionary definitions are likely to be good indicators
for the senses they define


In [0]:
from nltk import  word_tokenize , wordpunct_tokenize
from nltk.wsd import lesk
nltk.download('punkt')
sentence = "I went to the bank to deposit some money last Monday"
tokens = word_tokenize(sentence)
print(lesk(tokens,'bank','n'))

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Synset('savings_bank.n.02')
