# WordNet #

## Senses and Synonyms ##

In [1]:
#synonyms
from nltk.corpus import wordnet as wn
wn.synsets('motorcar')

[Synset('car.n.01')]

Thus, motorcar has just one possible meaning and it is identified as car.n.01, the first noun sense of car. The entity car.n.01 is called a synset, or "synonym set", a collection of synonymous words (or "lemmas"):

In [2]:
wn.synset('car.n.01').lemma_names()

['car', 'auto', 'automobile', 'machine', 'motorcar']

Each word of a synset can have several meanings, e.g., car can also signify a train carriage, a gondola, or an elevator car. However, we are only interested in the single meaning that is common to all words of the above synset. Synsets also come with a prose definition and some example sentences:

In [3]:
wn.synset('car.n.01').definition()

'a motor vehicle with four wheels; usually propelled by an internal combustion engine'

In [4]:
wn.synset('car.n.01').examples()

['he needs a car to get to work']

To eliminate ambiguity, we will identify these words as car.n.01.automobile, car.n.01.motorcar, and so on. This pairing of a synset with a word is called a lemma. 

In [5]:
wn.synset('car.n.01').lemmas()

[Lemma('car.n.01.car'),
 Lemma('car.n.01.auto'),
 Lemma('car.n.01.automobile'),
 Lemma('car.n.01.machine'),
 Lemma('car.n.01.motorcar')]

In [6]:
wn.lemma('car.n.01.automobile') 

Lemma('car.n.01.automobile')

In [7]:
wn.lemma('car.n.01.automobile').synset()

Synset('car.n.01')

In [8]:
wn.lemma('car.n.01.automobile').name()

'automobile'

In [9]:
wn.synsets('car')

[Synset('car.n.01'),
 Synset('car.n.02'),
 Synset('car.n.03'),
 Synset('car.n.04'),
 Synset('cable_car.n.01')]

In [10]:
for synset in wn.synsets('car'):
    print(synset.lemma_names())

['car', 'auto', 'automobile', 'machine', 'motorcar']
['car', 'railcar', 'railway_car', 'railroad_car']
['car', 'gondola']
['car', 'elevator_car']
['cable_car', 'car']


In [11]:
wn.lemmas('car')

[Lemma('car.n.01.car'),
 Lemma('car.n.02.car'),
 Lemma('car.n.03.car'),
 Lemma('car.n.04.car'),
 Lemma('cable_car.n.01.car')]

In [12]:
wn.synsets('dish')

[Synset('dish.n.01'),
 Synset('dish.n.02'),
 Synset('dish.n.03'),
 Synset('smasher.n.02'),
 Synset('dish.n.05'),
 Synset('cup_of_tea.n.01'),
 Synset('serve.v.06'),
 Synset('dish.v.02')]

In [13]:
wn.synset('dish.n.02').lemma_names()

['dish']

In [14]:
wn.synset('dish.n.02').definition()

'a particular item of prepared food'

In [15]:
wn.synset('dish.n.02').lemmas()

[Lemma('dish.n.02.dish')]

WordNet makes it easy to navigate between concepts. For example, given a concept like motorcar, we can look at the concepts that are more specific; the (immediate) hyponyms.

In [16]:
motorcar = wn.synset('car.n.01')
types_of_motorcar = motorcar.hyponyms()
types_of_motorcar[0]

Synset('ambulance.n.01')

In [None]:
sorted(lemma.name() for synset in types_of_motorcar for lemma in synset.lemmas())
#['Model_T', 'S.U.V.', 'SUV', 'Stanley_Steamer', 'ambulance', 'beach_waggon',
#'beach_wagon', 'bus', 'cab', 'compact', 'compact_car', 'convertible',
#'coupe', 'cruiser', 'electric', 'electric_automobile', 'electric_car',
#'estate_car', 'gas_guzzler', 'hack', 'hardtop', 'hatchback', 'heap',
#'horseless_carriage', 'hot-rod', 'hot_rod', 'jalopy', 'jeep', 'landrover',
#'limo', 'limousine', 'loaner', 'minicar', 'minivan', 'pace_car', 'patrol_car',
#'phaeton', 'police_car', 'police_cruiser', 'prowl_car', 'race_car', 'racer',
#'racing_car', 'roadster', 'runabout', 'saloon', 'secondhand_car', 'sedan',
#'sport_car', 'sport_utility', 'sport_utility_vehicle', 'sports_car', 'squad_car',
#'station_waggon', 'station_wagon', 'stock_car', 'subcompact', 'subcompact_car',
#'taxi', 'taxicab', 'tourer', 'touring_car', 'two-seater', 'used-car', 'waggon',
#'wagon']

In [None]:
motorcar.hypernyms()
#[Synset('motor_vehicle.n.01')]
paths = motorcar.hypernym_paths()
len(paths)
#2
[synset.name() for synset in paths[0]]
#['entity.n.01', 'physical_entity.n.01', 'object.n.01', 'whole.n.02', 'artifact.n.01',
#'instrumentality.n.03', 'container.n.01', 'wheeled_vehicle.n.01',
#'self-propelled_vehicle.n.01', 'motor_vehicle.n.01', 'car.n.01']
[synset.name() for synset in paths[1]]
#['entity.n.01', 'physical_entity.n.01', 'object.n.01', 'whole.n.02', 'artifact.n.01',
#'instrumentality.n.03', 'conveyance.n.03', 'vehicle.n.01', 'wheeled_vehicle.n.01',
#'self-propelled_vehicle.n.01', 'motor_vehicle.n.01', 'car.n.01']
motorcar.root_hypernyms()
#[Synset('entity.n.01')]

Hypernyms and hyponyms are called lexical relations because they relate one synset to another. These two relations navigate up and down the "is-a" hierarchy. Another important way to navigate the WordNet network is from items to their components (meronyms) or to the things they are contained in (holonyms). For example, the parts of a tree are its trunk, crown, and so on; the part_meronyms(). The substance a tree is made of includes heartwood and sapwood; the substance_meronyms(). A collection of trees forms a forest; the member_holonyms():

In [17]:
for synset in wn.synsets('mint', wn.NOUN):
    print(synset.name() + ':', synset.definition())

batch.n.02: (often followed by `of') a large number or amount or extent
mint.n.02: any north temperate plant of the genus Mentha with aromatic leaves and small mauve flowers
mint.n.03: any member of the mint family of plants
mint.n.04: the leaves of a mint plant used fresh or candied
mint.n.05: a candy that is flavored with a mint oil
mint.n.06: a plant where money is coined by authority of the government


In [18]:
wn.synset('mint.n.04').part_holonyms()

[Synset('mint.n.02')]

In [19]:
wn.synset('mint.n.04').substance_holonyms()

[Synset('mint.n.05')]

In [20]:
wn.synset('walk.v.01').entailments()

[Synset('step.v.01')]

In [21]:
wn.synset('eat.v.01').entailments()

[Synset('chew.v.01'), Synset('swallow.v.01')]

In [22]:
wn.synset('tease.v.03').entailments()

[Synset('arouse.v.07'), Synset('disappoint.v.01')]

Some lexical relationships hold between lemmas, e.g., **antonymy**:

In [23]:
wn.lemma('supply.n.02.supply').antonyms()

[Lemma('demand.n.02.demand')]

In [24]:
wn.lemma('rush.v.01.rush').antonyms()

[Lemma('linger.v.04.linger')]

In [25]:
wn.lemma('horizontal.a.01.horizontal').antonyms()

[Lemma('vertical.a.01.vertical'), Lemma('inclined.a.02.inclined')]

In [26]:
wn.lemma('staccato.r.01.staccato').antonyms()

[Lemma('legato.r.01.legato')]

we can traverse the WordNet network to find synsets with related meanings. Knowing which words are semantically related is useful for indexing a collection of texts, so that a search for a general term like vehicle will match documents containing specific terms like limousine.

In [27]:
right = wn.synset('right_whale.n.01')
orca = wn.synset('orca.n.01')
minke = wn.synset('minke_whale.n.01')
tortoise = wn.synset('tortoise.n.01')
novel = wn.synset('novel.n.01')
right.lowest_common_hypernyms(minke)

[Synset('baleen_whale.n.01')]

In [28]:
right.lowest_common_hypernyms(orca)

[Synset('whale.n.02')]

In [29]:
right.lowest_common_hypernyms(tortoise)

[Synset('vertebrate.n.01')]

In [30]:
right.lowest_common_hypernyms(novel)

[Synset('entity.n.01')]

In [31]:
wn.synset('baleen_whale.n.01').min_depth()

14

In [32]:
wn.synset('whale.n.02').min_depth()

13

In [33]:
wn.synset('vertebrate.n.01').min_depth()

8

In [34]:
wn.synset('entity.n.01').min_depth()

0

Similarity measures have been defined over the collection of WordNet synsets which incorporate the above insight. For example, path_similarity assigns a score in the range 0–1 based on the shortest path that connects the concepts in the hypernym hierarchy (-1 is returned in those cases where a path cannot be found). Comparing a synset with itself will return 1. 

In [35]:
right.path_similarity(minke)

0.25

In [36]:
right.path_similarity(orca)

0.16666666666666666

In [37]:
right.path_similarity(tortoise)

0.07692307692307693

In [38]:
right.path_similarity(novel)

0.043478260869565216

In [39]:
help(wn) 

Help on WordNetCorpusReader in module nltk.corpus.reader.wordnet object:

class WordNetCorpusReader(nltk.corpus.reader.api.CorpusReader)
 |  WordNetCorpusReader(root, omw_reader)
 |  
 |  A corpus reader used to access wordnet or its variants.
 |  
 |  Method resolution order:
 |      WordNetCorpusReader
 |      nltk.corpus.reader.api.CorpusReader
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, root, omw_reader)
 |      Construct a new wordnet corpus reader, with the given root
 |      directory.
 |  
 |  all_lemma_names(self, pos=None, lang='eng')
 |      Return all lemma names for all synsets for the given
 |      part of speech tag and language or languages. If pos is
 |      not specified, all synsets for all parts of speech will
 |      be used.
 |  
 |  all_synsets(self, pos=None)
 |      Iterate over all synsets with a given part of speech tag.
 |      If no pos is specified, all synsets for all parts of speech
 |      will be loaded.
 |  
 |  cita

In [41]:
from nltk.book import *
nltk.corpus.verbnet()

*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908


NameError: name 'nltk' is not defined