# Wordnet

## import

In [1]:
import nltk

In [2]:
from nltk.corpus import wordnet

In [4]:
# 載入wordnet需要的詞庫
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\owo-a\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\wordnet.zip.


True

## 取得定義和同義詞

在WordNet裡面，字詞會被分類成synset的同義詞群組。  
以car為例，要取得其同義詞組的話用`wordnet.synsets()`就可以了。  
只是因為會對應到不止一種意思，(車子、車廂、纜車...)，所以必須要從多個詞庫當中設定對應的意思

In [5]:
wordnet.synsets('car')

[Synset('car.n.01'),
 Synset('car.n.02'),
 Synset('car.n.03'),
 Synset('car.n.04'),
 Synset('cable_car.n.01')]

`car.n.01`中
+ `car`: 字詞名稱
+ `n`:屬性（詞性）
+ `01`:群組index

In [8]:
car = wordnet.synset('car.n.01')
car.definition() # 取得car的定義解釋

'a motor vehicle with four wheels; usually propelled by an internal combustion engine'

In [9]:
car.lemma_names() # 取得car同義詞組名稱

['car', 'auto', 'automobile', 'machine', 'motorcar']

## 詞網

用`hypernym_path()`可以檢視與其他字詞的上下關係
> 【語義學】hypernym：上位詞

所以看起來由entity → physical_entity → object →...→ motor_vehicle → car組成的一條路徑

In [10]:
car.hypernym_paths()[0]

[Synset('entity.n.01'),
 Synset('physical_entity.n.01'),
 Synset('object.n.01'),
 Synset('whole.n.02'),
 Synset('artifact.n.01'),
 Synset('instrumentality.n.03'),
 Synset('container.n.01'),
 Synset('wheeled_vehicle.n.01'),
 Synset('self-propelled_vehicle.n.01'),
 Synset('motor_vehicle.n.01'),
 Synset('car.n.01')]

## 詞義相似度

用`path_similarity()`方法可以算出字詞之間的相似程度，回傳值為$0\sim 1$之間的實數

In [11]:
car = wordnet.synset('car.n.01')
novel = wordnet.synset('novel.n.01')
dog = wordnet.synset('dog.n.01')
motorcycle = wordnet.synset('motorcycle.n.01')

In [12]:
car.path_similarity(novel)

0.05555555555555555

In [13]:
novel.path_similarity(car)

0.05555555555555555

In [14]:
car.path_similarity(motorcycle)

0.3333333333333333