# WordNet

In [None]:
import nltk
nltk.download('wordnet')

In [None]:
from nltk.corpus import wordnet as wn

WordNet este o baza de date lexicala care contine relatii semantice intre cuvinte. Aceasta este disponibila in mai multe limbi, dar in acest laborator vom lucra cu baza de date pentru limba Engleza.

WordNet contine substantive, verbe, adjective si adverbe grupate in submultimi de cuvinte cu acelasi sens. Unitatea de baza este **synset**-ul, o submultime de cuvinte care descriu acelasi concept. Un cuvant poate face parte din mai multe synseturi (are mai multe sensuri/este polisemic).

In [None]:
# obtinem toate synseturile pentru cuvantul "school"\
school_synsets = wn.synsets("school")
print(school_synsets)

[Synset('school.n.01'), Synset('school.n.02'), Synset('school.n.03'), Synset('school.n.04'), Synset('school.n.05'), Synset('school.n.06'), Synset('school.n.07'), Synset('school.v.01'), Synset('educate.v.03'), Synset('school.v.03')]


In [None]:
syn = school_synsets[1]  # school.n.02

Putem pentru un synset sa afisam o definitie a acelui sens.

In [None]:
print(syn.definition())

a building where young people receive education


De asemenea, WordNet contine exemple de utilizare in text pentru cuvintele dintr-un synset.

In [None]:
print(syn.examples())

['the school was built in 1932', 'he walked to school every morning']


Putem afisa pentru un synset toate lemele (formele de dictionar) ale cuvintelor care au acel sens.

In [None]:
print(syn.lemmas())
print(syn.lemma_names())

[Lemma('school.n.02.school'), Lemma('school.n.02.schoolhouse')]
['school', 'schoolhouse']


WordNet insa nu este doar o impartire a cuvintelor in sensuri. Aceasta resursa este valoroasa si prin multitudinea de relatii pe care le defineste intre synseturi.

Synseturile in WordNet se impart in synseturi pentru substantive (*n*), adjective (*a*, *s*), verbe (*v*) si adverbe (*r*). Principalele relatii sunt definite intre synseturi corespunzatoare acelorasi parti de vorbire (PoS - part-of-speech), insa exista si relatii cross-PoS.

## Tipuri de relatii pentru substantive

### 1. Hypernyms/Hyponyms

Spunem ca sensul $s_1$ este un hypernym al sensului $s_2$ daca $s_1$ inglobeaza sensul lui $s_2$. Cu alte cuvinte, $s_2$ este un fel de $s_1$ (*is a type of*).

Opusul relatiei de hypernymie este relatia de hyponymie ($s_2$ este un hyponym pentru $s_1$).

In [None]:
syn = wn.synset('school.n.01')
print("Hypernyms:", syn.hypernyms())
print("Hyponyms:", syn.hyponyms())

Hypernyms: [Synset('educational_institution.n.01')]
Hyponyms: [Synset('academy.n.03'), Synset('alma_mater.n.01'), Synset('conservatory.n.01'), Synset('correspondence_school.n.01'), Synset('crammer.n.03'), Synset('dance_school.n.01'), Synset('dancing_school.n.01'), Synset('day_school.n.02'), Synset('direct-grant_school.n.01'), Synset('driving_school.n.01'), Synset('finishing_school.n.01'), Synset('flying_school.n.01'), Synset('grade_school.n.01'), Synset('graduate_school.n.01'), Synset('language_school.n.01'), Synset('night_school.n.01'), Synset('nursing_school.n.01'), Synset('private_school.n.01'), Synset('public_school.n.01'), Synset('religious_school.n.01'), Synset('riding_school.n.01'), Synset('secondary_school.n.01'), Synset('secretarial_school.n.01'), Synset('sunday_school.n.01'), Synset('technical_school.n.01'), Synset('training_school.n.01'), Synset('veterinary_school.n.01')]


### 2. Meronyms/Holonyms

Spunem ca sensul $s_1$ este un holonym al sensului $s_2$ daca $s_1$ il "contine" pe $s_2$. Holonymele sunt de 3 feluri:
 - part (part of)
 - substance (made of)
 - member (contains)
 
Invers, spunem ca $s_2$ este un meronym pentru $s_1$.

In [None]:
syn = wn.synset('air.n.01')
print(syn.substance_holonyms())
print(syn.substance_meronyms())

[Synset('wind.n.01')]
[Synset('argon.n.01'), Synset('krypton.n.01'), Synset('neon.n.01'), Synset('nitrogen.n.01'), Synset('oxygen.n.01'), Synset('xenon.n.01')]


In [None]:
syn = wn.synset('house.n.01')
print(syn.part_holonyms())
print(syn.part_meronyms())

[]
[Synset('library.n.01'), Synset('loft.n.02'), Synset('porch.n.01'), Synset('study.n.05')]


In [None]:
syn = wn.synset('tree.n.01')
print(syn.member_holonyms())
print(syn.part_meronyms())

[Synset('forest.n.01')]
[Synset('burl.n.02'), Synset('crown.n.07'), Synset('limb.n.02'), Synset('stump.n.01'), Synset('trunk.n.01')]


## Tipuri de relatii pentru verbe

### 1. Hypernyms/Hyponyms

Asemanator cu descrierea de la substantive. Hyponymele pot fi vazute aici ca definind aceeasi actiune dar intr-un context mai restrans. De exemplu "to jog" sau "to sprint" sunt o varianta mai specifica a verbului "to run".

Hyponymele pentru verbe mai poarta numele si de troponyms.

In [None]:
syn = wn.synset('run.v.01')
print(syn.hypernyms())
print(syn.hyponyms())

[Synset('travel_rapidly.v.01')]
[Synset('hare.v.01'), Synset('jog.v.03'), Synset('lope.v.01'), Synset('outrun.v.01'), Synset('romp.v.02'), Synset('run.v.33'), Synset('run_bases.v.01'), Synset('rush.v.05'), Synset('scurry.v.01'), Synset('sprint.v.01'), Synset('streak.v.02'), Synset('trot.v.01')]


### 2. Entailment

Defineste pentru o actiune ce alta actiune trebuie sa aiba loc.

In [None]:
print(wn.synset('snore.v.01').entailments())
print(wn.synset('buy.v.01').entailments())

[Synset('sleep.v.01')]
[Synset('choose.v.01'), Synset('pay.v.01')]


## Tipuri de relatii pentru adjective

### 1. Antonime

Acestea se determina la nivel de lema (nu la nivel de synset).

In [None]:
lem = wn.synset('good.a.01').lemmas()[0]
print(lem.antonyms())

[Lemma('bad.a.01.bad')]


### 2. Sinonimie

In [None]:
syn = wn.synset('strong.a.01')
print(syn.similar_tos())

[Synset('beardown.s.01'), Synset('beefed-up.s.01'), Synset('brawny.s.01'), Synset('bullnecked.s.01'), Synset('bullocky.s.01'), Synset('fortified.s.02'), Synset('hard.s.04'), Synset('industrial-strength.s.01'), Synset('ironlike.s.01'), Synset('knock-down.s.01'), Synset('noticeable.s.04'), Synset('reinforced.s.01'), Synset('robust.s.03'), Synset('stiff.s.02'), Synset('vehement.s.02'), Synset('virile.s.01'), Synset('well-knit.s.01')]


## Tipuri de relatii pentru adverbe

### 1. Antonime

In [None]:
lem = wn.synset('quickly.r.01').lemmas()[0]
print(lem.antonyms())

[Lemma('slowly.r.01.slowly')]


## Tipuri de relatii cross-PoS

### 1. Attributes (substantive <-> adjective)

Aceasta relatie leaga un synset $s_1$ al unui substantiv cu un synset $s_2$ al unui adjectiv daca "$s_2$ poate fi o valoare pentru $s_1$".

In [None]:
print(wn.synset('strength.n.01').attributes())
print(wn.synset('strong.a.01').attributes())
print(wn.synset('weak.a.01').attributes())

[Synset('delicate.a.01'), Synset('rugged.a.01'), Synset('strong.a.01'), Synset('weak.a.01')]
[Synset('strength.n.01')]
[Synset('strength.n.01')]


### 2. Pertainyms (pentru adjective si adverbe)

Returneaza concepte care se refera la calitatile descrise de adjective/adverbe.

In [None]:
lem = wn.synset('technical.a.01').lemmas()[0]
print(lem.pertainyms())

[Lemma('technique.n.01.technique')]


In [None]:
lem = wn.synset('quickly.r.01').lemmas()[0]
print(lem.pertainyms())

[Lemma('quick.s.01.quick')]


## Vizualizare

Puteti folosi aceasta platforma pentru a vizualiza relatiile din WordNet: [http://wordvis.com/](http://wordvis.com/).

## Graful relatiilor pentru hypernyms

Daca consideram synseturile noduri intr-un graf, iar relatiilor de forma $s_1$ este hypernym pentru $s_2$ le asociem o muchie orientata de la $s_2$ la $s_1$ obtinem un graf orientat aciclic (DAG).

![](https://www.researchgate.net/profile/Zhao-Lu-3/publication/261351248/figure/fig1/AS:669012354691096@1536516383841/A-DAG-fragment-of-WordNet-30.ppm)

Un drum de hypernyme (hypernym path) pentru un synset este un drum in graf de la nodul acestuia pana la un nod radacina (cu grad de iesire $0$). Observati ca pentru un synset pot exista mai multe astfel de drumuri.

In [None]:
syn = wn.synset("water.n.01")
paths = syn.hypernym_paths()
print(len(paths))

3


In [None]:
path = paths[0][::-1]
print(" -> ".join(n.name() for n in path))

water.n.01 -> binary_compound.n.01 -> compound.n.02 -> chemical.n.01 -> material.n.01 -> substance.n.01 -> matter.n.03 -> physical_entity.n.01 -> entity.n.01


Definim adancimea unui synset ca fiind lungimea celui mai lung astfel de hypernym path.

Numim *lowest common hypernym* al doua synsteturi ($s_1$ si $s_2$), synsetul (sau synseturile) cu adancime maxima care se afla pe cel putin un hypernym path al lui $s_1$ si pe cel putin un hypernym path al lui $s_2$.

In exemplul de mai sus *lowest common hypernym* pentru "bus \#1" si "engine" este "public transport". 

Pe baza acestor drumuri se poate calcula si un scor de similaritate intre doua synseturi.

In [None]:
dog = wn.synset('dog.n.01')
cat = wn.synset('cat.n.01')
plane = wn.synset('plane.n.01')

print("dog ~ cat:", dog.path_similarity(cat))
print("dog ~ plane:", dog.path_similarity(plane))

dog ~ cat: 0.2
dog ~ plane: 0.07142857142857142


# TASK
## Deadline: 14 aprilie ora 23:59.

Formular pentru trimiterea temei: https://forms.gle/ZRZR4Uh7xZsQvBvi6


1. Implementati o functie care afiseza pentru un cuvant definitiile tuturor synseturilor din care face parte cuvantul.
2. Implementati o functie care verifica daca doua cuvinte $w_1$ si $w_2$ au cel putin un synset comun. Cu alte cuvinte verificati daca $w_1$ si $w_2$ sunt sinonime.
3. Implementati o functie care pentru un synset dat afiseaza toate holonymele si toate meronymele sale.
4. Implementati o functie care afiseaza pentru un synset dat toate drumurile de hypernyme corespunzatoare acestuia.
5. Implementati o functie care pentru doua synseturi determina lowest common hypernym(s) si afiseaza definitia acestor sensuri comune.
6. Implementati o functie care primeste un synset $s$ si o lista de synseturi. Sortati aceasta lista descrescator conform similaritatii dintre $s$ si componentele sale.
7. Implementati o functie care afiseaza pentru un cuvant sinonimele sale si antonimele sale (pentru toate sensurile cuvantului).