<a href="https://colab.research.google.com/github/NoStracts/HLT-NLP/blob/main/Wordnet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Wordnet

## Part 1

Wordnet is a project started by Princeton psycologiest George Miller in the 1980s to hierarchically organize nouns, verbs, adjectives and adverbs. The main relation between words is synonymy, meaning that words are groups into sets of synonyms calls synsets.

## Part 2

In [64]:
import nltk
from nltk.corpus import wordnet as wn

wn.synsets('house')

[Synset('house.n.01'),
 Synset('firm.n.01'),
 Synset('house.n.03'),
 Synset('house.n.04'),
 Synset('house.n.05'),
 Synset('house.n.06'),
 Synset('house.n.07'),
 Synset('sign_of_the_zodiac.n.01'),
 Synset('house.n.09'),
 Synset('family.n.01'),
 Synset('theater.n.01'),
 Synset('house.n.12'),
 Synset('house.v.01'),
 Synset('house.v.02')]

## Part 3

In [65]:
firm = wn.synset('firm.n.01')
print(firm.definition())
print(firm.examples())
print(firm.lemmas())
hyp = lambda s: s.hypernyms()
list(firm.closure(hyp))

the members of a business organization that owns or operates one or more establishments
['he worked for a brokerage house']
[Lemma('firm.n.01.firm'), Lemma('firm.n.01.house'), Lemma('firm.n.01.business_firm')]


[Synset('business.n.01'),
 Synset('enterprise.n.02'),
 Synset('organization.n.01'),
 Synset('social_group.n.01'),
 Synset('group.n.01'),
 Synset('abstraction.n.06'),
 Synset('entity.n.01')]

How Wordnet's hierarchy works is that hypernyms sit towrds the top while hyponyms sit towards the bottom. Entity is the top most synset for all nouns in Wordnet. 

## Part 4

In [66]:
print(firm.hypernyms())
print(firm.hyponyms())
print(firm.part_meronyms())
print(firm.part_meronyms())
print(firm.lemmas()[0].antonyms())
print(firm.lemmas()[1].antonyms())
print(firm.lemmas()[2].antonyms())

[Synset('business.n.01')]
[Synset('accounting_firm.n.01'), Synset('auction_house.n.01'), Synset('consulting_firm.n.01'), Synset('corporation.n.01'), Synset('dealer.n.02'), Synset('law_firm.n.01'), Synset('publisher.n.01')]
[]
[]
[]
[]
[]


## Part 5

In [67]:
wn.synsets('run')

[Synset('run.n.01'),
 Synset('test.n.05'),
 Synset('footrace.n.01'),
 Synset('streak.n.01'),
 Synset('run.n.05'),
 Synset('run.n.06'),
 Synset('run.n.07'),
 Synset('run.n.08'),
 Synset('run.n.09'),
 Synset('run.n.10'),
 Synset('rivulet.n.01'),
 Synset('political_campaign.n.01'),
 Synset('run.n.13'),
 Synset('discharge.n.06'),
 Synset('run.n.15'),
 Synset('run.n.16'),
 Synset('run.v.01'),
 Synset('scat.v.01'),
 Synset('run.v.03'),
 Synset('operate.v.01'),
 Synset('run.v.05'),
 Synset('run.v.06'),
 Synset('function.v.01'),
 Synset('range.v.01'),
 Synset('campaign.v.01'),
 Synset('play.v.18'),
 Synset('run.v.11'),
 Synset('tend.v.01'),
 Synset('run.v.13'),
 Synset('run.v.14'),
 Synset('run.v.15'),
 Synset('run.v.16'),
 Synset('prevail.v.03'),
 Synset('run.v.18'),
 Synset('run.v.19'),
 Synset('carry.v.15'),
 Synset('run.v.21'),
 Synset('guide.v.05'),
 Synset('run.v.23'),
 Synset('run.v.24'),
 Synset('run.v.25'),
 Synset('run.v.26'),
 Synset('run.v.27'),
 Synset('run.v.28'),
 Synset('run.v.

## Part 6

In [68]:
race = wn.synset('race.v.02')
print(race.definition())
print(race.examples())
print(race.lemmas())
hyp = lambda s: s.hypernyms()
list(race.closure(hyp))

compete in a race
['he is running the Marathon this year', "let's race and see who gets there first"]
[Lemma('race.v.02.race'), Lemma('race.v.02.run')]


[Synset('compete.v.01')]

Verbs to not have a top word, unlike nouns which have 'entity' at the top. Despite this, the general hierarchy applies with more general words at the bottom and less general words at the top.

## Part 7

In [69]:
print(wn.morphy('race', wn.ADJ))
print(wn.morphy('race'))
print(wn.morphy('race', wn.VERB))
print(wn.morphy('race', wn.ADV))

None
race
race
None


## Part 8

In [99]:
from nltk.wsd import lesk
walk = wn.synset('walk.n.01')
strut = wn.synset('strut.n.01')
print(wn.wup_similarity(walk, strut))
sent = "I decided to walk while Leo decided to strut."
sentence = [*sent]
print(lesk(sentence,'walk'))
print(lesk(sentence,'strut'))

0.9090909090909091
Synset('walk.v.10')
Synset('tittup.v.01')


According to the Wo-Palmer similarity metric, walk and strut are 90% similar. In the context sentence, walk is the only word that fits that context, but tittup can be used in place of strut.


## Part 9

SentiWordNet is a resource that is used to analyze whether a word has a positive, negative, or objective connotation. This can help a writer or speaker figure out what words should be used in specific situations. 

In [80]:
import nltk
from nltk.corpus import sentiwordnet as swn

scrum = swn.senti_synsets('scrumptious', 'a')
for i in scrum:
  print(i)

<delectable.s.01: PosScore=0.75 NegScore=0.25>


The word scrumptious is a positive word. I'm suprised that it has a noticeable negative score because it is not common for it to be used with negative connotations. 

## Part 10

Collocations are words that appear together to form a meaning that is not just the sum of their parts. The example I use will be General Government which refer to the central government rather a government that is just general.

In [94]:
import nltk
from nltk.book import *
print(text4.collocations())


United States; fellow citizens; years ago; four years; Federal
Government; General Government; American people; Vice President; God
bless; Chief Justice; one another; fellow Americans; Old World;
Almighty God; Fellow citizens; Chief Magistrate; every citizen; Indian
tribes; public debt; foreign nations
None


In [96]:
import math
text = ' '.join(text4.tokens)
vocab = len(set(text4))
gg = text.count('General Government')/vocab
gen = text.count('General')/vocab
gov = text.count('Government')/vocab
pmi = math.log2(gg/(gen*gov))
print(pmi)

4.890435179947461


The term 'General Government' has a high amount of mutual information since it the two words have 400% chance of appering together than separately. This means that 'General Government' is most definitely a collocation.