# Lab1: Introduction of jupyternotebook and exploration of WordNet

Objective today:
- Understand the logic of jupyter Notebook and Google Colab
- Explore wordnet with NLTK

## What is Jupyter Notebook?
Jupyter Notebook is an open-source web application that allows users to create and share documents that contain:

* Live code (in Python, R, Julia, etc.).
* Text (written in Markdown) [tutorial of Markdown](https://www.ibm.com/docs/en/SSYKAV?topic=train-how-do-use-markdown).
* Visualizations (like plots, charts).
* Mathematical equations (using LaTeX).

**Why Do We Use Jupyter Notebook?**



1.   Interactive Coding: Run code in small chunks and see the results immediately, which is ideal for learning, testing, and prototyping.
2.   Data Analysis and Visualization: Easily combine code, visualizations, and explanatory text in a single document.
3.   Documentation and Sharing: Share research, teaching materials, or analyses with others using an easy-to-read format.
4.   Reproducibility: It supports step-by-step execution of code, making workflows transparent and repeatable.

https://www.geeksforgeeks.org/how-to-use-jupyter-notebook-an-ultimate-guide/

## Why do we use Google Colab?

Google colab is cloud-based Jupyter environment provided by Google.

advantages:
- No setup required; runs in a web browser.
- Files stored in Google Drive or downloadable.
- Real-time collaboration like Google Docs.
- Leverages Google's computational resources (e.g., GPUs, TPUs for free or upgraded plans).
- Limited advanced features (but user-friendly).

Other tools where you can run jupyter notebook (locally, make sure you have powerful computing resources):
- VSCode
- JupyterLab

## Introduction to WordNet with NLTK

### 1. Setting Up

Objective: Install NLTK and download WordNet data.

Topics Covered:

*   Installing the nltk library.
*   Downloading the WordNet dataset.


In [1]:
# Import nltk and download WordNet
import nltk
nltk.download('wordnet')
nltk.download('omw-1.4')  # Optional: For multilingual WordNet


[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


True

### 2. Understanding WordNet Synsets

Objective: Understand the concept of Synsets (Synonym Sets) and how to access them.

Topics Covered:
* What are Synsets?
*   Fetching Synsets for a word.
*   Printing Synset names, parts of speech, and definitions.

In [2]:
from nltk.corpus import wordnet as wn

# Get synsets for a word
word = "bank"
synsets = wn.synsets(word) # a list of synsets of word bank, each element contains features like pos, name about this word
print(f"Synsets for '{word}':")
for synset in synsets:
    print(f"- {synset.name()}: {synset.definition()} (POS: {synset.pos()})")


Synsets for 'bank':
- bank.n.01: sloping land (especially the slope beside a body of water) (POS: n)
- depository_financial_institution.n.01: a financial institution that accepts deposits and channels the money into lending activities (POS: n)
- bank.n.03: a long ridge or pile (POS: n)
- bank.n.04: an arrangement of similar objects in a row or in tiers (POS: n)
- bank.n.05: a supply or stock held in reserve for future use (especially in emergencies) (POS: n)
- bank.n.06: the funds held by a gambling house or the dealer in some gambling games (POS: n)
- bank.n.07: a slope in the turn of a road or track; the outside is higher than the inside in order to reduce the effects of centrifugal force (POS: n)
- savings_bank.n.02: a container (usually with a slot in the top) for keeping money at home (POS: n)
- bank.n.09: a building in which the business of banking transacted (POS: n)
- bank.n.10: a flight maneuver; aircraft tips laterally about its longitudinal axis (especially in turning) (POS:

### 3. Exploring Synset Attributes

Objective: Learn about Synset properties.

Topics Covered:

* Synset lemmas.
* Examples (usage) of the synset in a sentence.

In [3]:
synset1 = wn.synset("plant.n.01")
print(synset1.name())
print(synset1.definition())
print(synset1.examples())
print(f"{[lemma.name() for lemma in synset1.lemmas()]}") # in the synset we perhaps have a list of lemmas refer to our synonyme set

plant.n.01
buildings for carrying on industrial labor
['they built a large plant to manufacture automobiles']
['plant', 'works', 'industrial_plant']


### 4. Relationships Between Synsets

Objective: Explore semantic relationships like hypernyms, hyponyms, meronyms, etc.

Topics Covered:
* Hypernyms (more general terms).
* Hyponyms (more specific terms).
* Holonyms (whole-part relationships).
* Meronyms (part-whole relationships).

In [4]:
synset = wn.synset("money.n.01") # wn.synset for one singer set
print("Hypernyms which are terms more general") # a same type of synset, containing info like name, pos, lemmas
print(synset.hypernyms())
print("Hyponyms which are terms more specific")
hyponyme = synset.hyponyms()[0]

print(f"{[hypo.name() for hypo in hyponyme.lemmas()]}")

Hypernyms which are terms more general
[Synset('medium_of_exchange.n.01')]
Hyponyms which are terms more specific
['token_money']


### 5. Lexical Relations for Words

Objective: Explore synonyms, antonyms, and related forms of words.

Topics Covered:

* Fetching synonyms and antonyms using lemmas.

In [5]:
# get synonyms and antonymes
word = "cat"
synsets = wn.synsets(word)

for synset in synsets: # loop each synset
    for lemma in synset.lemmas(): # in each synset loop each lemma
        print(f"{word} lemma is {lemma.name()}")
        if lemma.antonyms(): #  for each lemma try to find its antony
            print(f"the antonyme of {lemma.name()} is {lemma.antonyms()[0].name()}")

cat lemma is cat
cat lemma is true_cat
cat lemma is guy
cat lemma is cat
cat lemma is hombre
cat lemma is bozo
cat lemma is cat
cat lemma is kat
cat lemma is khat
cat lemma is qat
cat lemma is quat
cat lemma is cat
cat lemma is Arabian_tea
cat lemma is African_tea
cat lemma is cat-o'-nine-tails
cat lemma is cat
cat lemma is Caterpillar
cat lemma is cat
cat lemma is big_cat
cat lemma is cat
cat lemma is computerized_tomography
cat lemma is computed_tomography
cat lemma is CT
cat lemma is computerized_axial_tomography
cat lemma is computed_axial_tomography
cat lemma is CAT
cat lemma is cat
cat lemma is vomit
the antonyme of vomit is keep_down
cat lemma is vomit_up
cat lemma is purge
cat lemma is cast
cat lemma is sick
cat lemma is cat
cat lemma is be_sick
cat lemma is disgorge
cat lemma is regorge
cat lemma is retch
cat lemma is puke
cat lemma is barf
cat lemma is spew
cat lemma is spue
cat lemma is chuck
cat lemma is upchuck
cat lemma is honk
cat lemma is regurgitate
cat lemma is throw_

### 6. Path Similarity Between Synsets

Objective: Learn to compute semantic similarity between synsets.

Topics Covered:
* Path similarity.
* Other similarity metrics.

In [6]:
# Semantic similarity : path_similarity
synset1 = wn.synset("dog.n.01")
synset2 = wn.synset("cat.n.02")

similarity = synset1.path_similarity(synset2)
print(f"Path similarity between '{synset1.name()}' and '{synset2.name()}': {similarity}")


Path similarity between 'dog.n.01' and 'guy.n.01': 0.125


### 7. Advanced Search Using Closures

Objective: Learn how to explore multiple levels of semantic relationships.

Topics Covered:
* Using closures to recursively explore relationships like hypernyms.

In [7]:
synset = wn.synset("panda.n.01")
hypernyms = lambda s: s.hypernyms()

closure_result = list(synset.closure(hypernyms)) # closure provides a recursive possibility to loop up all dataset and return a list of hypernyms set
# the type synset is the elemental type in wordnet
for hypernym in closure_result:
    print(hypernym.name())

procyonid.n.01
carnivore.n.01
placental.n.01
mammal.n.01
vertebrate.n.01
chordate.n.01
animal.n.01
organism.n.01
living_thing.n.01
whole.n.02
object.n.01
physical_entity.n.01
entity.n.01


### 8. Multilingual WordNet

Objective: Explore translations and multilingual features.

Topics Covered:
* Using omw (Open Multilingual WordNet) to translate words.

In [8]:
# Multilingual support
from nltk.corpus import wordnet as wn

synset = wn.synset("love.n.01")
print(f"Translations for '{synset.name()}':")
print(synset.lemmas("spa"))  # Get Spanish lemmas


Translations for 'love.n.01':
[Lemma('love.n.01.amor')]


## Now it's your turn!

### Tasks


#### **1. Exploring Synsets**
**Word**: Use the word **"plant"** for this task (or replace it with a word you like).  
- **a)** Find and list all the synsets for the word.  
- **b)** For each synset, provide:
  - Its definition.
  - Its lemmas (alternative words).  
- **c)** Which synset do you think matches the most common meaning of your choosen word? Explain why.  

- Discuss whether the synsets cover all possible meanings. Are any definitions too vague, too specific, or missing entirely?
---

In [9]:
from nltk.corpus import wordnet as wn

In [10]:
word = "magic"
synsets = wn.synsets(word)
synsets_list = list(synsets)
for synset in synsets:
    print(f"synset <{synset.name()}> has meaning: <{synset.definition()}> ")
    for lemma in synset.lemmas():
        print(f"with lemmas: <{lemma.name()}>")

synset <magic.n.01> has meaning: <any art that invokes supernatural powers> 
with lemmas: <magic>
with lemmas: <thaumaturgy>
synset <magic_trick.n.01> has meaning: <an illusory feat; considered magical by naive observers> 
with lemmas: <magic_trick>
with lemmas: <conjuring_trick>
with lemmas: <trick>
with lemmas: <magic>
with lemmas: <legerdemain>
with lemmas: <conjuration>
with lemmas: <thaumaturgy>
with lemmas: <illusion>
with lemmas: <deception>
synset <charming.s.02> has meaning: <possessing or using or characteristic of or appropriate to supernatural powers; ; ; ; - Shakespeare> 
with lemmas: <charming>
with lemmas: <magic>
with lemmas: <magical>
with lemmas: <sorcerous>
with lemmas: <witching>
with lemmas: <wizard>
with lemmas: <wizardly>


##### **Answer**
**Word**: magic
- **c)** There are 3 synsets for my word magic. For me I think the most common meaning is this one : `possessing or using or characteristic of or appropriate to supernatural powers` beacuse, the word magic appears frequenctly in tells as an inexsting power that should not appear in real life, the power that can complish an impossible thing. And the other two denitions are too specific : `any art that invokes supernatural powers`. `an illusory feat; considered magical by naive observers`

### **2. Exploring Semantic Relationships**
- **a)** Choose the synset **"plant.n.01"** (or another synset of your choice).  
- **b)** Find its hypernym path (more general terms).  
- **c)** Find its hyponyms (more specific terms).  
- **d)** Find its holonyms (whole-part relationships) and meronyms (part-whole relationships).  

**Question**: Do you agree with these relationships? Provide examples of any relationship that seems incorrect or incomplete.

---


In [11]:
synset01 = synsets[0]
print(synset01.definition())
hyper = synset01.hypernym_paths() # hypernym_paths returns all possible paths from the synset to the root hypernym, it's more general and exhausive, beacuse it returns the hierarchical strucure of the word (understanding helped by deepseek, based on its explanation, I have written these comments)
# hypernyms() returns the immediate hypernyms of a synset, like for the word dog, its hypernyms could be canine
print(f"hypernym path: {hyper}")
hypo = synset01.hyponyms()
print(f"hyponyms: {hypo}")
# about holonyms : the full strucre that contains sub parts like a tree contains branches, root and so on, these parts are so called meronyms
holonyms = synset01.part_holonyms() # there is another method member_holonyms()
print(f"holonyms: {holonyms}")
meronyms = synset01.part_meronyms()
print(f"meronyms: {meronyms}")
# but the result is wired, for the word branch, the returned are empty, same for magic

any art that invokes supernatural powers
hypernym path: [[Synset('entity.n.01'), Synset('abstraction.n.06'), Synset('psychological_feature.n.01'), Synset('cognition.n.01'), Synset('content.n.05'), Synset('belief.n.01'), Synset('supernaturalism.n.01'), Synset('magic.n.01')]]
hyponyms: [Synset('white_magic.n.01'), Synset('juju.n.01'), Synset('mojo.n.01'), Synset('conjuring.n.01'), Synset('sorcery.n.01')]
holonyms: []
meronyms: []


#### **Answer**
- the `Synset('entity.n.01')` it refers to the rootset that has the definition the most abstract
- the word magic with `any art that invokes supernatural powers` definition, could be a meronym for a holonnym like unreal power, and the word itself can have meronyms like alchemy

### **3. Semantic Ambiguity**
- **a)** Compare the synsets of **"plant"** with those of the word **"factory"**.  
- **b)** Are there overlapping meanings between the two?  
- **c)** Do you agree that "factory" could be a synonym for "plant"? Why or why not?  

Discuss whether WordNet handles nuanced differences in meaning adequately.

---

In [12]:
word01 = "plant"
word02 = "factory"
synsets_plant = wn.synsets(word01)
synsets_factory = wn.synsets(word02)
plant_defition  = [synset.definition() for synset in synsets_plant]
factory_defition = [synset.definition() for synset in synsets_factory]
print(plant_defition,
      factory_defition)

['buildings for carrying on industrial labor', '(botany) a living organism lacking the power of locomotion', 'an actor situated in the audience whose acting is rehearsed but seems spontaneous to the audience', 'something planted secretly for discovery by another', 'put or set (seeds, seedlings, or plants) into the ground', 'fix or set securely or deeply', 'set up or lay the groundwork for', 'place into a river', 'place something or someone in a certain position in order to secretly observe or deceive', 'put firmly in the mind'] ['a plant consisting of one or more buildings with facilities for manufacturing']


- Yes, these two wods both refer to building, or monuments, it seems that Wordnet does not care about the context of the word ? Words similarty is based on denition without considering them into real context, like given sentence : `the plant that my father planted last year in the electroic plant is growing very well`. WordNet does not care about the context while plant here has 3 defitions different, the first one is a living thing that grows in earth, the second is a verb that describes the action to put a plant into the ground and the last one means a building, a factory
- wordnet can not recognize the relation between words which are associated actually in a phrase, like word coffee and cup, even though their definition are so differented, they are often associated


### **4. Similarity Measures**
- **a)** Use the **Wu-Palmer similarity** to calculate the similarity between (or with any other synsets but with different levels of similarities):  
  - **"plant.n.01"** and **"factory.n.01"**  
  - **"plant.n.01"** and **"tree.n.01"**  
  - **"plant.n.01"** and **"dog.n.01"**  
- **b)** Rank these pairs from most to least similar.  

**Question**: Does the Wu-Palmer similarity reflect your intuition? If not, why do you think it fails?

---

In [25]:
synset_plant = wn.synset("plant.n.01")
synset_factory = wn.synset("factory.n.01")
synset_tree = wn.synset("tree.n.01")
synset_dog = wn.synset("dog.n.01")

wup1 = synset_plant.wup_similarity(synset_factory)
print(f"similarity between plant.n.01 and factory.n.01 is {wup1}")
wup2 = synsets_plant.wup_similarity(synset_tree)

wup3 = synset_plant.wup_similarity(synset_dog)
result = []
result.append(wup1)
result.append(wup2)
result.append(wup3)
result.sort(reverse=True)
print(result)




similarity between plant.n.01 and factory.n.01 is 0.9411764705882353
[0.9411764705882353, 0.47058823529411764, 0.4444444444444444]


In [32]:
synsets = {
    "plant": wn.synset("plant.n.01"),
    "factory": wn.synset("factory.n.01"),
    "tree": wn.synset("tree.n.01"),
    "dog": wn.synset("dog.n.01")
}

results = [
    synsets["plant"].wup_similarity(synsets["factory"]),
    synsets["plant"].wup_similarity(synsets["tree"]),
    synsets["plant"].wup_similarity(synsets["dog"]),
]
print(results)

results.sort(reverse=True)
print(sorted(synsets["plant"].common_hypernyms(synsets["factory"])))
print(sorted(synsets["plant"].common_hypernyms(synsets["tree"])))
sorted(synsets["plant"].common_hypernyms(synsets["dog"]))

[0.9411764705882353, 0.4444444444444444, 0.47058823529411764]
[Synset('artifact.n.01'), Synset('building_complex.n.01'), Synset('entity.n.01'), Synset('object.n.01'), Synset('physical_entity.n.01'), Synset('plant.n.01'), Synset('structure.n.01'), Synset('whole.n.02')]
[Synset('entity.n.01'), Synset('object.n.01'), Synset('physical_entity.n.01'), Synset('whole.n.02')]


[Synset('entity.n.01'),
 Synset('object.n.01'),
 Synset('physical_entity.n.01'),
 Synset('whole.n.02')]

In [38]:
ref = synsets["plant"].hypernyms()[0]
print(synsets["plant"].shortest_path_distance(synsets["factory"]))
print(synsets["plant"].shortest_path_distance(synsets["tree"]))
print(synsets["plant"].shortest_path_distance(synsets["dog"]))

1
10
9


- we can see that the word plant has many commun hypernyms with factory, much less with other two words and the distance to go to the common hypernym from factory and from plant is 1 this will be considered that they definition is very similar than other words
- in wu-palmer the distance/step need to find the common hypernym is the most important value, less step, more similarity