# Lab2: Assignment - the lexicon of emotionally loaded words

Emotions are difficult to grasp and our language to express emotions is subtle and nuanced. What are the words that we can use to express emotions?

In this assignment, you need to find emotion expressing words through Wordnet-NLTK and through Word embeddings.

We start from the six basic emotions that Ekman and his team claimed to hold universally across the facial expressions of people from many different cultures:

<img src="images/ekman-faces.png" width="500" height="600">

The emotions are: anger, disgust, fear, joy, sadness and surprise. Many emotion detection modules in NLP use these six emotions.

Refeernce: Ekman, Paul, Wallace V. Friesen, Maureen O'sullivan, Anthony Chan, Irene Diacoyanni-Tarlatzis, Karl Heider, Rainer Krause et al. "Universals and cultural differences in the judgments of facial expressions of emotion." Journal of personality and social psychology 53, no. 4 (1987): 712


## 1. Finding emotional words in WordNet

When you ask ChatGPT how to extract emotion words from WordNet-NLTK, it will give you more or less the following answer and code:

### Text generated by ChatGPT
To get **all words related to basic emotion terms** (like *sadness, anger, disgust, surprise, fear, joy*) using **WordNet (NLTK)**, you can:

1. Loop over each basic emotion term.
2. Query **adjective, noun, and verb synsets** (since emotions can span multiple parts of speech).
3. Collect **related words**:

   * Synonyms (`lemmas`)
   * Antonyms (`lemma.antonyms()`)
   * Similar adjectives (`similar_tos()` for adjectives)
   * Derivationally related forms
   * Hypernyms, hyponyms, meronyms, etc. (for nouns and verbs)

---


In [1]:
### Code generated by ChatGPT
### ✅ General Code to Expand Emotion Terms

from nltk.corpus import wordnet as wn

def expand_emotion_word(word):
    related_words = set()
    
    # Check all parts of speech
    for pos in [wn.NOUN, wn.ADJ, wn.VERB]:
        synsets = wn.synsets(word, pos=pos)
        for syn in synsets:
            # Synonyms
            for lemma in syn.lemmas():
                related_words.add(lemma.name())

                # Antonyms
                for antonym in lemma.antonyms():
                    related_words.add(antonym.name())

                # Derivationally related forms
                for deriv in lemma.derivationally_related_forms():
                    related_words.add(deriv.name())

            # Adjective similarity
            if pos == wn.ADJ:
                for similar in syn.similar_tos():
                    for lemma in similar.lemmas():
                        related_words.add(lemma.name())

            # Hypernyms, hyponyms, meronyms, holonyms
            for related_syn in syn.hypernyms() + syn.hyponyms() + syn.part_meronyms() + syn.member_holonyms():
                for lemma in related_syn.lemmas():
                    related_words.add(lemma.name())

    return sorted(related_words)

In [2]:
### Code generated by ChatGPT

basic_emotions = ['sadness', 'anger', 'disgust', 'surprise', 'fear', 'joy']

wordnet_emotion_expansions = {}

for emotion in basic_emotions:
    words = expand_emotion_word(emotion)
    wordnet_emotion_expansions[emotion] = words

for emotion, words in wordnet_emotion_expansions.items():
    print(f"{emotion.upper()}:{len(words)}")
    print(words)
    print()

SADNESS:37
['bereavement', 'cheerlessness', 'dejectedness', 'depression', 'desolation', 'dispiritedness', 'dolefulness', 'downheartedness', 'feeling', 'forlornness', 'gloominess', 'gloomy', 'happiness', 'heaviness', 'loneliness', 'low-spiritedness', 'lowness', 'lugubrious', 'lugubriousness', 'melancholy', 'misery', 'mourning', 'poignance', 'poignancy', 'regret', 'rue', 'ruefulness', 'sad', 'sadness', 'sorrow', 'sorrowful', 'sorrowfulness', 'tearfulness', 'uncheerfulness', 'unhappiness', 'unhappy', 'weepiness']

ANGER:57
['aggravate', 'anger', 'angriness', 'angry', 'annoyance', 'arouse', 'bad_temper', 'bridle', 'chafe', 'choler', 'choleric', 'combust', 'dander', 'deadly_sin', 'elicit', 'emotion', 'emotional_arousal', 'enkindle', 'enrage', 'enragement', 'evoke', 'exacerbate', 'exasperate', 'experience', 'feel', 'fire', 'fury', 'gall', 'hackles', 'huffiness', 'ill_temper', 'incense', 'indignation', 'infuriate', 'infuriation', 'ira', 'ire', 'irk', 'kindle', 'madden', 'madness', 'miff', 'mo

## ASSIGNMENT:

The basic emotion terms are all nouns, except for ```surprise``` which can also be a verb. This may have an impact on the expansion because the ```nets``` for each Part-of-Speech are poorly connected.
Answer the following questions:

1. Create another list called ```adjectival_basic_emotions``` that contains six adjectives that correspond with the six nouns.
2. Extend the dictionary ```wordnet_emotion_expansions``` with six adjectival expansions. 
4. Which words in these pairs are inconsistent with the emotion or do not imply this emotion?
5. Which of the Wordnet relations cause these inconsistencies in your opinion? How could you make the function more consistent?
3. For each pair of noun-adjective, how much overlap is there and is the overlap more consistent?

TIPS:

- list the nouns and adjectives in the same order and pair the list using the ```zip``` function, e.g. ```for noun, adjective in zip(basic_emotions, adjectives_basic_emotions):```
- to get the overlap of two lists `a` and `b`, you can use the following expression: ```overlap = list(set(a)&set(b))```, which first turns the lis in a set and applies the ```&``` operator.

## [YOUR CODE AND DISCUSSION GO HERE]

## KEY:

In [4]:
#### Expanded to cover ADJECTIVES emotions
basic_emotions = ['sadness', 'anger', 'disgust', 'surprise', 'fear', 'joy']
adjectival_basic_emotions = ['sad', 'angry', 'disgusting', 'surprising', 'scary' ,'happy']

for emotion in adjectival_basic_emotions:
    words = expand_emotion_word(emotion)
    wordnet_emotion_expansions[emotion] = words

for noun, adjective in zip(basic_emotions, adjectival_basic_emotions):
    nouns = wordnet_emotion_expansions[noun]
    adjectives = wordnet_emotion_expansions[adjective]
    print(f"{noun.upper()}:{len(nouns)}:{nouns}")
    print(f"{adjective.upper()}:{len(adjectives)}: {adjectives}")
    overlap =list(set(nouns) & set(adjectives))
    print(f"Overlap: {len(overlap)}: {overlap}")
    print()


SADNESS:37:['bereavement', 'cheerlessness', 'dejectedness', 'depression', 'desolation', 'dispiritedness', 'dolefulness', 'downheartedness', 'feeling', 'forlornness', 'gloominess', 'gloomy', 'happiness', 'heaviness', 'loneliness', 'low-spiritedness', 'lowness', 'lugubrious', 'lugubriousness', 'melancholy', 'misery', 'mourning', 'poignance', 'poignancy', 'regret', 'rue', 'ruefulness', 'sad', 'sadness', 'sorrow', 'sorrowful', 'sorrowfulness', 'tearfulness', 'uncheerfulness', 'unhappiness', 'unhappy', 'weepiness']
SAD:24: ['bad', 'bittersweet', 'deplorable', 'distressing', 'doleful', 'glad', 'heavyhearted', 'lament', 'lamentable', 'melancholic', 'melancholy', 'mournful', 'pensive', 'pitiful', 'sad', 'sadness', 'sorriness', 'sorrowful', 'sorry', 'tragic', 'tragical', 'tragicomic', 'tragicomical', 'wistful']
Overlap: 4: ['melancholy', 'sorrowful', 'sadness', 'sad']

ANGER:57:['aggravate', 'anger', 'angriness', 'angry', 'annoyance', 'arouse', 'bad_temper', 'bridle', 'chafe', 'choler', 'choler

## KEY:

Inconsistencies result from "antonyms" and "hypernyms" and also "derivations". Also the function considers are the senses of a word. Restricting it to the first sense may make it more consistent.

## 2. Get emotions words through word embeddings

## ASSIGNMENT:

Wordembeddings can also be used to expand words to related words. Create two more dictionaries for emotion words derived from ```Wiki2Vec``` and ```Leipzig2Vec```. 

1. Create a ```wiki2vec_emotion_dict``` dictionary by expanding the six nouns and six adjectives to the top 50 most similar using the Wiki2Vec embeddings. 
2. Create a ```leipzig2vec_emotion_dict``` dictionary by expanding the six nouns and six adjectives to the top 50 most similar using the Leipzig2Vec embeddings.
3. For each noun-adjective pair, print the 50 most similar words for each Word2Vec model and the overlap.
4. How consistent are these most similar words?
5. How different are the lists across the models and compared to WordNet expanion?
6. For each pair of noun-adjective, how much overlap is there and is the overlap more consistent?

### 2.1 Wiki2Vec expansion:

Download the Wiki2Vec embeddings for your target language and load the model using the Gensim package. If you cannot load the complete model, load part of it.

## TIP: 
You need to check if the emotion word actually is included in the vcabulary of the embedding model. If not, add an empty list of words to the dictionary.

## [HERE COMES YOUR CODE TO EXPAND THE WORDS USING WIKI2VEC]

## KEY:

In [5]:
import gensim
from gensim.models import KeyedVectors
# Path to the local copy of a model built from wikipedia
MODEL_FILE='/Users/piek/Desktop/t-ONDERWIJS/data/word-embeddings/wiki2vec/enwiki_20180420_100d.txt'

### If you have a small computer you may want to limit the number of embeddings loaded as shown below:
## wiki2vec = KeyedVectors.load_word2vec_format(MODEL_FILE, limit=5000) 

### To load the full model you should drop the limit.
# Loading the full model can take a while.
wiki2vec = KeyedVectors.load_word2vec_format(MODEL_FILE)

Get a list of the words that are most similar to the six basic emotions: sadness, fear, surprise, anger, joy and disgust.

In [7]:
wiki2vec_emotion_dict = {}

for noun in basic_emotions:
    words = []
    if noun in wiki2vec.key_to_index:
        sim50 = wiki2vec.most_similar(noun, topn=50)
        for sim in sim50:
            word = sim[0]
            if not word.startswith('ENTITY'):
                words.append(sim[0])
    wiki2vec_emotion_dict[noun]=words

for adjective in adjectival_basic_emotions:
    words = []
    if noun in wiki2vec.key_to_index:
        sim50 = wiki2vec.most_similar(adjective, topn=50)
        for sim in sim50:
            word = sim[0]
            if not word.startswith('ENTITY'):
                words.append(sim[0])
    wiki2vec_emotion_dict[adjective]=words


### 2.2 Embeddings from the Leipzig corpus:

Download a text corpus from the Leipniz corpora collection. Build an embedding model from that corpus as explained in the notebook **Lab2.3.Creating_Wordembeddings** or load the model from disk if you already built and saved it. 

Get the words that are most similar to the six basic emotions in the same way as you did for the Wiki2vec embeddings. 

## TIP: 
You need to check if the emotion word actually is included in the vcabulary of the embedding model. If not, add an empty list of words to the dictionary.

## [HERE COMES YOUR CODE TO EXPAND THE WORDS USING LEIPZIG2VEC]

## KEY:

In [8]:
leipzig2vec = KeyedVectors.load_word2vec_format('/Users/piek/Desktop/t-ONDERWIJS/data/leipzig-corpora/models/eng_news_2005_1M-sentences.txt') 

In [10]:
leipzig_emotion_dict = {}

for noun in basic_emotions:
    words = []
    if noun in leipzig2vec.key_to_index:
        sim50 = leipzig2vec.most_similar(noun, topn=50)
        for sim in sim50:
            words.append(sim[0])
    leipzig_emotion_dict[noun]=words

for adjective in adjectival_basic_emotions:
    words = []
    if adjective in leipzig2vec.key_to_index:
        sim50 = leipzig2vec.most_similar(adjective, topn=50)
        for sim in sim50:
            words.append(sim[0])
    leipzig_emotion_dict[adjective]=words


### 2.3. Comparison

## [HERE COMES YOUR CODE TO PRINT THE EXPANDED WORDS FOR EACH MODEL AND THE OVERLAP]

In [11]:
for noun, adjective in zip(basic_emotions, adjectival_basic_emotions):
    wiki2vec_words = wiki2vec_emotion_dict[noun]
    leipzig2vec_words = leipzig_emotion_dict[noun]
    print(noun.upper())
    print(f"Wiki2Vec:{len(wiki2vec_words)}:{wiki2vec_words}")
    print(f"Leipzig2Vec:{len(leipzig2vec_words)}: {leipzig2vec_words}")
    overlap =list(set(wiki2vec_words) & set(leipzig2vec_words))
    print(f"Overlap: {len(overlap)}: {overlap}")
    print()

    
    wiki2vec_words = wiki2vec_emotion_dict[adjective]
    leipzig2vec_words = leipzig_emotion_dict[adjective]
    print(adjective.upper())
    print(f"Wiki2Vec:{len(wiki2vec_words)}:{wiki2vec_words}")
    print(f"Leipzig2Vec:{len(leipzig2vec_words)}: {leipzig2vec_words}")
    overlap =list(set(wiki2vec_words) & set(leipzig2vec_words))
    print(f"Overlap: {len(overlap)}: {overlap}")
    print()

SADNESS
Wiki2Vec:50:['sorrow', 'anguish', 'loneliness', 'grief', 'despair', 'longing', 'yearning', 'feeling', 'despondency', 'hopelessness', 'regret', 'exhilaration', 'aushan', 'pensiveness', 'melancholy', 'anger', 'unhappiness', 'helplessness', 'weariness', 'ennui', 'gloominess', 'shame', 'hopefulness', 'frustration', 'feelings', 'remorse', 'anxieties', 'bewilderment', '解怨', 'elation', 'unease', 'disgust', 'exultation', 'emotion', 'optimism', 'bitterness', 'uneasiness', 'tempestuousness', 'restlessness', 'coldness', 'emotions', 'disillusionment', 'anxiousness', 'pessimism', 'disillusion', 'purposelessness', 'fear', 'contentment', 'dejection', 'tenderness']
Leipzig2Vec:0: []
Overlap: 0: []

SAD
Wiki2Vec:50:['okolinu', 'sirmijuma', 'geografske', 'karakteristike', 'rapajić', 'drugly', 'stanovnika', 'pronađi', 'bačke', 'zaboravit', 'juče', 'srema', 'verzija', 'gledaš', 'čoveka', 'dieingly', 'kasno', 'strukture', 'rastanak', 'podzemlje', 'uspomena', '슬픈', 'raickovic', 'tražim', 'banata', '

## [HERE COMES YOUR ANALYSIS AND DISCUSSION]

## NRC Emotion lexicon

http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm

In [12]:
def load_nrc_lexicon(filepath):
    emotion_words = {}
    with open(filepath, 'r', encoding='utf-8') as f:
        for line in f:
            word, emotion, score = line.strip().split('\t')
            if score == '1':
                if emotion in emotion_words:
                    emotion_words[emotion].append(word)
                else:
                    emotion_words[emotion] = [word]
    return {emo: sorted(words) for emo, words in emotion_words.items()}

filepath = "NRC-Emotion-Lexicon-Wordlevel-v0.92.txt"
nrc_emotion_lexicon = load_nrc_lexicon(filepath)


In [13]:
for noun, adjective in zip(basic_emotions, adjectival_basic_emotions):
    wordnet_nouns = wordnet_emotion_expansions[noun]
    wiki2vec_nouns = wiki2vec_emotion_dict[noun]
    leipzig2vec_nouns = leipzig_emotion_dict[noun]
    wordnet_adjectives = wordnet_emotion_expansions[adjective]
    wiki2vec_adjectives = wiki2vec_emotion_dict[adjective]
    leipzig2vec_adjectives = leipzig_emotion_dict[adjective]
    wordnet = set(wordnet_nouns+wordnet_adjectives)
    embedding = set(wiki2vec_nouns+leipzig2vec_nouns+wiki2vec_adjectives+leipzig2vec_adjectives)
    nrc = nrc_emotion_lexicon[noun]

    print(f"{noun.upper()}:")
    print(f"WORDNET:{len(wordnet)}")
    print(f"EMBEDDING:{len(embedding)}")
    print(f"NRC:{len(nrc)}")
    print("Overlap", list(set(wordnet)&set(nrc)&set(embedding)))
    print()

SADNESS:
WORDNET:57
EMBEDDING:150
NRC:1187
Overlap ['melancholy', 'regret', 'sorrow', 'unhappiness', 'loneliness', 'feeling', 'bad']

ANGER:
WORDNET:92
EMBEDDING:188
NRC:1245
Overlap ['anger', 'furious', 'indignation', 'indignant', 'irate', 'outrage']

DISGUST:
WORDNET:51
EMBEDDING:100
NRC:1056
Overlap ['revulsion', 'loathsome', 'dislike']

SURPRISE:
WORDNET:49
EMBEDDING:178
NRC:532
Overlap ['surprise', 'surprising', 'shock', 'startling']

FEAR:
WORDNET:60
EMBEDDING:190
NRC:1474
Overlap ['worry', 'thrill']

JOY:
WORDNET:53
EMBEDDING:182
NRC:687
Overlap ['glad', 'excitement', 'blessed', 'exhilaration', 'joyful', 'happiness', 'delight', 'blissful']

