# Yoonhyuck WOO / Purdue University_Computer and Information Technology
# Title: Analyzing Contextualized Word Embeddings for knowledge of word-senses
# Professor: Julia Rayz, PhD


# Reference: By Kanishka Misra (kmisra@purdue.edu)


The goal of this assignment is to get you to be familiarized in dealing with vectors computed by (roughly) state of the art pre-trained language models.

Recall from Tuesday's lecture that language modeling is a commonly used method for training neural-network-based sequence models, and allows them to learn vector representations of words *in context.* For instance, every layer of the BERT model represents a word by relying on **all other words** in the sentence context that the word occurs in.

This may lead us to hypothesize that BERT could have attained a decent competency in representing lexical ambiguity -- a phenomena when the same word has multiple meanings.

---


## A brief primer on lexical ambiguity

Lexical ambiguity manifests in language in two elementary ways:

The first way is when the same word has multiple meanings that are related. In this case, what we have is an instance of **polysemy**:

Consider the many polysemous senses of the word **face** (taken from [WordNet](http://wordnetweb.princeton.edu/perl/webwn?s=face&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=1&o3=&o4=&h=000000)):
  1. (n) the front of the human head from the forehead to the chin and ear to ear. E.g., *his **face** was injured*
  2. (n) the feelings expressed on a person's face. E.g., *his angry **face**.*
  3. (n) the general outward appearance of something. E.g., *the **face** of the city is changing* (metaphorically related)
  4. ... (including anything else that is related to the three above)

The second way is when two or more distinct, unrelated meanings happen to have the same word form (this usually happens by coincidence). In this case, we have an instance of **homonymy** (when two words share sounds, it is called **homophony**).

Consider the following senses of the word form **bow**:

  1. (n) a slightly curved piece of resilient wood with taut horsehair strands; used in playing certain stringed instruments. *She checked on her **bow** before performing that night.*
  2. (n) bending the head or body or knee as a sign of reverence or submission or shame or greeting. *He dropped into a **bow** before them.*
  3. (n) a weapon for shooting arrows, composed of a curved piece of resilient wood with a taut cord to propel the arrow. *a **bow** and arrow.*

Coming back to our assignment -- our goal here is to analyze a given model's (or more models, upto you) behavior in representing the above lexical phenomena.

---

## Analysing lexical ambiguity in models

While there are several ways in which one can test for lexical ambiguity, we will be using the notion of vector space similarity.
It would be reasonable to suggest that vectors of words that have the same or related senses should be much closer together as opposed to words that do not. That is, the vectors for the word **bank** in (1.) should be closer to that in (2.), than to that in (3.):

1. *I went to the **bank** to withdraw some cash.*
2. *John had an appointment with the manager of the **bank** yesterday.*
3. *They pulled the canoe up on the **bank**.*

This closeness can be measured by the cosine similarity:

$$
cos(\pmb x, \pmb y) = \frac {\pmb x \cdot \pmb y}{||\pmb x|| \cdot ||\pmb y||}
$$

Therefore a good model will show us the following result: $cos(bank_1, bank_2) > cos(bank_1, bank_3)$. This is exactly what we will be exploring in this homework.

---

## Deliverables

A document (pdf, or any other format) that has a description of your results and discussions. Each question following the demo has its own set of discussion content. **Code is optional, but feel free to include it. We will be mostly paying attention to the discussion and results.**

## Getting started



We will begin by installing my package, `minicons`.
If you are interested, check out its [documentation](https://minicons.kanishka.website) (still in active development). I have also made "getting started examples" for the package. They can be found [here](https://github.com/kanishkamisra/minicons/blob/master/examples/word_representations.md).

I wrote this package to make it easy to extract representations for words from pre-trained LMs. It also contains other very important utilities that we may use in this class at some point.

To install the package, click on the grey cell below, and either click "play" to the left of the cell, or hit `shift + tab` which will run the cell and take you to the next code cell.

In [None]:
!pip install minicons # will show an error towards the end, but that's not an error in the installation, so worry not!



The above code may have shown an error saying `ERROR: pip's dependency resolver...`, but that is google's internal problem. The package should be installed regardless.

Next, load some useful libraries that we'll need:

In [None]:
from minicons import cwe
import torch

We will then write code to compute the cosine similarity between two vectors (or tensors, in general; you do not have to worry about this for the purposes of this homework, but feel free to ask us questions separately)

In [None]:
def cosine(a: torch.Tensor, b: torch.Tensor, eps = 1e-8) -> torch.Tensor:
    a_n, b_n = a.norm(dim=1)[:, None], b.norm(dim=1)[:, None]
    a_norm = a / torch.max(a_n, eps * torch.ones_like(a_n))
    b_norm = b / torch.max(b_n, eps * torch.ones_like(b_n))
    sims = torch.mm(a_norm, b_norm.transpose(0, 1))
    return sims

# Loading pre-trained models

We are now ready to load our first pre-trained model!

For simplicity, I will show this demo on the `bert-base-uncased` model, the smallest official BERT model released in the original paper.

Every pre-trained model that can be loaded by minicons is an instance of the `cwe.CWE` class. `CWE` stands for 'contextual word embeddings'

BERT, RoBERTa, etc., are all instances of contextual word embeddings, since they emit vectors that take their input context into account.

In theory, any model that is part of the [huggingface hub](https://huggingface.co/models) can be loaded with this class.

To load bert-base, run the following code:

# Extracting representations of words (and phrases)

The function primarily used for extracting representations from models is `model.extract_representation()`. It accepts batches of instances represented in either of the following formats:

```
data = [
  (sentence_1, word_1),
  (sentence_2, word_2),
  ....
  (sentence_n, word_n)
]
```
where `word_i` is the word whose vector is to be extracted from its corresponding sentence (`sentence_i`)

or

```
data = [
  (sentence_1, (start_1, end_1)),
  (sentence_2, (start_2, end_2)),
  ....
  (sentence_n, (start_n, end_n))
]
```
where `(start_i, end_i)` are the character span indices for the target word in the ith sentence, i.e., `start_i` is the start index, and `end_i` is the end index.

For example, the entry `["I like reading books.", (15, 20)]` corresponds to the word `"books"`, because

```python
"I like reading books."[15:20] = "books"
```

To keep things simple, I will be using sentences where a word only occurs once, and use the first method of representing the input.

In [None]:
data = [
        ['The quick brown fox jumped over the lazy dog.', 'fox'],
        ['The slow pink fox ran into the fast cat.', 'fox']
]

In [None]:
data[0]

['The quick brown fox jumped over the lazy dog.', 'fox']


The `model.extract_representation` function also takes another parameter, `layer`.

Recall from earlier class that pre-trained LMs usually have multiple layers. To find out how many layers a model has, run:

In [None]:
model.layers

12

Note that the 12 layers here means that there have been 12 total "multi-headed self attention" operations. Apart from this, the model also contains a 0th layer, which consists of representations that are passed to the first self-attention layer. These representations are composed of the static embeddings of the model (one for each word) which are combined with the position and segment embeddings using vector-addition.

By default, minicons uses the model's last layer.


Let us now extract embeddings from the structure we created earlier (the sentences containing the word 'fox')

In [None]:
embedding = model.extract_representation(data)

In [None]:
embedding # embeddings of the word fox in the sentences contained in `data`

tensor([[-0.0571, -0.2493, -0.3621,  ..., -0.3584,  0.2380,  0.3966],
        [-0.1199, -0.7536, -0.3551,  ..., -0.5651,  0.4172,  0.6888]])

The result is interpreted as follows:

```
tensor([[-0.0571, -0.2493, -0.3621,  ..., -0.3584,  0.2380,  0.3966], <- embedding for the first instance
        [-0.1199, -0.7536, -0.3551,  ..., -0.5651,  0.4172,  0.6888]]) <- embedding for the second instance
```

`bert-base` encodes words in a 768 dimensional vector, you can check the dimensions of the above result using:

In [None]:
embedding.shape # 2 vectors of 768 dimensions each

torch.Size([2, 768])

Let us now compute the cosine similarity of the two fox-sentences. While there are a number of different ways of doing this, we will take the similarity of the above result with itself:

In [None]:
pairwise = cosine(embedding, embedding)
pairwise # notice that cosine is symmetric

tensor([[1.0000, 0.9278],
        [0.9278, 1.0000]])

The top right (or bottom left) value is the similarity of the two fox-words, we can access it by:

In [None]:
pairwise[0,1].item() # similarity of fox in the  first sentence with that in second

0.9278209209442139

## An example

Let us now apply our knowledge about computing similarities with contextualized word representations to test how well BERT represents lexical ambiguity.

We will adopt the paradigm of defining a set of query instances (sentence-word pairs) and take each instance's similarity with a set of reference sentence-word pairs.

In the following case, we have (focus word bolded) the following queries:

1. Please just **book** me a place to stay already, will you!
2. I'll **reserve** those tickets shortly.
3. I liked reading that **book**.

Similarly, we have the following references:

1. My children said the will **book** us a trip to Hawaii!
2. Please just buy the **book** already, will you!
3. Lester, can you **book** my entire schedule for all of Monday?

**Exercise for the reader:** What words should be more similar to each other? (Notice the contexts for query 1 and reference 2)

In [None]:
query = [
         ["Please just book me a place to stay already, will you!", "book"],
         ["I'll reserve those tickets shortly.", "reserve"],
         ["I liked reading that book!", "book"] # Noun
] # question

reference = [
             ["My children said they will book us a trip to Hawaii!", "book"], # Verb
             ["Please just buy the book already, will you!", "book"], # Noun
             ["Lester, can you book my entire schedule for all of Monday?", "book"]
]

In [None]:
# Extract embeddings for each set of instances, for demonstration, let us look at the second last layer (11)
reference_emb = model.extract_representation(reference, layer = 11)
query_emb = model.extract_representation(query, layer = 11)

In [None]:
# Take the cosine of every query with every reference
sims = cosine(query_emb, reference_emb)

# explore the output:
sims

tensor([[0.7540, 0.4687, 0.6881],
        [0.5885, 0.3825, 0.6178],
        [0.5589, 0.8250, 0.5271]])

In [None]:
# To get the similarity between the first query, "My children...", and the reference list:
sims[0]

tensor([0.7540, 0.4687, 0.6881])

We see that the similarity of "book" in *My children said they will **book**...* with:
1. first reference is 0.754
2. second reference is 0.469
3. third reference is 0.688

Which means, the "book" in first question is closest to the "book" in:

"Please just **book** me a place to stay already, will you!"


we can make the process of looking at "the closest" embedding a little easier:

In [None]:
# For the first query, what is the closest usage of the book in the reference set?

closest1 = reference[sims[0].argmax().item()] # argmax finds the index with the greatest value, in this case, the greatest similarity!

print(f"Query: {query[0]}\nClosest Reference: {closest1}")

Query: ['Please just book me a place to stay already, will you!', 'book']
Closest Reference: ['My children said they will book us a trip to Hawaii!', 'book']


In [None]:
# Repeating the same for the second query:
closest2 = reference[sims[1].argmax().item()] # argmax finds the index with the greatest value, in this case, the greatest similarity!

print(f"Query: {query[1]}\nClosest Reference: {closest2}")

Query: ["I'll reserve those tickets shortly.", 'reserve']
Closest Reference: ['Lester, can you book my entire schedule for all of Monday?', 'book']


In [None]:
sims

tensor([[0.7540, 0.4687, 0.6881],
        [0.5885, 0.3825, 0.6178],
        [0.5589, 0.8250, 0.5271]])

In [None]:
# Third query:
closest3 = reference[sims[2].argmax().item()] # argmax finds the index with the greatest value, in this case, the greatest similarity!

print(f"Query: {query[2]}\nClosest Reference: {closest3}")

Query: ['I liked reading that book!', 'book']
Closest Reference: ['Please just buy the book already, will you!', 'book']


We see here that in all cases, BERT-base (layer 11) prefers the correct reference! Although to conclude about this more broadly, we'd need a large dataset of diverse sentences.

Now, it's your turn!

# Assignment objectives

Using the code from above, your objectives are as follows:

**Preliminary:** Select a model and layer of your choice. Here are some suggested options (`Name: <identifier to be used in cwe.CWE()>, <number of layers>`):
```
BERT-base: bert-base-uncased, 12 layers
BERT-large: bert-large-uncased, 24 layers
RoBERTa-base: roberta-base, 12 layers
RoBERTa-large: roberta-large, 24 layers
```

If you want to be a little adventurous, check out other models here: https://huggingface.co/models


## Question 1: Same words, different meanings

Analyze your model (and layer) on a new polysemous/homonymous word (should at least contain 2 different senses of the word) using the same format as above:

```
query = list of instances containing two distinct usages of the word.

reference = list of instances containing two distinct usages of the word,
with each having a similar usage with at least one instance in the query.

Example:

query = [
  ["i like books", "books"],
  ["please book me a hotel", "book]
]

reference = [
  ["she read that book", "book"],
  ["I will book those tickets shortly", "book]
]]
```

The word you select should be different from the ones discussed in this file. Therefore, you cannot use: `face, book, bow, bank`. In all cases, the word being compared should be the same (different tense and number allowed: *books* vs. *book* or *book* vs *booked*)

In [None]:
query = [["he beat the table with his hand", "beat"], # Verb_1. strike (a person or an animal) repeatedly and violently so as to hurt or injure them, typically with an implement such as a club or whip
        ["He beat me at chess", "beat"], # Verb_2. defeat (someone) in a game or other competitive situation
        ["the music changed to a funky disco beat", "beat"]] # Noun. a main accent or rhythmic unit in music or poetry

reference = [
  ["The piece has four beats to the bar", "beats"], # n
  ["Their recent wins have proved they’re still the ones to beat", "beat"], # v2
  ["he beat his own world record", "beat"], # v2
  ["she beat her fists against the wood", "beat"], # v1
  ["he heard the beat of a drum", "beat"] #n
]

In [None]:
model = cwe.CWE('bert-base-uncased')
model_1 = cwe.CWE("roberta-base")
model_2 = cwe.CWE("roberta-large")
model_3 = cwe.CWE("bert-large-uncased")

In [None]:
reference_emb = model_1.extract_representation(reference)
query_emb = model_1.extract_representation(query)

reference_emb2 = model_1.extract_representation(reference, layer = 10)
query_emb2 = model_1.extract_representation(query, layer = 10)

# Model: Bert
reference_emb3 = model.extract_representation(reference)
query_emb3 = model.extract_representation(query)

reference_emb4 = model_2.extract_representation(reference)
query_emb4 = model_2.extract_representation(query)

reference_emb5 = model_3.extract_representation(reference)
query_emb5 = model_3.extract_representation(query)

In [None]:
# Take the cosine of every query with every reference
sims = cosine(query_emb, reference_emb)
sims2 = cosine(query_emb2, reference_emb2)
sims3 = cosine(query_emb3, reference_emb3)
sims4 = cosine(query_emb4, reference_emb4)
sims5 = cosine(query_emb5, reference_emb5)

# explore the output:
print('model:RoBERTa, layer = 12 \n', sims)
print('model:RoBERTa, layer = 10 \n', sims2)
print('model:BERT, layer = 12 \n', sims3)
print('model:RoBERTa, layer = 24 \n', sims4)
print('model:BERT, layer = 24 \n', sims5)

model:RoBERTa, layer = 12 
 tensor([[0.8928, 0.8752, 0.9254, 0.9567, 0.8991],
        [0.8606, 0.9054, 0.9513, 0.9092, 0.8763],
        [0.8861, 0.8930, 0.8863, 0.8948, 0.9333]])
model:RoBERTa, layer = 10 
 tensor([[0.8634, 0.8406, 0.9012, 0.9441, 0.8624],
        [0.8425, 0.8785, 0.9296, 0.8810, 0.8296],
        [0.8615, 0.8202, 0.8306, 0.8432, 0.8935]])
model:BERT, layer = 12 
 tensor([[0.4661, 0.4273, 0.4799, 0.7304, 0.4620],
        [0.3503, 0.5702, 0.6055, 0.4569, 0.3879],
        [0.4892, 0.4833, 0.3816, 0.3605, 0.4991]])
model:RoBERTa, layer = 24 
 tensor([[0.9774, 0.9685, 0.9865, 0.9905, 0.9754],
        [0.9739, 0.9737, 0.9904, 0.9794, 0.9686],
        [0.9789, 0.9696, 0.9746, 0.9703, 0.9808]])
model:BERT, layer = 24 
 tensor([[0.4456, 0.4985, 0.5038, 0.8303, 0.5251],
        [0.4829, 0.6406, 0.7208, 0.5317, 0.5447],
        [0.5390, 0.5323, 0.4377, 0.3503, 0.6388]])


In [None]:
closest1 = reference[sims[0].argmax().item()]
print(f"Query: {query[0]}\nClosest Reference: {closest1} \n")

closest2 = reference[sims[1].argmax().item()]
print(f"Query: {query[1]}\nClosest Reference: {closest2}\n")

closest3 = reference[sims[2].argmax().item()]
print(f"Query: {query[2]}\nClosest Reference: {closest3}\n")

Query: ['he beat the table with his hand', 'beat']
Closest Reference: ['she beat her fists against the wood', 'beat'] 

Query: ['He beat me at chess', 'beat']
Closest Reference: ['he beat his own world record', 'beat']

Query: ['the music changed to a funky disco beat', 'beat']
Closest Reference: ['he heard the beat of a drum', 'beat']



# 1. In your write-up
### 1.1 Write what word you chose.
### & 1.2 The sentences you chose for the various senses of the word

I find the word with two different verbs meaning most frequently used and one noun meaning commonly used in the Oxford Press dictionary.

I chose the word 'beat' and The senses is the following
- Verb_1) **Strike** (a person or an animal) repeatedly and violently so as to hurt or injure them, typically with an implement such as a club or whip
> "He `beat` the table with his hand." <br>
> "She `beat` her fists against the wood"

- Verb_2) **Defeat** (someone) in a game or other competitive situation
> "He `beat` me at chess." <br>
> "Their recent wins have proved they’re still the ones to `beat`" <br>
> "He `beat` his own world record"

- Noun) A main **accent or rhythmic** unit in music or poetry
> "The music changed to a funky disco `beat`" <br>
> "The piece has four `beat` to the bar" <br>
> "He heard the `beat` of a drum"

### 1.3 What you found
After I used one of models suggested, I wondered other model's perforamnce. Therefore, I ran the following models with different layers: <br>
>model:RoBERTa, layer = 12 <br>
>model:RoBERTa, layer = 10 <br>
>model:BERT, layer = 12 <br>
>model:RoBERTa, layer = 24 <br>
>model:BERT, layer = 24 <br>

The result is that, generally, RoBERTa showed better performance than BERT, and the more layers, the better performance.

One of the most interesting points is that I naturally believe the cosine result will high if part of speech between query and reference is the same or the sense is the same. The final result shows well in every case, however, I could discover that it is only sometimes.

For example, I see the similarity of "beat"(v2) in *He beat me at chess* with:
1. first reference is 0.8606(N)
2. second reference is 0.9054(v2)
3. third reference is 0.9513(v2)
4. fourth reference is 0.9092(v1)
5. fifth reference is 0.8763(N) in the first model.

Therefore, of course, the model printed the result that the closest reference is the third reference, but in more detail, I found that the fourth reference, which is more related to the first verb sense, shows a higher score than the second reference.

Moreover, even in the same sense, I thought there was not a big gap over 0.00xx. However, I could see the most enormous gap is 0.08 in the second query of the Bert-large model, and the smallest gap is 0.0019 in the third query of the RoBERTa-large model.

## Question 2, Different words, (related or same) meaning

For your set of sentences in question 2, come up with new reference instances that include words that are related to only one of the sentences. For e.g., if I was comparing book (novel) vs book (reserving something):

```
query = [
  ["i like books", "books"],
  ["please book me a hotel", "book]
]

references = [
  ["that was a good novel", "novel"],
  ["i'd like to make a reservation", "reservation"]
]
```

here, `novel` should be closer to the first query than to the second, similarly, `reservation` should be closer to the second as opposed to the first.

**Same as above, discuss the stimuli you created, and what you found.**

In [None]:
q2_query = [["they beat me with a stick and punched me", "beat"], # Verb_1. strike (a person or an animal) repeatedly and violently so as to hurt or injure them, typically with an implement such as a club or whip
        ["he beat his own world record", "beat"], # Verb_2. defeat (someone) in a game or other competitive situation
        ["the glissando begins on the second beat", "beat"]] # Noun. a main accent or rhythmic unit in music or poetry

q2_reference = [
  ["he made her count beats to the bar and clap the rhythm", "rhythm"], # query_3
  ["she had still not quite admitted defeat", "defeat"], # query_2
  ["they've conquered new markets in Japan", "conquered"], # query_2
  ["a car hit the barrier", "hit"], # query_1
  ["he raised his hand, as if to strike me", "strike"], # query_1
  ["We walked at a fast tempo", "tempo"], # query_3
  ["all she could hear was the pounding of her heart ", "pounding"] # query_3
]

In [None]:
# Your code here:
q2_reference_emb = model_1.extract_representation(q2_reference)
q2_query_emb = model_1.extract_representation(q2_query)

q2_reference_emb2 = model_1.extract_representation(q2_reference, layer = 10)
q2_query_emb2 = model_1.extract_representation(q2_query, layer = 10)

# Model: Bert
q2_reference_emb3 = model.extract_representation(q2_reference)
q2_query_emb3 = model.extract_representation(q2_query)

q2_reference_emb4 = model_2.extract_representation(q2_reference)
q2_query_emb4 = model_2.extract_representation(q2_query)

q2_reference_emb5 = model_3.extract_representation(q2_reference)
q2_query_emb5 = model_3.extract_representation(q2_query)

In [None]:
# Take the cosine of every query with every reference
q2_sims = cosine(q2_query_emb, q2_reference_emb)
q2_sims2 = cosine(q2_query_emb2, q2_reference_emb2)
q2_sims3 = cosine(q2_query_emb3, q2_reference_emb3)
q2_sims4 = cosine(q2_query_emb4, q2_reference_emb4)
q2_sims5 = cosine(q2_query_emb5, q2_reference_emb5)

# explore the output:
print('model:RoBERTa, layer = 12 \n', q2_sims)
print('model:RoBERTa, layer = 10\n', q2_sims2)
print('model:BERT, layer = 12\n', q2_sims3)
print('model:RoBERTa, layer = 24\n', q2_sims4)
print('model:BERT, layer = 24\n', q2_sims5)

model:RoBERTa, layer = 12 
 tensor([[0.8345, 0.8232, 0.7949, 0.8815, 0.8875, 0.8074, 0.8255],
        [0.8497, 0.8437, 0.8371, 0.8915, 0.8640, 0.8071, 0.8121],
        [0.8865, 0.8417, 0.7975, 0.8689, 0.8589, 0.8334, 0.8094]])
model:RoBERTa, layer = 10
 tensor([[0.8039, 0.8105, 0.8256, 0.8421, 0.8687, 0.7788, 0.8287],
        [0.8139, 0.8416, 0.8797, 0.8630, 0.8361, 0.7972, 0.8228],
        [0.8311, 0.8074, 0.7920, 0.7748, 0.7958, 0.8104, 0.7924]])
model:BERT, layer = 12
 tensor([[0.2098, 0.1106, 0.3546, 0.4285, 0.4682, 0.2437, 0.3599],
        [0.2358, 0.1770, 0.4544, 0.4355, 0.2972, 0.2309, 0.3746],
        [0.6470, 0.4035, 0.2757, 0.2986, 0.2291, 0.5684, 0.3010]])
model:RoBERTa, layer = 24
 tensor([[0.9397, 0.9475, 0.9441, 0.9612, 0.9385, 0.9145, 0.9075],
        [0.9579, 0.9665, 0.9717, 0.9810, 0.9476, 0.9443, 0.9247],
        [0.9675, 0.9534, 0.9568, 0.9628, 0.9345, 0.9435, 0.9229]])
model:BERT, layer = 24
 tensor([[0.4674, 0.1582, 0.5238, 0.5196, 0.6433, 0.3106, 0.5279],
        

# model:RoBERTa, layer = 12

In [None]:
q2_closest1 = q2_reference[q2_sims[0].argmax().item()]
print(f"Query: {q2_query[0]}\nClosest Reference: {q2_closest1} \n")

q2_closest2 = q2_reference[q2_sims[1].argmax().item()]
print(f"Query: {q2_query[1]}\nClosest Reference: {q2_closest2}\n")

q2_closest3 = q2_reference[q2_sims[2].argmax().item()]
print(f"Query: {q2_query[2]}\nClosest Reference: {q2_closest3}\n")

Query: ['they beat me with a stick and punched me', 'beat']
Closest Reference: ['he raised his hand, as if to strike me', 'strike'] 

Query: ['he beat his own world record', 'beat']
Closest Reference: ['a car hit the barrier', 'hit']

Query: ['the glissando begins on the second beat', 'beat']
Closest Reference: ['he made her count beats to the bar and clap the rhythm', 'rhythm']



# model:RoBERTa, layer = 10

In [None]:
q2_closest1 = q2_reference[q2_sims2[0].argmax().item()]
print(f"Query: {q2_query[0]}\nClosest Reference: {q2_closest1} \n")

q2_closest2 = q2_reference[q2_sims2[1].argmax().item()]
print(f"Query: {q2_query[1]}\nClosest Reference: {q2_closest2}\n")

q2_closest3 = q2_reference[q2_sims2[2].argmax().item()]
print(f"Query: {q2_query[2]}\nClosest Reference: {q2_closest3}\n")

Query: ['they beat me with a stick and punched me', 'beat']
Closest Reference: ['he raised his hand, as if to strike me', 'strike'] 

Query: ['he beat his own world record', 'beat']
Closest Reference: ["they've conquered new markets in Japan", 'conquered']

Query: ['the glissando begins on the second beat', 'beat']
Closest Reference: ['he made her count beats to the bar and clap the rhythm', 'rhythm']



# model:BERT, layer = 12

In [None]:
q2_closest1 = q2_reference[q2_sims3[0].argmax().item()]
print(f"Query: {q2_query[0]}\nClosest Reference: {q2_closest1} \n")

q2_closest2 = q2_reference[q2_sims3[1].argmax().item()]
print(f"Query: {q2_query[1]}\nClosest Reference: {q2_closest2}\n")

q2_closest3 = q2_reference[q2_sims3[2].argmax().item()]
print(f"Query: {q2_query[2]}\nClosest Reference: {q2_closest3}\n")

Query: ['they beat me with a stick and punched me', 'beat']
Closest Reference: ['he raised his hand, as if to strike me', 'strike'] 

Query: ['he beat his own world record', 'beat']
Closest Reference: ["they've conquered new markets in Japan", 'conquered']

Query: ['the glissando begins on the second beat', 'beat']
Closest Reference: ['he made her count beats to the bar and clap the rhythm', 'rhythm']



# model:RoBERTa, layer = 24

In [None]:
q2_closest1 = q2_reference[q2_sims4[0].argmax().item()]
print(f"Query: {q2_query[0]}\nClosest Reference: {q2_closest1} \n")

q2_closest2 = q2_reference[q2_sims4[1].argmax().item()]
print(f"Query: {q2_query[1]}\nClosest Reference: {q2_closest2}\n")

q2_closest3 = q2_reference[q2_sims4[2].argmax().item()]
print(f"Query: {q2_query[2]}\nClosest Reference: {q2_closest3}\n")

Query: ['they beat me with a stick and punched me', 'beat']
Closest Reference: ['a car hit the barrier', 'hit'] 

Query: ['he beat his own world record', 'beat']
Closest Reference: ['a car hit the barrier', 'hit']

Query: ['the glissando begins on the second beat', 'beat']
Closest Reference: ['he made her count beats to the bar and clap the rhythm', 'rhythm']



# model:BERT, layer = 24

In [None]:
q2_closest1 = q2_reference[q2_sims5[0].argmax().item()]
print(f"Query: {q2_query[0]}\nClosest Reference: {q2_closest1} \n")

q2_closest2 = q2_reference[q2_sims5[1].argmax().item()]
print(f"Query: {q2_query[1]}\nClosest Reference: {q2_closest2}\n")

q2_closest3 = q2_reference[q2_sims5[2].argmax().item()]
print(f"Query: {q2_query[2]}\nClosest Reference: {q2_closest3}\n")

Query: ['they beat me with a stick and punched me', 'beat']
Closest Reference: ['he raised his hand, as if to strike me', 'strike'] 

Query: ['he beat his own world record', 'beat']
Closest Reference: ["they've conquered new markets in Japan", 'conquered']

Query: ['the glissando begins on the second beat', 'beat']
Closest Reference: ['he made her count beats to the bar and clap the rhythm', 'rhythm']



In [None]:
q2_query = [["they beat me with a stick and punched me", "beat"], # Verb_1. strike (a person or an animal) repeatedly and violently so as to hurt or injure them, typically with an implement such as a club or whip
        ["he beat his own world record", "beat"], # Verb_2. defeat (someone) in a game or other competitive situation
        ["the glissando begins on the second beat", "beat"]] # Noun. a main accent or rhythmic unit in music or poetry

q2_reference = [
  ["he made her count beats to the bar and clap the rhythm", "rhythm"], # query_3
  ["she had still not quite admitted defeat", "defeat"], # query_2
  ["they've conquered new markets in Japan", "conquered"], # query_2
  ["a car hit the barrier", "hit"], # query_1
  ["he raised his hand, as if to strike me", "strike"], # query_1
  ["We walked at a fast tempo", "tempo"], # query_3
  ["all she could hear was the pounding of her heart ", "pounding"] # query_3
]

# 2. In your write-up
### 2.1 Write what word you chose.& 2.2 The sentences you chose for the various senses of the word
# `Beat`
Based on the dictionary, I reference thesaurus in the Oxford press
- Verb_1) Strike (a person or an animal) repeatedly and violently so as to hurt or injure them, typically with an implement such as a club or whip => `Strike`,
`Hit`
> "He raised his hand, as if to `strike` me." <br>
> "A car `hit` the barrier"

- Verb_2) **Defeat** (someone) in a game or other competitive situation => `Defeat`,
`Conquer`
> "She had still not quite admitted `defeat`." <br>
> "They've `conquered` new markets in Japan" <br>

- Noun) A main **accent or rhythmic** unit in music or poetry => `Rhythm`,
`Tempo`, `Pounding`
> "He made her count beats to the bar and clap the `rhythm`" <br>
> "We walked at a fast `tempo`" <br>
> "All she could hear was the `pounding` of her heart"


### 2.3 What you found
In this problem, I also found a very similar situation with the first write-up.

RoBERTa performed better than BERT, and the more layers, the better. However, unlike first write-up, some models print wrongly.

For example, I see the similarity of "beat"(V2) in *He beat me at chess* with:
1. first reference is 0.8497(N / Rhythm)
2. second reference is 0.8437(V2 / Defeat)
3. third reference is 0.8371(V2 / Conquer)
4. fourth reference is 0.8915(V1 / Hit)
5. fifth reference is 0.9640(V1 / Strike)
6. sixth reference is 0.8071 (N / Tempo)
7. seventh reference is 0.8121 (N / Pounding) in the first model.

Therefore, I expected the model to print out between the second and third reference, but as you can see, the V1 score is higher than the V2 score. Also, even if the model prints out the result I expected when I saw it in detail, there are some cases in which a reference is another part of speech and has another sense.

Thus, I believe that even if the words have a similar sense in each query, I used different words, which caused the above results.

## Question 3: Comparing cosine similarity across meanings(senses)

We explore whether instances of same sense of a word are more similar than different sense instances of a word. We hypothesize that the former is the case, but let's find out!

e.g. The word *'book'* has two senses (*novel* and *reserving something*) discussed in the previous questions. But are same sense instances of the word book (*novel* vs *novel*) more similar on average than different sense instances of the word book (*novel* vs *reserving something*)?

We specify two word-sense specific cosine relatedness measures, which you can use to perform comparisons.

**Same Sense similarity:** Average cosine similarity between same sense (meaning) instances of a word

\begin{equation}
\operatorname{SenSim}_{\ell}\left(w_{s}\right)=\frac{1}{m} \sum_{j} \sum_{k \neq j} \cos \left(v_{\ell}\left(w_{s_{j}}\right), v_{\ell}\left(w_{s_{k}}\right)\right)
\end{equation}


**Inter Sense similarity:** Average cosine similarity between different sense (meaning) instances of a word

\begin{equation}
\operatorname{InterSim}_{\ell}(w)=\mathbb{E}_{a, b \in S}\left[\frac{1}{m n} \sum_{j=1}^{m} \sum_{i=1}^{n} \cos \left(v_{\ell}\left(w_{a_{i}}\right), v_{\ell}\left(w_{b_{j}}\right)\right)\right]
\end{equation}

________________________

Now it's your turn!

Select three polysemous words (same word, multiple meanings) and perform same sense and inter sense similarity analysis like the examples given above. Compare results across at least two senses of each word and report your findings.

The words you select should be different from the ones discussed in this file.

# 1. Same Sense Similarity

*query: beat (defeat (someone) in a game or other competitive situation)*

references: beat: defeat (someone) in a game or other competitive situation
  - a. overcome (a problem or disease)
  - b. do or be better than (a record or score)

In [None]:
query_same_sense = [
         ["He beat me at chess", "beat"], # defeat (someone) in a game or other competitive situation
        ]

reference_same_sense = [
             ["the president said beating violent crime was his first priority", "beating"], # overcome (a problem or disease)
             ["he beat his own world record", "beat"], # do or be better than (a record or score)
             ["The young prodigy beat the odds and defeated the experienced grandmaster in a nail-biting game of Go",
              "beat"], # defeat (someone) in a game or other competitive situation
             ["Sarah used a brilliant strategy to beat her opponent and secure the championship", "beat"], # defeat (someone) in a game or other competitive situation
             ]

In [None]:
q3_1_reference_emb = model_1.extract_representation(reference_same_sense, layer = 10)
q3_1_query_emb = model_1.extract_representation(query_same_sense, layer = 10)

q3_1_reference_emb_1 = model_2.extract_representation(reference_same_sense)
q3_1_query_emb_1 = model_2.extract_representation(query_same_sense)

q3_1_reference_emb_2 = model_3.extract_representation(reference_same_sense)
q3_1_query_emb_2 = model_3.extract_representation(query_same_sense)


# Take the cosine of every query with every reference
q3_1_sims = cosine(q3_1_query_emb, q3_1_reference_emb)
q3_1_sims_1 = cosine(q3_1_query_emb_1, q3_1_reference_emb_1)
q3_1_sims_2 = cosine(q3_1_query_emb_2, q3_1_reference_emb_2)

# explore the output:
print("query:  beat (defeat (someone) in a game or other competitive situation)")
print("reference:  beat (defeat / overcome / do or be better than)")
print('Same sense similarity:', round(torch.mean(q3_1_sims).item(),3))
print('Same sense similarity:', round(torch.mean(q3_1_sims_1).item(),3))
print('Same sense similarity:', round(torch.mean(q3_1_sims_2).item(),3))

query:  beat (defeat (someone) in a game or other competitive situation)
reference:  beat (defeat / overcome / do or be better than)
Same sense similarity: 0.902
Same sense similarity: 0.976
Same sense similarity: 0.651


# 2. Same Sense Similarity

query: Rock: move gently to and fro or from side to side

references: Rock: move gently to and fro or from side to side
 - a. shake or cause to shake or vibrate, especially because of an impact earthquake, or explosion
 - b. cause great shock or distress to (someone or something), especially so as to weaken or destabilize

In [None]:
query2_same_sense = [
         ["she rocked the baby in her arms", "rocked"], # move gently to and fro or from side to side
        ]

reference2_same_sense = [
             ["the vase rocked back and forth on its base", "rocked"], # move gently to and fro or from side to side
             ["minutes later a second blast rocked the city", "rocked"], # shake or cause to shake or vibrate
             ["The unexpected economic downturn has the potential to rock the stability of the financial markets",
              "rock"], # cause great shock or distress to (someone or something)
             ["The earthquake that hit the region not only caused physical damage but also emotionally rocked the local community.", "rocked"], # cause great shock or distress to (someone or something)
             ]

In [None]:
q3_1_reference_emb2 = model_1.extract_representation(reference2_same_sense, layer = 10)
q3_1_query_emb2 = model_1.extract_representation(query2_same_sense, layer = 10)

q3_1_reference_emb2_1 = model_2.extract_representation(reference2_same_sense)
q3_1_query_emb2_1 = model_2.extract_representation(query2_same_sense)

q3_1_reference_emb2_2 = model_3.extract_representation(reference2_same_sense)
q3_1_query_emb2_2 = model_3.extract_representation(query2_same_sense)


# Take the cosine of every query with every reference
q3_1_sims2 = cosine(q3_1_query_emb2, q3_1_reference_emb2)
q3_1_sims2_1 = cosine(q3_1_query_emb2_1, q3_1_reference_emb2_1)
q3_1_sims2_2 = cosine(q3_1_query_emb2_2, q3_1_reference_emb2_2)

# explore the output:
print("query:  Rock: move gently to and fro or from side to side")
print("reference:  Rock (move / shake / cause great shock or distress)")
print('Same sense similarity:', round(torch.mean(q3_1_sims2).item(),3))
print('Same sense similarity:', round(torch.mean(q3_1_sims2_1).item(),3))
print('Same sense similarity:', round(torch.mean(q3_1_sims2_2).item(),3))

query:  Rock: move gently to and fro or from side to side
reference:  Rock (move / shake / cause great shock or distress)
Same sense similarity: 0.873
Same sense similarity: 0.981
Same sense similarity: 0.621


# 3. Same Sense Similarity

query: Rest: cease work or movement in order to relax, sleep, or recover strength

references: Rest: cease work or movement in order to relax, sleep, or recover strength
 - a. allow to be inactive in order to regain strength or health
 - b. (of a body) lie buried
 - c. used euphemistically by actors to indicate that they are out of work
 - d. leave (a player) out of a team temporarily

In [None]:
query3_same_sense = [
         ["he needed to rest after the feverish activity", "rest"], # move gently to and fro or from side to side
        ]

reference3_same_sense = [
             [" going to rest up before travelling to England", "rest"], # move gently to and fro or from side to side
             ["her friend read to her while she rested her eyes", "rested"], # allow to be inactive in order to regain strength or health
             ["the king's body rested in his tomb", "rested"], # (of a body) lie buried
             ["she was an actress but doing domestic work while she was resting.", "resting"], # cause great shock or distress to (someone or something)
             ["both men were rested for the cup final", "rested"]] # leave (a player) out of a team temporarily

In [None]:
q3_1_reference_emb3 = model_1.extract_representation(reference3_same_sense, layer = 10)
q3_1_query_emb3 = model_1.extract_representation(query3_same_sense, layer = 10)

q3_1_reference_emb3_1 = model_2.extract_representation(reference3_same_sense)
q3_1_query_emb3_1 = model_2.extract_representation(query3_same_sense)

q3_1_reference_emb3_2 = model_3.extract_representation(reference3_same_sense)
q3_1_query_emb3_2 = model_3.extract_representation(query3_same_sense)


# Take the cosine of every query with every reference
q3_1_sims3 = cosine(q3_1_query_emb3, q3_1_reference_emb3)
q3_1_sims3_1 = cosine(q3_1_query_emb3_1, q3_1_reference_emb3_1)
q3_1_sims3_2 = cosine(q3_1_query_emb3_2, q3_1_reference_emb3_2)

# explore the output:
print("query:  Rest: cease work or movement in order to relax, sleep, or recover strength")
print("reference:  Rest ( cease / allow to be inactive / lie buried / used euphemistically / leave out of a team )")
print('Same sense similarity:', round(torch.mean(q3_1_sims3).item(),3))
print('Same sense similarity:', round(torch.mean(q3_1_sims3_1).item(),3))
print('Same sense similarity:', round(torch.mean(q3_1_sims3_2).item(),3))

query:  Rest: cease work or movement in order to relax, sleep, or recover strength
reference:  Rest ( cease / allow to be inactive / lie buried / used euphemistically / leave out of a team )
Same sense similarity: 0.895
Same sense similarity: 0.964
Same sense similarity: 0.679


# 1. Inter Sense Similarity

*query: Beat: defeat (someone) in a game or other competitive situation*

*references: Beat: a main accent or rhythmic unit in music or poetry*

In [None]:
query_different_sense = [["the swimmer finally beat the world record in the 100-meter freestyle event", "beat"], # Verb_2. defeat (someone) in a game or other competitive situation
                         ]

reference_different_sense = [
  ["The drummer set the tempo with a steady beat that echoed throughout the concert hall.", "beat"], # rhythmic unit
  ["The music had a catchy beat that had everyone dancing on the dance floor.", "beat"], # rhythmic unit
  ["The conductor directed the orchestra to emphasize the strong beat in the musical composition", "beat"], # rhythmic unit
  ["he made her count beats to the bar and clap the rhythm", "beats"], # rhythmic unit
  ["all she could hear was the beat of her heart ", "beat"] # rhythmic unit
]

In [None]:
q3_2_reference_emb = model_1.extract_representation(reference_different_sense, layer = 10)
q3_2_query_emb = model_1.extract_representation(query_different_sense, layer = 10)

q3_2_reference_emb_1 = model_2.extract_representation(reference_different_sense)
q3_2_query_emb_1 = model_2.extract_representation(query_different_sense)

q3_2_reference_emb_2 = model_3.extract_representation(reference_different_sense)
q3_2_query_emb_2 = model_3.extract_representation(query_different_sense)

# Take the cosine of every query with every reference
q3_2_sims2 = cosine(q3_2_query_emb, q3_2_reference_emb)
q3_2_sims_1 = cosine(q3_2_query_emb_1, q3_2_reference_emb_1)
q3_2_sims_2 = cosine(q3_2_query_emb_2, q3_2_reference_emb_2)

# explore the output:
print("query: beat (defeat (someone) in a game or other competitive situation)")
print("reference: beat (a main accent or rhythmic unit in music or poetry)")
print('Same sense similarity:', round(torch.mean(q3_2_sims2).item(),3))
print('Same sense similarity:', round(torch.mean(q3_2_sims_1).item(),3))
print('Same sense similarity:', round(torch.mean(q3_2_sims_2).item(),3))

Same sense similarity: 0.831
Same sense similarity: 0.965
Same sense similarity: 0.486


# 2. Inter Sense Similarity

query: Rock: move gently to and fro or from side to side

references: Rock: he solid mineral material

In [None]:
query_different_sense2 = [["she rocked the baby in her arms", "rocked"], #
                         ]

reference_different_sense2 = [
  ["the beds of rock are slightly tilted.", "rock"],
  ["There are dangerous rocks around the island.", "rocks"],
  ["This bread is (as) hard as a rock", "rock"], #
  ["Moss can grow on bare rock", "rock"], #
]

In [None]:
q3_2_reference_emb2 = model_1.extract_representation(reference_different_sense2, layer = 10)
q3_2_query_emb2 = model_1.extract_representation(query_different_sense2, layer = 10)

q3_2_reference_emb2_1 = model_2.extract_representation(reference_different_sense2)
q3_2_query_emb2_1 = model_2.extract_representation(query_different_sense2)

q3_2_reference_emb2_2 = model_3.extract_representation(reference_different_sense2)
q3_2_query_emb2_2 = model_3.extract_representation(query_different_sense2)

# Take the cosine of every query with every reference
q3_2_sims2 = cosine(q3_2_query_emb2, q3_2_reference_emb2)
q3_2_sims2_1 = cosine(q3_2_query_emb2_1, q3_2_reference_emb2_1)
q3_2_sims2_2 = cosine(q3_2_query_emb2_2, q3_2_reference_emb2_2)

# explore the output:

print('Same sense similarity:', round(torch.mean(q3_2_sims2).item(),3))
print('Same sense similarity:', round(torch.mean(q3_2_sims2_1).item(),3))
print('Same sense similarity:', round(torch.mean(q3_2_sims2_2).item(),3))

Same sense similarity: 0.802
Same sense similarity: 0.975
Same sense similarity: 0.506


# 3. Inter Sense Similarity
query: Rest: cease work or movement in order to relax, sleep, or recover strength

references: Rest: the remaining part of something

In [None]:
query_different_sense3 = [["he needed to rest after the feverish activity", "rest"], #
                         ]

reference_different_sense3 = [
  ["What do you want to do for the rest of your life?", "rest"],
  ["I'll tell you the rest tomorrow night.", "rest"],
  ["The rest of us were experienced skiers", "rest"],
  ["We finished the rest of the cake.", "rest"]
  ]

In [None]:
q3_2_reference_emb3 = model_1.extract_representation(reference_different_sense3, layer = 10)
q3_2_query_emb3 = model_1.extract_representation(query_different_sense3, layer = 10)

q3_2_reference_emb3_1 = model_2.extract_representation(reference_different_sense3)
q3_2_query_emb3_1 = model_2.extract_representation(query_different_sense3)

q3_2_reference_emb3_2 = model_3.extract_representation(reference_different_sense3)
q3_2_query_emb3_2 = model_3.extract_representation(query_different_sense3)

# Take the cosine of every query with every reference
q3_2_sims3 = cosine(q3_2_query_emb3, q3_2_reference_emb3)
q3_2_sims3_1 = cosine(q3_2_query_emb3_1, q3_2_reference_emb3_1)
q3_2_sims3_2 = cosine(q3_2_query_emb3_2, q3_2_reference_emb3_2)

# explore the output:

print('Same sense similarity:', round(torch.mean(q3_2_sims3).item(),3))
print('Same sense similarity:', round(torch.mean(q3_2_sims3_1).item(),3))
print('Same sense similarity:', round(torch.mean(q3_2_sims3_2).item(),3))

Same sense similarity: 0.781
Same sense similarity: 0.931
Same sense similarity: 0.547


# 3. In your write-up
### 3.1 Write what word you chose. & 3.2 The sentences you chose for the different senses of the word

# Same sense
## beat: defeat (someone) in a game or other competitive situation
  - a. overcome (a problem or disease)
  - b. do or be better than (a record or score)

## Rock: move gently to and fro or from side to side
 - a. shake or cause to shake or vibrate, especially because of an impact earthquake, or explosion
 - b. cause great shock or distress to (someone or something), especially so as to weaken or destabilize

## Rest: cease work or movement in order to relax, sleep, or recover strength
 - a. allow to be inactive in order to regain strength or health
 - b. (of a body) lie buried
 - c. used euphemistically by actors to indicate that they are out of work
 - d. leave (a player) out of a team temporarily

# Inter sense

Beat:defeat (someone) in a game or other competitive situation <br>
Beat: a main accent or rhythmic unit in music or poetry

Rock: move gently to and fro or from side to side <br>
Rock: he solid mineral material

Rest: cease work or movement in order to relax, sleep, or recover strength <br>
Rest: the remaining part of something

### 3.3 The differences you found in same sense and inter sense similarity across words.

I used only three models:
>model:RoBERTa, layer = 10 <br>
>model:RoBERTa, layer = 24 <br>
>model:BERT, layer = 24 <br>

There is no significant reversal related to the performance of models.

In inter sense, I picked each verb and noun.

**Optional question:** *Do you think word sense diversity plays a role in the gap between same sense and inter sense similarity values?*

## Question 4: Your turn!

Ask your own question! It could be about comparing the above results on different models, or different layers of the same model. Feel free to explore!

**Write about your analysis, what choices you made, and the results you got, and the conclusions you derived.**

# I reported each write-up