In this homework, you'll explore using BERT and SentenceTransformers into the Lesk algorithm for word sense disambiguation.  (You'll likely want to run this on Colab.)

In [None]:
!pip install transformers
!pip install sentence-transformers

In [None]:
from transformers import BertModel, BertTokenizer
import numpy as np
from nltk.corpus import wordnet as wn
import nltk
from scipy.spatial.distance import cosine
import operator
import torch
from math import sqrt

In [None]:
nltk.download('wordnet')

In [None]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

**Q1:** The Lesk algorithm we discussed in class uses information about the *context* around a term in calculating the similarity between a word in a sentence and a word in a dictionary gloss.  For instance, [Basile et al. 2014](https://www.aclweb.org/anthology/C/C14/C14-1151.pdf) use static word vectors to provide this context-level information, where we measure the similarity between a gloss g = $\{ g_1, \ldots, g_G \}$ and context c = $\{ c_1, \ldots, c_C \}$ as the cosine similarity between the sum of distributed representations:

$$
\cos \left(\sum_{i=1}^G g_i, \sum_{i=1}^C c_i  \right)
$$

However, over the past few weeks we've considered how contextual language models like BERT already provide a sentence-level contextualization for words.  So given a target sentence ("I withdrew money from the *bank*") with target term (bank), and a list of dictionary glosses/examples corresponding to different senses ("A bank is a financial institution" = bank1; "A bank is the side of a river" = bank2), let's adapt the Lesk algorithm to simply calculate the similarity between the average BERT embedding for all words in the target sentence (including the [CLS] and [SEP] tokens) and the average BERT embedding for all the words in the sense gloss (again including [CLS] and [SEP]):

$$
\cos \left({1 \over G}\sum_{i=1}^G BERT(g_{i}), {1 \over C} \sum_{j=1}^C BERT(c_{j}) \right)
$$


* The gloss for a synset can be found in `synset.definition()`.
* You can find the cosine similarity between two vectors below.
* `wn.synsets(word, pos=part_of_speech)` gets you a list of the synsets for a word with a specific part of speech (e.g., "n" for noun)
* Feel free to draw on the code you've already seen for getting the BERT embeddings for words (e.g., `3.embeddings/BERT.ipynb`).

In [None]:
def cosine_similarity(vec1, vec2):
  return np.dot(vec1, vec2)/(sqrt(np.dot(vec1, vec1)) * sqrt(np.dot(vec2, vec2)))

In [None]:
def bert_lesk(word, sentence, part_of_speech):

    def get_bert_for_token(string, term):

        # your code here

        return


    context_vector=get_bert_for_token(sentence, word)

    synsets=...

    vals={}
    for synset in synsets:
        vector=get_bert_for_token(...)
        vals[synset]=cosine_similarity(...)

    sorted_x = sorted(vals.items(), key=operator.itemgetter(1), reverse=True)
    for k,v in sorted_x:
        print("%.3f\t%s\t%s"% (v,k,k.definition()))


Execute the following two cells to check whether your implementation distinguishes between these two senses of "bank".

In [None]:
bert_lesk("bank", "I deposited my money into my savings account at the bank", "n")

In [None]:
bert_lesk("bank", "I ran along the river bank", "n")

Q2.  Now do the same thing with SentenceBERT.  For a gloss $g$ and a target sentence $c$ containing the word to disambiguate, calculate the similarity between them as the cosine similarity of the SentenceBERT vectors of each one:

$$
\cos \left(\textrm{SBERT}(g), \textrm{SBERT}(c) \right)
$$


In [None]:
from sentence_transformers import SentenceTransformer
sentence_model = SentenceTransformer('sentence-transformers/all-distilroberta-v1')

In [None]:
def sentencebert_lesk(word, sentence, part_of_speech):

    # your code here

    vals={}

    # your code here

    sorted_x = sorted(vals.items(), key=operator.itemgetter(1), reverse=True)
    for k,v in sorted_x:
        print("%.3f\t%s\t%s"% (v,k,k.definition()))


Execute the following two cells to check whether your implementation of SentenceBERT-Lesk distinguishes between these two senses of "bank".

In [None]:
sentencebert_lesk("bank", "I deposited my money into my savings account at the bank", "n")

In [None]:
sentencebert_lesk("bank", "I ran along the river bank", "n")

To turn in:

- Go to `File > Download > Download .ipynb` and save your notebook.
- In your browser, print this page to save as PDF.
- Upload both your .ipynb and .pdf files to bCourses as usual.