In a previous homework, you constructed semantic axis using static word embeddings, defining a continuum for a particular concept (rich-poor, sad-happy, etc.) that you can then situate *any* word along.  As we've discussed, one issue with static embeddings is lexical *polysemy*---static embeddings must encode *all* meanings of a word, and a word like "rich" has many other senses beyond describing wealth (e.g., "rich dessert").  In this homework, you'll address this by constructing *contextual* semantic axes.

This homework is inspired by methods in the following paper (read it to get a sense of what contextual vectors can be used for, and also as an example to guide your final project reports): Lucy et al (2022), [Discovering Differences in the Representation of People using Contextualized Semantic Axes](https://aclanthology.org/2022.emnlp-main.228/).

In [None]:
!pip install transformers

In [None]:
from transformers import BertModel, BertTokenizer
import numpy as np

Use `bert-base` on Colab; if you're not able to run on Colab, feel free to use the smaller BERT models we discussed in class.

In [None]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

Here are some functions that may be useful.

In [None]:
def cosine_similarity(a, b):
    return np.dot(a, b)/(np.linalg.norm(a)*np.linalg.norm(b))

In [None]:
def get_bert_for_token(string, term):

    # tokenize
    inputs = tokenizer(string, return_tensors="pt")

    # convert input ids to words
    tokens=tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])

    print(tokens)

    # find the first location of the query term among those tokens (so we know which BERT rep to use)
    term_idx=tokens.index(term)

    outputs = model(**inputs)

    # return the BERT rep for that token index
    # The output is a pytorch tensor object, but let's convert it to a numpy object to work with numpy functions

    return outputs.last_hidden_state[0][term_idx].detach().numpy()


In [None]:
query_rep=get_bert_for_token("I ate some jam with toast", "jam")
print(query_rep.shape)

With static semantic axes, you defined the endpoints by choosing sets of words (e.g., {happy, elated} vs. {sad, unhappy}).  Here you will define them with sentences that contain the term (e.g., "It was Saturday morning and John woke up **happy**.")  

**Q1**. Describe the concept for your contextual semantic axis and create 5 sentences that describe the two endpoints (i.e., 10 sentences total).

**Q2**. Now you will create your semantic axis.  Use the `get_bert_for_token` code above to get the BERT embedding for the target word in each of your sentences.  Create a positive vector $V^+$ as the average for all the BERT embeddings for one class, and a negative vector $V^-$ as the average for all the BERT embeddings for the second class.  Following the SemAxis structure, your axis is then $V_\textrm{axis}=V^+-V^-$.  If you are using BERT base, your axis should be a 768-dimensional vector.

**Q3**.  Now select 5 terms **in context** that you want to situate along that axis. For each example, construct a sentence containing a use of that term, get its BERT embedding, and take its cosine similarity with the axis you've defined.  E.g., for a term "wicked" in a context senntence "That homework is **wicked** hard", you will take the BERT embedding for "wicked" in that sentence to situate along the axis you've defined above.

$$
\textrm{score}= \textrm{cos}(\textrm{wicked}_\textrm{That homework is wicked hard}, V_\textrm{axis})
$$

**check-plus**.  You've now situated words in context along a semantic axis. How would you know if this method is accurate or not?  In 200 words, brainstorm an evaluation that you could carry out. Be specific enough that you provide enough detail for us to carry it out from your description alone.

To turn in:

- Go to `File > Download > Download .ipynb` and save your notebook.
- In your browser, print this page to save as PDF.
- Upload both your .ipynb and .pdf files to bCourses as usual.