
## Lab11: Exploring word and sentence embeddings:

In this session, we learn about representation learning, which means extracting useful features from text using advanced deep neural models. This is important for creating effective NLP applications. Instead of manually designing features, we use deep learning to automatically grasp complex patterns in the text, going beyond what's just in the provided data.

Note: Please solve the exercises and turn the IPYNB notebook in under Canvas->Week11 by no later than **end of today**.

## PART A: Exploring Word Representations (Features) from Pre-trained Models
## 1.1. Let's start with word vectors.

`Word2Vec` and `GloVe` are both popular algorithms used for obtaining word embeddings, which represent words as dense vectors in a continuous vector space. These methods help capture the semantic meanings of words based on their contexts in a corpus.

1. **Word2Vec**:

   Word2Vec is a shallow, two-layer neural network model that is trained to reconstruct linguistic contexts of words. It has two training methods: Continuous Bag of Words (CBOW) and Skip-gram. CBOW predicts the target word from its context, while Skip-gram predicts the context words from the target word.

2. **GloVe**:

   GloVe, short for Global Vectors, is an unsupervised learning algorithm for obtaining vector representations for words. It works on the co-occurrence statistics of words in a corpus, considering the global corpus statistics. GloVe combines the advantages of global matrix factorization and local context window methods.

Here's an example of how to compute word similarity using Word2Vec and GloVe in Python and then plot words in 2D.


In [5]:
import gensim.downloader as api

# Load the pre-trained Word2Vec model

# Download a pre-trained word2vec (trained on Google News data)
w2v_model = api.load("word2vec-google-news-300")

# Compute similarity between words
def compute_similarity(model, word1, word2):
    try:
        return model.similarity(word1, word2)
    except KeyError:
        return 0  # return 0 if either of the words is not in the vocabulary

# Examples of word similarity
word1 = 'king'
word2 = 'queen'
word3 = 'tiger'

w2v_similarity1 = compute_similarity(w2v_model, word1, word2)
w2v_similarity2 = compute_similarity(w2v_model, word2, word3)
w2v_similarity3 = compute_similarity(w2v_model, word1, word3)

print(f"Word2Vec similarity between {word1} and {word2}: {w2v_similarity1}")
print(f"Word2Vec similarity between {word2} and {word3}: {w2v_similarity2}")
print(f"Word2Vec similarity between {word1} and {word3}: {w2v_similarity3}")

Word2Vec similarity between king and queen: 0.6510956883430481
Word2Vec similarity between queen and tiger: 0.08486607670783997
Word2Vec similarity between king and tiger: 0.1430056393146515


As expected, the word `king` is more similar to `queen` than animals.

## 1.2. Finding Most Similar Words

We find words that are most similar to the given word by identifying the nearest neighbors of a word in the N-dimensional space. We need a distance metric (Euclidean or Cosine) for this.

In [11]:
similar_words = lambda word_list, target_word: sorted((word, sim) for word, sim in model.wv.most_similar(target_word) if word in word_list)

word_to_check = 'king'
k = 10

most_similar_words = w2v_model.most_similar(word_to_check,topn=k)
print (most_similar_words)


[('kings', 0.7138045430183411), ('queen', 0.6510956883430481), ('monarch', 0.6413194537162781), ('crown_prince', 0.6204220056533813), ('prince', 0.6159993410110474), ('sultan', 0.5864824056625366), ('ruler', 0.5797567367553711), ('princes', 0.5646552443504333), ('Prince_Paras', 0.5432944297790527), ('throne', 0.5422105193138123)]


We can also perform some consitional queries. E.g., `what word is most similar to "king" and "woman" but opposite of "man" (i.e., very dissimilar to "man")`.

In [15]:
sims = w2v_model.most_similar(positive=['king', 'woman'], negative=['man'])

print (sims)

[('queen', 0.7118193507194519), ('monarch', 0.6189674139022827), ('princess', 0.5902431011199951), ('crown_prince', 0.5499460697174072), ('prince', 0.5377321839332581), ('kings', 0.5236844420433044), ('Queen_Consort', 0.5235945582389832), ('queens', 0.5181134343147278), ('sultan', 0.5098593831062317), ('monarchy', 0.5087411999702454)]


In [16]:
# Function to extract sentence vector from word vectors by averaging word embeddings
from scipy.spatial.distance import cosine
import numpy as np

def extract_sentence_vector(sentence):
    words = sentence.split()
    word_vectors = [w2v_model[word] for word in words if word in w2v_model]
    if not word_vectors:
        return None  # Return None if no word vectors are found
    sentence_vector = np.mean(word_vectors, axis=0)
    return sentence_vector

# Example of extracting sentence vector using mean of word vectors
example_sentence1 = "Natural Language Processing is a branch of Artificial Intelligence".lower()
example_sentence2 = "The quick brown fox jumped over the laze dog".lower()
example_sentence3 = "In the jungle the mighty jungle the lion sleeps tonight".lower()

sentence_vector1 = extract_sentence_vector(example_sentence1)
sentence_vector2 = extract_sentence_vector(example_sentence2)
sentence_vector3 = extract_sentence_vector(example_sentence3)

def similarity (x1, x2):
  # similarity is the opposite of distance
  return 1 - cosine(x1, x2)

sim1 = similarity(sentence_vector1, sentence_vector2)
sim2 = similarity(sentence_vector2, sentence_vector3)
sim3 = similarity(sentence_vector1, sentence_vector3)

print (f"Distance between sentence 1 and 2, score:  {sim1}")
print (f"Similarity between sentence 2 and 3, score:  {sim2}")
print (f"Similarity between sentence 1 and 3, score:  {sim3}")

Distance between sentence 1 and 2, score:  0.23709803819656372
Similarity between sentence 2 and 3, score:  0.56390380859375
Similarity between sentence 1 and 3, score:  0.265546590089798


## 1.3. Extracting Sentence Vectors from Word Vectors

We have done this quite a few times now. Extracting a sentence vector from sentences is easy; we just take the mean (average) of the word vectors across all words in a sentence.

## 1.4. Zero-shot Text Classification using Sentence Vectors

We have seen in Practicum 5, how it is possible to use word vectors such as GloVE to classify text. We extract sentence vectors (by averaging word vectors) and then treat the vectors as features. We extract training and test features-sets from the train/test splits. We then train a classifier (such as `LogisticRegression` based on features and labels and evaluate it.

Here, let's conceptualize the task of classification as a problem of finding similarity between the input sentence and some generic sentences representing class labels. Based on the similarity, we then come up with the class labels.

Since we do not use any training data in this setting, this can be reffered to as zero shot learning (i.e., no-training data is used here). Let's classify movie reviews here.

In [17]:
generic_sentence1 = "positive"
generic_sentence2 = "negative"

pos_vec = extract_sentence_vector(generic_sentence1)
neg_vec = extract_sentence_vector(generic_sentence2)


def predict_sentiment_zero_shot(sentence):
  vec = extract_sentence_vector(sentence)

  sim1 = similarity(pos_vec, vec)
  sim2 = similarity(neg_vec, vec)

  if sim1 > sim2:
    print (f"Predicted sentiment: 'positive' , score: POS: {sim1} NEG: {sim2}")
  else:
    print (f"Predicted sentiment: 'negative' , score: POS: {sim2} NEG: {sim1}")


sentence1 = "The movie was amazing . I liked the use of CGI".lower()
sentence2 = "The movie was horrible . I did not like the use of CGI".lower()
sentence3 = "The movie was somewhat good . However , I did not find the acting appealing and won't recommend".lower()

predict_sentiment_zero_shot(sentence1)
predict_sentiment_zero_shot(sentence2)
predict_sentiment_zero_shot(sentence3)

Predicted sentiment: 'positive' , score: POS: 0.2235041707754135 NEG: 0.2126578837633133
Predicted sentiment: 'negative' , score: POS: 0.2664741575717926 NEG: 0.20714746415615082
Predicted sentiment: 'positive' , score: POS: 0.27916139364242554 NEG: 0.2693459689617157


As we can see, word vector based models do not give a lot of importance to context. Also we did not do any preprocessing (e.g., stop word removal and lemmatization), which would have removed some unnecessary words like `be` verbs and `prepositions` here.

## PART B: Exploring Sentence Representations (Features) from Pre-trained Models

Now, with the advent of deeper neural architecture such as transformers, it is possible to extract sentence representations directly using a pretrained transformer network. An example of a pre-trained transformer based model is BERT (stands for Bidirectional Encoder Representations from Transformers) .

Sentence embeddings in BERT are representations of entire sentences generated by the BERT model. These embeddings capture the semantic meaning and contextual information of the input sentences. BERT employs a transformer architecture that processes words in the context of their surrounding words, enabling it to create contextualized word embeddings. By pooling or averaging these word embeddings, BERT can generate a fixed-size vector representation for the entire sentence. This embedding encodes various aspects of the sentence's meaning, including syntax, semantics, and contextual information, making it useful for various natural language processing tasks such as sentence similarity, classification, and translation.

Let's try to repeat some of the above sections but with BERT based sentence embeddings this time.

Ingredients:
- the `sentence_transformers` library: A python library that is built on top of the `transformers` library by huggingface. We need to install this as this may not alreay be a part of the colab eco-system.  

- the `bert-base-uncased` model: A BERT model variant, trained using large amount of web based text.





In [18]:
!pip install sentence_transformers



## 2.1 Computing sentence similarity using BERT based sentence vectors.

In [19]:
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

# Load a pre-trained BERT model
model = SentenceTransformer('bert-base-uncased')

# Example sentences

example_sentence1 = "Natural Language Processing is a branch of Artificial Intelligence".lower()
example_sentence2 = "The quick brown fox jumped over the laze dog".lower()
example_sentence3 = "In the jungle the mighty jungle the lion sleeps tonight".lower()

all_sentences = [example_sentence1, example_sentence2, example_sentence3]
sentence_embeddings = model.encode(all_sentences)

# Compute similarity between the first two sentences
similarity_score = similarity(sentence_embeddings[0],sentence_embeddings[1])
print(f"Similarity between the first two sentences: {similarity_score}")

# Compute similarity between the first and the third sentence
similarity_score = similarity(sentence_embeddings[0],sentence_embeddings[2])
print(f"Similarity between the first and the third sentence: {similarity_score:.4f}")

# Compute similarity between the second and the third sentence
similarity_score = similarity(sentence_embeddings[1],sentence_embeddings[2])
print(f"Similarity between the second and the third sentence: {similarity_score:.4f}")




Similarity between the first two sentences: 0.4472642242908478
Similarity between the first and the third sentence: 0.4397
Similarity between the second and the third sentence: 0.6144


## Exercise E1: Repeat Zero-shot Text Classification using Sentence Vectors (Section 1.5)

Repeat seciton 1.4. "Zero-shot Text Classification using Sentence Vectors" but this time use BERT sentence vectors instead of averaged word embedding.

Change the generic sentences from `positive` and `negative` to sentential forms like `This is a positive sentence` and `This is a negative sentence`.

Do you see any difference in predictions for the same examples we used in section 1.5?

**Optional Exercise**: Load the `test.csv` file for IMDB movie review classification and compute the accuracy of the zero-shot classifier on the test data. Is it better or worse than the accuracy figures that we saw in our machine learning lab 7? Comment.




