# Practical 1 Notebook and Report
## I. First step by step guide to using a Trained Model
First thing to do would be to load the model. 

Information on how to use the repository are provided in the README.

In [1]:
import torch

model = torch.load("model_checkpoint\BiLSTMConcat_checkpoint_epoch5.pt")

Now that the model is used, it is possible to give him two sentences to predict their relationship between entailment, neutral and contradiction.
But first, the data need to be pre-processed. Let's take two sentences : 

"Harvey Specter is solving a case and he thinks about bribing the police."
"Harvey corrupts the police."

The premise entails the hypothesis so the label predicted by the model should be 0, as entailment = 0, neutral = 1 and contradiction = 2.

In [30]:
premise = "Harvey Specter is solving a case and he thinks about bribing the police."
hypothesis = "Harvey corrupts the police."

import nltk
import json

# First we tokenize the sentences
premise = nltk.tokenize.word_tokenize(premise)
hypothesis = nltk.tokenize.word_tokenize(hypothesis)

#Then we lowercase all the tokens
premise = [words.lower() for words in premise]
hypothesis = [words.lower() for words in hypothesis]

# We load the vocabulary
with open("data/data.json", 'r') as file:
    vocab = json.load(file)

from data import prepare_example

#This function will assign indexes to token, so that they are recognizes by the embedding table that was used to train the model.
premise = prepare_example(premise, vocab)
hypothesis = prepare_example(hypothesis, vocab)

print(premise)
print(hypothesis)

tensor([[35960,     0,    12,  8140,     2,  2828,    26,    73,  7664,   761,
             0,    35,  2271,    11]], device='cuda:0')
tensor([[35960,     0,    35,  2271,    11]], device='cuda:0')


By executing the cell below, you can see that the model gives higher probability to the first class, which refers to entailment. 

In [31]:
import torch.nn as nn

softmax = nn.Softmax(dim = 1)

print(softmax(model(premise, hypothesis)))

tensor([[0.9438, 0.0391, 0.0171]], device='cuda:0', grad_fn=<SoftmaxBackward0>)


## II. Why does the model sometimes fail ?

Premise - “Two men sitting in the sun”
Hypothesis - “Nobody is sitting in the shade”

Label - Neutral (likely predicts contradiction)

Premise - “A man is walking a dog”
Hypothesis - “No cat is outside”

Label - Neutral (likely predicts contradiction)

Can you think of a possible reason why the model would fail in such cases?


In [52]:
premise_1 = "Two men sitting in the sun"
hypothesis_1 = "Nobody is sitting in the shade"

premise_2 = "A man is walking a dog"
hypothesis_2 = "No cat is outside"

import nltk
import json

# First we tokenize the sentences
premise_1 = nltk.tokenize.word_tokenize(premise_1)
hypothesis_1 = nltk.tokenize.word_tokenize(hypothesis_1)

premise_2 = nltk.tokenize.word_tokenize(premise_2)
hypothesis_2 = nltk.tokenize.word_tokenize(hypothesis_2)

#Then we lowercase all the tokens
premise_1 = [words.lower() for words in premise_1]
hypothesis_1 = [words.lower() for words in hypothesis_1]

premise_2 = [words.lower() for words in premise_2]
hypothesis_2 = [words.lower() for words in hypothesis_2]

# We load the vocabulary
with open("data/data.json", 'r') as file:
    vocab = json.load(file)

from data import prepare_example

#This function will assign indexes to token, so that they are recognizes by the embedding table that was used to train the model.
premise_1 = prepare_example(premise_1, vocab)
hypothesis_1 = prepare_example(hypothesis_1, vocab)

premise_2 = prepare_example(premise_2, vocab)
hypothesis_2 = prepare_example(hypothesis_2, vocab)

print(premise_1)
print(hypothesis_1)

tensor([[  83,  452,  102,   41,   35, 1370]], device='cuda:0')
tensor([[   2,   85,   12,  102,   41,   35, 3608]], device='cuda:0')


In [53]:
import torch.nn as nn

softmax = nn.Softmax(dim = 1)
print("For premise_1 and hypothesis_1, model predicts : ")
print(softmax(model(premise_1, hypothesis_1)))

print("For premise_2 and hypothesis_2, model predicts : ")
print(softmax(model(premise_2, hypothesis_2)))

For premise_1 and hypothesis_1, model predicts : 
tensor([[1.3425e-06, 2.0507e-05, 9.9998e-01]], device='cuda:0',
       grad_fn=<SoftmaxBackward0>)
For premise_2 and hypothesis_2, model predicts : 
tensor([[1.6535e-07, 6.9126e-07, 1.0000e+00]], device='cuda:0',
       grad_fn=<SoftmaxBackward0>)


The model predicts contradiction for these two sentences.

In those two last examples, the model is quite certain that the right label is contradiction. Moreover, it has to be specified that the loaded model is our best performing model on SNLI and SentEVal with an accuracy approximating 85% on both tasks. 

The first thing that I did is comparing the typical sentences of the SNLI dataset with the premises and hypothesis provided. One could say that the premises of the SNLi dataset tend to be longer and more detailed but some examples match the length of the two provided sentences.

One reason why the model fail could be the negation of the premises' subjects. In both hypothesis, the following "Nobody" and "No cat" are employed as the subject of the sentence. It could be that the model associates negation with contradiction and thus predict the wrong label.

To experiment and see if that is the reason, I modified the last example

In [50]:
premise_2 = "A man is walking a dog"
hypothesis_2 = "A cat is outside"

import nltk
import json

# First we tokenize the sentences
premise_2 = nltk.tokenize.word_tokenize(premise_2)
hypothesis_2 = nltk.tokenize.word_tokenize(hypothesis_2)

#Then we lowercase all the tokens
premise_2 = [words.lower() for words in premise_2]
hypothesis_2 = [words.lower() for words in hypothesis_2]

# We load the vocabulary
with open("data/data.json", 'r') as file:
    vocab = json.load(file)

from data import prepare_example

#This function will assign indexes to token, so that they are recognizes by the embedding table that was used to train the model.
premise_2 = prepare_example(premise_2, vocab)
hypothesis_2 = prepare_example(hypothesis_2, vocab)

In [51]:
import torch.nn as nn

softmax = nn.Softmax(dim = 1)

print("For premise_2 and hypothesis_2, model predicts : ")
print(softmax(model(premise_2, hypothesis_2)))

For premise_2 and hypothesis_2, model predicts : 
tensor([[0.1744, 0.8138, 0.0118]], device='cuda:0', grad_fn=<SoftmaxBackward0>)


The model still predicts contradiction even without negation. For this particular sentence, I also thought of the fact that cat and dog are close in the embedding space and might be used to being opposite, hence the contradiction. I then tried the sentences "No crocodile is outside" and " A crocodile is outside" as hypothesis but both are predicted as a contradiction. I did the same experiences with the first set of sentences, replacing "Nobody" by "A women" or "No women" and all examples lead to contradiction. 

OIn a large majority of the dataset examples, the subject of the hypothesis is either directly refered by the same word as in the premise, or refered by a "similar" word, as "old guy" could be refered by "man". However, the subjects in the given hypothesis here do not appear in the premises, which is an uncommun sentence structure. that might be the reason why the model fails.