# Natural Language Inference (NLI) Results and Error Analysis

| Model                   | SNLI Test Accuracy |
|-------------------------|--------------------|
| Basic Encoder           | 61.93              |
| LSTM                    | 77.64              |
| BiLSTM                  | 76.81              |
| BiLSTM with Max Pooling | 80.2               |

In this Jupyter Notebook, we will load the best performing pretrained NLI model (biLSTM with max pooling), demonstrate its usage on custom examples, present an overview of the results, and perform error analysis.

## Load a Pretrained Model

In [2]:
import torch
import models

config_nli_model = {
    'n_words'        :  37927         ,
    'word_emb_dim'   :  300   ,
    'enc_lstm_dim'   :  2048   ,
    'dpout_model'    :  0.0    ,
    'fc_dim'         :  512         ,
    'bsize'          :  16     ,
    'n_classes'      :  3      ,
    'encoder_type'   :  'biLSTMMaxPoolEncoder'   ,
    'use_cuda'       :  True                  ,
    }

model_path = "savedir/maxpool_bilstm_model.pickle"

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
state_dict = torch.load(model_path, map_location=device)

# Create a model instance and load the state dictionary
model_instance = models.NLIClassifier(config_nli_model)  # Assuming the LSTMEncoder is imported from models module
model_instance.load_state_dict(state_dict)
model_instance.to(device)
model_instance.eval()

NLIClassifier(
  (encoder): biLSTMMaxPoolEncoder(
    (enc_lstm): LSTM(300, 2048, batch_first=True, bidirectional=True)
  )
  (classifier): Sequential(
    (0): Linear(in_features=16384, out_features=512, bias=True)
    (1): Linear(in_features=512, out_features=512, bias=True)
    (2): Linear(in_features=512, out_features=3, bias=True)
  )
)

## Demonstrate Model Usage

In [3]:
from data_preprocess import NLIDataset, build_vocab, get_nli, collate_fn
import numpy as np

LABELS = ['entailment', 'neutral', 'contradiction']

def predict_entailment(model, premise, hypothesis):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    data_path = "data/"
    glove_path = "glove/glove.840B.300d.txt"

    train_data, valid_data, test_data = get_nli(data_path)
    word_vec = build_vocab(train_data['s1'] + train_data['s2'], glove_path)
    dataset = NLIDataset(train_data, word_vec)

    s1_idx, s1_length = dataset._get_sentence_indices(premise)
    s2_idx, s2_length = dataset._get_sentence_indices(hypothesis)
    s1_tensor, s1_length = s1_idx.unsqueeze(0).to(device), torch.tensor([s1_length]).to("cpu")
    s2_tensor, s2_length = s2_idx.unsqueeze(0).to(device), torch.tensor([s2_length]).to("cpu")
    s1 = (s1_tensor, s1_length)
    s2 = (s2_tensor, s2_length)

    logits = model(s1, s2)
    prediction = np.argmax(logits.detach().cpu().numpy(), axis=1)

    return prediction


In [4]:
premise = "Two men sitting in the sun"
hypothesis = "Nobody is sitting in the shade"

prediction = predict_entailment(model_instance, premise, hypothesis)
print("Prediction:", LABELS[prediction[0]])

Found 37325/62999 words with glove vectors
Vocab size : 37325
Prediction: contradiction


In [5]:
premise = "A man is walking a dog"
hypothesis = "No cat is outside"

prediction = predict_entailment(model_instance, premise, hypothesis)
print("Prediction:", LABELS[prediction[0]])

Found 37325/62999 words with glove vectors
Vocab size : 37325
Prediction: contradiction


The relationship between the premise and the hypothesis is subtle and nuanced, so there are a few reasons why it could be difficult for the model to make the correct inference:
1. First of all, there is a complete vocab mismatch. There is no direct overlap between the words in the premise and the hypothesis, except for "sitting." The model needs to understand the relationship between "sun" and "shade," which are antonyms, but not explicitly mentioned in the same sentence.
2. The correct entailment relies on understanding that although sitting in the sun implies that the two men are not in the shade, the two sentences are not related. The model might not be able to make this inference because this information is not explicitly stated in the premise.
3. The hypothesis is quite ambiguous. Stating that "nobody is sitting in the shade" makes it unclear whether it refers to the two men or everybody in the world.

For the second example with the dog and cat, the model predicts that their entailment is contradiction. This is because the model is not able to understand that the two sentences are not related. The model is able to make an incorrect inference because the words "dog" and "cat" are in the premise and hypothesis, and the model understands this as a contradiction. 


