# A solution to Watson's contradictions
### From Kaggle's recurring challenge "Contradictory, My Dear Watson"


This is a preliminary model to a Natural Language Inferencing (NLI) problem, which takes two sentences and decides if the first entails, contradicts, or is unrelated to the second.

See the challenge on Kaggle [here](https://www.kaggle.com/c/contradictory-my-dear-watson/overview).

In [5]:
import numpy as np
import pandas as pd
from transformers import BertTokenizer, TFBertModel
import matplotlib.pyplot as plt
import tensorflow as tf

Matplotlib is building the font cache; this may take a moment.


## Set up the TPU.

In [6]:
try:
  tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
  tf.config.experimental_connect_to_cluster(tpu)
  tf.tpu.experimental.initialize_tpu_system(tpu)
  strategy = tf.distribute.experimental.TPUStrategy(tpu)
except ValueError:
  strategy = tf.distribute.get_strategy()
  print('Number of replicas:', strategy.num_replicas_in_sync)

Number of replicas: 1


## Load the data

From Kaggle's [tutorial notebook](https://www.kaggle.com/anasofiauzsoy/tutorial-notebook):
>The training set contains a premise, a hypothesis, a label (0 = entailment, 1 = neutral, 2 = contradiction), and the language of the text.


In [7]:
train = pd.read_csv("./input/train.csv")
train.head()

Unnamed: 0,id,premise,hypothesis,lang_abv,language,label
0,5130fd2cb5,and these comments were considered in formulat...,The rules developed in the interim were put to...,en,English,0
1,5b72532a0b,These are issues that we wrestle with in pract...,Practice groups are not permitted to work on t...,en,English,2
2,3931fbe82a,Des petites choses comme celles-là font une di...,J'essayais d'accomplir quelque chose.,fr,French,0
3,5622f0c60b,you know they can't really defend themselves l...,They can't defend themselves because of their ...,en,English,0
4,86aaa48b45,ในการเล่นบทบาทสมมุติก็เช่นกัน โอกาสที่จะได้แสด...,เด็กสามารถเห็นได้ว่าชาติพันธุ์แตกต่างกันอย่างไร,th,Thai,1


Looking at one pair of sentences, we have:

In [10]:
print('Premise:')
print(train.premise.values[1])
print('\nHypothesis:')
print(train.hypothesis.values[1])
print('\nRelationship:')
print(train.label.values[1])

Premise:
These are issues that we wrestle with in practice groups of law firms, she said. 

Hypothesis:
Practice groups are not permitted to work on these issues.

Relationship:
2


## Data abstraction with pretrained model

Use a multilingual BERT (Bidirectional Encoder Representations from Transformers) model to tokenize the sentences in many languages.

In [11]:
model_name = 'bert-base-multilingual-cased'
tokenizer = BertTokenizer.from_pretrained(model_name)

Downloading: 100%|██████████| 996k/996k [00:00<00:00, 1.83MB/s]
Downloading: 100%|██████████| 29.0/29.0 [00:00<00:00, 7.56kB/s]
Downloading: 100%|██████████| 1.96M/1.96M [00:00<00:00, 2.45MB/s]
Downloading: 100%|██████████| 625/625 [00:00<00:00, 189kB/s]


In [23]:
def encode_sentence(s):
  tokens = list(tokenizer.tokenize(s))
  tokens.append('[SEP]')
  return tokenizer.convert_tokens_to_ids(tokens)

def demo_encode():
  example = 'Hello ML world!'
  tokens = list(tokenizer.tokenize(example))
  tokens.append('[SEP]')
  print('Demo using:\n\"%s\"\n' % example)
  print(tokens)
  print(tokenizer.convert_tokens_to_ids(tokens))

demo_encode()

Demo using:
"Hello ML world!"

['Hello', 'ML', 'world', '!', '[SEP]']
[31178, 75920, 11356, 106, 102]


In [24]:
def bert_encode(hypotheses, premises, tokenizer):
  num_examples = len(hypotheses)

  sentence1 = tf.ragged.constant([
    encode_sentence(s)
    for s in np.array(hypotheses)
  ])
  sentence2 = tf.ragged.constant([
    encode_sentence(s)
    for s in np.array(premises)
  ])

  cls = [tokenizer.convert_tokens_to_ids(['CLS'])]*sentence1.shape[0]
  input_word_ids = tf.concat([cls, sentence1, sentence2], axis=-1)

  input_mask = tf.ones_like(input_word_ids).to_tensor()

  type_cls = tf.zeros_like(cls)
  type_s1 = tf.zeros_like(sentence1)
  type_s2 = tf.zeros_like(sentence2)
  input_type_ids = tf.concat([type_cls, type_s1, type_s2], axis=-1).to_tensor()

  inputs = {
    'input_word_ids': input_word_ids.to_tensor(),
    'input_mask': input_mask,
    'input_type_ids': input_type_ids
  }

  return inputs

In [25]:
train_input = bert_encode(train.premise.values, train.hypothesis.values, tokenizer)

## Creating and training model

In [28]:
max_len = 50

def build_model():
  bert_encoder = TFBertModel.from_pretrained(model_name)
  input_word_ids = tf.keras.Input(shape=(max_len,), dtype=tf.int32, name="input_word_ids")
  input_mask = tf.keras.Input(shape=(max_len), dtype=tf.int32, name="input_mask")
  input_type_ids = tf.keras.Input(shape=(max_len), dtype=tf.int32, name="input_type_ids")

  embedding = bert_encoder([input_word_ids, input_mask, input_type_ids])[0]
  output = tf.keras.layers.Dense(3, activation='softmax')(embedding[:,0,:])

  model = tf.keras.Model(inputs=[input_word_ids, input_mask, input_type_ids], outputs=output)
  model.compile(tf.keras.optimizers.Adam(lr=1e-5), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

  return model

In [31]:
with strategy.scope():
  model = build_model()
  model.summary()
  tf.keras.utils.plot_model(model, 'tf_bert_model.png', show_shapes=True)

Some layers from the model checkpoint at bert-base-multilingual-cased were not used when initializing TFBertModel: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-multilingual-cased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


Model: "functional_5"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_word_ids (InputLayer)     [(None, 50)]         0                                            
__________________________________________________________________________________________________
input_mask (InputLayer)         [(None, 50)]         0                                            
__________________________________________________________________________________________________
input_type_ids (InputLayer)     [(None, 50)]         0                                            
__________________________________________________________________________________________________
tf_bert_model_3 (TFBertModel)   TFBaseModelOutputWit 177853440   input_word_ids[0][0]             
                                                                 input_mask[0][0]      

In [32]:
model.fit(train_input, train.label.values, epochs = 2, verbose = 1, batch_size = 64, validation_split = 0.2)

Epoch 1/2
  1/152 [..............................] - ETA: 5s - loss: 1.2153 - accuracy: 0.3125

KeyboardInterrupt: 